Rohit K bio photo

Rohit K

Graduate Student
Johns Hopkins University
Baltimore, Maryland.

Email LinkedIn Github Scholar

Research

My research interests lie in the areas of signal processing, machine learning, deep learning, and neuroscience with applications to robust speech recognition, speech enhancement, auditory and behavioral neuroscience.

Thesis

  • Mask Estimator Approaches For Audio Beamforming
    MTech Thesis
    Instructor: Prof. Sriram Ganapathy

Publications

2021

  1. End-To-End Speech Recognition With Joint Derevberation of Sub-Band Autoregressive Envelopes Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram and Sriram Ganapathy Under Review

  2. Dereverberation of autoregressive envelopes for far-field speech recognition Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar and Sriram Ganapathy Elsevier Journal of Computer Speech and Language

  3. SRIB-LEAP submission to Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing R G Prithvi Raj, Rohit Kumar, M K Jayesh, Anurenjan Purushothaman, Sriram Ganapathy and M A Basha Shaik INTERSPEECH 2021

  4. Towards sound based testing of COVID-19 - Summary of the first Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge Neeraj Kumar Sharma, Ananya Muguli, Prashant Krishnan, Rohit Kumar, Srikanth Raj Chetupalli and Sriram Ganapathy Elsevier Computer Speech and Language

  5. Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms Srikanth Raj Chetupalli, Prashant Krishnan, Neeraj Sharma, Ananya Muguli, Rohit Kumar, Viral Nanda, Lancelot Mark Pinto, Prasanta Kumar Ghosh and Sriram Ganapathy Under Review

  6. DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Sharma, Prashant Krishnan, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda INTERSPEECH 2021

  7. Investigating the Feature Selection and Explainability of COVID-19 Diagnostics from Cough Sounds Flavio Avila, Amir Hossein Poorjam, Deepak Mittal, Charles Dognin, Ananya Muguli, Rohit Kumar, Srikanth Raj Chetupalli, Sriram Ganapathy and Maneesh Singh INTERSPEECH 2021

2020

  1. Unsupervised Neural Mask Estimator for Generalized Eigen-Value Beamforming Based ASR Rohit Kumar, Anirudh Sreeram, Anurenjan Purushothaman and Sriram Ganapathy International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

  2. Coswara - A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis Neeraj Sharma, Prashant Krishnan, Rohit Kumar, Shreyas Ramoji, Srikanth Raj Chetupalli, Nirmala R., Prasanta Kumar Ghosh, and Sriram Ganapathy INTERSPEECH 2020

  3. Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar, Sriram Ganapathy INTERSPEECH 2020

  4. LEAP Submission to CHiME-6 ASR Challenge Anirudh Sreeram, Anurenjan Purushothaman, Rohit Kumar, Sriram Ganapathy CHIME-6 Challenge


Other projects

  • END-TO-END DEEP NETWORK FOR IMAGE TO AUDIO CONVERSION
    EN.520.612.01.FA21 Machine Learning for Signal Processing
    Instructor: Prof. Najim Dehak
    Abstract:
    Image to speech synthesis plays a crucial role in assisting blind people. Image to speech synthesis aims to synthesize intelligible and natural speech given an input image. In this work, we propose an end-to-end encoder-decoder-based architecture for the image to speech synthesis where we attempt to generate audio from image directly without generating intermediate text first. Further, we employ a transformer network to understand long-range dependencies in images for synthesizing better speech. We will analyze our model performance by evaluating it on a standard benchmark dataset and will compare with existing state-of-the-art methods.

  • FEATURE ANALYSIS IN AUDIO RECORDINGS FOR COVID-19 DERECTION
    EN.520.645.01.FA21 Audio Signal Processing
    Instructor: Prof. Mounya Elhilali
    Abstract:
    Current COVID-19 testing processes present many barriers such as high cost, lack of availability, and time delay in the results. While the use of audio signals as biomarkers for COVID-19 detection has shown promise in recent work, there is still room for improvement in both signal analysis and classification performance for COVID-19 audio data. Through feature extraction and analysis of multiple audio modalities, this project attempts to provide a signal-level understanding of COVID-19 biomarkers in audio, allowing for better feature selection and classification performance going forward. To accomplish this, COVID-19 audio samples from breathing, cough, and speech recordings were obtained through the Coswara dataset [\cite{sharma2020coswara}]. Features outlined in the INTERSPEECH 2013 Computational Paralinguistics Evaluation baseline [\cite{schuller2013interspeech}] were extracted, including a set of low-level features along with their statistical functionals. Features were ranked and selected from each modality according to a two-part selection process. Finally, an in-depth analysis of the optimal features was performed in order to provide further insight into effective features for COVID-19 classification.

  • DEEP MULTIWAY CANONICAL CORRELATION ANALYSIS FOR DECODING THE AUDITORY BRAIN
    E9 205 Machine Learning for Signal Processing
    Instructor: Prof. Sriram Ganapathy
    Abstract:
    Decoding of the auditory brain for an acoustic stimulus involves finding the relationship between the audio input and the brain activity measured in terms of Electroencephalography (EEG) recordings. Prior methods in this domain focus on analysing a subjects’ activity separately using linear analysis methods like Canonical Correlation Analysis (CCA) and non-linear methods like Deep CCA. A recent attempt was made, called multiway CCA, to combine the brain activity readings from a bunch of subjects and extract useful information from each subject which is irrespective of the subject to obtain a large dataset of stimulus and response to work with. In this project, we tried to introduce a deep learning framework to perform correlation analysis in this setup. We try to replace the block of multiway CCA, which is one linear formulation of a Generalized Canonical Correlation Analysis with a deep version of Generalized CCA. The corresponding results obtained in performing the existing multiway CCA method onto the data and the comparison of the correlations obtained for each subject with and without the influence of other subjects’ data are presented.

  • Non-Negative Matrix Factorization
    E0 229 Foundations of Data Science
    Instructor: Prof. Siddharth Barman
    Abstract: Linear dimensionality reduction techniques such as principal component analysis and singular value decomposition are powerful tools for dealing with high dimensional data. In this report, we will explore a linear dimensionality reduction technique namely Non negative matrix factorization, a low rank approximation problem which is quite useful while dealing with data in which all entries are non negative, for eg., spectrogram matrix entries or pixels in an image. More precisely, we seek to approximate a given non negative matrix as a product of two low-rank non negative matrices. In this report we will embark on the journey to explore the theoretical complexity associated with this problem, then how to find the non negative factors of our main protagonist and what all applications are there we can use this non negative matrix factorisation.

  • Speech dereverberation using variance-normalized delayed linear prediction
    E9 261 Speech Information Processing
    Instructor: Prof. Prasanta Kumar Ghosh and Prof. Sriram Ganapathy
    Abstract:
    Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberantspeech are temporally smeared. In this project, I tried to implement the work by Nakatani et el, where the paper praposed a statistical model based speech dereverberation approach, that can cancel the late reverberation of a reverberant speech signal. For this project, REVERB Challenge dataset was used and various objective evaluation tests were performed on the enhaced audio.

  • Sparsity in Linear Prediction Coefficient
    E9 203 Compressive sensing and sparse signal processing
    Instructor: Prof. K.V.S Hari


Here is a list of all the courses I have taken, both during PhD and Master’s.