Research

My research interests lie in the areas of signal processing, machine learning, deep learning, and neuroscience with applications to robust speech recognition, speech enhancement, auditory and behavioral neuroscience.

Thesis

Mask Estimator Approaches For Audio Beamforming
MTech Thesis
Instructor: Prof. Sriram Ganapathy

Publications

2021

End-To-End Speech Recognition With Joint Derevberation of Sub-Band Autoregressive Envelopes Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram and Sriram Ganapathy Under Review
Dereverberation of autoregressive envelopes for far-field speech recognition Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar and Sriram Ganapathy Elsevier Journal of Computer Speech and Language
SRIB-LEAP submission to Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing R G Prithvi Raj, Rohit Kumar, M K Jayesh, Anurenjan Purushothaman, Sriram Ganapathy and M A Basha Shaik INTERSPEECH 2021
Towards sound based testing of COVID-19 - Summary of the first Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge Neeraj Kumar Sharma, Ananya Muguli, Prashant Krishnan, Rohit Kumar, Srikanth Raj Chetupalli and Sriram Ganapathy Elsevier Computer Speech and Language
Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms Srikanth Raj Chetupalli, Prashant Krishnan, Neeraj Sharma, Ananya Muguli, Rohit Kumar, Viral Nanda, Lancelot Mark Pinto, Prasanta Kumar Ghosh and Sriram Ganapathy Under Review
DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Sharma, Prashant Krishnan, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda INTERSPEECH 2021
Investigating the Feature Selection and Explainability of COVID-19 Diagnostics from Cough Sounds Flavio Avila, Amir Hossein Poorjam, Deepak Mittal, Charles Dognin, Ananya Muguli, Rohit Kumar, Srikanth Raj Chetupalli, Sriram Ganapathy and Maneesh Singh INTERSPEECH 2021

2020

Unsupervised Neural Mask Estimator for Generalized Eigen-Value Beamforming Based ASR Rohit Kumar, Anirudh Sreeram, Anurenjan Purushothaman and Sriram Ganapathy International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Coswara - A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis Neeraj Sharma, Prashant Krishnan, Rohit Kumar, Shreyas Ramoji, Srikanth Raj Chetupalli, Nirmala R., Prasanta Kumar Ghosh, and Sriram Ganapathy INTERSPEECH 2020
Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar, Sriram Ganapathy INTERSPEECH 2020
LEAP Submission to CHiME-6 ASR Challenge Anirudh Sreeram, Anurenjan Purushothaman, Rohit Kumar, Sriram Ganapathy CHIME-6 Challenge

Other projects

END-TO-END DEEP NETWORK FOR IMAGE TO AUDIO CONVERSION
EN.520.612.01.FA21 Machine Learning for Signal Processing
Instructor: Prof. Najim Dehak
Abstract:
Image to speech synthesis plays a crucial role in assisting blind people. Image to speech synthesis aims to synthesize intelligible and natural speech given an input image. In this work, we propose an end-to-end encoder-decoder-based architecture for the image to speech synthesis where we attempt to generate audio from image directly without generating intermediate text first. Further, we employ a transformer network to understand long-range dependencies in images for synthesizing better speech. We will analyze our model performance by evaluating it on a standard benchmark dataset and will compare with existing state-of-the-art methods.
FEATURE ANALYSIS IN AUDIO RECORDINGS FOR COVID-19 DERECTION
EN.520.645.01.FA21 Audio Signal Processing
Instructor: Prof. Mounya Elhilali
Abstract:
Current COVID-19 testing processes present many barriers such as high cost, lack of availability, and time delay in the results. While the use of audio signals as biomarkers for COVID-19 detection has shown promise in recent work, there is still room for improvement in both signal analysis and classification performance for COVID-19 audio data. Through feature extraction and analysis of multiple audio modalities, this project attempts to provide a signal-level understanding of COVID-19 biomarkers in audio, allowing for better feature selection and classification performance going forward. To accomplish this, COVID-19 audio samples from breathing, cough, and speech recordings were obtained through the Coswara dataset [\cite{sharma2020coswara}]. Features outlined in the INTERSPEECH 2013 Computational Paralinguistics Evaluation baseline [\cite{schuller2013interspeech}] were extracted, including a set of low-level features along with their statistical functionals. Features were ranked and selected from each modality according to a two-part selection process. Finally, an in-depth analysis of the optimal features was performed in order to provide further insight into effective features for COVID-19 classification.
DEEP MULTIWAY CANONICAL CORRELATION ANALYSIS FOR DECODING THE AUDITORY BRAIN
E9 205 Machine Learning for Signal Processing
Instructor: Prof. Sriram Ganapathy
Abstract:
Decoding of the auditory brain for an acoustic stimulus involves finding the relationship between the audio input and the brain activity measured in terms of Electroencephalography (EEG) recordings. Prior methods in this domain focus on analysing a subjects’ activity separately using linear analysis methods like Canonical Correlation Analysis (CCA) and non-linear methods like Deep CCA. A recent attempt was made, called multiway CCA, to combine the brain activity readings from a bunch of subjects and extract useful information from each subject which is irrespective of the subject to obtain a large dataset of stimulus and response to work with. In this project, we tried to introduce a deep learning framework to perform correlation analysis in this setup. We try to replace the block of multiway CCA, which is one linear formulation of a Generalized Canonical Correlation Analysis with a deep version of Generalized CCA. The corresponding results obtained in performing the existing multiway CCA method onto the data and the comparison of the correlations obtained for each subject with and without the influence of other subjects’ data are presented.
Non-Negative Matrix Factorization
E0 229 Foundations of Data Science
Instructor: Prof. Siddharth Barman
Abstract: Linear dimensionality reduction techniques such as principal component analysis and singular value decomposition are powerful tools for dealing with high dimensional data. In this report, we will explore a linear dimensionality reduction technique namely Non negative matrix factorization, a low rank approximation problem which is quite useful while dealing with data in which all entries are non negative, for eg., spectrogram matrix entries or pixels in an image. More precisely, we seek to approximate a given non negative matrix as a product of two low-rank non negative matrices. In this report we will embark on the journey to explore the theoretical complexity associated with this problem, then how to find the non negative factors of our main protagonist and what all applications are there we can use this non negative matrix factorisation.
Speech dereverberation using variance-normalized delayed linear prediction
E9 261 Speech Information Processing
Instructor: Prof. Prasanta Kumar Ghosh and Prof. Sriram Ganapathy
Abstract:
Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberantspeech are temporally smeared. In this project, I tried to implement the work by Nakatani et el, where the paper praposed a statistical model based speech dereverberation approach, that can cancel the late reverberation of a reverberant speech signal. For this project, REVERB Challenge dataset was used and various objective evaluation tests were performed on the enhaced audio.
Sparsity in Linear Prediction Coefficient
E9 203 Compressive sensing and sparse signal processing
Instructor: Prof. K.V.S Hari

Here is a list of all the courses I have taken, both during PhD and Master’s.

Rohit K

Research

Thesis

Publications

Other projects