Team

My immediate team - made of some of the best minds in the ASR business!

  • Etienne Marcheret
  • John Hershey
  • Karthik Visweswariah
  • Liam Comerford
  • Mirek Novak
  • Peder Olsen
  • Pierre Dognin
  • Roberto Sicconi
  • Sabine Deligne
  • Serdar Kozat
  • Sree Balakrishnan
  • Tony Lee
  • Vaibhava Goel

 

And Roberto Sicconi’s team – that is defining the future in dialog management and AVSR!

  • Daniel Coffman
  • Gregg Daggett
  • Jing Huang
  • Leonid Rachevsky
  • Makis Potamianos
  • Mark Epstein
  • Rajesh Balchandran
  • Vit Libal

 

I also enjoy working with the extended team in Research and the worldwide IBM Voice Systems R&D team – on a regular basis on interesting problems.

 

What follows is a selected list of recent publications from my team organized by topic.

Acoustic Processing

      S. Dharanipragada and B. Rao. MVDR based Feature Extraction for Robust Speech Recognition, ICASSP 2001.

2.      U. Yapanel and S. Dharanipragada. Perceptual MVDR-based Cepstral Coefficients for Robust Speech Recognition, ICASSP 2003.

3.      E. Marcheret, K. Visweswariah, G. Potamianos, "Speech Activity Detection Fusing Acoustic Phonetic and Energy Features", Eurospeech 2005

4.      T. Kristjansson, S. Deligne, P. Olsen, “Voicing Features for Robust Speech Detection,”, Eurospeech 2005

Acoustic Modeling

1.      P. Olsen and R. A. Gopinath. Modeling Inverse Covariances in Gaussian Mixture Models. ICASSP 2002.

2.      S. Axelrod, R. Gopinath and P. Olsen. Modeling with a Subspace Constraint on Inverse Covariances. ICSLP 2002.

3.      K. Visweswariah, P. Olsen, R. A. Gopinath and S. Axelrod. Maximum Likelihood Training of Subspaces for Inverse Covariance Modeling. ICASSP 2003.

4.      V. Goel, S. Axelrod, R. A. Gopinath, P. Olsen and K. Visweswariah. Discriminative Estimation of Subspace Precision & Mean (SPAM) Models, Eurospeech 2003.

5.      P. Olsen and S. Dharanipragada. An Efficient Integrated Gender Detection Scheme and Time Mediated Averaging of Gender Dependent Acoustic Models, ICASSP 2003.

Acoustic Model Adaptation

1.      K. Visweswariah, V. Goel, and R. A. Gopinath. Structuring Linear Transformations For Adaptation Using Training Time Information, ICASSP 2002

2.      S. V. Balakrishnan. Fast Incremental Adaptation using Maximum Likelihood Regression and Stochastic Gradient Descent, Eurospeech 2003.

3.      K. Visweswariah and P. Olsen "Feature adaptation using projection of Gaussian posteriors," Interspeech 2005

4.      S. S. Kozat, K. Visweswariah and R. A. Gopinath, " Efficient, low latency adaptation for speech recognition," ICASSP 2006 (submitted).

1.      S. S. Kozat, K. Visweswariah and R. A. Gopinath, "Feature adaptation based on Gaussian posteriors," ICASSP 2006 (submitted).

Search

1.      M. Novak, R. Hampl, P. Krbec, V. Bergl and J. Sedivy. Two-pass Strategy for Large List Recognition on Embedded Speech Recognition Platforms, ICASSP 2003.

2.      M. Novak and R. Diego. Confidence Measure driven Scalable Two-pass Search Strategy for Large List Grammars, Eurospeech 2003.

3.      M. Novak , V. Bergl, “Memory efficient decoding graph compilation with wide cross-word acoustic context,”  ICSPLP 2004.

4.      M. Novak, “Memory efficient approximate lattice generation for grammar based decoding” Eurospeech 2005

Lexicon and Pronunciation Modeling

1.      S. Deligne and L. Mangu. On the use of Lattices for Automatic Generation of Pronunciation, ICASSP 2003.

2.      B. Maison. Automatic Baseform Generation from Acoustic Data, Eurospeech 2003.

Language Modeling

1.      H. Printz and P. Olsen. Theory and Practice of Acoustic Confusability, Computer, Speech and Language, January 2002. Special issue: Advances in Large Vocabulary Speech Recognition.

2.      V. Goel, J. Kuo, S. Deligne, and C. Wu, "Language Model Estimation for Optimizing End-to-End Performance of a Natural Language Call Routing System," ICASSP 2005.

Spoken Language Understanding and Dialog Management

1.      R. Sarikaya, J. Kuo, V. Goel, and Y. Gao. "Exploiting unlabeled data using multiple classifiers for improved natural language call routing," Eurospeech 2005.

2.      J. Kuo and V. Goel, "Active Learning with Minimum Expected Error for Spoken Language Understanding," Eurospeech 2005.

3.      V. Goel and R. Gopinath, "Using EM Clustering to Build Optimal Trees for Dialog Context Sensitive Language Models," ICASSP 2006.

Audio-Visual Speech Processing

1.      J. Huang, E. Marcheret, K. Visweswariah, "Rapid Feature Space Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition", ICME 2005

2.      J. Huang and K. Visweswariah, ”Improving Lip-reading with Feature Space Transforms for Multi-Stream Audio-Visual Speech Recognition”, Eurospeech 2005.