Multimodal Conversational Solutions


Our ability to organize and intelligently combine sensory data derived from multiple sensors (modulated by perceptual relevance and sensory confidence) is crucial for building a robust model of objects and events in our environment, in spite of dramatically varying perceptual conditions. Our group's goal is to exploit this human perceptual principle of sensory integration to improve the recognition of human activity (e.g. speech recognition, speech activity, speaker change, etc.), intent (e.g. speech intent) and identity (e.g: speaker recognition), particularly in the presence of acoustic degradation due to noise and channel, and the analysis and mining of multimedia content.

The applications for this work include (but are not limited to) accurate transcription of human activity for improved human information interfaces, multimedia content mining and meeting transcription. The links provided in the left side menu contain more detailed information about the different areas we are exploring. The project is currently being managed by Bhuvana Ramabhadran



Last updated 21 Nov 2008