The goal of the Speech-to-Speech Translation (S2S) research is to enable real-time, interpersonal communication via natural spoken language for people who do not share a common language. The Multilingual Automatic Speech-to-Speech Translator (MASTOR) system is the first S2S system that allows for bidirectional (English-Mandarin) free-form speech input and output.
The research leading to MASTOR was initiated in 2001 as an IBM adventurous research project and was also selected to be funded by the Defense Advanced Research Projects Agency (DARPA) CAST program (formerly called “Babylon” program).
MASTOR combines IBM cutting-edge technologies in the areas of automatic speech recognition, understanding and synthesis. The tight coupling of speech recognition and understanding effectively mitigates the effects of speech recognition errors and non-grammatical inputs common in conversational colloquial speech (as opposed to well-formed written text or read speech in dictation or broadcast news) on the quality of the translated output, resulting in a highly robust system for limited domains. MASTOR currently has bidirectional English-Mandarin translation capabilities on unconstrained free-form natural speech input with a large vocabulary (over 30,000 words for each direction) in multiple domains, including travel, emergency medical diagnosis and defense-oriented force protection and security. MASTOR runs in real-time on a laptop, and has also been ported to a handheld PDA, with minimal performance degradation. Both versions of the system displayed outstanding performances in the February and August 2004 DARPA evaluations across all criteria, including task completion rate, usability, user satisfaction, etc. The IBM team was also the only team able to present a stand-alone PDA bidirectional speech-to-speech translation system in DARPA CAST program, because it has the most accurate and optimized algorithms and code, so they require the least amount of memory and processing requirements for adequate performance.
The GUI for S2S in Medical Domain
DARPA and the tech community recognize MASTOR as a breakthrough in spoken language translation for its ability to produce bidirectional usable translated output from free-form spoken input on real portable devices. MASTOR has been showcased on many occasions, including CeBIT’2003, DARPATech’2004 and technology demonstrations to U.S. senators and to the deputy director of the Department of Defense. Yuqing Gao, the principal researcher on the project, has received two awards from DARPA CAST program for technology progress. The innovation has also been highlighted widely by the media, including the BBC, an MIT Technology Review article featuring “10 Emerging Technologies That Will Change Your World,” and on National Public Radio's “Marketplace Morning Report.”
Construction of robust systems for speech-to-speech translation to facilitate cross-lingual oral communication has been the dream of speech and natural language researchers for decades. It is technically extremely difficult because of the need to integrate a set of complex technologies – Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Machine Translation (MT), Natural Language Generation (NLG), and Text-to-Speech Synthesis (TTS) – that are far from mature on an individual basis, much less when cascaded together. Blindly integrating ASR, MT and TTS components does not provide acceptable results because typical machine translation technologies, primarily oriented towards well-formed written text, are not adequate to process conversation speech materials rife with imperfect syntax and speech recognition errors. Initial work in this area in the 1990s, for example, by researchers at CMU and Japan’s ATR labs, resulted in systems severely limited to a small vocabulary or otherwise constrained in the variety of expressions supported. Currently, the only commercial available speech translation technology is Phraselator, a simple unidirectional translation device that is customized for military use. It searches from a fixed number of English sentences and plays out the corresponding voice recordings in foreign languages, and cannot handle bidirectional speech.
IBM MASTOR Architecture
Related Publications
Fu Hua Liu, Liang Gu, Yuqing Gao and Michael Picheny. Use of Statistical N-Gram Models In Natural Language Generation For Machine Translation. ICASSP 2003. IEEE, April 2003.
Yuqing Gao, Bowen Zhou, Zijian Diao, Jeffrey Sorensen and Michael Picheny, "MARS: A Statistical Semantic Parsing and Generation-Based Multilingual Automatic tRanslation System," Journal of Machine Translation, Vol. 17, 185-212, 2002.
Ruhi Sarikaya, Yuqing Gao, Michael Picheny and Hakan Erdogan, "Semantic Confidence Measurement for Spoken Dialog Systems", IEEE Trans. Speech and Audio Processing, July, 2005.
Bowen Zhou, Daniel Dechelotte and Yuqing Gao, "Two-way Speech-to-Speech Translation on Handheld Devices", Int. Conf. of Spoken Language Processing (ICSLP), Korea, Oct. 2004.
Liang Gu, Yuqing Gao, "On Feature Selection in Maximum Entropy Approach to Statistical Concept-based Speech-to-Speech Translation,", Int. Workshop on Spoken Language Translation, Kyoto, Japan Oct. 2004.
Hong-Kwang Jeff Kuo and Yuqing Gao, "Maximum Entropy Direct Model as a Unified Model for Acoustic Modeling in Speech Recognition," in Proc. of Int. Conf. of Spoken Language Processing (ICSLP), Korea, Oct. 2004.
Awards
Yuqing Gao:
Principal Investigator of Year 2002 – DARPA CAST Program
Industrial Principal Investigator of Year 2003 – DARPA CAST Program
| News and Information |
"IBM Developing Translation Software", by Lisa Bowman, CNET News.com, April 24, 2003.
"Future Tech: 20 Hot Technologies to Watch", by Cade Metz, PC Magazine, July 1, 2003.
"Voice Ideas: Automating the Tower of Babel", by Dr. Judith Markowitz, Speech Technology Magazine, Sept/Oct, 2004.
"Machines Not Lost in Translation", by Ann Harrison, Wired News, March 9, 2005.
“Soon, you too can speak Chinese - With a little computer help. We test new translation technology.” By Anders W. Hagen, Dagbladet.no, April 25, 2005
“Universal Translator – one of the 10 Emerging Technologies That Will Change Your World,” by Greg Huang, MIT Technology Review, January 2004.
S2S DEMONSTRATION VIDEO CLIP from CeBIT 2004, CeBIT Germany, 2004.

Yuging Gao Researcher 




