|
 |

MASTORing languages
| |
IBM's history with speech recognition technology goes back over 40 years – to its first demonstration of an early discrete translation system at the 1964 New York World's Fair. At the time, the "shoebox recognizer" could identify the spoken digits zero through nine – for a total of ten words.
Speech recognition has made great strides from those humble beginnings, but travel and the Internet are helping to flatten the world, so more robust and comprehensive tools are needed. Going through airports or getting sick in a foreign country and not being able to understand or communicate can be, at best, inconvenient and, at worst, dangerous.
Yuqing Gao, manager of speech recognition and understanding for IBM Research, had been working on speech technologies for many years when, in 2000, she felt as if speech recognition technology had reached a plateau in its statistical approach. She believed that speech recognition needed new perspectives that could strengthen it for practical uses, such as language-independent semantic meaning representation, dialogue and pragmatic context, and more. She viewed speech translation as an integrated task of speech recognition and understanding. To break through that plateau and go beyond what had existed before, Gao and her team worked to add semantic meaning and context to the work -- a goal they have accomplished with MASTOR (or Multilingual Automatic Speech-to-Speech Translator).
Of course, if this were easy, it would have been done long ago. Gao admits there are a number of hurdles, referring to recognition and translation as "two open-ended challenges."
"Look at recognition alone," she said. "There are different dialects, accents, some low voices, some high pitched, older voices, a speaker who uses slang or who slurs or who speaks slowly. And, no matter what, there is always background noise."
To work as intended, the system needs to recognize English and translate it to Arabic or Chinese and then recognize Arabic or Chinese and translate it back again. But Arabic dialects, for instance, are mainly oral, not written. The written language is very different from the spoken language and those issues had to be dealt with during development. To overcome the difficulties, the team invented algorithms and filed over 10 patents specifically related to these problems.
On top of that, researchers had to create a tool that could work without unlimited resources. "If someone could carry a Blue Gene supercomputer around, then the algorithms could be numerous and complex," explained Gao. "But we had to shrink and optimize the algorithms to make sure the application could run on a very small device. We optimized the code for laptops, tablet PCs and handheld devices. The target is to optimize it further to work on smart phones."
In fact, Gao sees the future of MASTOR in smaller and smaller devices, and in more languages. MASTOR can manage Mandarin Chinese and Arabic pretty well, which are complex languages. Asian languages, in general, are more difficult than Romanic or Germanic languages as there are more local dialects and the syntax and grammar is very different. "Now that those are done," Gao confided, "the European languages will seem easier to tackle." In an effort to lay the groundwork for expanding the language offerings, the team aimed to formalize the algorithms and create approaches that would be language independent.
While there are other solutions that attempt to address the same problems, they are mostly one-way fixed phrase translation tools, meaning that the English used has to be from pre-selected patterns using pre-selected sounds. MASTOR is two-way free form translation, meaning both speakers can say anything they want.
The technology is mostly software, but microphone and device quality are very important. The device needs to be small enough and, of course, the application is computation intensive, so powerful enough too. As the hardware industry has taken great leaps forward in device miniaturization, the possibilities have grown.
Gao offers many suggestions for how MASTOR might be used.
"Businesses with global knowledge workers could use this tool for conference calls or video conferences, which would be much easier if employees could speak in their native languages. For companies with stores around the world, this would help their businesses grow faster and easier. And as India's economy matures and grows, other parts of the world could use MASTOR to gain an edge for the call center jobs. It could also help with the shortage of doctors in third-world nations as they wouldn't have to learn the local language before going, which could open up whole new fields of opportunity."
MASTOR is in a trial deployment for communication purposes for the military. By easing communications, this tool can help personnel provide essentials, such as food, clean water, clothing, electricity and medical treatment. In addition to this important outreach, better communication can also make it possible to provide jobs for local populations. Gao is pleased with the way MASTOR has been used in the market. "Even something like sending troops overseas can be made better with MASTOR. If military personnel can communicate with local people, it will eliminate unnecessary conflicts caused by language barriers. If this technology can help people communicate better, then we could have a better world."
|