IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 

Featured Concept
Erasing language barriers

By Gary Taubes

Erasing language barriersSkim the index of a thesaurus and you'll get a feel for what makes translation such a fusion of science and art: two dozen shades of meaning for a simple adjective like "soft," and nearly three dozen for a word like "mark" that can be either a noun or a verb. With all the idioms and nuances of grammar, syntax and semantics that make a language unique, it should be no surprise that efforts to create computer programs that can translate one language into another have been under way for half a century.

Demand for such programs is at an all-time high. The globalization of business requires endless translations; technical manuals and corporate documents have to be available in a variety of languages. And the dominance of English on the Web makes its reach somewhat less than world-wide. "Machine Translation is getting better, improving global communications among ordinary people working in global environments," says Marshall Schor, senior manager of data mining, machine translation and computer music at IBM's Thomas J. Watson Research Center.

Filling Slots
The Language Analysis & Translation group at IBM Research led by Michael McCord is hoping to solve these problems. They developed the Language Analysis & Translation (LA&T) system which translates between English and Spanish, French, German or Italian. The LA&T system can translate Web pages on the fly and is suitable for heavy use by Web servers. It runs on a variety of platforms, including Win 32, AIX
TM, Solaris®, Linux®, HP, AS/400, OS/2 and VM. An independent double-blind comparison of leading English to German machine translation systems for Web translation gave it the highest rating.

At the heart of the LA&T system is a theory of McCord's known as Slot Grammar. A slot, explains McCord, is a placeholder for the different parts of a sentence associated with a word. A word may have several slots associated with it, and these form a slot frame for the word. "For example," McCord explains, "in the sentence, 'I give the chocolate to you,' the word 'give' has three slots: a subject (I), a direct object (chocolate) and an indirect object (you)."








Try our interface to the online demo. Type a URL into the field and select the language you want the site to be translated into. The Web page will appear with all of the HTML text not appearing in a graphic translated into the selected language.

below, starting with http://
   Translate

For external World Wide Web URLs only. Japanese translation uses King of Translation system. Read an article and disclaimer about the demo on developerWorks.




In order to translate a sentence, the system first analyzes it. For each word, the Slot Grammar parser draws on the word's slot frames to cycle through the possible sentence constructions. For example, the indirect object slot for "give" might be filled by "to you" or by "you." Using a series of word relationship tests to establish context, the system then tries to determine the meaning of the sentence.

The translation for the sentences, "The police officer questioned the suspect," and "The police officer questioned his motives" illustrates the complexity of this task. McCord explains,"In Spanish, if the subject is a police officer and the object is a suspect, then the word for 'question' is 'interrogar.' But if the object is 'motive,' the phrase for 'question' is 'poner en duda.' The program must specify enough tests on all the words and relationships in the source sentence to make the right decisions."

The last task in the translation process is to establish the correct word order. This, too, can be difficult. "Mary became the most popular player" translates into Spanish as "Mary herself has turned into the player most popular."

 
Next Steps

As the LA&T system progresses, different IBM translation efforts are proceeding in Asia. The Tokyo Research Laboratory's approach, known as Internet King of Translation, uses a technique known as pattern-based translation in place of Slot Grammar. The program matches specific patterns against the source sentence — grammatical fragments or phrases, for instance, or specific patterns of words — and then maps those English sentence patterns directly to their corresponding Japanese patterns.

The Homepage Translator system, a real-time, online English-to-Chinese translator developed by IBM's China Research Lab, combines the two techniques. First Homepage Translator parses the English sentence using McCord's English Slot Grammar. Then an English-Chinese pattern-based dictionary is matched against the resulting parse tree and builds corresponding patterns for the translation. This work appears in the new release of Netscape 6. The China Research Lab is now developing the Chinese-to-English complement based on Slot Grammar.

IBM is developing additional technologies that will help writers and editors construct coherent prose and ensure that nothing will be lost in translation. One offspring of Slot Grammar, EasyEnglishAnalyzer (EEA), looks at the English Slot Grammar parse of a sentence to discover ambiguities. It then suggests rephrasing in ways that are easier to translate. An editor interface lets writers click on the appropriate choice, which is then substituted automatically in the text.

When a document needs to be translated into different languages, as is the case with much technical documentation, it is useful to fix the original English source text with EEA before attempting machine translation. That way, potential problems are caught before they are propagated to the many translated versions.

Gary Taubes is a freelance writer who lives in Venice, California.


    About IBMPrivacyContact