IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
Editor's corner

By Rowan Dordick

The ability to use language has often been cited as the defining characteristic of human beings, a difference that separates us from all other forms of life. Similarly, it is becoming increasingly clear, speech-enabled applications are creating a comparable distinction in technology. Computers with which we can interact via speech will outshine and out compete those not so enabled. So far, dictation - the ability to enter text without the need for a pen or a keyboard - is the most visible speech application, and with good reason. But other uses of speech in human-computer interaction are emerging that may have an even greater impact on computing.

While the ability to talk naturally to a computer, or to any device for that matter, and have one's words understood as well as recognized, is the ultimate goal, it is also the most distant. Nevertheless, the advantages of other numerous, if less momentous, achievements, are significant. In some cases, the ability to use speech can make the difference between using a computer or working without one. For example, the sheer difficulty of using a keyboard to write characters has been a serious impediment to computing in China and Japan.

This issue includes accounts of several accomplishments that, in one way or another, represent milestones in the development of speech technology. The main story on the subject recounts the challenges of creating the first continuous speech recognition programs for Chinese and Japanese (see "Words Out of Characters"). Despite being able to build on the work done for English speech dictation products, handling the additional complexities found in both of these Asian languages - such as tones in Chinese or multiple ways of writing in Japanese - required substantial innovation. While speech technology benefits all languages, the potential gain in user productivity is greatly magnified in both Chinese and Japanese because of the immense problem of inputting the characters.

Although speech-recognition programs are typically regarded as substitutes for keyboard entry, that task reflects neither the full scope of how humans use speech nor the potential of the technology. A program designed to help children learn how to read is an example of how specific speech-enabled solutions can greatly extend the scope of particular kinds of applications. In most cases, children learn to read by being read to and by practicing what they have heard. Watch Me Read, a program created at the Thomas J. Watson Research Center, makes no pretense of replacing parents or teachers, but through the use of speech recognition it offers children a degree of interactivity and feedback that could not otherwise be obtained in many classroom settings (see Solutions). Being able to talk to a specific application is a first step toward fully comprehending computers. While speech-enabled command and control programs with limited capabilities are available, the full scope of this type of interface is just being explored. At the fall Comdex show, Mark Lucente of Watson demonstrated a multimodal user interface that allows the use of both voice and gesture to interact with a computer (see "LabNotes"). The brief description of his work in this issue will be supplemented with a feature story in the next. Other subjects in the next issue include systems management, novel approaches to lithography, future scenarios for e-commerce, techniques for classifying email and more.




    About IBMPrivacyContact