IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 

Featured Concept
IBM gets smart about Artificial Intelligence

By Pamela Kramer

Computing Unplugged"Just what do you think you’re doing, Dave? Look, Dave, I can see you’re really upset about this. I honestly think you ought to sit down calmly, take a stress pill and talk things over.”
— HAL, just prior to disconnection, 2001: A Space Odyssey, a film by Stanley Kubrick

Midway through the first year of the new millennium, science reality offers nothing like science fiction’s human-reasoning, chess-playing, art-appreciating, psycho-babbling, Heuristic Algorithm. But IBM Research is actually developing components of the very artificial intelligence technologies that enabled HAL to run the Jupiter-bound mission in the film
2001: A Space Odyssey.

“There are pieces in HAL that can be identified with ongoing projects going after their own more specific goals,” says Chidanand Apte, the data abstraction research group manager at IBM’s Thomas J. Watson Research Center. “Researchers have looked at HAL not as a single entity, but as a collection of key AI applications.”

Setting aside the programming-gone-bad that made HAL a murderously unstable machine, research focuses on practical pieces. Computer face recognition already helps maintain security in the workplace by monitoring the presence and identification of an employee at a particular computer terminal, for example, and soon will help protect us against international terrorism. Programs that allow machines to recognize our speech and understand natural (human) language make it more convenient for us to use them. And in a world exploding with e-everything, systems that weed through, learn from and act upon vast amounts of data can become machine experts that make humans more effective in their jobs and personal lives.

Seeing the future

HAL’s ability to recognize not only the astronauts walking about the ship, but the face in a portrait that one of them has drawn isn’t yet a reality. “We haven’t been able to really mimic general-purpose vision in a machine,” says Ruud Bolle, manager of the exploratory computer vision group at IBM Research. “More task-based vision machines are built to perform a specific task, and do it very well.”


Further reading:
Face recognition, still early in its development, works only when a face is positioned carefully in front of the computer’s camera. The computer matches the face with data representing facial features and their geometrical relationships. Researchers are working toward recognition of faces in a larger environmental context — to identify known terrorists in an airport, for instance. But “it will take some time,” Bolle says.

At this point, just finding a face amid other “objects” in a scene is complex. Computers have to string together subtle clues, says Jonathan Connell, a computer vision researcher who helped develop technology for a face-finding video browser. Using the browser, the computer looks for a pinkish color that suggests the possibility of a face, regardless of a person’s race. But color-matching alone might as easily select a brick. Connell explains, “Consider someone wearing a tank top; bare arms could trick a browser based on color alone.” To further refine its search, the computer looks for other clues — for instance, dark bars where shadows from the eyes, nose, mouth and chin appear. The browser quickly searches hours of videotape to find a specific segment. For instance, a user who wants to locate an interview segment can make the browser look for clips in which just two faces appear. Combined with speech recognition technology on the audio track, it can find faces that are talking about specific topics.

Sighting speech

Computer vision is important to speech recognition, too. Visual cues help computers decipher speech sounds that are obscured by environmental noise. Chalapathy Neti, manager of IBM’s audiovisual speech technologies (AVST) group at Watson, often cites HAL’s lip-reading ability in
2001 in promoting the group’s work.

Lip reading can reduce confusion among the sounds that make up words (phonemes) when other noise intrudes. In one AVST project, every 10 milliseconds the computer receives 10 to 12 values representing what’s visually important about a speaker’s mouth — the shape of the lips, whether they’re open or closed, the positions of the teeth and whether they’re in contact with the lips, and what the tip of the tongue is doing. The computer recognizes possible visemes, or word sounds that are visually distinguishable. It weighs the viseme data with audio data representing phonemes. Finally, Neti says, it combines that information with a language model for “a final hypothesis of what was actually said.”

The AVST group has found dramatic improvement in speech recognition when a computer is fed a combination of audio and visual signals, rather than relying on audio alone. (The same is true for speech recognition by people, Neti says.) Because speech recognition is being deployed as the “user interface of choice” in a variety of pervasive environments, he says, its accuracy needs to be improved. Working algorithms already demonstrate significant improvements in overcoming “speech babble” noise, and in settings with more than one speaker. Neti says he expects a real-time prototype using the technology later this year. He predicts that IBM desktop products like ViaVoice
® will be the first to use it commercially, followed by future embedded offerings.

Words to the wise
Understanding
what is communicated — whether it's spoken or written — and responding to it intelligently pose a different set of problems.

“When you think about the knowledge you need to understand sentences, you need a whole lot of information about the world,” says Leora Morgenstern, a researcher in the common sense reasoning group at Watson and chair of the Artificial Intelligence Professional Interest Community. Morgenstern illustrates the problem with a series of simple sentences: “Susan saw the dog in the window. She pressed her nose against it. She wanted to buy it.” Our life knowledge tells us that Susan pressed her nose against the window, not the dog; and that she most likely wanted to buy the dog, not the window.

But telling this to a computer is no easy task. In finite areas of expertise, however, computers can be given enough knowledge to contextualize well, so that they can reason with new information they encounter. Morgenstern and Moninder Singh are working on a patent for a program that helps its users dynamically configure portfolios to maximize investment goals. The system tells the user about all of the tax rules that apply, how they interact, and what to do if the rules are in conflict. “You're not giving it a canned set of rules," Morgenstern says. “On the spot, when it's trying to connect combinations it hasn't seen before, it reasons about how the rules interact.”

Salim Roukos, a natural language researcher at Watson, is working on conversational interfaces. One example is the the DARPA (Defense Advance Research Projects Agency) communicator research prototype, which makes travel arrangements using natural language understanding and a dialog manager — a “brain” that consults its databases to figure out the appropriate answers to the user’s questions. It supplies the computer’s end of the conversation by printing a page, displaying a picture or responding over the phone.

The system is fed a few thousand sentences that people may use to make travel plans, and the meanings — assigned manually — of those sentences. A statistical model relying in part on sentence diagrams allows it to infer the most probable meaning of sentences that don’t precisely match its example sentences. For instance, it will infer that if the word “arriving” appears in a sentence between two time elements, the second time element is the desired arrival time.

“You can always stop it by asking, ‘How is the weather?’” Roukos says, because the weather is outside the computer’s expertise. But within its air-travel domain, its accuracy is between 80 and 95 percent on new sentences.

Arranging flights from New York to Chicago may not be as glamorous as running a mission to Jupiter. However, conversational interfaces with content and knowledge can make life on Earth a lot smoother. One system pages a user when an e-mail arrives requiring a schedule change, scans the e-mail and can then answer the user’s questions about it. The computer can then check the user’s calendar and, if the schedule allows, adjust travel plans accordingly.

Thirst for knowledge

IBM has played a key developmental role in knowledge discovery and data mining. This discipline combines machine learning technology and pattern recognition with statistics and database management on a large scale in highly automated systems. The systems gain insights — patterns or rules, for instance — that are not explicitly apparent to humans. These insights enable the systems to make predictions that give businesses a competitive edge. Data abstraction manager Apte gives an example: Sports car owners historically have been hit with high car-insurance premiums. But mining an insurance company’s database may show that middle-aged, high-income professionals who own sports cars are good risks; their cars are hobbies and spend most of their time in the garage. The company gains an edge over the competition in offering that demographic segment lower premiums.

One data-mining project at Watson has developed a capability for automatically generating predictive rules from complex data. This capability has been used in industry-specific solutions for insurance risk management, as in Apte's sports car example, and to enhance targeted marketing for direct mail campaigns.

Researchers are currently focusing on how to eliminate the need for human experts to write business rules for decision systems. The intent is for search algorithms to analyze data to select the combination of factors for the best business result, and automatically write those out as rules. Several projects in IBM labs are working toward systems for this automatic rule generation.

“We all know that HAL didn’t happen in 2001,” Apte says. “I think we will be there eventually. The only part we may not achieve is the emotional aspect. But progress in achieving many of these other HAL-like capabilities is creeping forward. … However, without the emotions, it’s only a computer — it’s still controllable.”


    About IBMPrivacyContact