IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
Transparent Computing
COVER STORY: Transparent Computing

By Gary Taubes

Responsive to your every word and gesture, future interfaces will let you focus on tasks instead of technology.


Communication is often described as a two-way street, but the analogy of a multilane highway would be closer to the truth. People communicate by words and vocal inflections, by subtle bodily movements and nuances of gaze, by touch and gesture, and even by uncontrollable reactions like blushing or falling speechless.

By comparison, human-computer interaction is an impoverished affair. The primary interfaces—a keyboard and pointing device—not only force the user to master new techniques but they restrict the range of the interaction. "The major obstacle facing the user today," says Barton Smith, manager of the human interface research group at IBM's Almaden Research Center, "is not how fast we can process information and how much memory we can put in, because we are making progress in those areas at a tremendous rate, but rather the usability of the systems—making them do what we want them to do."

Scientists at IBM Research labs around the world are working on a wide range of human-computer interaction technologies that are helping to make it easier to get computers to do what we want them to do. And a logical way to accomplish that is to get the user interface out of the way. "A good interface," says Shumin Zhai, a member of the user systems ergonomic research (USER) group at Almaden, "is one that's transparent. That means it is so good you don't notice it, allowing you to fully concentrate on the task at hand. That's what we're after." From advanced speech recognition and input devices and interaction techniques to "attentive environments" and new kinds of pervasive computers, IBM Research projects are aiming to bring such transparency to human-computer interaction.

"Behind all this work," explains Jim Spohrer, senior manager of the USER group, "is a simple guiding principle: to look at every conceivable information processing task, every way in which an individual or a group might use a computing device, and imagine how that interaction could be simplified, facilitated and made more productive, as well as more enjoyable."

IBM's ScrollPoint Mouse is one of the latest examples of the products to emerge from this activity. It grew out of the realization, says Zhai, that the process of scrolling through a document is annoyingly inefficient. It isn't transparent. You have to divert your attention from the immediate task, which is reading the document, to push a key or click on a scroll bar.

The solution—which was driven by intensive human-factors studies—was to put a scrolling joystick on the mouse itself. The saddle-shaped control responds to finger pressure exerted forward or backward, as well as left or right. "You can easily position your finger on it without looking at the mouse," says Zhai. "It's also very efficient, because it embodies the concept of rate control: the harder you push, the faster the document scrolls."

The first model of the ScrollPoint Mouse came out in 1997. An improved version, with higher control resolution and continuous scrolling, was shipped at the end of 1999. And the ScrollPoint Pro, released in 2000, adds a sculpted shape for greater comfort.

LOOKING AT EYES

One way to reduce the attention humans need to devote to interacting with computers is to shift the burden of work to the computers. For that to occur, interactions will need to be increasingly multimodal, involving not just hands to type or to manipulate a mouse, but voice, hearing, sight and other means of sensing. Already, voice recognition technology like IBM's ViaVoice allows a computer to take dictation. While that can simplify text entry, IBM researchers believe that within a few years voice will be used to issue sophisticated commands to computers and even to converse with them.

Important as voice is, human communication is heavily dependent on nonverbal cues. IBM's BlueEyes research project began with a simple question, according to Myron Flickner, a manager in Almaden's USER group: Can we exploit nonverbal cues to create more effective user interfaces?

One such cue is gaze—the direction in which a person is looking. Flickner and his colleagues have created some new techniques for tracking a person's eyes and have incorporated this gaze-tracking technology into two prototypes. One, called SUITOR (short for Simple User Interest Tracker), fills a scrolling ticker on a computer screen with information related to the user's current task. SUITOR knows where you are looking, what applications you are running, and what Web pages you may be browsing. "If I'm reading a Web page about IBM, for instance," says Paul Maglio, the Almaden cognitive scientist who invented SUITOR, "the system presents the latest stock price or business news stories that could affect IBM. If I read the headline off the ticker, it pops up the story in a browser window. If I start to read the story, it adds related stories to the ticker. That's the whole idea of an attentive system—one that attends to what you are doing, typing, reading, so that it can attend to your information needs."

The second prototype, known as magic Pointing, allows your eyes to affect the movement of the cursor, a longtime dream of computer designers. Having the eyes wholly determine cursor movement turns out to be impractical because many eye movements are involuntary. "If you try to control the cursor with your eyes, you'll quickly find yourself exhausted and confused," says Zhai, who has led the magic Pointing research. Instead, magic Pointing settles for less to achieve more.

The gaze-tracking cameras follow your eyes, but your fingers have to be involved as well. As soon as you touch the pointing device, the cursor jumps precisely to the point of your gaze. "What's 'magic' about it," says Zhai, "is that the user doesn't know why and how the cursor moves, but suddenly it's there. That saves you the time and effort of moving the cursor across the screen."

Flickner expects that within five years, eye-tracking technology will be common, if not ubiquitous, as people find themselves living and working in "attentive environments," places that contain a multiplicity of computing devices. Spohrer agrees. "In an attentive environment," he says, "gaze tracking simplifies the task of speech recognition. If a device knows it's being looked at, it can, in effect, pay attention. That way, every device in a room won't be struggling, for instance, to make sense of the 'play' command that you're giving your VCR."

A more immediate use for computer vision technology in human-computer interaction is among people with disabilities that make the current interfaces difficult or impossible to use (see "Eyes on aging"). The existing technology for people who have only limited movement of their head or hands involves what are known as single-switch interfaces. "You turn the interface into single-switch mode," says Rick Kjeldsen of the exploratory computer vision group at IBM's Thomas J. Watson Research Center, "and the cursor automatically goes from hot spot to hot spot on the screen, each of which represents a different action. When it lands on the desired action, the user touches a switch, which might be next to the user's head." But for some people, even a switch requires too much strength and coordination to control. Besides, such contraptions are expensive and cumbersome to set up.

Kjeldsen has devised a system that replaces the physical switch with an inexpensive IBM PC camera and a vision system that can be trained to recognize the most subtle of movements. The camera is aimed at whatever part of the body the user can move, and the user "triggers the switch," in effect, by moving the body part within a target area of the camera's field of view. "You can put the target next to their thumb, for instance" says Kjeldsen, "and you can bring the camera so close that a very small twitch of the thumb will activate it. No force at all is needed. In some cases, that's very important." The technology has been released as a product by Edmark Corp., an IBM company.

Kjeldsen and his IBM colleagues are looking to push the technology much further. One concept is to use the vision system to track faces, so people with impaired vision can move a magnifying glass around the screen by dragging it, in effect, with their nose. "That way, they can leave their hands on the keyboard and mouse and interact with the system normally," says Kjeldsen.

The long-term dream of Kjeldsen and his colleagues is to replace keyboard and mouse for all users with voice recognition and gesture recognition programs. "Given the ability to recognize where you're looking and what you're doing and the position of your hands and fingers and their gestures," he says, "we're asking what's the best way to combine those things into an interface that works smoothly and seamlessly."

IN TOUCH WITH EMOTIONS

Researchers are also exploring just how far the concept of attentive computing can be carried. Is it possible, for example, that a computer could gauge your emotional state, slowing down or speeding up the presentation of information, or prioritizing it, depending on whether you were tired, anxious or in a hurry? Might a learning program sense a student's difficulties and modify its teaching style accordingly? Wendy Ark, a member of the USER group, thinks that these are indeed realistic expectations and that the ability to sense the user's emotional state is an important way of enhancing human-computer interaction.

Ark and her colleagues have been measuring the heart rate, temperature, galvanic skin response and minute body movements of test subjects, and then matching the measurements with six emotional states: happiness, surprise, anger, fear, sadness and disgust. Now, the researchers are developing devices that can easily take such readings from a person's hand. Ark points out that a set of sensors, including infrared detectors and temperature-sensitive chips, can easily be placed in different locations or devices—office chair, keyboard, mouse, phone handle and so on. "Once contact is made," she says, "the computer learns about the user's emotional state. Over time, it can learn about the user's personality, likes and dislikes."

Such technology has many potential applications, says Spohrer. "Any activity in which our emotional or physiological state is apt to affect our performance is an obvious candidate for this kind of technology." Sitting in front of a traditional computer is just one area, Spohrer adds. "A steering wheel with a built-in emotion detector could sound an alert if the driver became drowsy or, depending on how the car's computer is programmed, might suggest that the driver pull over to the side of the road and relax if the sensors detected tension or anxiety," he says. "Similarly, a Web-based help desk could provide better service if it received input about the user's level of frustration. This is all part of the attentive environment, taking advantage of nonsymbolic input into the computer to create more intelligent services."

EMBEDDED INTELLIGENCE

The notion of a computer is changing. The traditional image of a box, a screen and a keyboard is rapidly being replaced by concepts wherein computing power is distributed among a multitude of devices. Even the idea of specialized computing devices is likely to become outdated, as microprocessors find their way into all sorts of objects. Indeed, such chips may one day be as pervasive as bar codes are today, which will make better forms of human-computer interaction even more pressing.

IBM researchers are working on a spectrum of experimental computing systems that will take computing off the desktop and the laptop and place it anywhere in the house, car or environment, and they are working on the communication systems that will link all this information together and make it accessible from anywhere. For some people, the idea of being continuously connected to a computer is unnerving. To others, it is practically a necessity. One's attitude is likely to reflect one's experiences.

To Mitch Stein, the defining moment came with the purchase of his first alphanumeric pager, several years ago. Stein, who is now senior manager of the Life Networking group at Watson, says when he showed up at work wearing his new pager, his co-workers showered him with condolences. "It must be really horrible to be so tethered to your job that you have to have a pager," they'd say.

"My response to these people," Stein says, "was that it's a beautiful afternoon, and we're sitting in the courtyard having a cup of coffee, and I'm not running back to work because I don't have to be tied to my desk waiting for critical information—it comes to me. For me, the pager was a liberating experience."

Now Stein and a host of researchers at IBM are aiming to make personal computing an equally liberating experience. The basic idea, known variously as pervasive or ubiquitous computing or what Stein calls life networking, was perhaps first clearly articulated at the Xerox Palo Alto Research Center in the late 1980s, but it has been embraced and extended by scientists elsewhere. The goal is to do for digital information what the mobile phone has done for the human voice, says Tom Zimmerman, a researcher in the USER group. Because the pervasive-computing revolution is poised to fundamentally change the way humans interact with computers, the value of voice, gaze, gesture and emotional interactions is growing. All the innovations that are simplifying desktop computing will be even more crucial in the pervasive world, in which computers will be encountered in unfamiliar settings and often may not even be visible or immediately recognizable as computers.

Stein and his colleagues are working on a prototype of a device they call the InfoPortal, which Stein describes as a "super personal information manager." This flat-screen computer will be able to sit on your desk or hang on your kitchen cabinets and serve as a single point of access for the entire spectrum of information that might come your way: email, faxes, voice mail, computer files or Web pages. Later versions will be mobile, says Stein. "There could be one for your car and even one you could wear."

Stein's group is now working on adding a "sensory bezel"—an amalgam of electronics, including a video camera and a fingerprint reader, that lets the InfoPortal sense the presence of users and adapt the interface according to their distance. When the user is across the room, the InfoPortal will display large fonts and enable speech interaction as the main navigation tool. As the user approaches, the font will automatically scale down, allowing more information to be displayed on screen. The person will then be able to steer the cursor by means of "touchless pointing" (especially useful when your hands are coated in cookie dough, and you need to turn the page on your electronic cookbook).

The InfoPortal is just one of a range of new packaging concepts—or form factors—that IBM researchers are experimenting with, from coffee-table computers that control all your entertainment systems to "watchpad" computers that will provide many of the functions of personal digital assistants (PDAs) in a compact form. "Several of these projects are based on packaging technology in new ways so users can interact with the devices and the services they provide in new ways," says IBM distinguished engineer John Karidis. "Before experimenting with these new forms," he adds, "we try to think about the most convenient possible ways for people to interact with information and computers, not just about how they currently interact with them."

Karidis himself has been working on several new computers, including a watch computer, a home computer that combines the convenience of a touch-screen kiosk with the functionality of a desktop computer in a sleek, lightweight package that can be placed anywhere in the house and a mobile phone that lets users view full-screen images of Web pages. "Today's Internet-enabled phones have very limited display capabilities and require special services to translate and reformat normal Web site information for their small, low-resolution displays," says Karidis. "But it's technologically feasible to build a mobile phone that, while being held comfortably to your ear, allows you to view what appears to be a normal 12-inch notebook display at a distance of about 20 inches."

The phone concept displays images by means of a miniature high-resolution display and special optics built into the flip-phone cover. Navigation combines speech recognition with a thumb-operated TrackPoint® pointing device. Although Karidis says the technology is not quite ready to allow the phone to be manufactured cost-effectively, he adds that "it's only a matter of time." Other, nontechnical issues will need to be resolved as well. "We have to consider the obvious safety concerns with having people reading a screen on a cell-phone while driving," Karidis says, "or the possible social awkwardness of looking at information projected through the cover of a flip-phone. We're trying to think far enough out in the future with these new form factors that we can explore a wide range of possibilities."

CONVERSING WITH MACHINES

The trend toward pervasive computing is driving research into ever-more-natural forms of human-computer interaction that require ever-more-sophisticated technology. The use of voice is a good example. "We're bringing speech recognition and conversational technologies to a variety of different pervasive computing platforms," says Ponani Gopalakrishnan, manager of the pervasive speech technologies group at Watson. "Imagine first a device or a computer that responds to spoken commands, such as 'turn on the lights' or 'open my email.' The next stage would be a conversational system, where you engage in a dialogue. For example, you might say 'send this message to David,' and it might respond 'I know of two Davids. Which one do you mean?'"

Such technologies allow for hands-free computing as well as using commands that aren't limited to what you can type in or click on a screen. The technological challenges include creating highly accurate speech recognition systems that will work despite background noise, and designing systems that will work along with point-and-click or other interfaces.

Gopalakrishnan's group are designing software that will allow voice access to Web sites on handheld devices. "We now have to look at how to implement browsers that can support that level of voice interface," says Gopalakrishnan. A consortium of companies, including IBM, are taking a step in that direction by defining a new markup language, voice XML, for expressing voice interfaces. The group is also working to extend this to multimodal interfaces.

Already, Gopalakrishnan's group has designed a voice-interface module that plugs into the serial port of a PalmPilot or IBM WorkPad® PDA. The device could be a reality within a year, says Gopalakrishnan. He adds that conversational interfaces should be available for a wide range of pervasive computing devices within five years. "There should be many different levels of technology available on different devices," he says. "Some may be fairly simple, like just dialing by voice on a car phone, all the way to having full conversational interfaces with large-vocabulary natural-language understanding."

THE RIGHT INFORMATION

As computing devices take on more portable forms, the importance of supplying them with the right information at the right time and in the right form will become paramount. Part of the job will be done by intermediaries —systems that sit between the browser and the server and that "transcode" Web pages, or reformat them for the limited screens of PDAs or mobile phones. Rob Barrett and his colleagues in the USER group have designed one such system, known as Web Browser Intermediary, or WBI (pronounced "webby"). "All your requests for information, your form submissions, the documents you see, they all flow through WBI," says Barrett. WBI can do anything from translating such information into a different language to presenting only highlights that can be easily browsed on small screens.

But pervasive computing devices will also have to communicate with one another, and with their environment. "If you're waiting to be seated in a busy restaurant," says Tom Zimmerman, "you might want your PDA to display the menu. If you're walking into Grand Central Station, you want to see the local train schedule, not the train schedule for Tokyo." The solution is context-specific and location-specific information. In other words, says Zimmerman, "what's needed is a technology that filters out all but the information relevant to us in space and time."

A key enabling technology is a new form of radio communication. IBM is among the five founding companies of an international consortium known as Bluetooth that has created a standard for low-cost, low-power digital radio communication between devices (see Think Research, Number 1, 1999, "Unwired"). "The signals go only about 30 feet, which by default gives you location-specific information," says Zimmerman.

Ultimately, the quest for truly transparent computing could lead to what Zimmerman calls a personal-area network—a spontaneous and ever-changing system of "contagious information." Short-range wireless communication will enable every device in your house to synchronize instantaneously with every other device. "It will create a digital cloud of information," says Zimmerman. "As devices come into my digital sphere, they will update each other automatically. If I have to go on a trip, I'll grab my PDA and I'll know that I have everything I need for the trip: my presentation and papers, my hotel and airline reservations, and so on. And when I get there, I know I'll have everything I need to know about the place I'm staying. With the new technology, not only will the world be at your fingertips, but it will be that part of the world you need, when you need it."


USER group, Almaden Research Center
IBM Systems Journal, pervasive computing issue


Gary Taubes, a freelance writer who lives in Venice, California, is a frequent contributor to Think Research.


    About IBMPrivacyContact