IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
Pipeline

By Paul Kallender,Katherine Silberger ,Xu Fang,Dennis Normile and Terence Finan

A Seeing Eye for the Web

The Nanoworld on a Shoestring

Automated Web Publishing

Is It Live or Is It Text to Speech?

IBM Software + Web = Science Online

Why the ThinkPad Keeps Coming and Going


A Seeing Eye for the Web

While most of us take the World Wide Web for granted, this invaluable information resource has been closed to the visually impaired. But new software developed by a team at IBM's Tokyo Research Laboratory makes the online world accessible at last. "When I discovered the Web and found how useful it could be," says team leader Chieko Asakawa, "it occurred to me that a Web-reading environment would be a great help to blind people." The resulting product, Homepage Reader, is available in a Japanese-language version. An English version is in preparation.

Asakawa, who is blind, says she was inspired by IBM's Screen Reader™ 2, which enabled her to access the Netscape Navigator™ Web browser. The problem was that the software reads text-based information only, and was therefore not suited to the Internet's multimedia environment, which contains embedded images and hyperlinks.

Asakawa and her team set out to design a system capable of interpreting the Web's special coding. One of the biggest challenges, Asakawa notes, was devising a simple way to navigate. "Remember, blind people can't use the mouse," she says.

The team's solution was to use the computer's numeric keypad. Once online, Homepage Reader announces the default home page, and the surfer can use number keys to back up and move forward through pages, lines and individual characters. "If I want to read something, I just double-click key 2 and the system will read the page from the beginning," says Asakawa. Other keys let the user fast-forward, rewind or jump to the next link on the page. When the reader arrives at a hyperlink, the voice switches from male to female -- a cue that Asakawa deems "intuitive and natural."

As part of an outreach program, IBM has already trained some 400 people in the use of Homepage Reader, and plans to train another 1,500 over the next year. Asakawa says users can master the basic functions in less than an hour. The biggest hurdle, she says, is explaining the concept of a hyperlink -- or, indeed, the Internet itself -- to some visually impaired users. But the rewards are enormous. "One trainee," Asakawa reports, "said Homepage Reader made him feel empowered. He had never imagined he would ever be able to access the Web, let alone so easily."

-- Paul Kallender

FYI: http://www.research.ibm.com/topics/innovate/hci/


The Nanoworld on a Shoestring

THE ATOMIC FORCE microscope (AFM), which can image nanometer-scale surfaces and structures, has become a basic tool of science and industry. But the instrument has been something of a luxury. With a typical price tag of $100,000, the AFM has been too costly for many corporate laboratories, let alone for schools. In May, however, Seiko Instruments introduced a novel AFM -- initially for the Japanese market -- that lowers the price barrier.

As the result of a joint effort with IBM's Zurich Research Laboratory, where the development of the low-cost AFM was begun by Gerd Binnig and Walter Häberle in 1994, Seiko Instruments is now manufacturing and marketing the system, called Nanopics, at less than a third of the price of previous AFMs. One cost-saving innovation is the use of integrated sensors, instead of lasers, to measure the minute deflections of the microscope's cantilever caused by the forces acting between itself and the surface atoms. (Those tip movements are translated into an image of the contours of the object under study.) Nanopics also replaces costly piezoelectric elements with inexpensive voice coils -- similar to the drivers in stereo speakers -- to control the tip movement. To cut costs further, only the head with the cantilever, not the sample itself, moves during scanning, and images are captured through simple video-quality recording.

Another innovation is a compact design that reduces the AFM's sensitivity to vibration. This places less stringent requirements on the operating environment.

"Our collaboration with Seiko Instruments," says Binnig, "has produced a microscope that we believe will become standard equipment in many laboratories." Binnig, who invented the original AFM and developed it with his Zurich colleague Christopher Gerber and with Calvin Quate of Stanford University, predicts that Nanopics will allow even high-school students to explore nature on the nanometer scale.

-- Katherine Silberger


Automated Web Publishing

Converting printed materials such as books and catalogs into Web pages is about to become a lot simpler, thanks to a tool being developed at IBM's China Research Laboratory, in Beijing. With current Web publishing products, such as Microsoft's FrontPage98®, text and images must be input separately, and a new page layout must be created using HTML tags. Generating the hyperlinks for navigating the document is yet another time sink. IBM's prototype Web publishing tool speeds up the process by automatically recognizing a document's component parts -- table of contents, headings, page numbers, text and images -- and structuring the Web version accordingly. Although the work is aimed at developing a product for the Chinese market, the technology could be adapted to any language, notes Hui Su, who leads the team working on the prototype.

To create a Web page, a printed document is first scanned into a computer. An optical character recognition program developed at IBM then maps the document into blocks of text, images and titles. (The program can recognize more than 4,000 Chinese characters in six fonts, or about 99 percent of the characters used in general text.) Finally, the publishing tool restores files to their original layout, but in accordance with a chosen Web format such as HTML.

No Web page is complete without hyperlinks, and the publishing tool simplifies their addition to a page. Users can create a list of URLs, together with related strings of text. The tool then automatically detects those strings on the page and inserts links to the desired URLs. Hyperlinks can also be added manually.

The tool is especially useful for publishing book-length documents. "We have used our single-page technique to automatically build a 'Web book,'" says Su. Because the tool recognizes tables of contents and page numbers, it can create links that take the reader directly to a chapter or page of interest. As the technology matures, large virtual libraries will become easier and easier to build.

-- Xu Fang


Is It Live or Is It Text to Speech?

In their quest for customer-service perfection, three large Japanese banks have raised their automated telephone banking systems to a new level of user friendliness. When customers of Fuji Bank, Sanwa Bank and the Bank of Tokyo-Mitsubishi call in to check their account balances or transfer funds, they hear not the usual disjointed "robot speak" but a natural-sounding human voice. The secret is a new text-to-speech (TTS) system developed at IBM's Tokyo Research Laboratory.

The basic technology was originally used in IBM's award-winning Japanese program called ProTALKER, which reads PC text aloud. But as researcher Takashi Saito explains, adapting the technology to server-based telephony required considerable tweaking.

At the outset, the research team recognized that even state-of-the-art text-to-speech technology falls short of a recorded voice. So the system uses prerecorded words and numbers, combining them with personal and company names generated by TTS. To smooth transitions, the same speaker who records the set phrases also records the phonetic dictionary -- called the voice font -- that is used to generate the TTS names.

One key innovation is a simple method for automating the creation of a voice font. The TTS engine uses a database of 1,600 words recorded for the voice font and segments them into phonetic units. Each unit, made up of a consonant-vowel pair, is classified according to context, which can affect the way the sound is articulated. A syllable may be subject to 100 or so contextual variations. Automating the creation of a voice font reduces the time and effort involved in configuring or modifying a telephone banking system.

In developing the new system, Saito's team overcame a common TTS problem: the distortion that arises when phonetic units are strung together. The researchers patented a method of connecting units so that pitch shaking and rumbling sounds are minimized. Saito says the technique could be applied to any language in the world.

Another modification was optimizing the voice for the narrow, 4 kilohertz bandwidth of telephones. The researchers also improved processing by configuring a server that is dedicated to TTS and connected by a high-speed network to a Direct Talk/6000™ call transaction handler.

The Tokyo group is working to improve the product even further. Saito says the next step will be to refine the prosody -- the rhythm and intonation -- of TTS speech. Within three to five years, he hopes to raise TTS quality to the point where the recorded speech can be eliminated.

-- Dennis Normile


IBM Software + Web = Science Online

SINCE THE FIRST days of the Web, researchers and students have been frustrated by a flaw in Web browsers: their inability to display the elegant symbols of math and science as easily as they do regular text. Now, IBM is providing a solution in the form of the techexplorer Hypermedia Browser™ -- software that displays mathematical expressions and scientific documents in a variety of electronic media, including Web pages and CD-ROM-based textbooks.

The techexplorer software builds on the TeX and LaTeX formatting languages used in scientific and technical publishing, and adds features such as hypertext and multimedia. The software is also starting to support the Mathematical Markup Language specification from the World Wide Web Consortium. This new standard makes the creation and display of equations easier and more interactive than is possible using plain HTML and embedded images.

The new browser is available in two versions, each offering a different level of functionality. The Introductory Edition is a no-charge viewer that lets users read interactive documents containing text and mathematical expressions. The commercial Professional Edition offers other features as well, including an interface that allows users to write Java™ applets that interact with techexplorer. That edition contains a search engine, supports printing and has an extension mechanism for communicating with other applications.

IBM expects the Professional Edition to be the foundation of a new generation of interactive textbooks and online journals. "Our goal is to introduce novel forms of interactivity into technical and scientific documents, so that researchers and students can actually solve problems within their Web browsers," says Bob Sutor, a research staff member and manager of the interactive scientific publishing group at IBM's Thomas J. Watson Research Center.

The company is working with several partners to explore the possible uses of techexplorer for education and Web-based publishing. One partner is the NSF-funded Project Links group at Rensselaer Polytechnic Institute in Troy, New York. The project is developing online science courses in which undergraduates can conduct experiments and simulations using Java applets. "Our techexplorer software provides new ways to really explore the topics discussed in scientific documents," Sutor says.




    About IBMPrivacyContact