IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
Information at a Glance
COVER STORY: Part 7

Viewed chronologically, geographically or in other novel ways, the contents of a database take on new clarity

The challenge in finding online information is locating precisely what one wants without having to sift through thousands of irrelevant documents. Conversely, while a database query may produce information the user wants, related information in the database, or in another one linked to it, may get passed over because the query and search technology is not sophisticated enough to locate it.

To overcome these limitations and deal with the continually expanding amount of information being stored in databases -- and not just as text, but frequently as images, video and audio data -- researchers at IBM's Tokyo Research Laboratory (TRL) in Japan have developed what they call an information outlining system. The system integrates the ability to extract and then graphically visualize information with the ability to navigate through it. For example, in a demo using a Japanese database of 180,000 newspaper articles, a search on the word uchu (Japanese for "space" or "cosmos") produces a list of 598 articles.

With a conventional database system, users would have to scan the titles one by one, retrieving those that appear most relevant to their inquiry. But information outlining presents additional clues to help users quickly grasp aspects of the articles not readily apparent from the titles alone.

CHANGING VIEWS

By selecting a geographical view of the information, a map of Japan, with its 47 prefectures, is displayed. When prefectures are assigned colors depending on the number of times they are referred to in the articles, it becomes evident that some prefectures, such as Kagoshima (where a space center is located) are frequently noted, while others are rarely mentioned. So clicking on Kagoshima would very likely retrieve articles related to space in the cosmic sense rather than, say, the architectural or mathematical sense.

A chronological view can be invoked to produce a bar chart indicating the number of articles published in different time periods. In the search on "uchu," such a chart reveals that at times of important space-related events -- for instance, the launch or return of a space shuttle mission -- there are many more articles written. Other choices of viewing the data include the use of key words mapped out according to the time of their use, or according to the frequency with which they are used in the articles. The various views are created from key words and their categories (such as "IBM organization"), which are automatically extracted from the text using natural language processing technology developed at TRL and the Thomas J. Watson Research Center.

The world's largest newspaper company, the Tokyo-based Yomiuri Shimbun, is currently using the information outlining system to help its editors and writers sift through the millions of articles, photographs and charts the newspaper has stored in its databases over the past 11 years.

In addition, notes Jung-kook Hong, manager of the Solutions Research Center at TRL, the system has been applied to retrieve information on Japanese patents. "It's now an easy matter to see if IBM -- or a competitor -- is associated with a particular technology," he says. The technology is also used by the National Museum of Ethnology in Osaka, for searching its multimedia collections. "We believe information outlining will find many uses in business," Hong predicts.




    About IBMPrivacyContact