IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
Profile



From Theory to Practice

Tokyo researcher Shinichi Morishita's interests range from mathematical logic to medical databases. His versatility has helped to create products and services with strong commercial potential.

Since he graduated from Tokyo University in 1985, with a Master's degree in computer science, Shinichi Morishita has spent two interlocked careers in computing. For part of that time, he has undertaken academic studies of computer theory, which have led to several articles in scholarly journals. In the increasingly long intervals between academic exploits, he has worked at IBM's Tokyo Research Laboratory (TRL), helping to develop products and services for the company. The two careers have followed a common theme, in that many of the products and services have relied on practical extensions of his theories.

Recently, Morishita has been a key project manager in TRL's data mining projects, designed to extract items of interest from numerical data for banks and similar institutions. His efforts, based on theories he developed during a two-year stint at Stanford University, have had significant success. Japan's Information-Technology Promotion Agency recently selected TRL to oversee a data mining project that will run for two years with a budget of ¥250 million ($2.25 million). Morishita has also created an algorithm for extracting medically meaningful relationships from numerical databases that has since found broader application among nonmedical clients.

In his early years at TRL, starting in 1985, Morishita followed up the subject of his Master's thesis: the application of mathematical logic to computer science. His academic goal was to derive an efficient program from a simple but naive program, while preserving all the properties verified for the original. He successfully defined a transformation mechanism to achieve the goal.

In 1988, he and TRL scientist Masayuki Numao extended that theoretical work to develop a graphical debugger for PROLOG, a well-known programming language favored by specialists in artifical intelligence. He also created an expert system for NKK, the largest steel-making company in Japan. Called Scheiker, the system has particular value in scheduling industrial processes. It has since found wide application in several industries beyond steel. More than 80 Japanese companies - in forest products, credit card printing, printed circuits and other industries - have purchased the expert system.

Return to academe

After that work, Morishita returned to academic life. In 1990, he received his doctorate from the University of Tokyo, with a thesis on multivalued logic programming language. His work, which focused on the Boolean algebra used for calculations that involve binary numbers, had a specific practical application: diagnosing electrical circuits. "Theoretically it was interesting," he says of the study, which he has published in two journals. "But from the protocol point of view it couldn't succeed as well, because its practical impact was limited to only a few types of circuits."

Morishita then spent two years at Stanford University, working with Jeffrey D. Ullman, a well-known professor of computer science. There, he developed a theory that Cartesian products - the "joins" that result from combining items that have no common attributes - are unnecessary for certain types of search. That's significant because obtaining Cartesian products in searches of databases is expensive and time-consuming. The idea has appeal for users who want to track down relationships among more than three classes of items. Morishita's theory shows that a Cartesian-product-free approach exists that costs less than the optimal approach involving Cartesian products. The Journal of the ACM recently accepted a paper on this work. "I liked the work. It was intellectually satisfying," Morishita recalls. "I hope somebody will implement the algorithm."

While at Stanford, he also worked on methods of handling recursive processes in database management. In these, by definition, each querying step is automatically repeated until it has tracked down the required information from the database. Conventional databases cannot handle this type of querying, but Morishita developed a theory of recursive querying that led him to devise query optimization techniques for general situations. Based on that work, he developed a query optimizer - now included in free software distributed to many universities - for a deduction-based system called Glue-Nail that had been developed at Stanford.

Back to industry

By 1992, Morishita was ready for a change. "At Stanford," he explains, "I was working on theoretical issues. I returned to TRL to work on practical ones." Pursuing his interest in databases, he heard about Almaden researcher Rakesh Agrawal's work on data mining. He got in touch with Agrawal and "tried to figure out what I could contribute."

Agrawal's approach basically targets relationships between nonnumerical items, such as names and addresses. However, databases also contain several numerical attributes. Japanese banks, for example, typically have more than 30 numerical attributes on individual customers, such as daily account balances. So Morishita started to look at relationships among numerical attributes in databases, with the goal of complementing the power of Research's existing data mining tools.

Dealing with diabetes

In an initial project, he focused on diabetes, the buildup of glucose that results from the body's failure to produce the hormone insulin. Morishita and his collaborators used standard medical databases to obtain several numerical attributes from patients with diabetes, and set about devising a way to analyze the data. In collaboration with TRL researcher Takeshi Tokuyama, he came up with a general technique that they named the system for optimized numeric association rules (SONAR).

SONAR soon proved its worth. It revealed a relationship between the combination of a patient's glucose level and body mass index (the weight in kilograms divided by the square of the height in meters) and the likelihood that the patient has diabetes. Morishita and his collaborators have used this relationship to create an algorithm that makes a classification tree for diabetes. The tree organizes the glucose data into branches that represent, for example, patients with and without the disease. "The benefit of this type of classification," Morishita explains, "is that the tree size can be very small."

Although Morishita has applied SONAR only to benchmark medical databases so far, he has no doubt that doctors can use it effectively as an aid for diagnosing diabetes and other ailments in real patients. So general is the approach that it has already found use in entirely different types of numerical databases. "We have applied this approach to databases in a bank - for example, characterizing people who send in late payments on their credit cards," says Morishita. "It's also been applied by insurance companies to qualify customers."

TRL is now offering SONAR as a service to clients, and is seeking joint development projects that involve the technology. Since SONAR comes only in IBM's DB2®, rather than in rival formats, its popularity could open the way for Japanese acceptance of other IBM database products. "We hope," says Morishita, "that SONAR could be the killer application for IBM database systems."




    About IBMPrivacyContact