HealthMiner

Innovation Matters


HealthMiner mines clinical data to find new relationships that could have direct applications for conducting biomedical research, making prognoses and developing diagnostic tests.

HealthMiner, an application that analyzes patient data, was developed as part of a joint project between IBM Research and the IBM Healthcare & Life Sciences group. This innovative solution fills a void in digital patient record analysis by discovering rules and relationships that can lead to new knowledge, new hypotheses and more focused medical research. As a middleware package, it allows IBM to offer a way for our Independent Software Vendor (ISV) partners to automate their rule-authoring process.

Despite the fact that this is a very focused application, the methods underlying HealthMiner are general enough to make the middleware solution applicable to a wide range of problems, including risk management, economic and market analysis, quality control, epidemiology, security risk identification, and more. The strong point of this hybrid solution is that it draws upon the complementary strengths of three quite different and very powerful analytic methods: Thoth, CliniMiner® and Predictive Analysis.

Thoth was developed by the Bioinformatics and Pattern Discovery group, continues the longstanding effort to identify recurring patterns of nearby elements within sequential inputs of arbitrary length, and use them to build applications such as multiple sequence alignment, gene finding, protein annotation and others. Thoth discovers patterns, and then uses their relationships to deduce increasingly complicated association rules. This, in turn, permits the construction of extensive diagnostic canons.

Flow chart


CliniMiner® (a.k.a. "FANO") is based on the Bayesian expectation of information and its intrinsic moments, and enables the rapid computation of degrees of mutual dependency between variables by using relationships with the Riemann Zeta function, and other number-theoretic methods. Significant relationships between variables (including preferred, rare, negative or non-observed associations) can be identified by selecting those conditions that satisfy Zeta-function-based criteria.

Predictive Analysis was developed by the Data Analytics Research group developed Predictive Analysis, using its extensive expertise on machine learning. This analytic method learns decision rules using a procedure known as Lightweight Rule Induction (LRI), which works by searching through many possibilities to find the best ones in terms of predictive value, sensitivity and specificity. The method generates classification rules in disjunctive normal form, i.e. rules that are a sequence of OR-ed expressions with each expression being a conjunction of one or more variables.

In developing our prototype we processed 700,000 patient records made available to us by the University of Virginia. Preliminary analyses by our colleagues have shown that virtually 100 percent of the outputs generated by HealthMiner are biologically appropriate and well established. Interestingly, some outputs are novel, which makes their potential value for medical research very great.
Following the success of our first pilot project, we are in the process of extending our work to the analysis of approximately 400,000 digital records from patients with clinically-diagnosed schizophrenia.

Related Publications  

Isidore Rigoutsos, Tien Huynh, Aris Floratos, Laxmi Parida and Daniel Platt. Dictionary-driven Protein Annotation.. Nucleic Acids Research 30(17):3901-3916, 2002.

B. Robson, B. Mushlin and R. Mushlin. The Dragon on the Gold: Myths and Realities for Data Mining in Biotechnology using Digital and Molecular Libraries. Journal of Proteome Research, 2004.

B. Robson, B. Mushlin and R. Mushlin. Genomic Messaging System for Information-Based Personalized Medicine with Clinical and Proteome Research Applications. Journal of Proteome Research, 2004.

Barry Robson and R. Mushlin. Clinical and pharmacogenomic data mining. 2. A Simple Method for the Combination of Information from Associations and Multivariances to Facilitate Analysis, Decision and Design in Clinical Research and Practice.. Journal of Proteome Research, 2004.

B. Robson and J. Garnier. The Future of Highly Personalized Health Care.. Stud Health Technol Inform 80:163-174, 2002.

Barry Robson. CLINICAL AND PHARMACOGENOMIC DATA MINING. 1. THE GENERALIZED THEORY OF EXPECTED INFORMATION AND APPLICATION TO THE DEVELOPMENT OF TOOLS. Journal of Proteome Research 2(3):283-302, January 2003.

S. Weiss and N. Indurkhya. Lightweight rule induction. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 1135-1142.

S. Weiss, R. Galen and P. Tadepalli. Maximizing the predictive value of production rules.. J. Artif. Intell. 45(1-2):47-71, 1990.

Laxmi Parida, Aris Floratos and Isidore Rigoutsos. An Approximation Algorithm for Alignment of Multiple Sequences Using Motif Discovery. Journal of Combinatorial Optimization 3(2-3):247-275, July 1999.

Isidore Rigoutsos and Aris Floratos. Combinatorial Pattern Discovery In Biological Sequences: The TEIRESIAS Algorithm.. Bioinformatics 14(1), 1998.


Rate this article

Innovator's corner  

Daniel PlattDaniel Platt Researcher
What is the most exciting potential future use for the work you're doing?
The methods that we have combined make it possible to discover new relationships between diseases, diagnostic test results, treatments, prescriptions, and other patient characteristics. Some of those relationships can reveal negative (resp. positive) side effects of medications or treatments, new correlations between diagnostic test result complexes and disease, and other information that could pave the way toward a deeper understanding of disease physiology. HealthMiner can also aid in creating new diagnostic software tools to guide physicians and facilitate their diagnosis.


What is the most interesting part of your research?
The aspect of this research that excites me most is devising ways on how to use the tools we have been developing in creating new and unexpected applications – for example, going from pattern discovery to the discovery of actionable association rules. Then there is the immediate connection with basic physiological processes, with biochemistry and medicine, collaborating with physicians in trying to understand the meaning and importance of rules that we've discovered, working with people with very diverse backgrounds and a common objective – all of this is fascinating.


What inspired you to go into this field?
The multidisciplinary character of it: we have an unprecedented opportunity to use pattern discovery to connect physics, biochemistry, medicine, computer science and other domains. We also have the opportunity to connect to IBM's bottom line through contributions we can make to HealthCare and Life Sciences.


What is your favorite invention of all time?
Mechanical clock escapement: translates oscillatory rotational movement into rotation in one direction. The idea is echoed in a number of different technologies, including the ratchet and pawl and piston engines. Together, these simple inventions were important to technological development for millennia, and provided the foundations for the world's industrial revolution.

Related Research