HealthMiner mines clinical data to find new relationships that could have direct applications for conducting biomedical research, making prognoses and developing diagnostic tests.
HealthMiner, an application that analyzes patient data, was developed as part of a joint project between IBM Research and the IBM Healthcare & Life Sciences group. This innovative solution fills a void in digital patient record analysis by discovering rules and relationships that can lead to new knowledge, new hypotheses and more focused medical research. As a middleware package, it allows IBM to offer a way for our Independent Software Vendor (ISV) partners to automate their rule-authoring process.
Despite the fact that this is a very focused application, the methods underlying HealthMiner are general enough to make the middleware solution applicable to a wide range of problems, including risk management, economic and market analysis, quality control, epidemiology, security risk identification, and more. The strong point of this hybrid solution is that it draws upon the complementary strengths of three quite different and very powerful analytic methods: Thoth, CliniMiner® and Predictive Analysis.
Thoth was developed by the Bioinformatics and Pattern Discovery group, continues the longstanding effort to identify recurring patterns of nearby elements within sequential inputs of arbitrary length, and use them to build applications such as multiple sequence alignment, gene finding, protein annotation and others. Thoth discovers patterns, and then uses their relationships to deduce increasingly complicated association rules. This, in turn, permits the construction of extensive diagnostic canons.
CliniMiner® (a.k.a. "FANO") is based on the Bayesian expectation of information and its intrinsic moments, and enables the rapid computation of degrees of mutual dependency between variables by using relationships with the Riemann Zeta function, and other number-theoretic methods. Significant relationships between variables (including preferred, rare, negative or non-observed associations) can be identified by selecting those conditions that satisfy Zeta-function-based criteria.
Predictive Analysis was developed by the Data Analytics Research group developed Predictive Analysis, using its extensive expertise on machine learning. This analytic method learns decision rules using a procedure known as Lightweight Rule Induction (LRI), which works by searching through many possibilities to find the best ones in terms of predictive value, sensitivity and specificity. The method generates classification rules in disjunctive normal form, i.e. rules that are a sequence of OR-ed expressions with each expression being a conjunction of one or more variables.
In developing our prototype we processed 700,000 patient records made available to us by the University of Virginia. Preliminary analyses by our colleagues have shown that virtually 100 percent of the outputs generated by HealthMiner are biologically appropriate and well established. Interestingly, some outputs are novel, which makes their potential value for medical research very great.
Following the success of our first pilot project, we are in the process of extending our work to the analysis of approximately 400,000 digital records from patients with clinically-diagnosed schizophrenia.
Related Publications
Isidore Rigoutsos, Tien Huynh, Aris Floratos, Laxmi Parida and Daniel Platt. Dictionary-driven Protein Annotation.. Nucleic Acids Research 30(17):3901-3916, 2002.
B. Robson, B. Mushlin and R. Mushlin. The Dragon on the Gold: Myths and Realities for Data Mining in Biotechnology using Digital and Molecular Libraries. Journal of Proteome Research, 2004.
B. Robson, B. Mushlin and R. Mushlin. Genomic Messaging System for Information-Based Personalized Medicine with Clinical and Proteome Research Applications. Journal of Proteome Research, 2004.
Barry Robson and R. Mushlin. Clinical and pharmacogenomic data mining. 2. A Simple Method for the Combination of Information from Associations and Multivariances to Facilitate Analysis, Decision and Design in Clinical Research and Practice.. Journal of Proteome Research, 2004.
B. Robson and J. Garnier. The Future of Highly Personalized Health Care.. Stud Health Technol Inform 80:163-174, 2002.
Barry Robson. CLINICAL AND PHARMACOGENOMIC DATA MINING.
1. THE GENERALIZED THEORY OF EXPECTED INFORMATION AND APPLICATION TO THE DEVELOPMENT OF TOOLS. Journal of Proteome Research 2(3):283-302, January 2003.
S. Weiss and N. Indurkhya. Lightweight rule induction. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 1135-1142.
S. Weiss, R. Galen and P. Tadepalli. Maximizing the predictive value of production rules.. J. Artif. Intell. 45(1-2):47-71, 1990.
Laxmi Parida, Aris Floratos and Isidore Rigoutsos. An Approximation Algorithm for Alignment of Multiple Sequences Using Motif Discovery. Journal of Combinatorial Optimization 3(2-3):247-275, July 1999.
Isidore Rigoutsos and Aris Floratos. Combinatorial Pattern Discovery In Biological Sequences: The TEIRESIAS Algorithm.. Bioinformatics 14(1), 1998.






