 |

Computational Biology: IT Meets the Microscope
| |
When the Human Genome Project is completed, researchers will have amassed a catalog of some 50,000 to 100,000 human genes. All told, they will have sequenced many billions of base pairs of DNA molecules, the basic "bits" of genetic information. On its own, this accomplishment is undoubtedly significant. But if researchers were to stop here without trying to figure out what all this genetic data means, they would be squandering an unprecedented opportunity to gain insight into human diseases.
Finding Relevance
Analyzing and managing this genetic data is a task more akin to computer science and information management than biology, especially given the amount of data being collected. Some life sciences organizations are amassing data at the rate of close to a terabyte a week, or the equivalent of 50 million pages of information!
"Information Technology (IT) has necessarily become the language of Biology," says Paul Horn, director of IBM Research, "in the same way that math became the language of physicists at the end of the last century."
Since cracking the genetic code will be the key to the search for new drugs and cures for human ills, helping researchers in the life sciences manage their newfound wealth of genetic data constitutes an important new market for information technology. IBM will be able to apply its considerable computing expertise to taking vast amounts of genomic data, putting it into databases and coming up with ways of teasing out telltale patterns.
The Human Genome Project, for instance, is giving drug developers and medical researchers reams of data on the structure and function of proteins. But they need help in the form of advanced algorithms and software that can extract important information linking the function and structure of protein families to their sequence. Researchers can then use this information to systematically tag and classify new sequences of unknown function.
Recent advances in silicon gene chip technology are giving biologists new data germane to a crucial question: what environmental factors are responsible for turning genes "on" and "off"? The answers may lie in advanced pattern discovery algorithms that IBM is developing with academic partners. These algorithms analyze gene-chip data and look for associations between genes and diseases. So far, preliminary research has yielded promising results on cancer cell lines such as melanoma, cancer of the lung and colon cancer. The ultimate goal is to develop low-cost chips that doctors can use to diagnose specific diseases.
Blue Gene
Researchers will need more than ordinary computational firepower to help solve some of the most profound biological problems. In late 1999, IBM announced an exploratory project to build Blue Gene, a super computer 1,000 times more powerful than the Deep Blue machine that beat world chess champion Garry Kaparov. It will be about 2 million times more powerful than today's desktop PC's and powerful enough to help scientists gain a fundamental understanding of how proteins fold.
IBM's worldwide research team is dedicating 50 scientists from eight countries to work on the project. To answer these and other complex questions posed by nature, experts from many disciplines -- biology, chemistry, materials science, computer science, electrical engineering -- will use technology to push the limits of health research.