 |

Power of proteins
By Sam Howard-Spink
| |
IBM's Blue Gene (TM) uses computer science to solve one of the grand challenges of molecular biology—understanding how proteins fold. IBM is 20 months into a long-term effort to build an unprecedented computational tool: a massively parallel computer—one that uses many thousands of chips working in harmony—that will be able to simulate protein folding faster than any machine before it.
This multifaceted project encompasses disciplines ranging from engineering to biology and chemistry. An IBM Research group of about 25 people works in the interdisciplinary field of computational biology. Teams examining areas such as bioinformatics look for information in sequences of DNA and protein data and use mathematical algorithms to establish their functions. Others focus on structural predictions, analyzing the 3D shape of proteins. On the computational and architectural side, engineers improve the hardware used for protein simulations and develop applications for its use. In order to bridge gaps in expertise and capitalize on the diverse perspectives of the academic community, last year IBM launched the Blue Gene Speaker Series, a forum in which acclaimed scientists shared their knowledge of protein folding with members of the IBM Research life sciences team.
Nature's mechanics
So why is protein folding so important that such substantial computational and intellectual resources should be dedicated to it? Stanford University assistant professor of chemistry, structural biology and SSRL (Stanford Synchrotron Radiation Laboratory), Vijay Pande, who is collaborating closely with a team led by Bill Swope at IBM Research in Almaden, Ca., describes proteins as nature's nanomachines. "Just about anything that needs to get done in biology is done by a protein," explains Pande. "When you have a machine on this tiny scale, how is it built? When you are dealing with something on an atomic scale, you don't have atomic-sized hammers and screwdrivers. What biology has done is create machines that can assemble themselves. The process of self-assembly they go through is called folding."
The body contains over 100,000 of these protein nanomachines. Dennis Newns, a researcher on the life sciences team, has worked on simulating molecular activity at IBM for eight years. He describes these "machines" as a set of beads them as a set of beads—actually amino acids—on a string. Small proteins have chains of perhaps 100 beads, while large ones have roughly 700. "The sequence of the beads is what comes out of the genome, which is a code that enables you to place those beads on a string in the correct order," he explains. "In the form of a chain the protein is of no use; it has to coil up into a structure before it becomes functional."
If the protein does not fold correctly, a variety of problems can ensue. A group of around 20 diseases, including mad cow, cystic fibrosis, Alzheimer's and Parkinson's, are believed to be related to protein mis-folding. Just one amino acid "mistake" in hemoglobin, which carries oxygen through the blood, will result in sickle cell anemia. Scientists believe that by understanding how proteins fold, they will ultimately learn how to manipulate this folding process, thereby preventing such diseases.
Conducting experiments with proteins is extremely tricky. Direct experiments, those experiments performed on actual proteins, can only reveal so much, such as the rate of folding or the effect of substituting one bead in the chain. But they can't reveal much about how a protein folds in different environments, or about the function of time on the atomic scale. Simulating the folding is a far more controllable way of examining what proteins do. The drawback is how computationally demanding it is. "To simulate the very smallest, fastest protein folding right now on a fast workstation would probably take about 30 years," says Pande. "So that's where developing a faster machine comes in."
The Blue Gene project began in 1999 as IBM was looking for a "blue sky," computer science research project — a concerted effort to solve one of science's great challenges. Around the same time, the 30 gigaflop Blue Gene chip was developed. Newns says, "We figured that with 30,000 chips we could fold a smallish protein in a few months in real time. This would be a grand challenge project, something the public would appreciate."
Protein perspectives
As part of the Blue Gene Speaker Series, prominent molecular biologists from around the United States have addressed members of the IBM life sciences research team at Watson Research Center in New York. According to Newns, "They really are the top people in the field, and there are more to come."
The experts have brought a range of viewpoints and specialties to the project. Two speakers, Dr. William Eaton of the National Institutes of Health (NIH) and Dr. Terry Oas of Duke University Medical Center, gave talks on experimental developments in protein folding, divulging, for example, that it takes roughly a tenth of a millisecond to about one second for a protein to fold. Researchers in protein structures from Cornell University discussed the importance of the protein's sequence in determining its architecture. Leslie Greengard from New York University described the most efficient techniques for protein folding simulations. Stanford's Pande spoke on his own approach to the kinetics of protein folding.
"We've learned that there are lots of different opinions about what is important to study in this field," says Bob Germain, manager of the biomolecular dynamics and scalable modeling team at Watson. "For example, there is a wide range in the level of detail in the models that people use. We've had many suggestions about how to use this large computational resource that we have."
The Speaker Series has helped to establish collaborative projects between IBM and several universities. Germain cites a team of chemists at Columbia University who are working on algorithmic techniques for carrying out simulations more efficiently. The knowledge traffic travels two ways: IBM's computational biologists often find themselves called on to speak to industry and academic customers.
"It is important to collaborate with people on the outside to try to improve the simulation models we have," says Germain. "By doing large-scale simulations using various models and comparing them with experimental data, we can feed information back to people to help them build better models. All the computational power in the world is useless if the models aren't correct."
The insights gained from the Speaker Series will inspire future applications of Blue Gene. Newns says he is already applying the project's mathematical processes and algorithms in other areas of his IBM work. Improving the simulations will make computers more attractive to drug companies trying to establish which drug molecules will bind to certain proteins. Looking even further ahead, the development of incredibly powerful computers to study atomic-scale phenomena could even give rise to technologies of the future. Pande is hopeful that the field of nanotechnology will benefit from the work on proteins and their simulations.
Sam Howard-Spink is a British freelance journalist based in New York specializing in Internet technologies and the music industry. Protein-folding is his new passion.