IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
An Immune System for Cyberspace
COVER STORY: Computer Viruses meet their Match

By Gary Taubes

Research is creating an anitvirus system for the entire internet. In addition to detecting viruses, it will disinfect and immunize computers throughout the world.

By Gary Taubes

In Brief:

IBM Research's massively distributed systems group is creating a computer immune system for cyberspace. Client machines running the group's software will be able to detect the presence of a new virus and send a sample over the Internet back to the antivirus headquarters. There, computers will dissect it, analyze it, and identify the means for completely removing it from the infected computer. The system will then communicate the method for identifying and removing the virus to computers worldwide - in effect, immunizing them within minutes of the initial appearance of the virus.


The first virus Steve White encountered was the CHRISTMA worm, which hit global computer networks, and eventually IBM, in late 1987. The program, which would always come from a friend of the recipient, traveled by e-mail to the individual's in-basket. An attached message said that the file was worth executing, implying that it was going to display a Christmas tree and a holiday greeting. If the recipient did execute it, it e-mailed a copy of itself to everyone in the recipient's address book (see Avoiding Accidental Afflictions .)

The CHRISTMA worm was primitive, harmless and relatively simple to eradicate. But to White, who was involved in computer security, and to some of his colleagues at IBM's Thomas J. Watson Research Center, it was a pivotal event. "I remember the morning it happened," he says."I said: 'I'm not getting any work done today; I'm going to watch this. This is going to be an important event. It's going to change the world. "

By the time the notorious Internet worm struck the following year, infecting thousands of machines worldwide, White and his colleagues were already discussing how to combat this new threat to computer security. Today, White is the senior manager of IBM's massively distributed systems group at Watson, which produces the company's remarkable IBM AntiVirus line of products. By 1998, this will include a networkwide system for tracking down and eradicating viruses over the entire Internet. White and his colleagues call it an "immune system for cyberspace."

The system will be capable of recognizing the presence of a new virus, using a sophisticated neural network program designed by Watson researchers Jeffrey Kephart, Gregory Sorkin and Gerald Tesauro. It will then capture a sample, and send it off to secure computers at IBM. Those computers will promptly dissect the virus, decipher it and figure out how best to eradicate it. Within minutes, that information will be returned to the infected computer, whose IBM AntiVirus software will use it to remove the virus from its system and return the computer to its pristine pre-infected condition.

Moreover, that same information will be made available simultaneously to computers all over the world. "Our immune system for cyberspace will inoculate the whole world at once," says White. "If a virus is found in the middle of the night in Belgium, the entire world will be vaccinated against it within minutes. We will get that cure out faster than the virus can spread."

A coming plague

In the next few years, however, both the problem and the cure are likely to change, says Jeffrey Kephart, manager of antivirus science and technology at Watson. With the entire world networked, viruses can potentially spread faster than humans can keep up. The number of viruses may well increase also. Today, antivirus researchers know of a mere 10,000 different viruses, appearing at a rate of six or so a day. But, says White, "there's no particular reason why six is a magic number. It could be 60 or 600. If that happens, the current manual methods of the antivirus industry break down."

Cyberspace will no longer be able to afford the luxury of human intervention to disinfect new viruses. "People can't do it," Kephart says. "Cyberspace won't survive without quick, automatic defenses against new viruses, like the immune system."

The antivirus group at Watson began officially in 1989 with two employees. Since then it has grown to 26, who produce the IBM AntiVirus line of products for six different operating systems from DOS and Windows 3.1® to OS/2, Windows 95, Windows NT and Novell Netware®. The entire line is updated with new code and new features every three months, because, as White says, "new viruses come out all the time." The team is developing the immune system technology for projected release by 1998. "At that point," says White, "it will be fully deployed, finding and killing viruses all over the world."

Viruses under wraps

The core of the antivirus system is a method with a twin goal. Not only does it detect and identify the presence of a virus; it also figures out how the virus attaches itself to its host, and how to get rid of it and restore the host computer to its original condition.

To determine how best to do that, the Watson researchers collect viruses from customers and other antivirus groups, which they analyze in a specially designed safe room that boasts a motion detector, a badge-locked door, padlocked cabinets, and no networked connections to the outside world. "Inside that badge-locked door is where we keep the viruses themselves," says Dave Chess, a research staff member, "and that's where we do the live virus work. At any given time, we always have some viruses in the lab that we haven't analyzed yet. The lab is extremely secure. We want to make sure that nobody wanders in - by accident or on purpose - and takes a disk out."

The system itself will eventually be linked between client computers running IBM AntiVirus and the safe room, where the viruses will be dissected and analyzed. As Kephart explains it, the IBM AntiVirus software in the client computer will detect an unknown virus, capture a sample, and send it over the Internet to IBM. The virus analysis machine that receives the virus will proceed to step one, which the Watson researchers call the "triage program."

The analysis then enters phase two: the "autoverv" process, a term whose origin, according to Kephart, is "distressingly obscure." This technology looks at samples of the virus created by the triage system, and performs a sophisticated pattern-matching analysis, invented by researcher Gregory Sorkin, to determine the virus's structure and the way in which it attaches to the host program. From this it derives one prescription for verifying the exact identity of the virus, and another for removing the virus from any given host program.

The patterns of infection

Each virus has its own infection pattern, says Sorkin. A virus typically removes up to 20 bytes from the beginning of a program and sequesters them away in its own code, sometimes even encrypting them. Why such a complex procedure? The virus has to put its own code in the beginning so that it will be executed, explains Sorkin, but it has to save the host code so that, afterward, it can restore that code and enable the infected machine to appear to function normally.

The autoverv program looks at samples of the virus and uninfected samples of the program itself, and determines which parts of the infected program contain pieces of the virus or have been altered by it. "Once the program learns where those infected areas are, it can replace them," he says. "It can surgically remove the virus."

The analysis system also identifies an unambiguous signature for the virus itself, according to Kephart - one that can be used to identify it in the future. This consists of a series of bits that will appear in all instances of the virus but are extremely unlikely to show up in any healthy code. When thatās done, the immune system computers send back to the original infected client computer the various pieces of information needed to locate, verify and remove the virus. If the virus appears again, the same machine will recognize it and, armed with the necessary "antibodies," will disinfect it.

Just as the human immune system makes the entire body immune to a virus that enters through a small scratch, the IBM AntiVirus system will make computers all over the world immune to a new virus no matter where it is found initially. Once an unknown virus is identified in a single machine, IBM's immune system will analyze the virus and send the information needed to disinfect it to every client computer in the world.

Targeting tougher viruses

Over the next year and a half, the IBM AntiVirus team will refine its immune system. It will also address a new species of virus that has, says Kephart, recently appeared on the Internet "with a vengeance." Previous viruses always embedded themselves in conventional programs, to ensure that they were executed. The new viruses can camp out in macros - tiny programs embedded in files such as spreadsheets or even word processor documents, where they execute simple commands. What makes these "macro viruses" so pernicious is that they can spread whenever documents are exchanged on the net, which happens very frequently.

For instance, the most highly reported virus in the world today is the Concept virus, which infects Microsoft Word® documents. "It can travel whenever you exchange documents using MS Word," says John Morar, manager of the antivirus technology and systems group at Watson. "If an infected document is distributed widely, it could infect all the machines it is run on."

The IBM researchers first heard about the new virus simultaneously from clients and a consortium of antivirus companies that share information. They quickly dissected, analyzed and learned how to disinfect the particular virus. Since then they have been working to ensure that IBM's antivirus system handles any new macro viruses ÷ or even entirely different ones - that come along. Says Chess: "The most challenging part is predicting the future - figuring out what will happen next, what is the next fundamental shift in the virus problem and how to be prepared for that (see "Targeting Unknown Viruses . "

Helping hands from humans

For this reason, even the fully deployed IBM immune system will not be devoid of human assistance. Nor will it put the IBM antivirus researchers out of work. As new viruses emerge, the team will always have to shape the system to respond to customersā fresh concerns or fears. Moreover, says Kephart, "there will always be viruses that can elude a given antivirus system. There's mathematical proof that you'll never have a system that can handle any virus that comes along." In that sense, he says, the need for humans in the system still fits the biological analogy of an immune system.

"Our immune systems have protected us for millions of years," he continues. "They're not perfect; otherwise, we wouldn't need the medical profession. We can continually improve our technology, so the kinds of viruses we can detect and deal with will grow all the time. But the Internet changes and the computing environment changes. When the environment changes ÷ even in the biological world ÷ it affects the whole ecology, and different kinds of computer viruses will flourish. So humans will always be needed to keep up with that."


Gary Taubes is a freelance writer based in Boston. His latest book is entitled Bad Science: The Short Life and Weird Times of Cold Fusion.



More Information:

Avoiding Accidental Afflictions

Swat Team Versus Viruses

Targeting Unknown Viruses



Avoiding Accidental Afflictions

Computer viruses may be the most malicious agents that wreak havoc in the Internet, but they wonāt be the only ones. Computer networks will also be vulnerable to an entirely new kind of threat, say IBM researchers in the antivirus science and technology group at the Thomas J. Watson Research Center.

Consider "maelstroms," to use the term coined by Watson researcher Jeffrey Kephart. A condition that can occur whenever computers are programmed to forward e-mail automatically, it is one of many surprising things that can happen when computers handle information and send it on without direct human supervision.

One computer, for instance, might automatically send e-mail on to a distributed mailing list. If any of the receiving computers is programmed to do the same, e-mail can begin to circulate endlessly. The result: exponentially increasing vortices of e-mail. "You very quickly get billions of mail messages," says Kephart's colleague Jim Hanson. "The network gets completely clogged with mail no one wants. You get gridlock and traffic jams, and nothing works."

Kephart, Hanson and Raja Das are learning how to prevent the formation of maelstroms, as part of a broad study of emergent phenomena in computer networks. In such phenomena, the collective behaviors of a myriad of agents interact and reinforce to produce a potentially monumental effect. They arise in everything from economies and biological systems - consciousness in the human brain, for instance - to, for good or ill, computer networks.

The key to such emergent phenomena is the autonomy of the agents doing the forwarding. Small mail-loops have already been discovered among a handful of computers, but human operators have been able to close them down quickly. But in the future, warns Kephart, most users will have intelligent autonomous agents reading and forwarding their mail. So the number and intensity of emergent phenomena will be all that much worse. "As soon as everybody is hooked up to the net and everybody has agents working on their behalf," says Hanson, "things like this will be showing up every day. We're hoping to get a handle on that before it happens."


Swat Teams Versus Viruses

On the evening of Tuesday, May 14, 1996, the staff of a bank holding company in Cleveland saw the first signs that the company's computer system was infected by viruses: When they tried to log onto the system they were denied access. At 2 a.m. on Wednesday, the company called in IBM, which immediately sent staffers to the site and organized support from an antivirus "swat team."

Having diagnosed a massive virus infection, the on-site team sent details to a group at the Thomas J. Watson Research Center headed by Dave Chess. The researchers rapidly analyzed the virus - a variant of an established virus that they had not encountered before - and came up with a cure, which it sent to the on-site team. Working round the clock, the team installed new software on the 300 servers that accommodated the bank's 10,000 work stations. By Friday morning, the system was clean, and the bank's operations had returned to normal.


Targeting Unkown Viruses

The key to a successful immune system is recognizing not just the viruses you know but also those you don't. That may seem simple, but an unfamiliar virus will fool most existing antivirus programs. "They have a signature file that recognizes a few thousand known viruses," explains Gerald Tesauro, a research scientist at IBM's Thomas J. Watson Research Center. "But this completely falls apart when confronted with a new virus that's not in the signature database."

The IBM AntiVirus system has a solution to this problem: a neural network program created by Watson researchers Gregory Sorkin, Jeffrey Kephart and Tesauro that can detect viruses no one has ever seen before.

The neural network is a genre of artificial intelligence - a computational model that can be trained in the otherwise human art of pattern recognition. The program - "loosely inspired by what we believe to be going on in the human brain," in Tesauro's words - models an interconnected network of neurons and synapses. While the network can be taught by adjusting the strengths of its synapses, Sorkin says, "it is only as good as its input data. To figure out what that should be, we looked to a prototype built by a human advisory programmer William Arnold.

The training site for the antiviral neural network is a database with a few hundred samples of computer boot sectors - the first 512 kilobytes of a floppy disk or hard drive in which viruses are most likely to be found. Based on Arnold's work, the boot sectors are represented by features, each of which is a sequence of three bytes, known as trigrams. "The hardest part is figuring out which trigrams to use," says Sorkin. "So we start with all of them and whittle them down to a set of 50 that "cover" all the viruses."

The neural network looks at these strings of 50 zeros and ones and learns to recognize patterns that signify the presence or absence of a virus. After learning to recognize known viruses with nearly 100 percent accuracy, the program is unleashed on unknown viruses.

Since its release in 1995, says Kephart, the neural network has caught 75 percent of the unknown viruses that have appeared in boot sectors. With a few slight modifications, he and his colleagues hope to improve that to nearly 100 percent. Meanwhile, Kephart, Sorkin and visiting student Alex Morin are training another program to recognize viruses that infect files rather than boot sectors. "Because there are thousands of different file-infecting viruses, and they are more complex than boot viruses, the problem is much harder," Sorkin notes, "but we're getting great preliminary results." Stay tuned.

For more information see http://www.av.ibm.com/current/Front Page




    About IBMPrivacyContact