IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
Virus beater

By Victor D. Chase

Now there's an immune system for your PC.


It sounds like a virologist's dream: a completely automated system that, in less than an hour, can spot a new virus, transmit a specimen to antivirus headquarters for analysis, develop an antidote and ship it throughout the world—making epidemics a thing of the past.

Such a system has recently become reality, albeit in the world of computer viruses, not biological ones. Developed to prevent the world's virus writers from creating electronic pandemics over the Internet, the computer immune system is the brainchild of the 11-year-old antivirus science and technology unit at IBM's Thomas J. Watson Research Center.

The researchers' virus-fighting efforts recently passed a major milestone with the successful completion of a large-scale, two-month pilot test involving five major corporations—including banks, insurance companies and IBM itself—that use Norton AntiVirus® in their networks. As a result, Symantec Corporation, the maker of Norton AntiVirus software and IBM's new partner in the battle against digital disease, has decided to commercialize the immune system later this year.

When it comes to computer viruses, says Elizabeth Magliana, vice president of product management for Symantec's Enterprise Solution Division, "we need to spread the cure faster than the virus can spread, and we believe that this immune system is a giant step in that direction. The Norton AntiVirus packages that Symantec will ship in June will be immune-enabled. Some features of the system will be switched on very quickly, and some things will take a little longer to integrate." Even so, the entire system is expected to be up and running by September.

"Here's how the immune system works," Magliana explains. "If Norton AntiVirus, running on your PC, detects an unknown virus, it automatically packages the infected file and sends it securely over the Internet to our automated virus analysis center. There, it is analyzed, a cure is derived and tested, and it becomes just like any other virus that we know about and have a cure for."

The immune system currently targets two major classes of viruses—macro and file viruses—which are distinguished by the kinds of digital hosts they invade. File viruses exist in programs, such as computer games, while macro viruses strike documents, such as those created with a word processor. When a new strain of either class of virus is detected, the immune system creates an antidote that can be sent first to the original infected PC to disinfect it and then sent to all other computers that are registered with the immune system. The immune system can respond very quickly to a new virus outbreak. In the past, users of antivirus software have had to wait for other users to find and submit new virus samples, for human experts to analyze them, and for periodic scheduled updates to the software.

In the pilot, customers were set up with an advanced edition of Norton AntiVirus that connected to a test version of the immune system. When a pilot machine identified a potential virus, a sample was forwarded to an experimental virus analysis center at Watson. If the sample was found to be viral, an antidote was developed and immediately transmitted back to the afflicted machine. In most cases, the cycle was completed within 45 minutes and with no human intervention. At the same time, the new virus's identity and its antidote were added to Norton's virus definitions file, which was promptly made available to all the customer machines on the pilot system, effectively providing an inoculation. In the process, the Internet—today's primary route of virus distribution—became the medium for beating virus writers at their own game.

During the pilot period, the researchers subjected the immune system to an additional, acid test. "We obtained 38 new viruses sent recently to Symantec by its customers, and put the immune system through its paces on them," says John Morar, manager of antivirus technology and systems at Watson. Of the 38 new viruses, 28 were completely resolved with no human intervention. The immune system automatically forwarded the remaining 10 viruses to humans for analysis. "But even those 10 included three that had been partially analyzed by the system," notes Morar. Overall, he says, the pilot test demonstrated that the system can automatically detect and repair about 80 percent of the most common types of new, unknown viruses.

HOW TO FOIL A VIRUS

The ability to automatically discover new viruses arises from so-called "heuristic" detection algorithms, which try to recognize new viruses by their viruslike appearance or behavior. A powerful example is IBM's "neural network" detector, which has been incorporated into Norton AntiVirus. Like the mind, a neural network learns to generalize, creating concepts from examples. So instead of having to "memorize" individual virus codes, the neural network "forms a general principle, recognizing which kinds of program codes are found in viruses and which are not, giving it a generic recognition capability," explains Gerry Tesauro, who did the research behind the system's neural network. After being given samples of viruses, the neural network software learns to distinguish them from healthy programs.

At the same time, the antivirus researchers have taken pains to ensure that the detection capabilities created by the immune system do not result in false positives, which could cause undue alarm. This was no small challenge. "In principle, it's mathematically impossible to write a program that can accurately distinguish uninfected files from infected ones 100 percent of the time," says Steve White, who leads the antivirus unit. But researcher Jeff Kephart found that by exercising statistical brute force the scientists were able to come "amazingly close" to eliminating all false positives. This was possible, he says, because most programmers use similar tools to create healthy programs, and because there are normal sequences of code, just as there are normal sequences of words. One is much more likely to see "president" and "Clinton" together than "aardvark" and "Popsicle." When the immune system analyzes a virus and extracts a string of c haracteristic code, it avoids common sequences, preferring instead the rarer combinations that will not be found in uninfected files.

The next phase of immunization is the automatic production of "antibodies" to neutralize newly discovered viruses and to repair damaged files. Greg Sorkin, a mathematician on the team, categorized the usual ways viruses infect files and then wrote a series of algorithms that would allow a computer to prescribe the right mathematical medicine to kill the virus and repair the damaged file. Sorkin's system does this by locating common viral elements and examining infected files to determine how they have been changed. "Information gleaned in this manner provides clues as to what the virus looks like and how it corrupts files, which is essential to figuring out how to detect infected files and then repair them," Sorkin says.

The final link in the immunization chain is automatic distribution of the characteristic string that defines the virus, and of the antibody that cures it. The system uses a two-way electronic street by which new viral strains are sent to the central analysis facility, and from which identification and repair information is relayed to the computers of Symantec customers around the world. In creating this last link, the researchers had to make sure the analysis system would not get bogged down if thousands of computers notified it of the same new strain. Accordingly, team member Ed Pring invented an architecture that accepts the first sample of a new virus for analysis, then uses the result of that analysis to respond quickly to other copies that are submitted.

The recent pilot test has further cemented a relationship between IBM and Symantec that began in mid-1998. With the foundation of each of the immune system's elements firmly in place, IBM and Symantec joined forces to make a splash in the then-consolidating antivirus industry. Symantec wanted access to the technology and the key patents IBM holds in the field, while IBM wanted to ensure that the immune system would be broadly available and that IBM platforms would be protected. According to Symantec's Magliana, "The pilot confirmed that the immune system solves customer problems and that it is the right direction for Symantec."

INTERNET IN A BOX

The vandals of cyberspace show no signs of slowing down, so IBM's antivirus unit will not be lacking for work once Symantec takes the new immune system to the marketplace. Instead, the researchers will be kept busy developing new approaches to fighting viruses.

One of the unit's current efforts is aimed at developing a "virtual Internet" environment called Internet in a Box. Its purpose is to extend the immune system to analyze the new class of viruses specifically designed to spread over the Internet. Unlike previous generations of viruses, which were transmitted through infected documents and programs, Internet viruses seek out email address books and independently send themselves to everyone listed in an infected computer. By not requiring the computer user to take any action in order for them to proliferate, these insidious new self-perpetuating viruses speed the proliferation of their species.

As a result, says Magliana, "The immune system and the fast response it affords is not just a 'nice to have,' it's a 'must have,' because these new viruses can literally bring down a network in a matter of hours. In the past, viruses that relied on standard forms of propagation might take days, or maybe weeks, to cause the same outcome."

The Internet in a Box is an extension of what has already been developed in the immune system analysis center to analyze file and macro viruses. "But," says White, "the new Internet viruses are even tougher to analyze, because you can't understand what they do by looking at just one machine. Analyzing these viruses means watching how they spread across many machines. That's a lot harder."

Through such research—and through advances like the new Norton AntiVirus system—the day may soon come when computer users no longer need fear the potentially devastating consequences of a computer virus strike. Yet automation will never completely eliminate the human element from the picture. Human skill and intelligence will continue to be needed to deal with novel forms of viruses, like the recent polymorphic macro viruses that alter themselves as they spread.

And then there are the uniquely human efforts of people like Dave Chess, a founding member of IBM's antivirus unit, who has been hanging out in online news groups where he suspects many of the virus writers lurk. "I try to get across the point," he says, "that writing viruses is never a cool thing to do."

IBM's Antivirus Online


Victor D. Chase is a freelance writer who lives in Yorktown Heights, New York.


    About IBMPrivacyContact