Research is creating an anitvirus system for the entire internet. In
addition to detecting viruses, it will disinfect and immunize computers
throughout the world.
By Gary Taubes
In Brief:
IBM Research's massively distributed systems group is creating a
computer immune system for cyberspace. Client machines running the group's
software will be able to detect the presence of a new virus and send a sample
over the Internet back to the antivirus headquarters. There, computers will
dissect it, analyze it, and identify the means for completely removing it from
the infected computer. The system will then communicate the method for
identifying and removing the virus to computers worldwide - in effect,
immunizing them within minutes of the initial appearance of the virus.
The first virus Steve White encountered was the CHRISTMA worm, which hit
global computer networks, and eventually IBM, in late 1987. The program, which
would always come from a friend of the recipient, traveled by e-mail to the
individual's in-basket. An attached message said that the file was worth
executing, implying that it was going to display a Christmas tree and a holiday
greeting. If the recipient did execute it, it e-mailed a copy of itself to
everyone in the recipient's address book (see
Avoiding Accidental Afflictions .)
The CHRISTMA worm was primitive, harmless and relatively simple to
eradicate. But to White, who was involved in computer security, and to some of
his colleagues at IBM's Thomas J. Watson Research Center, it was a pivotal
event. "I remember the morning it happened," he says."I said:
'I'm not getting any work done today; I'm going to watch this. This is going to
be an important event. It's going to change the world. "
By the time the notorious Internet worm struck the following year, infecting
thousands of machines worldwide, White and his colleagues were already
discussing how to combat this new threat to computer security. Today, White is
the senior manager of IBM's massively distributed systems group at Watson, which
produces the company's remarkable IBM AntiVirus line of products. By 1998, this
will include a networkwide system for tracking down and eradicating viruses over
the entire Internet. White and his colleagues call it an "immune system for
cyberspace."
The system will be capable of recognizing the presence of a new virus, using a
sophisticated neural network program designed by Watson researchers Jeffrey Kephart, Gregory Sorkin and Gerald Tesauro. It will then capture a sample, and send it off to secure computers at IBM. Those computers will promptly dissect the virus, decipher it and figure out how best to eradicate it. Within minutes, that information will be returned to the infected computer, whose IBM AntiVirus software will use it to remove the virus from its system and return the computer to its pristine pre-infected condition.
Moreover, that same information will be made available simultaneously to computers all over the world. "Our immune system for cyberspace will inoculate the whole world at once," says White. "If a virus is found in the middle of the night in Belgium, the entire world will be vaccinated against it within minutes. We will get that cure out faster than the virus can spread."
A coming plague
In the next few years, however, both the problem and the cure are likely to
change, says Jeffrey Kephart, manager of antivirus science and technology at
Watson. With the entire world networked, viruses can potentially spread faster
than humans can keep up. The number of viruses may well increase also. Today,
antivirus researchers know of a mere 10,000 different viruses, appearing at a
rate of six or so a day. But, says White, "there's no particular reason why
six is a magic number. It could be 60 or 600. If that happens, the current
manual methods of the antivirus industry break down."
Cyberspace will no longer be able to afford the luxury of human intervention
to disinfect new viruses. "People can't do it," Kephart says. "Cyberspace
won't survive without quick, automatic defenses against new viruses, like the
immune system."
The antivirus group at Watson began officially in 1989 with two employees.
Since then it has grown to 26, who produce the IBM AntiVirus line of products
for six different operating systems from DOS and Windows 3.1® to OS/2,
Windows 95, Windows NT and Novell Netware®. The entire line is updated with
new code and new features every three months, because, as White says, "new
viruses come out all the time." The team is developing the immune system
technology for projected release by 1998. "At that point," says White,
"it will be fully deployed, finding and killing viruses all over the world."
Viruses under wrapsThe core of the antivirus system is a method with a twin goal. Not only does
it detect and identify the presence of a virus; it also figures out how the
virus attaches itself to its host, and how to get rid of it and restore the host
computer to its original condition.
To determine how best to do that, the Watson researchers collect viruses
from customers and other antivirus groups, which they analyze in a specially
designed safe room that boasts a motion detector, a badge-locked door, padlocked
cabinets, and no networked connections to the outside world. "Inside that
badge-locked door is where we keep the viruses themselves," says Dave
Chess, a research staff member, "and that's where we do the live virus
work. At any given time, we always have some viruses in the lab that we haven't
analyzed yet. The lab is extremely secure. We want to make sure that nobody
wanders in - by accident or on purpose - and takes a disk out."
The system itself will eventually be linked between client computers running
IBM AntiVirus and the safe room, where the viruses will be dissected and
analyzed. As Kephart explains it, the IBM AntiVirus software in the client
computer will detect an unknown virus, capture a sample, and send it over the
Internet to IBM. The virus analysis machine that receives the virus will proceed
to step one, which the Watson researchers call the "triage program."
The analysis then enters phase two: the "autoverv" process, a term
whose origin, according to Kephart, is "distressingly obscure." This
technology looks at samples of the virus created by the triage system, and
performs a sophisticated pattern-matching analysis, invented by researcher
Gregory Sorkin, to determine the virus's structure and the way in which it
attaches to the host program. From this it derives one prescription for
verifying the exact identity of the virus, and another for removing the virus
from any given host program.
The patterns of infection
Each virus has its own infection pattern, says Sorkin. A virus typically
removes up to 20 bytes from the beginning of a program and sequesters them away
in its own code, sometimes even encrypting them. Why such a complex procedure?
The virus has to put its own code in the beginning so that it will be executed,
explains Sorkin, but it has to save the host code so that, afterward, it can
restore that code and enable the infected machine to appear to function
normally.
The autoverv program looks at samples of the virus and uninfected samples of
the program itself, and determines which parts of the infected program contain
pieces of the virus or have been altered by it. "Once the program learns
where those infected areas are, it can replace them," he says. "It can
surgically remove the virus."
The analysis system also identifies an unambiguous signature for the virus
itself, according to Kephart - one that can be used to identify it in the
future. This consists of a series of bits that will appear in all instances of
the virus but are extremely unlikely to show up in any healthy code. When thatās
done, the immune system computers send back to the original infected client
computer the various pieces of information needed to locate, verify and remove
the virus. If the virus appears again, the same machine will recognize it and,
armed with the necessary "antibodies," will disinfect it.
Just as the human immune system makes the entire body immune to a virus that
enters through a small scratch, the IBM AntiVirus system will make computers all
over the world immune to a new virus no matter where it is found initially. Once
an unknown virus is identified in a single machine, IBM's immune system will
analyze the virus and send the information needed to disinfect it to every
client computer in the world.
Targeting tougher viruses
Over the next year and a half, the IBM AntiVirus team will refine its immune
system. It will also address a new species of virus that has, says Kephart,
recently appeared on the Internet "with a vengeance." Previous viruses
always embedded themselves in conventional programs, to ensure that they were
executed. The new viruses can camp out in macros - tiny programs embedded
in files such as spreadsheets or even word processor documents, where they
execute simple commands. What makes these "macro viruses" so
pernicious is that they can spread whenever documents are exchanged on the net,
which happens very frequently.
For instance, the most highly reported virus in the world today is the
Concept virus, which infects Microsoft Word® documents. "It can travel
whenever you exchange documents using MS Word," says John Morar, manager of
the antivirus technology and systems group at Watson. "If an infected
document is distributed widely, it could infect all the machines it is run on."
The IBM researchers first heard about the new virus simultaneously from
clients and a consortium of antivirus companies that share information. They
quickly dissected, analyzed and learned how to disinfect the particular virus.
Since then they have been working to ensure that IBM's antivirus system handles
any new macro viruses ÷ or even entirely different ones - that come
along. Says Chess: "The most challenging part is predicting the future -
figuring out what will happen next, what is the next fundamental shift in the
virus problem and how to be prepared for that (see
"Targeting Unknown Viruses
. "
Helping hands from humans
For this reason, even the fully deployed IBM immune system will not be
devoid of human assistance. Nor will it put the IBM antivirus researchers out of
work. As new viruses emerge, the team will always have to shape the system to
respond to customersā fresh concerns or fears. Moreover, says Kephart, "there
will always be viruses that can elude a given antivirus system. There's
mathematical proof that you'll never have a system that can handle any virus
that comes along." In that sense, he says, the need for humans in the
system still fits the biological analogy of an immune system.
"Our immune systems have protected us for millions of years,"
he continues. "They're not perfect; otherwise, we wouldn't need the medical
profession. We can continually improve our technology, so the kinds of viruses
we can detect and deal with will grow all the time. But the Internet changes and
the computing environment changes. When the environment changes ÷ even in
the biological world ÷ it affects the whole ecology, and different kinds of
computer viruses will flourish. So humans will always be needed to keep up with
that."
Gary Taubes is a freelance writer based in Boston. His latest book is
entitled Bad Science: The Short Life and Weird Times of Cold Fusion.
More Information:
Avoiding Accidental Afflictions
Swat Team Versus Viruses
Targeting Unknown Viruses
Avoiding Accidental Afflictions
Computer viruses may be the most malicious agents that wreak havoc in the
Internet, but they wonāt be the only ones. Computer networks will also be
vulnerable to an entirely new kind of threat, say IBM researchers in the
antivirus science and technology group at the Thomas J. Watson Research Center.
Consider "maelstroms," to use the term coined by Watson researcher
Jeffrey Kephart. A condition that can occur whenever computers are programmed to
forward e-mail automatically, it is one of many surprising things that can
happen when computers handle information and send it on without direct human
supervision.
One computer, for instance, might automatically send e-mail on to a
distributed mailing list. If any of the receiving computers is programmed to do
the same, e-mail can begin to circulate endlessly. The result: exponentially
increasing vortices of e-mail. "You very quickly get billions of mail
messages," says Kephart's colleague Jim Hanson. "The network gets
completely clogged with mail no one wants. You get gridlock and traffic jams,
and nothing works."
Kephart, Hanson and Raja Das are learning how to prevent the formation of
maelstroms, as part of a broad study of emergent phenomena in computer networks.
In such phenomena, the collective behaviors of a myriad of agents interact and
reinforce to produce a potentially monumental effect. They arise in everything
from economies and biological systems - consciousness in the human brain,
for instance - to, for good or ill, computer networks.
The key to such emergent phenomena is the autonomy of the agents doing the
forwarding. Small mail-loops have already been discovered among a handful of
computers, but human operators have been able to close them down quickly. But in
the future, warns Kephart, most users will have intelligent autonomous agents
reading and forwarding their mail. So the number and intensity of emergent
phenomena will be all that much worse. "As soon as everybody is hooked up
to the net and everybody has agents working on their behalf," says Hanson, "things
like this will be showing up every day. We're hoping to get a handle on
that before it happens."
Swat Teams Versus Viruses
On the evening of Tuesday, May 14, 1996, the staff of a bank holding company
in Cleveland saw the first signs that the company's computer system was infected
by viruses: When they tried to log onto the system they were denied access. At 2
a.m. on Wednesday, the company called in IBM, which immediately sent staffers to
the site and organized support from an antivirus "swat team."
Having diagnosed a massive virus infection, the on-site team sent details to
a group at the Thomas J. Watson Research Center headed by Dave Chess. The
researchers rapidly analyzed the virus - a variant of an established virus
that they had not encountered before - and came up with a cure, which it
sent to the on-site team. Working round the clock, the team installed new
software on the 300 servers that accommodated the bank's 10,000 work stations.
By Friday morning, the system was clean, and the bank's operations had returned
to normal.
Targeting Unkown Viruses
The key to a successful immune system is recognizing not just the viruses
you know but also those you don't. That may seem simple, but an unfamiliar virus
will fool most existing antivirus programs. "They have a signature file
that recognizes a few thousand known viruses," explains Gerald Tesauro, a
research scientist at IBM's Thomas J. Watson Research Center. "But this
completely falls apart when confronted with a new virus that's not in the
signature database."
The IBM AntiVirus system has a solution to this problem: a neural network
program created by Watson researchers Gregory Sorkin, Jeffrey Kephart and
Tesauro that can detect viruses no one has ever seen before.
The neural network is a genre of artificial intelligence - a
computational model that can be trained in the otherwise human art of pattern
recognition. The program - "loosely inspired by what we believe to be
going on in the human brain," in Tesauro's words - models an
interconnected network of neurons and synapses. While the network can be taught
by adjusting the strengths of its synapses, Sorkin says, "it is only as
good as its input data. To figure out what that should be, we looked to a
prototype built by a human advisory programmer William Arnold.
The training site for the antiviral neural network is a database with a few
hundred samples of computer boot sectors - the first 512 kilobytes of a
floppy disk or hard drive in which viruses are most likely to be found. Based on
Arnold's work, the boot sectors are represented by features, each of which is a
sequence of three bytes, known as trigrams. "The hardest part is figuring
out which trigrams to use," says Sorkin. "So we start with all of them
and whittle them down to a set of 50 that "cover" all the viruses."
The neural network looks at these strings of 50 zeros and ones and learns to
recognize patterns that signify the presence or absence of a virus. After
learning to recognize known viruses with nearly 100 percent accuracy, the
program is unleashed on unknown viruses.
Since its release in 1995, says Kephart, the neural network has caught 75
percent of the unknown viruses that have appeared in boot sectors. With a few
slight modifications, he and his colleagues hope to improve that to nearly 100
percent. Meanwhile, Kephart, Sorkin and visiting student Alex Morin are training
another program to recognize viruses that infect files rather than boot sectors.
"Because there are thousands of different file-infecting viruses, and they
are more complex than boot viruses, the problem is much harder," Sorkin
notes, "but we're getting great preliminary results." Stay tuned.
For more information see
http://www.av.ibm.com/current/Front
Page