A New Vision of Cradle-to-Grave
Electronic Services Support Promises
to Free PC Users from Such Time-
Consuming Chores as Installing
Software Fixes or Updates.
The Ultimate Payoff:
Lower Total Cost of
Ownership.
In Brief:
A new array of technologies is being designed
to beat back the administrative costs of
computers and networks. IBM is developing technologies that link PCs to
vendor Web sites for automatic upgrades and allow many or a few systems to
be serviced at the same time. Software in the works can speed the
diagnosis of system faults and even improve service at computer help desks.
The steam engine and the computer stand as icons of their respective eras.
Both transform energy into useful work, and both are ultimately subject to
the same physical laws. But from the standpoint of complexity, their
radical differences have huge implications. While a well-built steam engine
might go on working for years with an occasional squirt of oil and an
adjustment here and there, the upkeep of the PCs and servers within an
enterprise can far outweigh the cost of the machines themselves in a single
year.
There are lots of reasons for that: software requires fixes or updates; new
components and software have to be installed; ultimately, hardware becomes
obsolete in the face of new applications and needs to be replaced. And
there are always issues of performance, system availability and, indeed,
everything that affects the productivity of users. All of those concerns
fall under the broad heading of systems management, says Joe Hellerstein, a
researcher at IBM's Thomas J. Watson Research Center who teaches a course
on the subject at Columbia University. "Basically," he says, "the term
encompasses every aspect of systems that people have to cope with."
The range of issues is huge, says Robert Morris, director of personal
systems and advanced systems technology at Watson. "Basically, it includes
everything that is nonroutine, from installing the machine and getting your
environment right to dealing with failures and surprises." While these
concerns are not new, they are receiving increased attention, especially in
companies faced with the growth of distributed computer networks, not to
mention the growing number of PCs in homes and small businesses.
At the heart of the issue is the emerging concept of "total cost of
ownership." TCO includes all costs related to owning computers, from
purchase price to time spent taking care of the machines. While the retail
costs of computers are falling, says Morris, the total costs of managing
them are going up. According to the Gartner Group, the "people cost" for a
distributed system is often 60 to 90 percent of the total cost of owning
the system.
"The industry has been forcing every PC user to become a system
administrator," says Morris. "We needed to focus on the value of ownership
and on finding ways to minimize the loss of time and effort in keeping
systems up and running."
TOP DOWN & BOTTOM UP
Given its importance, systems management has become a growing business in
its own right. In March 1996, IBM bought Tivoli, an Austin, Texas, company
that specializes in enterprisewide systems management for large accounts
and that is now starting to develop solutions for smaller businesses as
well. IBM Research, which has a long history of working on systems
management, has made the subject one of its strategy areas and recently
formed a joint project with Tivoli called the Systems Management Technology
Institute (SMTI), headed by Seraphin Calo of Watson.
SMTI is focusing on several key areas. One of the most critical,
performance management, is aimed at finding the best way to manage a
distributed system to optimize the end-to-end response time. "You have to
be able to decompose the system into the critical elements and understand
the effect of component failures on overall performance," says Calo.
Visualization is another research objective of the joint project. "As
systems grow more complex, it becomes increasingly important to be able to
represent the state of the system to operators and administrators so that
they can quickly grasp what is happening," explains Calo. SMTI is also
looking at ways of allowing end users to manage certain tasks by means of
automated support software. And a research effort headed by a team at the
Zurich Research Laboratory is developing Java agents that can travel over
the network and report back on the state of routers and other devices.
Meanwhile, a distributed team comprising researchers at Watson, the Haifa
Research Laboratory and the Almaden Research Center (where Morris worked
until moving to Watson in 1996) has begun collaborating on a far-reaching
project with the IBM Personal Systems Group. This new approach represents a
vision for simplifying the management of client machines across networks,
encompassing individual consumers all the way up to the enterprise level.
The project grew out of the recognition several years ago by Morris and
Norm Pass, manager of cyberspace technology and applications at Almaden,
that one of the more taxing problems PC users have to deal with is updating
or replacing software or hardware components. "If you're anything like most
users," says Morris, "you spend a lot more time on those kinds of tasks
than you like, and you have the feeling that technology owes you a better
deal." This new framework is an attempt to implement such a deal.
MANAGING THROUGH THE WEB
The group's first effort focused on extending Netfinity, a workgroup
systems management application for managing the hardware aspects of
personal computers. "It does that by invoking a software agent sitting on
each client machine to provide a visual interface to a remote
administrator," explains Steve Welch, a member of Pass's team. "The agent
also provides some information to end users, such as 'check your disk
space,' but its main function is to enable the administrator to see and
control what's happening on the machine."
Netfinity was limited, however, to managing one PC at a time. "Our idea was
to make it possible for one person to manage a workgroup of clients and set
system parameters and thresholds for an entire group at a time, even
multiple kinds of client machines," says Pass.
The key to making that possible was the Web. The Almaden team turned
Netfinity into a system for "mass administration" -- the ability to manage
several machines of different types simultaneously -- by adding a layer of
software. And even more important, all this was possible from any Web
browser, eliminating the need for specially installed code to manage the
client machines.
The resulting Netfinity Web manager, called Webfinity, represented a
tremendous simplification. Shown at the Comdex computer show in the spring
of 1996 and delivered in the fourth quarter of that year, it was the first
completely Web-based systems management application for both hardware and
software. Webfinity proved to be just the beginning. It prompted Morris,
Pass and Welch to continue thinking about the goals of systems management,
ultimately leading them to a concept they dubbed "the umbilical."
Consider a typical scenario endemic to the computer business, says Pass.
"You buy a PC. It's been entombed in cardboard for six to eight weeks, its
BIOS is several revs out of date, and there are software bugs." Instead of
launching PCs into the world and leaving those problems for the consumer to
solve, the researchers decided, an electronic connection -- an umbilical --
should link the computer to the support center from the cradle to the
grave. Such a connection, says Pass, "requires zero knowledge on your part
to connect you to fixes via the Internet."
A simple example of this approach is IBM's
Update Connector, developed independently by
a team from IBM in Endicott, New York. Shipped with Aptivas® and other
IBM personal computers,
Update Connector links new machines to the
appropriate IBM service center. When the user first connects with the
service center, the system automatically downloads any fixes that have been
identified since the computer left the factory -- with no effort on the
part of the user. Beyond that, says
Morris, "each client remains connected to the mother for life, through the
Web."
AUTOMATED UPDATING
Researchers at Almaden and Watson have extended the concept of the
umbilical connection. In a project that represents the first phase of
implementing the new management framework, they have gone beyond Update
Connector's one-to-one link between PC and service center and introduced
the concept of update management. "The idea," explains Watson's Tom
Chefalas, one of the architects of the new approach, "is that a large
organization would not want to handle every machine on an individual basis."
The new update management function is a systems management tool that
permits an administrator to distribute software updates automatically and
selectively to a large population of computers. "The traditional way of
updating software," says Chefalas, "is for administrators or end users to
periodically search the Web for the applicable fixes for their machines,
download all the updates and carefully install them on each computer."
Update management relieves administrators of this tedious task. It also
allows them to target updates to machines of their choosing. "For example,"
says Chefalas, "the administrator may want to test out the latest fixes on
a select group before distributing to everyone's machine. Since users have
different jobs and systems requirements, everyone does not need to have the
same level of updates installed."
The technology relies on three segments working together: client agents, an
update management tool and the support center database of available
software. "The agent inventories the hardware and software on the client
computer," explains Jeff Kreulen, an Almaden researcher. "That information
is used to query the support center for the available software that may be
applicable to the system, allowing the update manager tool to automatically
determine the client's needs. Knowing exactly what software is applicable
to each system allows a systems administrator complete control over
distribution and maintenance."
Update management is a step along the route to a full-service goal. "This
new management approach starts at the original vendors, and drives all of
the fixes into the clients," explains Chefalas. If a vendor develops a new
product, for example, an effective full-service system would provide that
firm with a list of machines for which the product would work and a means
of informing the owners about the product.
KNOWLEDGE MANAGEMENT
That facility expands the boundaries of systems management. "Starting from
a focus on hardware management, we brought in software management, and now
we're including yet another level, knowledge management," says Chefalas.
Knowledge management has two parts: appropriate Web sites containing the
requisite content, and the daily evolution of that content as new problems
are discovered and fixes arrive for old problems.
Knowledge management, or the ability to capture and share information
across an organization, is also being applied to improve the operation of
help desks. "Staffing help desks is a very expensive piece of a low-margin
business like personal computers," says Watson researcher Sid Hantler. In
addition, the help-desk person who first answers a call must field all
sorts of queries -- from simple requests for information, such as where to
buy a part, to pleas for help when a computer has seized up. Hantler's
group has designed a prototype system, based on an innovative combination
of databases and decision trees, that helps find the answers. The payoff is
higher efficiency and shorter help-desk calls (see "Help for
the Help Desk").
HOMING IN ON FAULTS
In a related area, Almaden's Roger Williams is working on ways to identify
the root causes of breakdowns in systems as quickly as possible. The
objective is to cut the cost of diagnosis. "If a printer problem is caused
by a fault in the network, you don't want to call in the printer guy,"
explains Williams. Avoiding such mistakes requires corporate policies that
specify how the systems administrator should deal with different types of
faults. In collaboration with Tivoli, Williams has developed Policy Studio,
a system that aids in setting appropriate policies (see "Getting to the Root of the Problem").
That approach represents a move toward providing network administrators or
individual users with the information they need to use their time more
effectively. Such an approach might inform a Webmaster, for example, that a
particular Web page takes a long time to download because of a logo, giving
the Webmaster the option of omitting the logo. Similarly, it might persuade
individual users to turn off the graphics on their browser because the GIF
files are downloading too slowly.
The new management approach is a long-term initiative, but it is already
changing the way people think about an important aspect of systems
management. By linking IBM's PC customers with all the resources needed for
lifetime support, it can dramatically simplify PC ownership. We may never
experience again the relatively carefree technology of the past, but, if
this new vision is successfully deployed, end users should enjoy much
greater productivity and satisfaction.
Peter Gwynne is a freelance writer based in Cape Cod,
Massachusetts.
More Information:
Help for the Help Desk
Getting to the Root of the Problem
Help for the Help Desk
People who staff company call centers and computer help desks often need
help themselves. They can expect a wide spectrum of questions, ranging from
"What is the part number of the memory module I need for my computer?" to
"My computer no longer has sound -- what should I do?" Even the
best-trained professional cannot deal with every question quickly and
effectively. So the race is on for an automated system that provides most
of the answers for help desk personnel.
A team at IBM's Thomas J. Watson Research Center is developing just such a
system. These systems must not only help diagnose computer problems but
provide an answer to virtually any question about the vendor's operations
and products. "We put together a prototype system that uses a simple
knowledge-based system approach," explains team member Sid Hantler. "We're
trying to build it so that it handles anything from the simplest
information inquiry to the most complex diagnostic problem."
Typically, call center information is stored on a server. On receiving a
request from a call taker, the server doles out information to the call
taker's client machine. The Watson design, started when the team identified
several problems in other vendors' call center software, has two special
features. First, the client/server application, written in Java, takes
pressure off the server by enlisting the client machines to do many of the
calculations necessary to answer a question. That should free up the server
to run more applications, speeding up the call center's operation. In
addition, notes Hantler, "we have a novel way of representing the
information." Instead of using case examples as the basis for answers, he
says, "we think that a database is more reasonable at the simple end and
decision trees at the complex end of the spectrum."
The call takers won't have to decide whether a problem is simple or
complex. They will need only to describe the problem. Information coming
back from the system will appear on their screens.
The design already has a customer. Staffers at IBM North America's call
centers in Atlanta, Dallas and Toronto started to use a pilot version in
mid-March. Further development, financed by IBM North America, should
expand the system's range of application.
Getting to the Root of the Problem
When they detect problems in their networks, systems administrators want to
pinpoint the cause and take remedial action as soon as possible. But all
too often, says Roger Williams, a scientist at the Almaden Research Center,
"IT managers spend time chasing dependent symptoms rather than causes."
They might, for example, attempt to diagnose a fault in a computer when the
real problem is in the network itself. A tool for helping administrators
identify root causes would therefore save time and money.
Enter Policy Studio. Under development by Williams, this is a tool for
getting to the bottom of network problems. It provides a user interface
where the system manager can specify system dependencies, identifying root
causes of possible failures. In addition, system policies can be
constructed to specify the actions to take when the failure occurs. An
example of a failure caused by an underlying system dependency is a network
outage that makes the name server unavailable, causing an application
failure.
In a typical scenario, the help desk will hear from a Lotus Notes user
unable to access his mail database, even though Policy Studio has
identified the problem to be a network outage. A typical policy to follow
in this case is: "Call Fred when a critical application (e.g., Notes) has
been down for 20 minutes, unless this has been caused by a dependent
interruption." When Policy Studio is in use, a network technician, rather
than Fred, will be dispatched to address the rootcause network problem.
"That achieves three objectives," says Williams. "It ensures that the
person most likely to resolve the problem is dispatched. It ensures that
the wrong person is not trying to fix it. And it provides the customer --
the systems administrator -- with a direct understanding of the root cause
and the downstream symptoms."
Williams is aiming to improve the Policy Studio approach in two ways. He
wants to further automate the process of problem identification and make
the policies easier to construct. And he plans to make the system more
user-friendly. "Several customers have indicated that, if usability can be
improved, the payoff in terms of the ability to specify more complicated
rules would be very high," says Williams.