In Brief:
Handling a high-profile Web site can be a Webmaster's nightmare. However,
the Atlanta Olympics Web site at the 1996 summer games ran like a dream,
handling nearly 200 million hits over the course of 17 days. It was created and
managed using a new, evolving IBM technology called Web Object Manager (WOM).
This builds pages dynamically from "objects" stored in an underlying
database as they are requested, even customizing them to the needs of the user
or the characteristics of the clients browser. The Olympics site had some 35,000
objects. WOM also incorporates a new approach to load balancing and scalability,
which allows additional processing power to be easily added to an existing site.
In what seems like an almost inconceivably short time, the Internet has
changed from a useful resource for the few to an indispensable tool for the
many. Much of the credit goes to the introduction of the World Wide Web, which
provides hyperlinked access to data and programs on a vast number of servers
around the world. Yet, despite their growing centrality in our lives, both the
Internet and the Web remain mysterious, more easily talked about and used than
understood. Nor is that surprising. One cannot point to the Internet and say it
is here or it is there. Rather, like the ghost of Hamlet's father, it is here and
there and, indeed, everywhere. It is natural, then, that efforts to
characterize the advance of Internet technology tend to seize on the most
visible elements. Browsers, new content and electronic commerce along with the
perennial quest for greater bandwidth appear to be the key factors determining
the future of the Web. While no one would deny that these are among the forces
driving the popularity of the Internet, the continued expansion of the Web
depends on addressing a quite different set of issues.
As more than one Webmaster charged with creating and running a Web site has
discovered, popularity can pose problems of its own. The goal of managing
content and presenting site "visitors" with information tailored to
their needs grows ever more elusive as the site expands, and as the number of
users increases. The more pages a site creates, the more work there is in
updating them, ensuring consistency, and simply keeping track of whats already
been created. At the same time, as Web sites are called on to handle not just a
greater number of daily hits for simple files but requests that require the
server to execute programs such as traditional Common Gateway Interface (CGI)
scripts the pressure on resources can soar. Web servers designed to meet an
anticipated level of service may work admirably up to a point and then stumble
or crash under a sudden flurry of hits because they are unable to grow, or be
scaled up, in a simple way.
Content management and scalability are two of the key problems that the
Internet now faces. What is the solution? The desperate one of simply throwing
more programmers and more hardware at the problem can be extremely costly and
inefficient: ultimately, the need for new content and updates will always
outstrip the practical resources available, whereas hardware brought in to
handle peak loads sits by idly the rest of the time. The first steps toward a
novel solution began to emerge at IBM in 1994. Sean Martin, a member of IBM's UK
Client Server Group based in London, and Frank Schwichtenberg, of IBM Germany,
helped create a Web site AixServe for the Europe-wide Installation Support
Center that would provide product information and technical help. As he focused
on the task, Martin was struck by the existing limitations of Web servers.
A Web site, he realized, is essentially a collection of digital files
texts, images, programs, and so forth that are typically treated as independent
entities. The same navigation bar, for example, may appear on all the pages, but
it has to be recoded each time. In the course of discussing the problem with IBM
Austin's Rob Gore, the architect of AixServe, it dawned on Martin that there was
a better way. How much simpler, he reasoned, if the pieces needed to assemble a
page were regarded as the basic entities, and pages were created from a pool of
reusable parts each time they were requested.
By the end of 1994, Martin and Schwichtenberg had implemented the first
version of a technology that treated Web pages as entities that can be
degenerated dynamically on the fly (see "Serving Web Requests on the Fly,"
below). That pioneering Web server was the basis of what has come to be called
Web Object Manager (WOM).
A match for any sport
In early 1995, Martin gave a talk at an internal Internet meeting in
Thornwood, New York. As a result, he soon received an invitation to work on IBM's
1996 Summer Olympics project, which brought together leaders in Internet
technology from around the company to create the games official Web site (see "An
Olympic Victory," below). The group "warmed up" by working on
other sporting events, starting with a Web site for the 1995 Wimbledon tennis
tournament in June.
New challenges they faced at that time included the quirkiness of different
browsers: not all could display Web pages created with a given version of
Hypertext Markup Language (HTML), the set of tags used to format a Web page. "For
Wimbledon," says Martin, who is currently in the United States on
assignment at the Internet Division, "we added a new level of
personalization to the capability of dynamically generating pages on the fly. By
merging a request for specific information with the required HTML format
required by a clients browser, a customized page was sent to the user."
It became immediately apparent that dynamic page generation was especially
attractive for such rapidly changing sites, in which scores need to be
continually updated. Together with Andy Stanford-Clark a member of the UK's
Hursley Laboratory currently on assignment at the Thomas J. Watson Research
Center Martin wrote a new version of WOM for the PGA golf tournament in August,
and still another version for the U.S. Open tennis tournament the following
month. A core technical team at Watson has continued to develop the underlying
concepts of WOM, and Chet Murthy has now rewritten the code in Java on top of
IBM's DB2 relational database. As it evolves, the advantages of WOM's ability to
simplify content authoring and management has become increasingly evident. "Beyond
the immediate benefits to Webmasters, the kind of personalization WOM makes
possible represents a huge extension of the most natural and desirable way of
doing business" says Scott Penberthy, one of the WOM team co-leaders at
Watson. "In the long run, it will allow companies to have a one-to-one
relationship with customers on a massive scale."
Balancing the load
Popular sporting events make for popular Web sites. In addition to the
normal heavy demand experienced by the site servers, key matches often generate
huge peaks. WOM itself adds to the computational load because of its dynamic
page generation capability. With the request to build multiple sporting event
Web sites staring him in the face, Martin began thinking about ways to allow the
server to run on multiple processors.
For Wimbledon, there wasn't time to do more than modify traditional
load-balancing techniques, which were used to shift requests between a server in
White Plains, New York, and one at the Hursley Lab, in the south of England. But
Martin did introduce the first version of WOMbat. This technique ensures that
each time a request is received from a new user, all the servers supporting a
site "ping" the client machine just like a bat. The first server to
receive an "echo" then becomes the default server for that user. In
August, while preparing for the PGA Golf tournament, Martin teamed up with
Stanford-Clark to design a more general load-balancing scheme. The previous
January, Stanford-Clark, then at Hursleys Commercial Parallel Center building
parallel applications for customers using the IBM SP scalable parallel system,
had had his own epiphany. "I saw that, as people began putting up Web sites
with more advanced applications, such as catalogs, there would be a need for a
scalable Web server," he recalls.
Using load-leveling software called Interactive Session Support (ISS) that
Stanford-Clark and Graham Wallis had written a couple of years before, Martin
and Stanford-Clark designed and built a scalable Web server based on an SP. "ISS
allows a number of computers such as the various processor nodes of an SP to
behave as though they were a single machine," says Stanford-Clark. A key
additional component, which enabled essentially unlimited scalability, was a new
kind of router code recently renamed Network Dispatcher that had been
developed at Watson earlier (see "Scalability without Limits,"below).
Linking to legacy data
In the course of its evolution, the vision of what WOM will enable has
considerably enlarged. Martin credits Murthy with making him aware of WOM's true
potential. "Chet saw how WOM might become an Internet middleware layer that
would allow applications to find their home in the WOM programming environment,"
he explains.
In Murthy's vision, WOM provides the means to link users to legacy, or
enterprise, data most of which runs on mainframes. Such a link is important
both for intranets companywide networks based on Internet technology as well
as the Internet as a whole. "For example," says Jim Russell, the other
co-leader of the Watson WOM team, "it might be desirable to allow employees
to access their 401K account data online, not only to look at their balance but
to modify their allocations. That means having an intranet application that
enables one to access the database, have a "page" of data sent to one,
modify it and send back the updated version."
Potentially, transactions over the Internet could be even more far-reaching.
They could range from generating a personalized "catalog" from a
retailers stock to transferring funds in and out of ones bank account or making
and paying for an airline reservation. For that to happen, two things are
needed: gateways to access legacy data and Web applications that have the
robustness and the reliability that users have come to expect for their
mission-critical applications. "When one thinks about it," says
Murthy, "IBM built the original intranet transaction system inside the
company a long time ago, with distributed clients running all around the world
using CICS (Customer Information Control System) to carry out transactions
remotely and interact with mainframe databases. Our goal is to merge the
acknowledged capabilities of CICS and its associated legacy software with the
world of new media content." WOM technology is becoming available to
selected customers who face the challenge of running large, complex Web sites.
As the rollout continues, WOM will inevitably continue to grow closer to
realizing the vision of its developers and in the process change the nature of
networked computing.
More Information:
Serving Web Requests on the Fly
Scalability Without Limits
An Olympic Victory
Serving Web Requests on the FlyWhen a traditional Web site receives a request from a browser, it returns a
previously composed page, which may be one of hundreds or thousands. Each page
stored at the site must be created separately, even though certain elements of
the pages such as a common site logo, the navigation bar, and so on may be the
same. In some instances, a users request requires the output of a Common Gateway
Interface script a program that can be invoked from a browser to run on the
server.
Web Object Manager (WOM) approaches both flat files and scripts in a very
different way. It replaces the concept of a Web page with an object called a
canvas, which provides a common look and feel to all the pages served by the Web
site. The canvas contains special tags for embedding page parts that constitute
the various components of the page. For example, page parts such as headings,
formatted in Hypertext Markup Language (HTML), are common to every page. Other
HTML parts contain content specific to the page or to the user who requests the
page. Finally, there are parts that invoke Java code that can perform business
logic functions, such as querying a database and returning the data in an HTML
format.
Each page part also contains a pointer that enables the actual part to be
retrieved on the fly from an underlying database. The fully constituted page is
then sent to the user. Page parts not only enable reuse of common presentation
and business logic elements across multiple applications, but they also allow
automatic customization. For example, a single page part representing a companys
balance sheet might render itself as an HTML table or a Java spreadsheet,
depending on the capabilities of the Internet browser or on the users
preference.
In addition to the page parts themselves, which constitute the primary data,
WOM stores "metadata" about the objects. That can include a variety of
information, such as the name of the author who created the part, the date when
it was created, the date after which it is no longer valid, lists of names of
system users authorized to modify the part, and status information indicating
whether the part is being modified, reviewed and so forth. Different versions of
a part can be tracked in this way, and the entire task of managing and updating
the content of a Web site can be greatly simplified. WOM can also hold arbitrary
metadata associated with a part that can be used by an application, such as
showing confidential information only to approved users.
Scalability Without Limits
The goal of scalability is to provide sufficient resources to meet demand
seamlessly, without any interruption of service or inconvenience to the client.
For that reason, simply replacing a computer with a more powerful one is not a
form of scalability, since in most cases that single machine will itself run out
of steam at some point.
Traditionally, Web sites have run on a single server. Individuals wishing to
access the site must know the name of the server. When the Internet name is
entered by the person interacting with the Web through a browser, the name is
sent to a domain name server (DNS) somewhere in the network, which "resolves"
the name into an Internet Protocol (IP) address and then sends it back to the
users machine, the client. It is that address that is then sent on to the server
to initiate a session with a particular Web site.
Inherent in this scenario is the principle that every server has one and
only one address. If one adds additional servers, one would need to share a
single address among multiple machines. One would also need to implement a means
of sharing the load the incoming requests among the various servers. IBM
implemented such a scheme to meet these requirements for its Olympic Web site.
Once the address is returned to the client by the DNS, the request is sent
to a machine running a software technology known as Network Dispatcher (ND). ND
is not itself a Web server and so cannot return any data. "What it does"
says Andy Stanford-Clark, who helped develop the scalability scheme "is
pass on requests to backend servers. In this way, multiple servers can share a
single Internet address."
The ND also runs a load-balancing software program called Interactive
Session Support (ISS). Each backend server runs an ISS agent, which provides
feedback to the ND about the current load of the server. The ND then passes on
requests in a weighted round-robin fashion, starting with the server having the
smallest load.
An ND and the backend servers constitute a cluster, and when needed as did
www.atlanta.olympic.org during the 1996 summer games a Web site can be scaled
even further by interconnecting a group of clusters, which can be at widely
separated locations. In this configuration, the ISS can emulate the function of
a domain name server. However, unlike an ordinary DNS, ISS enables the DNS to
associate a single Internet name with more than one IP address, each of which is
that of a particular cluster.
The clusters ISS agent sends continual updates on the status of the cluster
to ISS. When ISS receives a name resolution request from a client, it determines
which IP address should be returned. "In this way," notes
Stanford-Clark, "a single logical Web site defined by a specific www...
name is distributed across multiple physical locations."
An Olympic Victory
To a degree unusual for technology barely out of the prototype stage, the
concept of the Web Object Manager (WOM) and the associated ideas for scalability
and application development have undergone a remarkable proof of concept. In the
spring and early summer of 1996, an IBM team based in Southbury, Connecticut,
used WOM to build the official Internet Web site of the Atlanta Olympic Games as
part of IBM's corporate sponsorship of the event.
The site comprising 50 nodes of a distributed IBM RS/6000 SP scalable
parallel system spread over five locations and four countries easily qualified
as the largest Web server in the world. Over 17 days, it logged 192 million
hits, with a peak of nearly 17 million in a single day. And each of these pages
was generated dynamically.
Even before the Olympics opened, a server based on the IBM Net.Commerce
server was used to sell more than $5 million worth of tickets to events at the
games over the Internet. During the games, a results system and server
technology developed by a team at Watson led by Paul Dantzig made it possible to
provide Internet users with real-time results for all of the 37 sports.
Most of the processing power was located at Southbury. But the four mirror
sites at Cornell Theory Center in Ithaca, New York; the IBM Hursley Laboratory;
the University of Karlsruhe, near Mainz, Germany; and Keio University, near
Tokyo, Japan served nearly exact replicas. The Sneak Peek application, however,
which furnished a constantly updated supply of images from 38 venues at the rate
of 21,000 per hour, was maintained only at Southbury. Audio and video clips, as
well as live audio from WGST Atlanta, were also made available using a streaming
technology known as Bamba (see "Net
Results").
Personal home pages could also be created for visitors, based on WOMs
dynamic-page-generation capability and information supplied by a users browser "cookie,"
which identifies the end user. By prompting users to specify the sports,
athletes and countries they were most interested in following, a Home Page
application would build a personalized page. Another personal touch was provided
by an online data mining tool that tracked users "clicks" and streamed
all the data into DB2 tables by means of an 8-way IBM System/390® Sysplex.
The information was then used to compile a user profile so that a person could
request links to information based on his or her browsing patterns.
A particularly useful, if not necessary, requirement for being able to
navigate, as well as manage, a large site is a way of categorizing the various
pages. Traditional HTML hyperlinks provide one way of doing this. But the
metadata category tags in the WOM database made it possible to build an "information
space," in which pages that contain conceptually similar material would be
found "close" together. That would allow someone following, say, the
Canoeing pages to readily look at those related to Kayaking, although there
might be no HTML link between them.
These features represent just a few of the applications developed for the
Olympics site. They illustrate the power of the WOM architecture to enhance the
usefulness of the Web and simplify its management.