IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
Managing the Web


In Brief:

Handling a high-profile Web site can be a Webmaster's nightmare. However, the Atlanta Olympics Web site at the 1996 summer games ran like a dream, handling nearly 200 million hits over the course of 17 days. It was created and managed using a new, evolving IBM technology called Web Object Manager (WOM). This builds pages dynamically from "objects" stored in an underlying database as they are requested, even customizing them to the needs of the user or the characteristics of the clients browser. The Olympics site had some 35,000 objects. WOM also incorporates a new approach to load balancing and scalability, which allows additional processing power to be easily added to an existing site.


In what seems like an almost inconceivably short time, the Internet has changed from a useful resource for the few to an indispensable tool for the many. Much of the credit goes to the introduction of the World Wide Web, which provides hyperlinked access to data and programs on a vast number of servers around the world. Yet, despite their growing centrality in our lives, both the Internet and the Web remain mysterious, more easily talked about and used than understood. Nor is that surprising. One cannot point to the Internet and say it is here or it is there. Rather, like the ghost of Hamlet's father, it is here and there and, indeed, everywhere. It is natural, then, that efforts to characterize the advance of Internet technology tend to seize on the most visible elements. Browsers, new content and electronic commerce along with the perennial quest for greater bandwidth appear to be the key factors determining the future of the Web. While no one would deny that these are among the forces driving the popularity of the Internet, the continued expansion of the Web depends on addressing a quite different set of issues.

As more than one Webmaster charged with creating and running a Web site has discovered, popularity can pose problems of its own. The goal of managing content and presenting site "visitors" with information tailored to their needs grows ever more elusive as the site expands, and as the number of users increases. The more pages a site creates, the more work there is in updating them, ensuring consistency, and simply keeping track of whats already been created. At the same time, as Web sites are called on to handle not just a greater number of daily hits for simple files but requests that require the server to execute programs such as traditional Common Gateway Interface (CGI) scripts the pressure on resources can soar. Web servers designed to meet an anticipated level of service may work admirably up to a point and then stumble or crash under a sudden flurry of hits because they are unable to grow, or be scaled up, in a simple way.

Content management and scalability are two of the key problems that the Internet now faces. What is the solution? The desperate one of simply throwing more programmers and more hardware at the problem can be extremely costly and inefficient: ultimately, the need for new content and updates will always outstrip the practical resources available, whereas hardware brought in to handle peak loads sits by idly the rest of the time. The first steps toward a novel solution began to emerge at IBM in 1994. Sean Martin, a member of IBM's UK Client Server Group based in London, and Frank Schwichtenberg, of IBM Germany, helped create a Web site AixServe for the Europe-wide Installation Support Center that would provide product information and technical help. As he focused on the task, Martin was struck by the existing limitations of Web servers.

A Web site, he realized, is essentially a collection of digital files texts, images, programs, and so forth that are typically treated as independent entities. The same navigation bar, for example, may appear on all the pages, but it has to be recoded each time. In the course of discussing the problem with IBM Austin's Rob Gore, the architect of AixServe, it dawned on Martin that there was a better way. How much simpler, he reasoned, if the pieces needed to assemble a page were regarded as the basic entities, and pages were created from a pool of reusable parts each time they were requested.

By the end of 1994, Martin and Schwichtenberg had implemented the first version of a technology that treated Web pages as entities that can be degenerated dynamically on the fly (see "Serving Web Requests on the Fly," below). That pioneering Web server was the basis of what has come to be called Web Object Manager (WOM).

A match for any sport

In early 1995, Martin gave a talk at an internal Internet meeting in Thornwood, New York. As a result, he soon received an invitation to work on IBM's 1996 Summer Olympics project, which brought together leaders in Internet technology from around the company to create the games official Web site (see "An Olympic Victory," below). The group "warmed up" by working on other sporting events, starting with a Web site for the 1995 Wimbledon tennis tournament in June.

New challenges they faced at that time included the quirkiness of different browsers: not all could display Web pages created with a given version of Hypertext Markup Language (HTML), the set of tags used to format a Web page. "For Wimbledon," says Martin, who is currently in the United States on assignment at the Internet Division, "we added a new level of personalization to the capability of dynamically generating pages on the fly. By merging a request for specific information with the required HTML format required by a clients browser, a customized page was sent to the user."

It became immediately apparent that dynamic page generation was especially attractive for such rapidly changing sites, in which scores need to be continually updated. Together with Andy Stanford-Clark a member of the UK's Hursley Laboratory currently on assignment at the Thomas J. Watson Research Center Martin wrote a new version of WOM for the PGA golf tournament in August, and still another version for the U.S. Open tennis tournament the following month. A core technical team at Watson has continued to develop the underlying concepts of WOM, and Chet Murthy has now rewritten the code in Java on top of IBM's DB2 relational database. As it evolves, the advantages of WOM's ability to simplify content authoring and management has become increasingly evident. "Beyond the immediate benefits to Webmasters, the kind of personalization WOM makes possible represents a huge extension of the most natural and desirable way of doing business" says Scott Penberthy, one of the WOM team co-leaders at Watson. "In the long run, it will allow companies to have a one-to-one relationship with customers on a massive scale."

Balancing the load

Popular sporting events make for popular Web sites. In addition to the normal heavy demand experienced by the site servers, key matches often generate huge peaks. WOM itself adds to the computational load because of its dynamic page generation capability. With the request to build multiple sporting event Web sites staring him in the face, Martin began thinking about ways to allow the server to run on multiple processors.

For Wimbledon, there wasn't time to do more than modify traditional load-balancing techniques, which were used to shift requests between a server in White Plains, New York, and one at the Hursley Lab, in the south of England. But Martin did introduce the first version of WOMbat. This technique ensures that each time a request is received from a new user, all the servers supporting a site "ping" the client machine just like a bat. The first server to receive an "echo" then becomes the default server for that user. In August, while preparing for the PGA Golf tournament, Martin teamed up with Stanford-Clark to design a more general load-balancing scheme. The previous January, Stanford-Clark, then at Hursleys Commercial Parallel Center building parallel applications for customers using the IBM SP scalable parallel system, had had his own epiphany. "I saw that, as people began putting up Web sites with more advanced applications, such as catalogs, there would be a need for a scalable Web server," he recalls.

Using load-leveling software called Interactive Session Support (ISS) that Stanford-Clark and Graham Wallis had written a couple of years before, Martin and Stanford-Clark designed and built a scalable Web server based on an SP. "ISS allows a number of computers such as the various processor nodes of an SP to behave as though they were a single machine," says Stanford-Clark. A key additional component, which enabled essentially unlimited scalability, was a new kind of router code recently renamed Network Dispatcher that had been developed at Watson earlier (see "Scalability without Limits,"below).

Linking to legacy data

In the course of its evolution, the vision of what WOM will enable has considerably enlarged. Martin credits Murthy with making him aware of WOM's true potential. "Chet saw how WOM might become an Internet middleware layer that would allow applications to find their home in the WOM programming environment," he explains.

In Murthy's vision, WOM provides the means to link users to legacy, or enterprise, data most of which runs on mainframes. Such a link is important both for intranets companywide networks based on Internet technology as well as the Internet as a whole. "For example," says Jim Russell, the other co-leader of the Watson WOM team, "it might be desirable to allow employees to access their 401K account data online, not only to look at their balance but to modify their allocations. That means having an intranet application that enables one to access the database, have a "page" of data sent to one, modify it and send back the updated version."

Potentially, transactions over the Internet could be even more far-reaching. They could range from generating a personalized "catalog" from a retailers stock to transferring funds in and out of ones bank account or making and paying for an airline reservation. For that to happen, two things are needed: gateways to access legacy data and Web applications that have the robustness and the reliability that users have come to expect for their mission-critical applications. "When one thinks about it," says Murthy, "IBM built the original intranet transaction system inside the company a long time ago, with distributed clients running all around the world using CICS (Customer Information Control System) to carry out transactions remotely and interact with mainframe databases. Our goal is to merge the acknowledged capabilities of CICS and its associated legacy software with the world of new media content." WOM technology is becoming available to selected customers who face the challenge of running large, complex Web sites. As the rollout continues, WOM will inevitably continue to grow closer to realizing the vision of its developers and in the process change the nature of networked computing.



More Information:

Serving Web Requests on the Fly

Scalability Without Limits

An Olympic Victory


Serving Web Requests on the Fly

When a traditional Web site receives a request from a browser, it returns a previously composed page, which may be one of hundreds or thousands. Each page stored at the site must be created separately, even though certain elements of the pages such as a common site logo, the navigation bar, and so on may be the same. In some instances, a users request requires the output of a Common Gateway Interface script a program that can be invoked from a browser to run on the server.

Web Object Manager (WOM) approaches both flat files and scripts in a very different way. It replaces the concept of a Web page with an object called a canvas, which provides a common look and feel to all the pages served by the Web site. The canvas contains special tags for embedding page parts that constitute the various components of the page. For example, page parts such as headings, formatted in Hypertext Markup Language (HTML), are common to every page. Other HTML parts contain content specific to the page or to the user who requests the page. Finally, there are parts that invoke Java code that can perform business logic functions, such as querying a database and returning the data in an HTML format.

Each page part also contains a pointer that enables the actual part to be retrieved on the fly from an underlying database. The fully constituted page is then sent to the user. Page parts not only enable reuse of common presentation and business logic elements across multiple applications, but they also allow automatic customization. For example, a single page part representing a companys balance sheet might render itself as an HTML table or a Java spreadsheet, depending on the capabilities of the Internet browser or on the users preference.

In addition to the page parts themselves, which constitute the primary data, WOM stores "metadata" about the objects. That can include a variety of information, such as the name of the author who created the part, the date when it was created, the date after which it is no longer valid, lists of names of system users authorized to modify the part, and status information indicating whether the part is being modified, reviewed and so forth. Different versions of a part can be tracked in this way, and the entire task of managing and updating the content of a Web site can be greatly simplified. WOM can also hold arbitrary metadata associated with a part that can be used by an application, such as showing confidential information only to approved users.


Scalability Without Limits


The goal of scalability is to provide sufficient resources to meet demand seamlessly, without any interruption of service or inconvenience to the client. For that reason, simply replacing a computer with a more powerful one is not a form of scalability, since in most cases that single machine will itself run out of steam at some point.

Traditionally, Web sites have run on a single server. Individuals wishing to access the site must know the name of the server. When the Internet name is entered by the person interacting with the Web through a browser, the name is sent to a domain name server (DNS) somewhere in the network, which "resolves" the name into an Internet Protocol (IP) address and then sends it back to the users machine, the client. It is that address that is then sent on to the server to initiate a session with a particular Web site.

Inherent in this scenario is the principle that every server has one and only one address. If one adds additional servers, one would need to share a single address among multiple machines. One would also need to implement a means of sharing the load the incoming requests among the various servers. IBM implemented such a scheme to meet these requirements for its Olympic Web site.

Once the address is returned to the client by the DNS, the request is sent to a machine running a software technology known as Network Dispatcher (ND). ND is not itself a Web server and so cannot return any data. "What it does" says Andy Stanford-Clark, who helped develop the scalability scheme "is pass on requests to backend servers. In this way, multiple servers can share a single Internet address."

The ND also runs a load-balancing software program called Interactive Session Support (ISS). Each backend server runs an ISS agent, which provides feedback to the ND about the current load of the server. The ND then passes on requests in a weighted round-robin fashion, starting with the server having the smallest load.

An ND and the backend servers constitute a cluster, and when needed as did www.atlanta.olympic.org during the 1996 summer games a Web site can be scaled even further by interconnecting a group of clusters, which can be at widely separated locations. In this configuration, the ISS can emulate the function of a domain name server. However, unlike an ordinary DNS, ISS enables the DNS to associate a single Internet name with more than one IP address, each of which is that of a particular cluster.

The clusters ISS agent sends continual updates on the status of the cluster to ISS. When ISS receives a name resolution request from a client, it determines which IP address should be returned. "In this way," notes Stanford-Clark, "a single logical Web site defined by a specific www... name is distributed across multiple physical locations."


An Olympic Victory

To a degree unusual for technology barely out of the prototype stage, the concept of the Web Object Manager (WOM) and the associated ideas for scalability and application development have undergone a remarkable proof of concept. In the spring and early summer of 1996, an IBM team based in Southbury, Connecticut, used WOM to build the official Internet Web site of the Atlanta Olympic Games as part of IBM's corporate sponsorship of the event.

The site comprising 50 nodes of a distributed IBM RS/6000 SP scalable parallel system spread over five locations and four countries easily qualified as the largest Web server in the world. Over 17 days, it logged 192 million hits, with a peak of nearly 17 million in a single day. And each of these pages was generated dynamically.

Even before the Olympics opened, a server based on the IBM Net.Commerce server was used to sell more than $5 million worth of tickets to events at the games over the Internet. During the games, a results system and server technology developed by a team at Watson led by Paul Dantzig made it possible to provide Internet users with real-time results for all of the 37 sports.

Most of the processing power was located at Southbury. But the four mirror sites at Cornell Theory Center in Ithaca, New York; the IBM Hursley Laboratory; the University of Karlsruhe, near Mainz, Germany; and Keio University, near Tokyo, Japan served nearly exact replicas. The Sneak Peek application, however, which furnished a constantly updated supply of images from 38 venues at the rate of 21,000 per hour, was maintained only at Southbury. Audio and video clips, as well as live audio from WGST Atlanta, were also made available using a streaming technology known as Bamba (see "Net Results").

Personal home pages could also be created for visitors, based on WOMs dynamic-page-generation capability and information supplied by a users browser "cookie," which identifies the end user. By prompting users to specify the sports, athletes and countries they were most interested in following, a Home Page application would build a personalized page. Another personal touch was provided by an online data mining tool that tracked users "clicks" and streamed all the data into DB2 tables by means of an 8-way IBM System/390® Sysplex. The information was then used to compile a user profile so that a person could request links to information based on his or her browsing patterns.

A particularly useful, if not necessary, requirement for being able to navigate, as well as manage, a large site is a way of categorizing the various pages. Traditional HTML hyperlinks provide one way of doing this. But the metadata category tags in the WOM database made it possible to build an "information space," in which pages that contain conceptually similar material would be found "close" together. That would allow someone following, say, the Canoeing pages to readily look at those related to Kayaking, although there might be no HTML link between them.

These features represent just a few of the applications developed for the Olympics site. They illustrate the power of the WOM architecture to enhance the usefulness of the Web and simplify its management.




    About IBMPrivacyContact