Web Technologies Research

Web


IBM Research has been active in virtually all areas of Web-related research since the Web’s inception. This has involved providing dynamic data at some of the most highly accessed Web sites in the world, new techniques for searching and characterizing the Web, contributions to standards affecting the entire industry, creating the infrastructure allowing the Web to move to XML, the development of cooperative and collaborative applications enabled by the Web, and Web access from mobile devices.

Web performance

We have pioneered new techniques for efficiently serving dynamic content. Our work has addressed the challenging problem of publishing large amounts of dynamically changing content while handling high peak loads. The techniques resulting from our research have been deployed at some of the most highly accessed Web sites in the world. Over the last few years, these sites have maintained 100% availability despite increases of orders of magnitude in publishing volumes and hit rates.

Other Web performance work has focused on caching, load balancing, Web server acceleration, Web traffic characterization, optimization and benchmarking of Web server performance, transparent discovery of network performance, and server support for gigabit networking.

XML infrastructure

We are creating the infrastructure for the Web to move to XML as its data exchange format and for much of its data persistence. Several directions are being explored including 1) a modular XHTML system that will integrate SMIL, XForms, MathML, P3P into HTML; 2) ability to separate information content and information rendering, and put them together again using the XSL stylesheet language; and 3) ability for application or industry specific markup vocabularies, described with XML Schema language and queried with XML Query Language.

Collaboration and awareness

In order to aid collaboration, we have developed a tool for high-level awareness and collaboration, called Livemaps, for projecting live information onto a Web site map. We are also investigating how to overcome two of the most difficult aspects of remote collaboration - common ground and group focus. Work involves modeling group behavior in collaborative environments and using these models to inform design of improved collaboration spaces. Another project focuses on the "distance" among people in a collaborative environment - breaking the isolation and providing group awareness and, at the same time, keeping the privacy. In our approach, awareness is provided by multi-device event perception; and distance is adjusted by multiagent cooperation and negotiation. We are exploring methods on multiagent negotiation such that "distances" among people can be adjusted intelligently.

Information retrieval

Another effort designs techniques to deal with information overload in today's interconnected world of Web and Intranet servers. Most of us react to information explosion by reading only relevant and authoritative matter. Relevance can be characterized by the documents that the user has seen or liked, and their link structure. Authority or quality can be attributed to documents based on hyperlink citations. Various techniques based on machine learning and graph algorithms are being used to mine documents in large hypertext databases for relevance and quality.

We are currently extending these methods to WWW image retrieval. By analyzing the page-to-image as well as page-to-page link structure, we are able to retrieve relevant images based on text queries. Additionally, we can locate image containers and image hubs which are defined as Web pages that are rich in relevant images, or from which many images are readily accessible.

Web access from mobile devices

We are investigating issues dealing with mobile knowledge seekers who typically need to access information from the Web when they are away from their desktops. The constraints imposed by mobile devices such as slow communication and form factor often make information discovery tasks impractical. We have developed a new focused-search approach specifically oriented for the mode of work and the constraints dictated by mobile devices. It combines focused search within specific topics with encapsulation of topic-specific information in a persistent repository. The repositories are based on "knowledge-agent bases" that comprise all the information necessary to access information about a topic and assist in the full search process from query formulation assistance to result scanning on the device itself.

Related Publications  

"The Connectivity Sonar: Detecting Site Functionality by Structural Patterns", Einat Amitay, David Carmel, Adam Darlow, Ronny Lempel, Aya Soffer, Proceedings of the 2003 ACM Hypertext Conference (HT '03), ACM Nelson award for best paper by a Hypertext conference newcomer.

"Energy Conservation Policies for Web Servers", Elmootazbellah Elnozahy, Mike Kistler, Ramakrishnan Rajamony, Proceedings of the 2003 USENIX Symposium on Internet Technologies and Systems (USITS 2003)

Lei Gao, Mike Dahlin, Amol Nayate, Jiandan Zheng and Arun K. Iyengar. Application Specific Data Replication for Edge Services. WWW 2003 - 12th International World Wide Web Conference. IFIP, January 2003.

Marcel Rosu and Daniela Rosu. Kernel Support for Faster Web Proxies. USENIX, Annual Technical Conference 2003. January 2003.

"Kernel Support for Faster Web Proxies," Marcel Rosu, Daniela Rosu, Proceedings of the 2003 USENIX Annual Technical Conference (USENIX 2003)


Rate this article

 


image

Architecture of the focused-search service for mobile users

Related Research  

Disciplines: Computer Science
Research Areas: Web
Research Labs: Almaden Research Center, Haifa Research Lab