Operating Systems: Selected Papers

An Infrastructure for Multiprocessor Run-Time Adaptation, Jonathan Appavoo, Kevin Hui, Michael Stumm, Robert Wisniewski, Dilma da Silva, Orran Krieger, Craig Soules WOSS 2002.

As computer systems become more complex, they become more difficult to administer properly. Modern systems are so complex that special training is needed to configure and maintain them, and this complexity is continuing to increase. Autonomic computing systems address this problem by managing themselves. Ideal autonomic systems just work, configuring and tuning themselves as needed. In this work we address the operating system issues in designing and implementing autonomic systems. This work has the tremendous potential to make machines that are easily to maintain, perform better across a wider variety of applications, and are more secure and available.


An Overview of the BlueGene/L Supercomputer. Jose Moreira, NR Adiga et al. Supercomputing Conference (SC2002)

This paper gives an overview of the BlueGene/L Supercomputer. This is a jointly funded research partnership between IBM and the Lawrence Livermore National Laboratory as part of the United States Department of Energy ASCI Advanced Architecture Research Program. Application performance and scaling studies have recently been initiated with partners at a number of academic and government institutions, including the San Diego Supercomputer Center and the California Institute of Technology. This massively parallel system of 65,536 nodes is based on a new architecture that exploits system-on-a-chip technology to deliver target peak processing power of 360 teraFLOPS (trillion floating-point operations per second). The machine is scheduled to be operational in the 2004-2005 time frame, at price/performance and power consumption/performance targets unobtainable with conventional architectures. This paper is important from an operating system perspective because it describes the system software approach to achieving extreme scalability: We use full-blown operating systems (Linux) on 1,024 I/O nodes. Each Linux operating system image controls 64 attached compute nodes through a lighweight compute node kernel that executes on each compute node. Therefore, we provide the view of a large but manageable machine size (1,024 nodes) while really offering 64k compute engines.


"An Observation-based Approach Towards Self-Managing Web Servers", Prashant Pradhan, Renu Tewari, Sambit Sahu (IBM Research), Abhishek Chandra, Prashant Shenoy (UMass), in Proceedings of IWQoS 2002, Miami Beach, Florida.

The web server architectures that provide performance isolation, service differentiation, and QoS guarantees, rely on external administrators to set the right parameter values for the desired performance. Due to the complexity of handling varying workloads and bottleneck resources, configuring such parameters optimally becomes a challenge. In this paper we describe an observation-based approach for self-managing web servers that can adapt to changing workloads while maintaining the QoS requirements of different classes. In this approach, the system state is monitored continuously and parameter values of various system resources---primarily the accept queue and the CPU---are adjusted to maintain the system-wide QoS goals. We implement our techniques using the Apache web server and the Linux operating system. We first demonstrate the need to manage different resources in the system depending on the workload characteristics. We then experimentally demonstrate that our observation-based system can adapt to workload changes by dynamically adjusting the resource shares in order to maintain the QoS goals.


Autonomic Computing and Grid, Pratap Pattnaik, Kattamuri Ekanadham, Joefon Jann Book chapter in "GRID2002" ed. by G.Fox/Ian Foster and published by John Wiley.

In this paper, we give an operational definition of an autonomic component of a server and identify the key features it must possess to be effective in the emerging environment. We then examine a scenario in the grid environment and argue that design of subsystems within the grid environment must naturally face the concerns addressed by autonomic designs. Hence, there is synergy between these two perspectives and one stands to gain by blending them together. Autonomic computing is the subject of study by many researchers in various contexts. Often the meaning of the term is very vague and is subject to interpretation. This paper attempts to standardize the notion by introducing certain essential parameters that must be spelled out in any autonomic subsystem. To the best of our knowledge, no one has attempted to do this and we believe that such a concrete definition is necessary for studying the problem and seeing if any techniques from literature (math and control theory) can be applied. The paper also addresses some common problems that occur in a Grid environment.


Energy trade-offs in the IBM Wristwatch Computer, N. Kamijoh, T. Inoue, C. M. Olsen, M. T. Raghunath, C. Narayanaswami, 2001 International Symposium on Wearable Computing (ISWC'01)

We recently demonstrated a high function wrist watch computer prototype that runs the Linux operating system and X11 graphics library. We describe the unique energy related challenges and tradeoffs we encountered while building this watch. We show that the usage duty factor for the device heavily dictates which of the powers, active power or sleep power, needs to be minimized more aggressively to achieve the longest battery life. We also describe energy issues that percolate through several layers of software all the way from device usage scenarios, applications, user interfaces, system level software to device drivers and the need to systematically address all of them to achieve the longest battery life. This project demonstrates how to extending battery life in pervasive devices and what impact to expect on Linux source code. It resulted in external visibility of the Linux Watch.


High-Performance Memory-Based Web Servers: Kernel and User-Space Performance, Philippe Joubert, Robert King, Richard Neves, Mark Russinovich and John Tracey, 2001 USENIX Annual Technical Conference

Adaptive Fast Path Architecture (AFPA) is a software architecture for high-performance network servers. The architecture is specifically designed to be general purpose. However, most of our work to date has focussed on Web servers. AFPA provides a framework that includes three main components: a RAM-based cache, a reverse, split-connection proxy, and a layer-7 (a.k.a. content-based) router. The cache allows AFPA to serve static content efficiently. The proxy distributes requests for dynamic content (that cannot be cached) to a set of "back-end" servers. The layer-7 router can be configured to identify which requests are handled by the cache and proxy respectively, based on URL path and extension (MIME type). All three components are implemented as extensions to the operating system kernel for maximum efficiency. AFPA was originally designed to cache content obtained via a file system interface. N-source In-Kernel Cache (NICache) is an extension to AFPA that allows caching of data from different sources, for example a back-end Web server or a user-level application server. NICache provides a caching framework that can be augmented with components for data retrieval from various sources, thus expanding AFPA's impact beyond origin Web servers to edge caches and application servers. AFPA has allowed IBM to achieve a leadership position in Web server performance. It has been used extensively to improve performance on IBM's SPECweb benchmark submissions and ships on all four IBM eServer platforms.


Improving Linux Block I/O for Enterprise Workloads, Shailabh Nagar, Hubertus Franke (IBM T.J.Watson Research Center) Peter W.Y. Wong, Badari Pulavarty, Janet Morgan, Jonathan Lahr, Bill Hartner, Suparna Bhattacharya (IBM Linux Technology Center) Ottawa Linux Symposium 2002.

The paper addresses the performance of the 2.4 Linux kernel's block I/O subsystem. Using I/O intensive decision support workloads, it systematically identifies the major bottlenecks in the block layers performance. For each of the bottlenecks, the paper proposes a kernel modification to alleviate the problem and demonstrates the performance improvements seen thereafter. Four bottlenecks are shown to affect performance: the use of bounce buffers for machines with over 1GB of memory, lock contention on the io_request_lock, suboptimal blocksizes for unbuffered I/O and inefficient support for scatter-gather unbuffered I/O. The kernel modifications proposed in the paper for these bottlenecks improves overall performance for decision-support workloads by 233%. The modifications are also analyzed using microbenchmarks and profiling data. The paper outlines a technique to reduce memory overhead for a key kernel I/O data structure, addressing a resource scalability concern for servers with a large number of attached devices. In addition, the paper examines some of the changes made to the block I/O code in the 2.5 development kernel and concludes that three of the four 2.4 kernel bottlenecks are addressed by the 2.5 kernel changes. Overall, the paper provides a comprehensive and systematic examination of 2.4 kernel block I/O performance for an enterprise workload with key insights useful for future Linux kernel development.


Using CQUAL for Static Analysis of Authorization Hook Placement, Xiaolan Zhang, Antony Edwards, Trent Jaeger. USENIX 2002 Security Symposium.

In this paper, we apply the C source code analysis tool CQUAL to verify the correct placement of authorization hooks in the Linux Security Modules (LSM) framework. LSM framework consists of over 200 authorization hooks placed throughout the Linux base kernel source that intend to enable access control over all security-sensitive Linux kernel operations, but it is not obvious that these hooks actually mediate all such operations. We show that the combination of some GCC parse tree analyses and CQUAL can be used to identify operations that are not mediated by an LSM hook. In fact, one of these operations was identified as a real flaw in the placement of an LSM hook and has since been fixed. The key features of this static analysis are that it is: (1) complete, such that there are no false negatives missed by the analysis over the entire kernel (assuming type safety) and (2) automated, so no user intervention is needed to annotate the kernel source. In this paper, the static analysis applied to a Linux 2.4.16 LSM kernel.