IBM Israel Research Seminars
 
We revisit key performance issues and assumptions on which modern parallel and distributed systems are constructed, while taking both theoretical and practical points of view. Specifically, we focus on two representative forms of computation: tightly-coupled parallel computing and extremely large-scale distributed systems.
In tightly-coupled clusters, the communication bottleneck at the end-nodes themselves is often the major cause for poor performance. While the introduction of System Area Networks (SANs) alleviated this bottleneck, applications seldom realize raw hardware capabilities, mainly due to substantial mismatches between the software architecture and hardware.
One source for such mismatches is the strong semantics of standard messaging APIs, which are not always required by the application. Therefore, many common overheads can be tackled upfront by deviating from these APIs. Using a full-fledged prototype of a demanding Distributed Shared Memory (DSM) system, we show how network and OS primitives can be integrated in the kernel to provide a high-performance platform that is well matched to application semantics. An alternative approach that we explore entails avoiding most overheads using theoretical memory models that are better matched to the hardware's semantics. Specifically, we assess implementation tradeoffs of the Bulk Synchronous Parallel (BSP) model, which endorses coarse-grain synchronization.
In extremely large-scale systems, in contrast, the varying communication latencies between nodes and dynamic changes in topology play a far more important role, and the ever-increasing demand for scalability has driven locality considerations into every aspect of algorithm design. Unfortunately, there are many important problems that cannot be solved in a local manner per se, i.e., O(1) complexity in problem size for all instances. In addition, the common measures for characterizing the potential efficiency of algorithms and that actually achieved by them, namely worst- and average-case (over problem instances) complexities, fail to capture the essence of real problems in a meaningful way.
Thus, we advocate instance-local algorithms whose performance depends on some metric of the input instance rather than the graph size. We initially demonstrate this approach with two practical algorithms: an algorithm for majority voting in highly dynamic networks, and a scalable quota management protocol for Grid computing environments. Then, we take a pure theoretical stand and establish a formal foundation for reasoning about global aggregation problems, which constitute an important application in many large-scale systems. We define a new metric on problem instances, Veracity Radius (VR), which captures the inherent possibility to compute them locally. We prove that VR yields a tight lower bound on both output-stabilization and quiescence times, and provide an efficient algorithm whose performance is proportional to VR for every problem instance.
 
- Speaker: Liran Liss, Technion
- Time: 05/03/2006, 10:00 AM - 11:00 PM
- Back to Previous Seminar Listings
