2005 Pat Goldberg Memorial Best Paper Award

Computer Science, Electrical Engineering and Mathematics


132 papers published in conference proceedings and journals in 2005 were submitted by IBM Research authors worldwide for consideration as 2005 Pat Goldberg Memorial CS/EE/Math Best Papers. The 2006 Best Paper Award Winners will be announced in August 2007.
The following six were selected (in alphabetical order)
:


  • Automatic Fragment Detection in Dynamic Web Pages and Its Impact on Caching
    Lakshmish Ramaswamy (Georgia Tech), Arun Iyengar, Ling Liu (Georgia Tech), Fred Douglis
    IEEE Transactions on Knowledge and Data Engineering
    Paper Description:
    This paper describes the first work which automatically detects fragments in Web pages for aiding fragment-based generation and caching of Web pages. The techniques described in the paper make it considerably easier to deploy fragment-based Web publication and caching. The paper introduces algorithms for automatically detecting, in dynamic web pages, fragments that are shared across documents and fragments that have specific lifetime characteristics. The algorithms rely on a precise definition of a candidate fragment and on an augmented fragment tree data structure in which nodes are annotated with fingerprints that allow for effective and efficient comparison of fragments. Further, an algorithm is introduced to incrementally compute fingerprint annotations, which improves performance. The study and simulations are thorough: the paper provides experimental results to evaluate the performance of the algorithms and to provide guidance into the determination of the required configuration parameters. The potential for long-term impact both in industry and academy is high. Fragment-based Web publication and fragment-based caching have already been used commercially by several companies including IBM. The presentation is very clear and concise. The paper is an extended, enhanced version of a conference paper which won the Best Paper Award at the 13th International World Wide Web conference (WWW2004), a Web PIC key conference.


  • Consistently Estimating the Selectivity of Conjuncts of Predicates
    Volker Markl, Peter Haas, Nimrod Megiddo, Marcel Kutsch (IBM Germany), T.M. Tran (IBM SVL), Utkarsh Srivastava (former summer intern, Stanford)
    International Conference on Very Large Data Bases (VLDB)
    Paper Description:
    Cost-based optimization is at the core of relational database technology for query processing. Since Selinger’s seminal paper in ’79, numerous studies approached the problem of estimating selectivity from different angles, including histograms, sampling, multivariate estimation and probabilistic models. Several recent publications pointed out deficiencies in selectivity estimation and plan selection in commercial databases. This paper stands out in addressing this problem through the use of entropy as a measure of selectivity and applying the principle of maximum entropy for cardinality estimation. It addresses the problem in which statistics of some predicates in a conjunctive form are known and the cost optimizer needs to estimate the selectivity of the whole conjunctive predicate (joint distribution). The paper formalizes the problem in terms of the information theoretic concept of maximum entropy instead of making ad hoc assumptions, such as independent and uniform distributions, which yield estimates that overshoot or undershoot the joint distribution. It then presents an iterative scaling algorithm that computes unbiased selectivity estimates using maximum entropy. Experiments using a prototype version of DB2 UDP shows that orders of magnitude improvement in selectivity estimation can be obtained which significantly reduces query execution time. The paper has been accepted for publication as an invited paper in the VLDB Journal.


  • Learning to Estimate Query Difficulty (Including Applications to Missing Content Detection and Distributed Information Retrieval)
    Elad Yom-Tov, Shai Fine, David Carmel, Adam Darlow (formerly HRL)
    ACM SIGIR Conference

    Paper Description:
    This paper tackles the problem of predicting the "goodness" (i.e. quality) of search results returned for a given query. This goodness or "query difficulty" prediction is computed before results are displayed to the user in order to be able to automatically activate tools for improving the quality of the results. The paper demonstrates that it is possible to predict the quality of queries with reasonably high accuracy using machine learning methods. Once the difficulty of a query is predicted, the paper demonstrates that it can be used in a variety of ways. The first, as mentioned above, is to modify the search tactics to reflect the computed "difficulty". Another use demonstrated is re-weighting results from different repositories of texts according to the predicted quality of their resources for answering a given query. This approach improved retrieval by up to 13% on the collections tested. It has also been shown that it is possible to use this "query difficulty" prediction to decide whether the quality of results is poor due to bad search strategies or simply due to the fact that the collection does not contain sufficient information. Experiments show that the technology detects over 70% of content-less queries, at the expense of misclassifying less than 10%. The paper received the best paper award at SIGIR, a Web PIC key conference.


  • Reincarnating PCs with Portable SoulPads
    Ramon Caceres, Casey Carter (former summer intern, UIUC), Chandra Narayanaswami, M. T. Raghunath (formerly Watson)
    ACM/USENIX International Conference on Mobile Systems, Applications, and Services (MobiSys)

    Paper Description:
    This paper demonstrates to the Mobile Computing community that the long-cherished vision of "walk up to any computer, personalize it, and use it as one's own" can be realized today using current devices, OS, and middleware. The work demonstrates how one may decouple the "soul" of computing (the user's applications, sessions and data) from an inter-changeable computing "body" (the monitor, keyboard, hard drive). In the approach taken here, all user's application software, data, and computing state is stored in a USB 2.0 portable hard drive. The drive can be plugged to virtually any x86 machine in use today and, as the machine boots from it, the entirety of the user's session state is restored, without any software being required or installed in the host machine, and without requiring a network connection. The approach requires only an off-the-shelf self-configuring Linux kernel that is able to boot the PC from the USB drive and a virtual machine system that runs the user's native OS (Windows or Linux). Since both the user's OS and her applications are running on the virtual machine, the user's state can be suspended and resumed on different kinds of hardware. Experiments demonstrate only slight to moderate impact on application performance and user experience. A key innovation of the paper is the use of appropriate encryption to ensure that no personal data of any kind is left on the temporarily-rented "body". The paper describes the whole structure of the system, highlights what is new, provides enough information to replicate it, and considers all the important issues including security, performance measurements, limitations of current devices and usability. The paper delicately balances the specific implementation details with the longer term architectural implications. It received the best paper award at MobiSys'05, a Mobile Computing PIC key conference.


  • The Unique Games Conjecture, Integrality Gap for Cut Problems and Embeddability of Negative Type Metrics into \ell 1
    Subhash A. Khot (Georgia Tech) Nisheeth K. Vishnoi
    IEEE Symposium on Foundations of Computer Science (FOCS)

    Paper Description:
    This paper addresses a fundamental problem in Computer Science: How to partition a graph into two "roughly" equal parts so as to minimize the number of edges going across the partition. This problem arises naturally in a variety of areas such as VLSI layout, clustering and parallel computation and is referred to as Balanced Separator. The classic result of Leighton and Rao ('88) provided a O(log n) factor approximation for this (and the closely related problem of Sparsest Cut). Their approach was based on a Linear Programming relaxation of the problem. Soon, a Semidefinite Programming (SDP) based approach was suggested and it was conjectured that it would lead to a constant factor approximation algorithm. At the core, this conjecture (made by several experts in the area) was about "low distortion" embeddability of one metric space into another. The lower the distortion, the better the algorithm. About 10 years after this metric embeddability conjecture appeared explicitly in the literature, there was a breakthrough result due to Arora, Rao and Vazirani ('04) who showed that this SDP based approach infact gives a O(\sqrt{log n}) factor algorithm. The main result of this paper is a disproof of this metric embedding conjecture. The key implication of this result is that this SDP based approach cannot lead to an approximation algorithm for the Sparsest Cut and Balanced Separator problems which achieves a factor better than loglog n! was a huge surprise to the community. The techniques used in the disproof are varied such as PCPs, Unique Games and Fourier Analysis and are powerful enough to establish the limits of these SDP based approaches for other cut problems such as Maximum Cut and Minimum UnCut. The paper got the FOCS best paper award for 2005, an Algorithms & Theory PIC key conference, and it was invited to be published in Journal of the ACM.


  • The Uniform-Phase Uniform-Amplitude Resonant-Load Global Clock Distribution
    Steven Chan, Kenneth Shepard (Columbia University), Phillip J. Restle
    IEEE Journal of Solid-State Circuits
    Paper Description:
    Clock distribution is a crucial element of the high-frequency processor design. As the frequency approaches multi-gigahertz range, the clock power accounts for a significant (15-20%) chip power and the clock uncertainty becomes a significant portion (10-15%) of the cycle time. The clock uncertainty directly translates into unusable time in a synchronous, pipelined design, thus needs to be minimized. Power is also becoming a constraint for modern microprocessors that everybody is struggling with due to the severe leakage problems in advanced CMOS technologies. This paper presents a resonant clock distribution by including on-chip spiral inductors at the global level. In this scheme, the energy of the fundamental frequency resonates between electric and magnetic forms, with reduced admittance of the clock network allowing for significantly lower gain requirement in the buffering network. Clock jitter reduction can also be achieved with the scheme due to less gain stages needed to drive the load. The authors showed experimental results demonstrating a 20% clock power reduction as well as an order-of-magnitude jitter reduction. This paper is ground breaking, and may change the clock network design in IBM's future processors. There is an on-going experiment to adopt the scheme in STI (Cell) processor designed for 11s.


 

Best paper award archive