
Research Staff Member
Research lab: Almaden Research Center
Hello! I am a researcher at IBM working on a variety of data management and query processing problems, including autonomic and grid computing, data compression, and adaptive query processing in DBMSs. I am also interested in information integration, especially the ETL, data cleansing, and transformation steps. Before joining IBM, I studied under Prof. Joseph M. Hellerstein at the University of California at Berkeley, and earned a Ph.D in Computer Science. I also have a B. Tech from the Indian Institute of Technology (IIT), Madras.
I have received two IBM Research Division awards (for LEO
and for Shark Offload), a Microsoft Fellowship, and an AT&T Asia-Pacific Leadership Award.
CURRENT WORK AND RECENT PUBLICATIONS
Data Compression: On modern processors, data movement costs (cache-memory-disk) often dwarf computation costs. To address this imbalance, we are investigating extreme compression of data bases, and techniques for efficiently operating over compressed databases, such as how to reduce dictionary sizes , do aggregations and updates over compressed data, etc.<br>
Proof that partitioning a multiset is entropy neutral<br>
| |
How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations (pdf, slides) (V. Raman, G. Swart). Intl. Conf. on Very Large Data Bases (VLDB), 2006. -- describes a method that compresses relations down to their entropy, gaining about an order of magnitude compression, while being able to apply equality and range predicates directly on compressed data without decoding. |
| |
How to Barter Bits for Chronons: Compression and Bandwidth Trade Offs for Database Scans (pdf) (A. Holloway, V. Raman, G. Swart, D. DeWitt). ACM SIGMOD Intl. Conf. on Management of Data, 2007. -- investigates the tradeoff between efficiency of compression and efficiency of query processing over compressed data. |
Adaptive Query Processing and Optimization: I work on several projects in this area, that all have an overall goal of making DBMSs self-managing and autonomic. Currently I am developing a method to do adaptive star joins, that performs predictably even when database statistics are inaccurate. In another project called Progressive Optimization (POP), I designed and prototyped in DB2 a technique for dynamically changing query plans during query execution, in order to make DBMSs more robust to erroneous cardinality estimates. This gives upto two orders of magnitude speedup in query execution. Here are a couple of papers on POP from SIGMOD 2004 and VLDB 2004 conferences. LEO is a project that automatically warehouses each estimation error made by a DBMS, and mines this warehouse to prevent recurrence of similar errors in future. The first phase of this project, a statistics collection tool and mining algorithm, has been released as part of IBM DB2
v8.2 (Stinger) and is profiled in this
press release.
| |
Lazy, Adaptive RID-List Intersection, and its application to
Index Anding (pdf) (V. Raman, L. Qiao, W. Han, I. Narang, Y. Chen, A. Yang, and F. Lin).
ACM SIGMOD Intl. Conf. on Management of Data, 2007.
|
Scalable Database Architectures is a project investigating new architectures for data processing systems, using either massive parallelism or embedded query functionality in storage devices. Here is a paper on parallel query processing on a grid.
Tesla is a project to build an On-Demand Information System, that virtualizes independent, low-quality data sources and compute resources into a robust, unified information system. The two key characteristics of Tesla are transparent masking of failures and brittleness of individual system components, and provision of predictable QoS to application programmers even when system characteristics or application loads vary. The Metawrapper is a module within Tesla that dynamically binds application queries written against a logical domain of data sources to a specific set of data sources and/or replicas, at query execution time.
PRIOR WORK
My Ph.D. dissertation was on interactive and adaptive query processing. I developed several new interactive and adaptive query processing algorithms that are now used in the Telegraph dataflow system. These algorithms have been taught in courses at Berkeley, Maryland, MIT, Stanford, and UIUC. I also developed the Online Reordering technique for incorporating user feedback in database query processors. I protoyped online reordering in the Informix Dynamic Server DBMS and the results were pretty good! -- this paper was selected as a best paper at the 25th International Conference on Very Large Data Bases. A summary paper on this work also appears in this graduate database textbook.
Another area I have worked on is data cleansing and transformation. I developed and released the open-source data cleansing tool Potter's Wheel A-B-C.Read more
Attached files:
theorem.pdf
Last updated 12 Jul 2007