IBM Research has produced major contributions to the area of data management for more than three decades. This includes E. F. Codd's seminal work on relational algebra; the System R relational database management system prototype (which led to IBM's DB2®); ARIES transaction recovery and logging; Starburst extensible database technology, and DB2 parallel database technology. Today, we continue to explore new data management technology in the areas of data warehousing, object-relational features, digital libraries, multimedia content management, federated databases, and integration of structured, semi-structured, and unstructured data, as well as the emerging areas of e-commerce, Internet, and mobile applications.
Some current projects include:
DBCache
The goal of this project is to incorporate a cache feature into DB2 by modifying engine code and leveraging existing federated database functionality. As a result, we are able to leverage DB2’s sophisticated distributed query processing power for database caching. This allows queries to be executed at the local database cache or the remote backend server. More importantly, the query can be partitioned and then distributed to both databases for cost optimum execution.
Hippocratic Databases
Privacy is the right of individuals to determine when personal information can be collected and how it should be used based on individual consent. To address privacy issues, we are exploring a database architecture supporting automatic enforcement of privacy policies.
SMART: Self Managing and Resource Tuning Databases
SMART is part of IBM's autonomic computing initiative to help reduce complexity and improve quality of service through the advancement of self-managing capabilities within a database environment. Recently IBM integrated new tools within DB2 UDB to automate database performance tuning and recovery time: DB2 Recovery Expert for multi-platforms, provides simplified, comprehensive, automated recovery features with extensive diagnostic and self-managing capabilities to minimize database outages. DB2 Performance Expert consolidates, reports, analyzes and recommends SMART changes on DB2 performance related information.
XML Databases
The goal of this project is the development of Structured Query Language (SQL) extensions for XML, including defining an XML data type, XML value functions, and bi-directional mappings of SQL and XML. XML value functions use XPath and/or XQuery to navigate and manipulate XML data. Our work in this field is comprised of exploring rewrite, indexing, and optimization algorithms and inventing indexing techniques to dramatically improve the response time of XQuery and SQL/XML queries.
Xperanto
The Xperanto project leverages XML (Extensible Markup Language), XQuery, text search capabilities and Web services technology to enable users to search XML documents, flat files, spreadsheets and other sources of information housed in a single database.
Active Technologies
Our cutting-edge technologies enable the fast development of active functionality, which allows applications to detect customized situations without having to be aware of the occurrence of basic events. The technologies are useful for e-business applications, operations management, customer relationship management, command and control applications, and monitoring systems.
MultiDimensional Clustering
The goal of this project is to provide an elegant relational database solution for clustering tables using multiple dimensions. This is very important in the Data Warehousing and Data Mart applications since most of the analysis tends to be dimensional. For example, a Retail data warehouse is organized along the key dimensions of Time, Products, Geography (Regions), Suppliers, and Customers. Analysis is performed using some or all of these dimensions. A number of OLAP and Multidimensional products have been implemented in recent years targeted at this growing market. They claim that relational databases are not naturally suited for such work. With this project, we are aiming to enhance the relational database technology so that it will perform efficiently and without requiring complex indexes for multidimensional analysis.
Related Publications
A. Adi and O. Etzion. Amit - the situation manager. VLDB Journal, 2004.
C. C. Aggarwal, J. Han, J. Wang and P. S. Yu. A Framework for Clustering Evolving Data Streams. Proc. Intl. Conf. Very Large Data Bases (VLDB). 2003.
R. Agrawal, A. Evfimievski and R. Srikant. Information Sharing Across Private Databases. Proc. ACM Int’l Conf. On Management of Data (SIGMOD), San Diego, California. June 2003.
B. Bhattacharjee, S. Padmanabhan, T. Malkemus, T. Lai, L. Cranston and M. Huras. Efficient Query Processing for Multi-Dimensionally Clustered Tables in DB2. Proc. Intl. Conf. Very Large Data Bases (VLDB). 2003.
Ronald Fagin, Phokion G. Kolaitis and Lucian Popa. Data Exchange: Getting to the Core. PODS 2003 - ACM SIGMOD/PODS. February 2003.
L. Lim, M. Wang and J. S. Vitter. SASH: A Self-Adaptive Histogram Set for Dynamically Changing Workloads. Proc. Intl. Conf. Very Large Data Bases (VLDB), 2003.
M. Stillger, G. M. Lohman, V. Markl and M. Kandil. LEO - DB2's LEarning Optimizer. Proc. Intl. Conf. Very Large Data Bases (VLDB). 2001.
Recent Accomplishments:
Berni Schiefer, co-Chair, VLDB-2004, Industrial Applications)
Charu Aggarwal , Workshop Chair, ACM Knowledge Discovery and Data Mining Conference, 2003
Guy Lohman, Industrial Committee, ACM SIGMOD/PODS 2004
John R. Smith, Advisory Committee of NIST TREC Video Retrieval Evaluation
Philip S.Yu, IEEE Outstanding Contributions Award, 2003
Rate this article
