Providing an elegant method for flexible, continuous, and automatic clustering of data along multiple dimensions with applications in business intelligence and transaction processing environments.
Multi Dimensional Clustering (MDC) is a new data layout scheme incorporated in DB2 Universal Database (UDB) Version 8, which allows a relational table to be clustered on one or more orthogonal clustering attributes (or expressions) of a table. Many applications, e.g., Online Analytic Processing (OLAP) and data warehousing, process a table or tables in a database using a multi-dimensional access paradigm. Currently, most database systems can only support organization of a table using a primary clustering index on one dimension. Secondary indexes are created to access the tables when the primary key index is not applicable. Unfortunately, secondary indexes perform many random I/O access against the table for a simple operation such as a range query.
A three-dimensional MDC layout on nation, itemid and year (orderDate)
Our work in MDC addresses this important deficiency in database systems. It allows the robustness and scalability provided by relational database systems, but is able to provide efficiency for multidimensional data access. The above figure indicates a simple MDC configuration on three dimensions based on the following table columns: nation, itemId and a third generated from the year of the orderDate column. The intersection of the dimension values form logical entities called cells, which are, in turn made up of physical entities called blocks. These are accessed using block indexes. A configuration like this would allow efficient access for user queries on the dimensions.
MDC also provides for a continuous and automatic maintenance of the clustering over time, which reduces the need to reorganize the table to regain clustering -- as happens with current technology. MDC also allows the ability to co-exist with existing database features such as row based indexes, table constraints, materialized views, high speed load and mass delete.
These features make MDC useful for a business intelligence environment as well as a transaction processing environment. Many major DB2 clients are realizing significant query performance gains by using MDC in their production systems. As reported in "MDC Performance Customer Examples & Experience" at the DB2 Information Management Technical Conference 2004 in Madrid, Spain by Leslie Cranston, IBM clients like the Canadian Astronomy Data Center (CADC), BrasilTelecom, BankOne and Thomas West have reported significant performance gains in their workloads and benchmarks using MDC. In particular, the talk mentions Thomas West as reporting "The new multidimensional data clustering capability has improved performance of our most complex queries by up to 30 times while removing the need for additional reorganization". MDC has been used in systems as large as 32 terabytes and in major industrial benchmarks like the DB2 10TB TPCH benchmark.
Members of the MDC team have been awarded the IBM Outstanding Innovation Award for their contributions to DB2 V8 for MDC.
Related Publications
B. Bhattacharjee, S. Padmanabhan, T. Malkemus, T. Lai, L. Cranston and M. Huras. Efficient Query Processing for Multi-Dimensionally Clustered Tables in DB2. Proc. Intl. Conf. Very Large Data Bases (VLDB). 2003.
B. Bhattacharjee, L. Cranston, T. Malkemus and S. Padmanabhan. The Luster of Data Clustering. eServer Magazine, April 2003.
B. Bhattacharjee, L. Cranston, T. Malkemus and S. Padmanabhan. Boosting Query Performance: Multidimensional Clustering. DB2 Magazine, Quarter 2, 2003.
S. Padmanabhan, B. Bhattacharjee, T. Malkemus, L. Cranston and M. Huras. Multi- Dimensional Clustering: A New Data Layout Scheme in DB2. Proceedings of the ACM SIGMOD Conference. 2003.
News and Information:
M. Gonzales, G. Robinson, The OLAP-Aware Database, DB2 Magazine, Quarter 2, 2003.
Meet The Experts: Matt Huras and Version 8 Enhancements For DB2 on Linux, Unix and Windows, DB2 Developers Domain, 23 July 2002.
Multidimensional Clustering, DB2 Information Center, 2003.
P Gunning, Database Technology Leaps Ahead, DB2 Magazine, Quarter 4, 2002.
R. Welgan, Comparing Query Performance, MDC vs. Non-MDC Tables, DB2 Magazine, Quarter 2, 2003.
S. Gausden, T. Mason, Getting Started With MDC Space Management, The IDUG Solutions Journal, October 2002.
Rate this article




