0% found this document useful (0 votes)
8 views14 pages

MDC Tables

Multi-Dimensional Clustering (MDC) is a method for partitioning data in DB2, allowing flexible clustering along multiple dimensions, primarily for data warehousing and large database systems. It utilizes concepts such as blocks, indexes, and dimensions to organize data efficiently, improving query performance and reducing logging and maintenance. However, MDC tables may require more disk space and careful design of clustering keys to avoid performance issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

MDC Tables

Multi-Dimensional Clustering (MDC) is a method for partitioning data in DB2, allowing flexible clustering along multiple dimensions, primarily for data warehousing and large database systems. It utilizes concepts such as blocks, indexes, and dimensions to organize data efficiently, improving query performance and reducing logging and maintenance. However, MDC tables may require more disk space and careful design of clustering keys to avoid performance issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Multi-Dimensional Clustering

A High-Level Overview
Zoran Kulina
DB2 CE Kernel Development

© 2009 IBM
Corporation
Multi-Dimensional Clustering

MDC Purpose

 One of the three methods for partitioning data in DB2


(others being range and database partitioning).

 Allows flexible, continuous and automatic clustering of data


along multiple dimensions.

 Primarily intended for data warehousing and large database


systems; can also be used in OLTP environments.

 Enables a table to be physically clustered on more than one


key (or dimension) simultaneously.

2 © 2009 IBM Corporation


Multi-Dimensional Clustering

MDC Concepts
 Block
– MDC version of extent
– Consecutive set of pages on the disk
– The smallest allocation unit of an MDC table

 Block index
– Automatically created
– Point to blocks of data rather than individual rows
– Cannot enforce uniqueness
– Cannot be dropped

3 © 2009 IBM Corporation


Multi-Dimensional Clustering

MDC Concepts
 Dimension block index
– One per dimension
– Used to access dimension data

 Composite block index


– One per table or partition
– Contains all dimension columns
– Used to maintain clustering of data during insert or update

4 © 2009 IBM Corporation


Multi-Dimensional Clustering

MDC Concepts
 Block map
– Maintains usage status information for blocks (extents)
– Facilitates quick lookup of empty blocks in MDC tables

Extents in the table

0 1 2 3 4 5 6 7 ... 0
X F U U U F U F ... 1
2 East, 1996

X 3 North, 1996
Reserved
North, 1997 year
F
Free - no bits set
4
5
U In use - data assigned to a cell
6 South, 1999
..
.
Reserved Data stored
5 © 2009 IBM Corporation
Multi-Dimensional Clustering

MDC Concepts
 Dimension
– Ordered set of one or more columns (clustering keys) of the table
– Axis along which data is organized in an MDC table
– Example: dimensions for nation, color, and year

1997, 1998,
Canada, Canada,
1997, 1997, yellow
blue
nation Canada, Canada,
dimension yellow yellow
1997, 1998,
Mexico, Mexico,
blue 1997, yellow
1997,
Mexico,
Mexico,
yellow
yellow
colour year
dimension dimension

6 © 2009 IBM Corporation


Multi-Dimensional Clustering

MDC Concepts
 Slice
– Portion of the table that contains all the rows that have a specific
dimension value (e.g. nation = ‘Canada’)

1997, 1998, Canada slice


Canada, Canada,
1997, 1997, yellow
nation blue
Canada, Canada,
dimension yellow yellow
1997, 1998,
Mexico, Mexico,
blue 1997, yellow
1997,
Mexico,
Mexico,
yellow
yellow
colour year
dimension dimension

7 © 2009 IBM Corporation


Multi-Dimensional Clustering

MDC Concepts
 Cell
– Portion of the table that contains rows having the same unique set of
dimension values
– Intersection of slices from each dimension (e.g. all records where
year=2002, country='Canada', and color='yellow‘)

1997, 1998,
Canada, Canada,
Mexico,
blue 1997, 1997, yellow Cell for
nation Canada, Canada, (1997, Canada, yellow)
dimension yellow yellow

1997, 1998,
1998,
Mexico, Mexico,
Canada, Each cell contains one
blue yellow
yellow
1997, 1997, or more blocks.
Mexico, Mexico,
yellow yellow

colour year
dimension dimension
8 © 2009 IBM Corporation
Multi-Dimensional Clustering

MDC Syntax
 ORGANIZE BY clause in CREATE TABLE

CREATE TABLE mdctable (


Year INT,
Nation CHAR(25),
Colour VARCHAR(10),
... )
ORGANIZE BY (Year, Nation, Colour)

 This MDC table will have four block indexes:


– Three dimension block indexes: Year, Nation and Colour
– One composite block index: (Year, Nation, Colour)

9 © 2009 IBM Corporation


Multi-Dimensional Clustering

MDC Syntax
 DB2_MDC_ROLLOUT registry variable
– 1, TRUE, ON, YES, IMMEDIATE (default)
– 0, FALSE, OFF, NO
– DEFER

 Delete statement special register


– SET CURRENT ROLLOUT MODE IMMEDIATE CLEANUP
– SET CURRENT ROLLOUT MODE NONE
– SET CURRENT ROLLOUT MODE DEFERRED CLEANUP

10 © 2009 IBM Corporation


Multi-Dimensional Clustering

MDC Benefits
 Improved query performance
– Block indexes are much smaller than row-level indexes
– Data is guaranteed to be clustered
– Prefetching is more efficient with MDC tables

 Reduced logging
– Inserts are not logged unless a new block is needed
– Mass deletes (rollouts) of entire cells log less data than regular
deletes

11 © 2009 IBM Corporation


Multi-Dimensional Clustering

MDC Benefits
 Reduced table maintenance
– Clustering maintained automatically
– No need for reorg unless to reclaim space

 Reduced application dependence on clustering indexes


– No need to reference columns in particular order for optimum usage

12 © 2009 IBM Corporation


Multi-Dimensional Clustering

MDC Usage Considerations


 Performance
– Best suited for data warehouses where queries are complex and
long-running
– Good for OLTP environments, but some update operations on MDC
tables may take longer than on regular tables

 Disk space
– MDC tables takes more space than equivalent regular tables

 Table design
– Poor selection of clustering key may lead to wasted disk space and
no performance gain

13 © 2009 IBM Corporation


Multi-Dimensional Clustering

References
 DB2 V9.7 Documentation
– http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.ad

 Database Partitioning, Table Partitioning and MDC for


DB2 9
– http://www.redbooks.ibm.com/abstracts/SG247467.html

14 © 2009 IBM Corporation

You might also like