An Introduction To OLAP in SQL Server 2005
An Introduction To OLAP in SQL Server 2005
This article discusses the major OLAP components of Analysis Services, all of which can be implemented by even a first-time cube builder. A follow
up article by Mark Frawley will examine the differences between Analysis Services in SQL 2000 and SQL 2005.
Measures
Measures are the key performance indicators that you want to evaluate. To determine which of the numbers in the data might be measures, a rule of
thumb is: If a number makes sense when it is aggregated, then it is a measure. For example, it makes sense to aggregate daily volume to month,
quarter and year. On the other hand, aggregating zip codes or telephone numbers would not make sense; therefore, zip codes and telephone
numbers are not measures. Typical measures include volume, sales, and cost.
Dimensions
Dimensions are the categories of data analysis. The rule of thumb is: When a report is requested "by" something, that something is usually a
dimension. For example, in a revenue report by month by sales region, the two dimensions needed are time and sales region. For this reason, OLAP
analysts often nickname dimensions the "bys." Typical dimensions include product, time, and region.
Dimensions are arranged in hierarchical levels, with unique positions within each level. For example, a time dimension may have four levels, such as
Year, Quarter, Month, and Day. Or the dimension might have only three levels, for example, Year, Week, and Day. The values within the levels are
called members. For example, the years 2002 and 2003 are members of the level Year in the Time dimension.
We believe as a best practice, a cube should have no more than twelve dimensions. A cube
with more than twelve dimensions becomes difficult to understand and browse. Too many
dimensions can cause confusion among end users and having too many dimensions and
aggregations can also lead to "data explosion." As the number of dimensions and levels
increase, the amount of data grows exponentially. As mentioned earlier, an OLAP
application is typically used to manipulate large volumes of data. To optimize response
time, Analysis Services usually pre-aggregate a multidimensional schema.
A dimension can be thought of as a tree structure. Many OLAP tools present it in a tree
control (see Figure 2). This familiar software control makes using dimensions easier as it
allows dimension members and their relationships to be viewed simultaneously. This
simple interface makes using the dimensions extremely user-friendly and allows user to
view data of different levels simultaneously.
A fact table contains a column for each measure as well as a column for each dimension. Each dimension
column has a foreign-key relationship to the related dimension table, and the dimension columns taken
together are the key to the fact table.
After determining the measures, dimensions, and schema using the BI Workbench, there is one more step—
you must decide where the data aggregation is to be stored. Historically, there were three basic storage
options: Multidimensional OLAP (MOLAP), Relational OLAP (ROLAP), or Hybrid OLAP (HOLAP). SQL Server
2005's introduction of what Microsoft calls the Unified Dimensional Model, which leverages the best of
relational and OLAP cube technologies, allows the designer many more storage options and unlike SQL Server
2000, allows combining them in the same solution.
Figure 3. Simple Star Schema: The
DTS figure shows a basic star
Microsoft's Data Transformation Services (DTS) is perhaps the most critical tool in an OLAP project. DTS is schema with the dimension
used to pull data from various sources into the star schema. The data warehouse will, in turn, feed the Analysis tables arranged around a central
Services database. More often that not, you must transform data from the source (for example, you may have fact table that contains the
measures.
to convert currency values, balance calculations, and the like) and remap it. Microsoft has estimated that in
most cases, organizations spend eighty percent of their data warehousing on the extract, transform, and load (ETL) phase.
Visual Studio 2005 hosts a new tool, BI Workbench, which is a replacement for DTS Designer. Chief among the improvements found in BI
Workbench is its separation of control flow (insertions, looping, sequencing, scripting, etc.) from the data flow (source identification, aggregation,
character mapping, and data conversion) tasks (see Figure 4 and Figure 5). This separation makes DTS packages easier to read, develop, and
maintain. BI Workbench is reason enough to learn and use Visual Studio 2005.
Because DTS has been completely reworked in SQL Server 2005, current SQL 2000 DTS user will need to brush up on DTS—and learn a few new
tricks.
Analysis Services has built-in wizards that make the actual process of creating dimensions fairly easy, especially if you're already familiar with SQL
2000's version, although SQL Server 2005's version does add one additional step—you must create a Data Source View to import your database
objects.
MDX
Just as you use SQL to query relational databases, you use MDX to query a multidimensional cube (see Figure
6). For those who are eager to interrogate the cube without learning MDX, there is an Excel Pivot Table add-in
that provides a drag and drop query interface. This interface generates MDX and queries the cube on behalf of
the user and as a special bonus the results are displayed in Excel!
You use MDX used to create "calculated measures" that would be too complex or impossible to do in SQL. For
example, suppose the VP of Sales wants to know what the average sales price of each product is.
Unfortunately, average sales price is not a measure in the Sales cube; however, Store Sales and Sales Count
are available. Because you can calculate Average Sales Price by dividing Store Sales by Sales Count, you can
calculate the measure (ergo the name "calculated measure") by using MDX. Here's the MDX code. Figure 6. SQL vs. MDX: The figure
compares data extraction using
SQL vs. MDX.
WITH
MEMBER Measures.[Average Sale Price] AS
'Measures.[Store Sales] /
Measures.[Sales Count]'
SELECT
{ Measures.[Average Sale Price] } ON COLUMNS,
{ Product.CHILDREN } ON ROWS
FROM Sales
Luckily, some third party tools let users create calculated measures that may have been intentionally omitted from the original cube design, such as
commission or bonus calculations.
Cube Browser
After creating the cube, you need a cube browser to connect to the cube and display the data. Cube browsers usually provide user-friendly tree-
structured dimension filters and/or drag and drop interfaces that allow end users to interrogate the cube. You can set up pre-defined queries, or allow
ad hoc querying by letting users combine the various measures with dimensions.
For example, suppose you want to create a report that shows Revenue by Sales Territory by Product. Because dimensions are hierarchical, you can
obtain the details of a dimension by drilling down. This usually involves clicking on the dimension (for example, clicking on Sales Territory may reveal
each store's level in that dimension).
Dimensions can have multiple levels (such as year, quarter, and month). Users can mix and match members within the same dimension.
Furthermore, some cube browsers enable developers to export cube browsers as a Web part that they can then easily include in a portal site or digital
dashboard.
Some OLAP developers find debugging cube design and validating data using pivot tables much easier than performing the same tasks using the
native Analysis Services screen.
Hopefully, this primer has whetted your OLAP appetite and given you the confidence to start creating OLAP cubes yourself. A good way to get started
is to use the sample Foodmart or Adventure Works databases that ship with SQL Server 2005.
Gail Tieh is a Project Leader with Citigate Hudson's Pervasive Business Intelligence team. Gail's expertise includes process improvement through
the use of technology. Gail holds an MBA in Information Systems and BA in Economics from Baruch College of City University of New York. She
currently serves on the Board of New York Software Industry Association (NYSIA) and is also the Special Interest Group Leader of NYSIA's
Database Professionals Council.