Unit-I: Introduction and Data Warehousing
Unit-I: Introduction and Data Warehousing
Contents
Cont..
Cont..
Data warehouses and OLAP tools are based on a multidimensional data model.
This model views data in the form of a data cube
What is a data cube? A data cube allows data to be modeled and viewed in
multiple dimensions. It is defined by dimensions and facts.
In general terms, dimensions are the perspectives or entities with respect to which
an organization wants to keep records. For example, All Electronics may create a
sales data warehouse in order to keep records of the stores sales with respect to the
dimensions time, item, branch, and location. These dimensions allow the store to
keep track of things like monthly sales of items and the branches and locations at
which the items were sold.
Each dimension may have a table associated with it, called a dimension table,
which further describes the dimension. For example, a dimension table for item
may contain the attributes item name, brand, and type. Dimension tables can be
specified by users or experts, or automatically generated and adjusted based on data
distributions.
Cont..
Cont..
Implementation
Four different views regarding the design of a data warehouse must be considered: the
top-down view, the data source view, the data warehouse view, and the business query
view.
The top-down view allows the selection of the relevant information necessary for the
data warehouse. This information matches the current and future business needs.
The data source view exposes the information being captured, stored, and managed by
operational systems. This information may be documented at various levels of detail and
accuracy, from individual data source tables to integrated data source tables. Data
sources are often modeled by traditional data modeling techniques, such as the entityrelationship model or CASE (computer-aided software engineering) tools.
The data warehouse view includes fact tables and dimension tables. It represents the
information that is stored inside the data warehouse, including precalculated totals and
counts, as well as information regarding the source, date, and time of origin, added to
provide historical context.
Finally, the business query view is the perspective of data in the data warehouse from
the viewpoint of the end user.
Further Development
The model considers variations and patterns in the measure value across all of the
dimensions to which a cell belongs. For example, if the analysis of item-sales data
reveals an increase in sales in December in comparison to all other months, this
may seem like an exception in the time dimension.
Data cubes facilitate the answering of data mining queries as they allow the
computation of aggregate data at multiple levels of granularity.
Many complex data mining queries can be answered by Multifeature cubes without
any significant increase in computational cost, in comparison to cube computation
for simple queries with standard data cubes.
Many data cube applications need to analyze the changes of complex measures in
multidimensional space.
For example, in real estate, we may want to ask what are the changes of the
average house price in the Vancouver area in the year 2004 compared against
2003, and the answer could be the average price for those sold to professionals in
the West End went down by 20%, while those sold to business people in Metro
town went up by 10%, etc.
Cont..
Data mining is not confined to the analysis of data stored in data warehouses.
It may analyze data existing at more detailed granularities than the summarized data
provided in a data warehouse.
It may also analyze transactional, spatial, textual, and multimedia data that are difficult
to model with current multidimensional database technology.
In this context, data mining covers a broader spectrum than OLAP with respect to data
mining functionality and the complexity of the data handled.
Because data mining involves more automated and deeper analysis than OLAP, data
mining is expected to have broader applications.
Data mining can help business managers find and reach more suitable customers, as well
as gain critical business insights that may help drive market share and raise profits.
In addition, data mining can help managers understand customer group characteristics
and develop optimal pricing strategies accordingly, correct item bundling based not on
intuition but on actual item groups derived from customer purchase patterns, reduce
promotional spending, and at the same time increase the overall net effectiveness of
promotions