0% found this document useful (0 votes)

28 views99 pages

Unit 1

Uploaded by

Bharathi S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views99 pages

Unit 1

Uploaded by

Bharathi S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 99

Data Warehouse and Data Mining

Course Code: CS811

Credits : 3
Lecture Hours (L: T: P): 39:0:0
Type of Course: Professional Elective Course
Course Outcomes
CO-1 : Understand the various architectures and main components of a data
warehouse.
CO-2 : Comprehend the data mining tasks, the KDD process, domain
information, the issues and challenges.
CO-3 : Apply pre-processing statistical methods and process raw data to make
it suitable for a range of data mining algorithms
CO-4 : Discover and measure interesting patterns from large business datasets.
CO-5 : Apply various clustering and classification algorithms to real world data
UNIT 1:

• Introduction to Data Warehousing: Data Warehouse: Basic Concepts,

Data warehouse Modeling: Data cube and OLAP: Data Cube, Stars,
Snowflakes, and Fact Constellations, Typical OLAP Operations, Data
Warehouse Design and Usage.
Scenario 1
• XYZ Pvt. Ltd is a company with branches at Mumbai, Delhi, Chennai and
Bangalore.
• Each branch has a separate operational system.
• The Sales Manager wants quarterly sales report.
• Extract sales information from each database.
• Store the information in a common repository at a single site
Scenario 2
• One Stop Shopping Super Market has huge operational database.
• Whenever Executives wants some report the OLTP system becomes
slow and data entry operators have to wait for some time
Solution 2
• Extract data needed for analysis from operational database.
• Store it in another system, the data warehouse.
• Refresh warehouse at regular intervals so that it contains up to date
information for analysis.
• Warehouse will contain data with historical perspective.
• Why do you need a warehouse?
– Operational systems could not provide strategic information
– Executive and managers need such strategic information for

• Making proper decision

• Formulating business strategies
• Establishing goals
• Setting objectives
• Monitoring results
How is a Data warehouse different from a Database? How are they similar?
Differences between a data warehouse and a database:
A data warehouse is a repository of information collected from
multiple sources, over a history of time, stored under a unified schema, and
used for data analysis and decision support;
whereas a database, is a collection of interrelated data that
represents the current status of the stored data. There could be multiple
heterogeneous databases where the schema of one database may not agree
with the schema of another. A database system supports ad-hoc query and
on-line transaction processing.
Similarities between a data warehouse and a database:
Both are repositories of information, storing huge amounts of persistent data.
What Is a Data Warehouse?
• Data warehousing provides architectures and tools for business
executives to systematically organize, understand, and use their data to
make strategic decisions.
The term "Data Warehouse" was first coined by Bill Inmon in 1990.
Characteristics of Data Warehousing
The major features of data warehouse are
 subject-oriented,
 integrated,
 time-variant,
 nonvolatile
• Subject-oriented
– Data warehouse is organized around subjects instead of application such as sales, product,
customer.
– Data organized by subject focuses only on the information necessary for decision making
– Excludes data not useful in decision support process
• Integration
– A data warehouse is usually constructed by integrating multiple
heterogeneous sources, such as relational databases, flat files, and
online transaction records.
– Data Preprocessing are applied to ensure consistency.
Data cleaning and data integration techniques are applied to ensure
consistency in naming conventions
• Time-variant
– Provides information from historical perspective, e.g. past 5- 10 years
– Every key structure contains either implicitly or explicitly an element of
time, i.e., every record has a timestamp.
– The time-variant nature in a DW
• Allows for analysis of the past
• Relates information to the present
• Enables forecasts for the future
• Non-volatile
– A data warehouse is always a physically separate store of data transformed
from the operational environment.
– Due to this it does not require transaction processing, recovery, and
concurrency control mechanisms.
– It requires only two operations in data accessing:
• Initial loading of data
• Access of data.
 In sum, a data warehouse is a semantically consistent data store that
serves as a physical implementation of a decision support data model.
 It stores the information an enterprise needs to make strategic decisions
Operational Systems
• The major task of operational database systems is to perform on-line
transaction and query processing.
• These systems are called on-line transaction processing (OLTP) systems.
• They cover most of the day-to-day operations of an organization, such
as purchasing, inventory, manufacturing, banking, payroll, registration,
and accounting.
Informational systems
• On the other hand, there are other functions that go on within the
enterprise that have to do with planning, forecasting and managing the
organization
• These functions are also critical to the survival of the organization,
• Functions like “marketing planning”, “engineering planning” and
“financial analysis” also require information systems to support them.
• But these functions are different from operational ones, and the types
of systems and information required are also different.
• “Informational systems” have to do with analysing data and making decisions
• Such systems can organize and present data in various formats in order to
accommodate the diverse needs of the different users.
• These systems are known as on-line analytical processing (OLAP) systems.
• The major distinguishing features between OLTP and OLAP are summarized as
follows:
– Users and system orientation
– Data contents
– Database design
– View
– Access pattern
• Users and system orientation
– OLTP : customer-oriented and is used for transaction and query
processing by clerks, clients, and information technology professionals.
– OLAP : market-oriented and is used for data analysis by knowledge
workers, including managers, executives, and analysts
• Data contents
– OLTP, manages current data that, typically, are too detailed to be used
for decision making.
– OLAP, manages large amounts of historical data, provides facilities for
summarization and aggregation
• Database design
– OLTP, adopts an entity-relationship (ER) data model and an application-
oriented database design.
– OLAP, adopts either a star or snowflake model and a subject oriented
database design.
• Star schema : A fact table in the middle connected to a set of dimension
tables
• Snowflake schema : A refinement of star schema where some
dimensional hierarchy is normalized into a set of smaller dimension
tables forming a shape similar to snowflake
Star Schema
Snowflake Schema
• View
– OLTP focuses mainly on the current data
– OLAP system often spans multiple versions of a database schema,
– OLAP systems also deal with information that originates and integrated
from many data stores.
– Because of their huge volume, OLAP data are stored on multiple
storage media.
• Access patterns
– The access patterns of an OLTP system consist mainly of short, atomic
transactions.
– Such a system requires concurrency control and recovery mechanisms.
– However, accesses to OLAP systems are mostly read-only operations
• Other features that distinguish between OLTP and OLAP systems are
summarized in the following Table
Why Have a Separate Data Warehouse?

• Because operational databases store huge amounts of data

“Why not perform online analytical processing directly on such

databases instead of spending additional time and resources to
construct a separate data warehouse?”
• High performance of both systems
– DBMS : tuned from known tasks like, searching for particular records, and
optimizing “canned” queries
– Warehouse : tuned for OLAP; complex OLAP queries, multidimensional view,
consolidation
• Different functions and different data
– Missing Data : Decision support requires historical data which operational DBs
do not typically maintain
– Data consolidation : Decision support requires consolidation (aggregation,
summarization) of data from heterogeneous system
• Processing OLAP queries in operational databases would substantially
degrade the performance for operational tasks.
Data warehouse architecture

• Data warehouse architecture is based on relational database

management system server that functions as the central repository for
informational data.
• In the data warehouse architecture, operational data and processing is
completely separate from data warehouse processing
Data warehouse architecture
• Bottom tier
– Back-end tools and utilities are used to feed data into the bottom tier from
operational databases
– These tools and utilities perform data extraction, cleaning, and transformation
as well as load and refresh functions to update the data warehouse.
- The data are extracted using application program interfaces known as
gateways.
– A gateway is supported by the underlying DBMS and allows client programs to
generate SQL code to be executed at a server (Ex ODBC, JDBC)
– This tier also contains a metadata repository, which stores information about
the data warehouse and its contents.
Data Warehouse Models:
• From the architecture point of view, there are three data warehouse
models:
– the enterprise warehouse,
– the data mart, and
– the virtual warehouse.
• Enterprise warehouse
– An enterprise warehouse collects all of the information about subjects
spanning the entire organization.
– It provides corporate-wide data integration, usually from one or more
operational systems
– An enterprise data warehouse may be implemented on traditional
mainframes, computer super servers
• Data mart:
– A data mart contains a subset of corporate-wide data
– The scope is confined to specific selected subjects.
For example, a marketing data mart may confine its subjects to customer, item,
and sales.
– Data marts are usually implemented on low-cost servers that are Unix/Linux or
Windows based.
– Depending on the source of data, data marts can be

• Independent data marts are sourced from data captured from one or more
operational systems or external information providers
• Dependent data marts are sourced directly from enterprise data warehouses.
• Virtual warehouse
– A virtual warehouse is a set of views over operational databases.
– only some of the possible summary views may be materialized.
– A virtual warehouse is easy to build but requires excess capacity on
operational database servers
Data Warehouse Metadata
 Metadata are data about data. When used in a data warehouse,
metadata are the data that define warehouse objects.
 Metadata are created for the data names and definitions of the given
warehouse.
 Additional metadata are created and captured for time stamping any
extracted data, the source of the extracted data, and missing fields that
have been added by data cleaning or integration processes.
A metadata repository should contain:
 This includes the warehouse schema, view, dimensions, hierarchies, and
derived data definitions, as well as data mart locations and contents;
 Operational metadata: which include data lineage (history of migrated
data and the sequence of transformations applied to it), currency of
data (active, archived, or purged), and monitoring information
(warehouse usage statistics, error reports, and audit trails);
 the algorithms used for summarization, which include measure and
dimension definition algorithms, data on granularity, partitions, subject
areas, aggregation, summarization, and predefined queries and reports;
• The mapping from the operational environment to the data
warehouse, which includes source databases and their contents,
gateway descriptions, data partitions, data extraction, cleaning,
transformation rules and defaults, data refresh and purging rules, and
security (user authorization and access control).
• Data related to system performance, which include indices and profiles
that improve data access and retrieval performance, in addition to
rules for the timing and scheduling of refresh, update, and replication
cycles; and
• Business metadata: which include business terms and definitions, data
ownership information, and charging policies
Data Warehouse Modeling: Data Cube and Online analytical
processing (OLAP)
• OLAP systems are data warehouse front-end software tools to make aggregate
data available efficiently, for advanced analysis, to managers of an enterprise.
• Data warehouses and OLAP tools are based on a multidimensional data model.
• This model views data in the form of a data cube.
• In this section, you will learn
– how data cubes model n-dimensional data.
– concept hierarchies and
– how they can be used in basic OLAP operations to allow interactive mining at
multiple levels of abstraction. Typical OLAP Operations : roll up, drill down, slice &
dice, pivot (rotate)
Why cubes?
• It is meant to be used by application builders who wants to provide
analytical functionality.
• logical view of analyzed data
– how analysts look at data
– how they think of data,
– not how the data are physically implemented in the data stores
Data Cube: A Multidimensional Data Model
• What is a data cube?”
– It is a multidimensional structure that contains information for
analytical purposes
– the main constituents of a cube are dimensions and measures or facts
– Dimensions define the structure of the cube that you use to slice and
dice over and,
– Measures provide aggregated numerical values of interest to the end
user.
• In general terms, dimensions are the perspectives or entities with
respect to which an organization wants to keep records.
• Ex. AllElectronics may create a sales data warehouse in order to keep
records of the store’s sales with respect to the dimensions time, item,
branch, and location.
– keep tracks of things like monthly sales of items and the branches and
locations at which the items were sold
• Each dimension may have a table associated with it, called a dimension
table, which further describes the dimension
For example, a dimension table for item may contain the attributes
item_name, brand, and type
• Dimension tables can be specified by users or experts, or generated and
adjusted based on data distributions
• A multidimensional data model is typically organized around a central
theme, such as SALES
• This theme is represented by a fact table
• The fact table contains the names of the facts, or measures, as well as
keys to each of the related dimension tables.
• To gain a better understanding of data cubes and the multidimensional
data model, let’s start by looking at a simple 2-D data cube
• Consider sales data from AllElectronics
• In particular, we will look at the AllElectronics sales data for items sold
per quarter in the city of Vancouver.
• In this 2-D representation, the sales for Vancouver are shown with
respect to
– the time dimension (organized in quarters) and
– the item dimension (organized according to the types of items sold).
– The fact or measure displayed is dollars sold (in thousands)
• Representation AllElectronics sales data for items sold per quarter in
the city of Vancouver
• Now, suppose that we would like to view the sales data with a third
dimension.
– For instance, would like to view the data according to time and item, as
well as location, for the cities Chicago, New York, Toronto, and
Vancouver.
– These 3-D data are shown in Table ….
The 3-D data in the table are represented as a series of 2-D tables
Conceptually, we may also represent the same data in the form
of a 3- D data cube
Efficient Computation of Data Cubes
• At the core of multidimensional data analysis is the efficient
computation of aggregations across many sets of dimensions
• In SQL terms, these aggregations are referred to as group-by’s.
• Each group-by can be represented by a cuboid
• Where the set of group-by’s forms a lattice of cuboids defining a data
cube
• Will explore issues relating to the efficient computation of data cube
• In the data warehousing research literature, a data cube like those
shown in Figure often referred to as a cuboid.
• Given a set of dimensions, we can generate a cuboid for each of the
possible subsets of the given dimensions.
• The result would form a lattice of cuboids, each showing the data at a
different level of summarization, or group-by.
• Let us understand through a simple example ……….
Efficient Computation of Data Cubes
 For three diemension (a,b,c), the possible group-by’s are
{(a,b,c), (a,b), (a,c),
(b,c), (a), (b), (c), () }

. .
Example
 Suppose that you would like to create a data cube for AllElectronics
sales that contains the following: city, item, year, and sales in dollars.
You would like to be able to analyze the data, with queries such as the
following:
– “Compute the sum of sales, grouping by city and item.”
– “Compute the sum of sales, grouping by city.”
– “Compute the sum of sales, grouping by item.”

What is the total number of cuboids, or group-by’s, that can be computed for this
data cube?
. .
 Taking the three attributes, city, item, and year, as the dimensions for the data
cube, and sales in dollars as the measure
 the total number of cuboids, or group by’s, that can be computed for this data
cube is 23 = 8.
 The possible group-by’s are the following:

{ (city, item, year),

(city, item), (city, year), (item, year),

(city), (item), (year), () } where () means that the group-by is empty or dimensions are not
grouped
 . These group-by’s form a lattice of cuboids for the data cube .
. .
The apex cuboid, or 0-D cuboid, refers to the case where the group-by is empty.
It contains the total sum of all sales.
The apex cuboid is the most generalized is often
denoted by all

The base cuboid contains all three dimensions, city, item, and year.
It can return the total sales for any combination of the three dimensions
The base cuboid is the least generalized
. .
An SQL query containing
no group-by, such as “compute the sum of total
sales” is a zero-dimensional operation

one group-by, such as “compute the sum of sales, group

by city,” is a one-dimensional operation

one group-by, such as “compute the sum of sales, group by

city and item” is a two-dimensional operation

. .
Data Mining Query Language (DMQL)
 The DMQL was proposed by Han, Fu, Wang, et al. for the DBMiner data mining
system.

 The DMQL is actually based on the SQL

 DMQL can be designed to support ad hoc and interactive data mining.

 DMQL provides commands for specifying primitives.

 DMQL can be used to define data mining tasks

 Particularly we examine how to define Cube, Dimension and Shared Dimensions

. .
Cube definition syntax in DMQL

. .
Defining Star Schema in DMQL

. .
 A statement such a
compute cube sales_star

would explicitly instruct the system to compute the sales

aggregate cuboids for all 16 subsets of the set {time, item,
branch, location} including the empty subset

. .
Defining Snowflake Schema

. .
Defining Fact Constellation in DMQL

. .
The Role of Concept Hierarchies
 A concept hierarchy defines a sequence of mappings from a set of low-level concepts to
higher-level, more general concepts
 Concept hierarchy organizes concepts (attribute values) hierarchically and is usually
associated with each dimension in a data warehouse
 Concept hierarchy facilitate drilling and rolling in data ware houses to view data in
multiple granularity
 Hierarchies can be explicitly specified by domain experts and/or data ware house
designers
 Consider a concept hierarchy for the dimension location.

. .
The Role of Concept Hierarchies
 Hierarchical and lattice structures of attributes in warehouse
dimensions: (a) a hierarchy for location and (b) a lattice for time.

. .
Number of Cuboids
 How many cuboids are there in an n-dimensional data cube?
– If there were no hierarchies associated with each dimension, then the total
number of cuboids for an n-dimensional data cube, as we have seen is 2n
– For dimensions (Product, Region, City), 2n =23 = 8 cuboids
– However, in practice, many dimensions do have hierarchies.
– For an n-dimensional data cube, the total number of cuboids that can be
generated including hierarchies is

. .
Number of Cuboids
Dim_Product Dim_Region Dim_Time
Class Item Product Country State City Year Month Day
Class 1 Item 1 Camera India Karnataka Mysore 2016 2 3
Class 2 Item 2 DVD India Tamilnadu Salem 2015 4 2
Class 3 Item 3 LED India Kerala Kozhikode 2014 5 1
… … … … …. … …. …. …
… … … … …. … …. …. …

 The Product dimension has three hierarchies (class, item, product)

 The Region dimension has three hierarchies (country, state, city)
 The Time dimension has three hierarchies (year, month, day)
 Thus, this cube will generate

 Cuboids such as {(product, city, day), (product, city, month), (product,

city, year), ………………………………. (all)}
Number of Cuboids

 Similarly, If n=10 and each dimension has one level, then T=

(2)10 = 1024
 If n=10 and each dimension has 4 levels, then T=
(5)10 = 9765625

. .
 Construct a lattice of cuboids forming a data
cube for the dimensions
time, item, location, and supplier.

. .
. .
• Star schema:
– The most common modeling paradigm is the star schema, in which the
data warehouse contains
1. A large central table (fact table) containing the bulk of the data, with
no redundancy, and
2. A set of smaller attendant tables (dimension tables), one for each
dimension. The schema graph resembles a starburst
– The dimension tables displayed in a radial pattern around the central
fact table.
• Star schema for AllElectronics sales with four dimensions: time, item,
branch, and location.
• The schema contains a central fact table for sales that contains keys to
each of the four dimensions, along with two measures: dollars sold and
units sold
• Snowflake schema:

– The snowflake schema is a variant of the star schema model, where

some dimension tables are normalized, thereby further splitting the
data into additional tables.
– The resulting schema graph forms a shape similar to a snowflake.
• Fact constellation:
– For each star schema it is possible to construct fact constellation schema
– This kind of schema can be viewed as a collection of stars, and hence is
called a galaxy schema or a fact constellation.
Ex.: Splitting the original star schema into more star schemes each of them
describes facts on another level of dimension hierarchies.
• The main shortcoming of the fact constellation schema is a more
complicated design because many variants for aggregation must be
considered and selected. Sophisticated applications may require multiple
fact tables to share dimension tables
Data Cube Measures: Three Categories

• Distributive: if the result derived by applying the function to n aggregate values is the same as that
derived by applying the function on all the data without partitioning
• E.g., count(), sum(), min(), max()
• Algebraic: if it can be computed by an algebraic function with M arguments (where M is a bounded
integer), each of which is obtained by applying a distributive aggregate function
• E.g., avg(), min_N(), standard_deviation()
• Holistic: if there is no constant bound on the storage size needed to describe a subaggregate.
• E.g., median(), mode(), rank()

87
Typical OLAP operations
Roll-up:
 The roll-up operation (also called the drill-up operation by some vendors) performs aggregation on a
data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction.
 This hierarchy was defined as the total order “street < city < province or state < country.” The roll-up
operation shown aggregates the data by ascending the location hierarchy from the level of city to the
level of country.
 In other words, rather than grouping the data by city, the resulting cube groups the data by country.
 When roll-up is performed by dimension reduction, one or more dimensions are removed from the
given cube.
Drill-down

 Drill-down is the reverse of roll-up. It navigates from less detailed data to more detailed data.
 Drill-down can be realized by either stepping down a concept hierarchy for a dimension or introducing
additional dimensions.
 It shows the result of a drill-down operation performed on the central cube by stepping down a concept
hierarchy for time defined as “day < month < quarter < year.”
 Drill-down occurs by descending the time hierarchy from the level of quarter to the more detailed level
of month. The resulting data cube details the total sales per month rather than summarizing them by
quarter.
Slice and dice

 The slice operation performs a selection on one dimension of the given cube, resulting in a subcube.
 It shows a slice operation where the sales data are selected from the central cube for the dimension time
using the criterion time = “Q1”
 The dice operation defines a subcube by performing a selection on two or more dimensions.
Pivot (rotate)

 Pivot (also called rotate) is a visualization operation that rotates the data axes in view in order to provide
an alternative presentation of the data.
Steps for the Design and Construction of Data Warehouses

“What can business analysts gain from having a data warehouse?”

 First, having a data warehouse may provide a competitive advantage by presenting relevant information from
which to measure performance and make critical adjustments in order to help win over competitors.

 Second, a data warehouse can enhance business productivity because it is able to quickly and efficiently gather
information that accurately describes the organization.

 Third, a data warehouse facilitates customer relationship management because it provides a consistent view of
customers and items across all lines of business, all departments, and all markets.

 Finally, a data warehouse may bring about cost reduction by tracking trends, patterns, and exceptions over long
periods in a consistent and reliable manner.
 To design an effective data warehouse we need to understand and analyze business needs and
construct a business analysis framework.
 The construction of a large and complex information system can be viewed as the construction
of a large and complex building, for which the owner, architect, and builder have different
views.
 These views are combined to form a complex framework that represents the top-down,
business-driven, or owner’s perspective, as well as the bottom-up, builder-driven, or
implementer's view of the information system.
 Four different views regarding the design of a data warehouse must be considered: the top-down view,
the data source view, the data warehouse view, and the business query view.
 The top-down view allows the selection of the relevant information necessary for the data warehouse.
This information matches the current and future business needs.
 The data source view exposes the information being captured, stored, and managed by operational
systems. This information may be documented at various levels of detail and accuracy, from individual
data source tables to integrated data source tables.
 Data sources are often modeled by traditional data modeling techniques, such as the entity-
relationship model or CASE (computer-aided software engineering) tools.
 The data warehouse view includes fact tables and dimension tables. It represents the information that
is stored inside the data warehouse, including pre calculated totals and counts, as well as information
regarding the source, date, and time of origin, added to provide historical context.
 Finally, the business query view is the perspective of data in the data warehouse from the viewpoint
of the end user.
The warehouse design process consists of the following steps.

 Choose a business process to model, for example, orders, invoices, shipments, inventory, account
administration, sales, or the general ledger.
 If the business process is organizational and involves multiple complex object collections, a data
warehouse model should be followed. However, if the process is departmental and focuses on the
analysis of one kind of business process, a data mart model should be chosen.
 Choose the grain of the business process. The grain is the fundamental, atomic level of data to be
represented in the fact table for this process, for example, individual transactions, individual daily
snapshots, and so on.
 Choose the dimensions that will apply to each fact table record. Typical dimensions are time, item,
customer, supplier, warehouse, transaction type, and status.
 Choose the measures that will populate each fact table record. Typical measures are numeric additive
quantities like dollars sold and units sold.
Indexing OLAP Data: Bitmap Index
• Index on a particular column
• Each value in the column has a bit vector: bit-op is fast
• The length of the bit vector: # of records in the base table
• The i-th bit is set if the i-th row of the base table has the value for the indexed column
• not suitable for high cardinality domains
– A recent bit compression technique, Word-Aligned Hybrid (WAH), makes it work for high cardinality domain as well
[Wu, et al. TODS’06]

Base table Index on Region Index on Type

Cust Region Type RecID Asia Europe Am erica RecID Retail Dealer
C1 Asia Retail 1 1 0 0 1 1 0
C2 Europe Dealer 2 0 1 0 2 0 1
C3 Asia Dealer 3 1 0 0 3 0 1
C4 America Retail 4 0 0 1 4 1 0
C5 Europe Dealer 5 0 1 0 5 0 1
98
Indexing OLAP Data: Join Indices

• Join index: JI(R-id, S-id) where R (R-id, …)  S (S-id, …)

• Traditional indices map the values to a list of record ids
– It materializes relational join in JI file and speeds up relational join
• In data warehouses, join index relates the values of the dimensions of a start
schema to rows in the fact table.
– E.g. fact table: Sales and two dimensions city and product
• A join index on city maintains for each distinct city a list of R-IDs of the
tuples recording the Sales in the city
– Join indices can span multiple dimensions

Data Warehousing
100% (1)
Data Warehousing
51 pages
Data Mining
No ratings yet
Data Mining
98 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
What Is Data Warehouse
No ratings yet
What Is Data Warehouse
19 pages
DWBI Unit-1
No ratings yet
DWBI Unit-1
19 pages
Module 1 DMDW
No ratings yet
Module 1 DMDW
64 pages
Data Mining Final New
No ratings yet
Data Mining Final New
109 pages
Dataware Housing Notes
No ratings yet
Dataware Housing Notes
134 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
86 pages
Unit 2 Data Warehousing and OLAP
No ratings yet
Unit 2 Data Warehousing and OLAP
72 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
87 pages
Data Mining Notes (1, 2, 3,4)
No ratings yet
Data Mining Notes (1, 2, 3,4)
82 pages
DW Intro
No ratings yet
DW Intro
30 pages
Module 1
No ratings yet
Module 1
71 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
57 pages
Data Warehousing Introduction Pages 2 53
No ratings yet
Data Warehousing Introduction Pages 2 53
52 pages
chp15 16 17 Warehouse NoSQL
No ratings yet
chp15 16 17 Warehouse NoSQL
38 pages
Module1 Part3
No ratings yet
Module1 Part3
46 pages
Data Mining UNIT 2 LECTURE NOTES
No ratings yet
Data Mining UNIT 2 LECTURE NOTES
32 pages
Unit 2
No ratings yet
Unit 2
31 pages
Module-1: Data Warehousing & Modelling
No ratings yet
Module-1: Data Warehousing & Modelling
13 pages
DWDM Unit-2 Final
No ratings yet
DWDM Unit-2 Final
21 pages
Business Intelligence - Data Warehouse Implementation
100% (1)
Business Intelligence - Data Warehouse Implementation
157 pages
Module 1
No ratings yet
Module 1
25 pages
DW Unit-1 (1) XXXXXXXX
No ratings yet
DW Unit-1 (1) XXXXXXXX
70 pages
DWDM Lecture Notes
No ratings yet
DWDM Lecture Notes
139 pages
DM Unit 2
No ratings yet
DM Unit 2
21 pages
Data War Eh Puse
No ratings yet
Data War Eh Puse
51 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
31 pages
Unit Ii DWDM
No ratings yet
Unit Ii DWDM
10 pages
Data Warehousing and OLAP
No ratings yet
Data Warehousing and OLAP
47 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
Wk3-4 Data Warehouse
No ratings yet
Wk3-4 Data Warehouse
60 pages
Unit-I DW - Architecture
100% (1)
Unit-I DW - Architecture
96 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
21 pages
DBMS II Seven 7
No ratings yet
DBMS II Seven 7
13 pages
DM Chapter 4
No ratings yet
DM Chapter 4
8 pages
Module 1-1basic Concepts
No ratings yet
Module 1-1basic Concepts
40 pages
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
100% (1)
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
82 pages
Data Mining UNIT I
No ratings yet
Data Mining UNIT I
11 pages
Lesson 2. Data Warehouse Basic Concepts
No ratings yet
Lesson 2. Data Warehouse Basic Concepts
18 pages
DMDW1
No ratings yet
DMDW1
13 pages
DWM Unit-I Notes
No ratings yet
DWM Unit-I Notes
9 pages
Lecture # 1-2-Intro
No ratings yet
Lecture # 1-2-Intro
55 pages
DataMining and Data Warehousing
No ratings yet
DataMining and Data Warehousing
96 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
CH 1
No ratings yet
CH 1
53 pages
Data Warehouse and OLAP
No ratings yet
Data Warehouse and OLAP
55 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Data Ware House Concepts
No ratings yet
Data Ware House Concepts
12 pages
Chapter 3 Data Mining
No ratings yet
Chapter 3 Data Mining
3 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
3.1 What Is Data Warehouse?: Unit Iii
No ratings yet
3.1 What Is Data Warehouse?: Unit Iii
33 pages
DWDM Lecturenotes PDF
No ratings yet
DWDM Lecturenotes PDF
133 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
DWDM Unit-2 PDF
No ratings yet
DWDM Unit-2 PDF
149 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
Muthyam Resume
No ratings yet
Muthyam Resume
2 pages
CSE 530 - Database Management Systems: Data Warehousing Presentation by Ali Gardezi Prashanth Janardanan Aaron Sheffield
No ratings yet
CSE 530 - Database Management Systems: Data Warehousing Presentation by Ali Gardezi Prashanth Janardanan Aaron Sheffield
69 pages
6.interview Questions
No ratings yet
6.interview Questions
59 pages
Ghouse Moinuddin Mohammad Email: - Phone: (872) 228-9552 Professional Summary
No ratings yet
Ghouse Moinuddin Mohammad Email: - Phone: (872) 228-9552 Professional Summary
9 pages
Data Warehousing
No ratings yet
Data Warehousing
154 pages
Current and Emerging Trends Transparencies: © Pearson Education Limited, 2004 1
No ratings yet
Current and Emerging Trends Transparencies: © Pearson Education Limited, 2004 1
85 pages
Data Science Terminology Flashcards - Quizlet
100% (1)
Data Science Terminology Flashcards - Quizlet
15 pages
Unit-1 Data Warehousing
No ratings yet
Unit-1 Data Warehousing
17 pages
Snowpro Core
No ratings yet
Snowpro Core
55 pages
Data Warehousing Logical Design
100% (1)
Data Warehousing Logical Design
23 pages
Top 50 Informatica Interview Questions & Answers
No ratings yet
Top 50 Informatica Interview Questions & Answers
11 pages
DWMquestion Bank
No ratings yet
DWMquestion Bank
5 pages
640005
No ratings yet
640005
4 pages
Literature Review Datawarehouse
100% (1)
Literature Review Datawarehouse
40 pages
Elective-I Advanced Database Management Systems
No ratings yet
Elective-I Advanced Database Management Systems
67 pages
Pengenalan Data Mining
No ratings yet
Pengenalan Data Mining
25 pages
SR 3
No ratings yet
SR 3
11 pages
An Iot Based Smart Energy Management of Hvac System
No ratings yet
An Iot Based Smart Energy Management of Hvac System
9 pages
Course Description
No ratings yet
Course Description
3 pages
Module 1: Infrastructure and Capacity Planning For Microsoft Dynamics Ax 2012 Module Overview
No ratings yet
Module 1: Infrastructure and Capacity Planning For Microsoft Dynamics Ax 2012 Module Overview
34 pages
The Cognos BI 10.1.1 Dynamic Query Cookbook - IBM Developer
No ratings yet
The Cognos BI 10.1.1 Dynamic Query Cookbook - IBM Developer
69 pages
Oracle 10g Database Administrator: Implementation and Administration
No ratings yet
Oracle 10g Database Administrator: Implementation and Administration
35 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
SEM 5 - IT - DOC 1-Advance Data Management Technologies 2024 MAY To 2022 DEC PYQ - Aeraxia - in
No ratings yet
SEM 5 - IT - DOC 1-Advance Data Management Technologies 2024 MAY To 2022 DEC PYQ - Aeraxia - in
4 pages
Full Download Information Systems in Organizations 1st Edition Patricia Wallace Test Bank All Chapter 2024 PDF
100% (20)
Full Download Information Systems in Organizations 1st Edition Patricia Wallace Test Bank All Chapter 2024 PDF
44 pages
Case Study Assignment 2
No ratings yet
Case Study Assignment 2
76 pages
QlikView SAP Connector v5.60 - SR2 Reference Manual
No ratings yet
QlikView SAP Connector v5.60 - SR2 Reference Manual
73 pages
Advantages of Multidimensional Data Model
No ratings yet
Advantages of Multidimensional Data Model
6 pages
Project Guides
No ratings yet
Project Guides
5 pages
DE Skills and Tools Guide
No ratings yet
DE Skills and Tools Guide
20 pages
Data Mining Questions
No ratings yet
Data Mining Questions
9 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

Unit 1

Uploaded by

Unit 1

Uploaded by

Data Warehouse and Data Mining

Course Code: CS811

• Introduction to Data Warehousing: Data Warehouse: Basic Concepts,

• Making proper decision

• Because operational databases store huge amounts of data

“Why not perform online analytical processing directly on such

• Data warehouse architecture is based on relational database

{ (city, item, year),

one group-by, such as “compute the sum of sales, group

one group-by, such as “compute the sum of sales, group by

 The DMQL is actually based on the SQL

 DMQL can be designed to support ad hoc and interactive data mining.

 DMQL provides commands for specifying primitives.

 DMQL can be used to define data mining tasks

 Particularly we examine how to define Cube, Dimension and Shared Dimensions

would explicitly instruct the system to compute the sales

 The Product dimension has three hierarchies (class, item, product)

 Cuboids such as {(product, city, day), (product, city, month), (product,

 Similarly, If n=10 and each dimension has one level, then T=

– The snowflake schema is a variant of the star schema model, where

“What can business analysts gain from having a data warehouse?”

Base table Index on Region Index on Type

• Join index: JI(R-id, S-id) where R (R-id, …)  S (S-id, …)

You might also like