0% found this document useful (0 votes)

19 views78 pages

Module-1

Uploaded by

Raghvendra Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views78 pages

Module-1

Uploaded by

Raghvendra Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Welcome

DATA WAREHOUSING DATA

MINING AND - 21CS732

29-11-2024
Department of Information Science and Engg
1
Transform Here
Modules and High Level Topics

Module – 1: Data warehousing and OLAP

Module – 2: Data warehouse implementation & Data Mining
Module – 3: Association Analysis Methods
Module – 4: Classification Methods
Module – 5: Clustering Analysis Methods

29-11-2024
Department of Information Science and Engg
2
Transform Here
Detailed Syllabus – Module Wise
Module-1: Data warehousing and OLAP : Basic Concepts: Data
Warehousing: A multitier Architecture, Data warehouse models: Enterprise
warehouse, Data mart and virtual warehouse, Extraction, Transformation and
loading, Data Cube: A multidimensional data model, Stars, Snowflakes and
Fact constellations: Schemas for multidimensional Data models, Dimensions:
The role of concept Hierarchies, Measures: Their Categorization and
computation, Typical OLAP Operations

Module-2: Data warehouse implementation & Data mining: Efficient Data

Cube computation: An overview, Indexing OLAP Data: Bitmap index and join
index, Efficient processing of OLAP Queries, OLAP server Architecture
ROLAP versus MOLAP Versus HOLAP. : Introduction: What is data mining,
Challenges, Data Mining Tasks, Data: Types of Data, Data Quality, Data
Preprocessing, Measures of Similarity and Dissimilarity

29-11-2024
Department of Information Science and Engg
3
Transform Here
Module-3: Association Analysis: Association Analysis: Problem Definition,
Frequent Item set Generation, Rule generation. Alternative Methods for
Generating Frequent Item sets, FPGrowth Algorithm, Evaluation of
Association Patterns.

Module-4: Classification: Decision Trees Induction, Method for Comparing

Classifiers, Rule Based Classifiers, Nearest Neighbor Classifiers, Bayesian
Classifiers.

Module-5: Clustering Analysis: Overview, K-Means, Agglomerative

Hierarchical Clustering, DBSCAN, Cluster Evaluation, Density-Based
Clustering, Graph-Based Clustering, Scalable Clustering Algorithms.

29-11-2024
Department of Information Science and Engg
4
Transform Here
Course Outcomes

CO1: Apply DWH architecture and multidimensional Modelling for

DWH Solutions
CO2: Design DWH for real world problem statements
CO3: Design association rules and Classification statements for a
given data pattern
CO4: Evaluate the Classification and Clustering techniques for
real world problem statements

29-11-2024
Department of Information Science and Engg
5
Transform Here
Text Books
Text Books:
1. Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining,
Pearson, First impression,2014.
2. Jiawei Han, Micheline Kamber, Jian Pei: Data Mining -Concepts and Techniques, 3rd
Edition, Morgan Kaufmann Publisher, 2012.

Reference Books:

1. Sam Anahory, Dennis Murray: Data Warehousing in the Real World, Pearson, Tenth
Impression,2012.
2. Michael.J.Berry,Gordon.S.Linoff: Mastering Data Mining , Wiley Edition, second
edtion,2012.

29-11-2024
Department of Information Science and Engg
6
Transform Here
We will deep dive into DWH & DM
Module-1: Data Warehousing & Modelling: Basic Concepts: Data Warehousing:
A multitier Architecture.

Data warehouse models: Enterprise warehouse, Data mart and virtual

warehouse.

Extraction, Transformation and loading.

Data Cube: A multidimensional data model, Stars, Snowflakes and Fact

constellations.

Schemas for multidimensional Data models, Dimensions: The role of concept

Hierarchies, Measures.

Their Categorization and computation, Typical OLAP Operations

29-11-2024
Department of Information Science and Engg
7
Transform Here
Basic Definitions
Data: Raw facts that can be recorded/acquired which has an implicit
meaning. Ex- Age, Color, name..etc

Database: A collection of related data, organized in a proper manner

for effective and efficient storage and retrieval purpose.

Database Management System (DBMS): A software

package/ system to facilitate the creation and maintenance of a
computerized database.

Mini-world (DB - Problem Statement): Some part of the real

world about which data is stored in a database. For example, student
grades and transcripts at a university.

29-11-2024
Department of Information Science and Engg
8
Transform Here
What is a Data Warehouse?
■ Defined in many different ways, but not rigorously.
■ A decision support database that is maintained separately from
the organization’s operational database
■ Support information processing by providing a solid platform of
consolidated, historical data for analysis.
■ “A data warehouse is a subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of management’s decision-
making process.”— W. H. Inmon

■ Data warehousing:
■ The process of constructing and using data warehouses
Department of Information Science and Engg
Transform Here 9
Data Warehouse - Subject-Oriented
■ Organized around major subjects, such as customer,
product, sales
■ Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing.
■ Provide a simple and concise view around particular subject
issues by excluding data that are not useful in the decision
support process

Department of Information Science and Engg

Transform Here 10
Data Warehouse - Integrated
■ Constructed by integrating multiple, heterogeneous data
sources
■ Relational databases, flat files, on-line transaction
records
■ Data cleaning and data integration techniques are
applied.
■ Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
■ E.g., Hotel price: currency, tax, breakfast covered, etc.
■ When data is moved to the warehouse, it is
converted.
Department of Information Science and Engg
Transform Here 11
Data Warehouse - Nonvolatile

■ A physically separate store of data transformed from the

operational environment
■ Operational update of data does not occur in the data
warehouse environment
■ Does not require transaction processing, recovery, and
concurrency control mechanisms
■ Requires only two operations in data accessing:
■ initial loading of data and access of data

Department of Information Science and Engg

Transform Here 12
Data Warehouse - Time Variant
■ The time horizon for the data warehouse is significantly longer
than that of operational systems
■ Operational database: current value data
■ Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
■ Every key structure in the data warehouse
■ Contains an element of time, explicitly or implicitly
■ But the key of operational data may or may not
contain “time element” (dwh_create_time (dwh_cttm),
dwh_update_time (dwh_up_time)
Department of Information Science and Engg
Transform Here 13
The major distinguishing features of OLTP and OLAP are
summarized as follows:
Users and system orientation:

• An OLTP system is customer-oriented and is used for

transaction and query processing by clerks, clients, and
information technology professionals.

• An OLAP system is market-oriented and is used for data

analysis by knowledge workers, including managers, executives,
and analysts.

OLTP – Online Transaction Processing

OLAP – Online Analytical Processing
Department of Information Science and Engg
Transform Here 14
■ Data contents:

An OLTP system manages current data that, typically, are

too detailed to be easily used for decision making.

An OLAP system manages large amounts of historic data,

provides facilities for summarization and aggregation, and
stores and manages information at different levels of
granularity.

These features make the data easier to use for informed

decision making.

Department of Information Science and Engg

Transform Here 15
Database Design:

An OLTP system usually adopts an entity-relationship (ER) data

model and an application-oriented database design.

An OLAP system typically adopts either a star or a

snowflake model and a subject-oriented database design.

Department of Information Science and Engg

Transform Here 16
■ View: An OLTP system focuses mainly on the current data
within an enterprise or department, without referring to
historic data or data in different organizations.

■ In contrast, an OLAP system often spans multiple versions of

a database schema, due to the evolutionary process of an
organization.

■ OLAP systems also deal with information that originates from

different organizations, integrating information from many
data stores.
■ Because of their huge volume, OLAP data are stored on
multiple storage media.
Department of Information Science and Engg
Transform Here
■ Access patterns: The access patterns of an OLTP system consist
mainly of short, atomic transactions. Such a system requires
concurrency control and recovery mechanisms.

■ However, accesses to OLAP systems are mostly read-only

operations (because most data warehouses store historic rather
than up-to-date information), although many could be complex
queries.

Department of Information Science and Engg

Transform Here
OLTP vs. OLAP
Parameter OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized,
isolated multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
Department of Information Science and Engg
Transform Here
How are organizations using the information from
data warehouses?
Many organization use this information to support business decision-making
activities, including
1. Increasing customer focus, which includes the analysis of customer buying
patterns (such as buying preference, buying time, budget cycles, and
appetites for spending);
2. Repositioning products and managing product portfolios by comparing
the performance of sales by quarter, by year, and by geographic regions
in order to fine-tune production strategies;
3. Analyzing operations and looking for sources of profit; and
4. Managing customer relationships, making environmental corrections, and
managing the cost of corporate assets

Department of Information Science and Engg

Transform Here
■ Because operational databases store huge amounts of
data, you may wonder, “Why not perform online
analytical processing directly on such databases instead
of spending additional time and resources to
construct a separate data warehouse?”

Department of Information Science and Engg

Transform Here
Why a Separate Data Warehouse?
■ High performance for both systems
■ DBMS - tuned for OLTP: access methods, indexing, concurrency control,

recovery
■ Warehouse - tuned for OLAP: complex OLAP queries, multidimensional
view, consolidation
■ Different functions and different data:
■ missing data: Decision support (DS) requires historical data which
operational DBs do not typically maintain
■ data consolidation: DS requires consolidation (aggregation,
summarization) of data from heterogeneous sources
■ data quality: different sources typically use inconsistent data
representations, codes and formats which have to be reconciled
■ Note: There are more and more systems which perform OLAP analysis directly
on relational databases
Department of Information Science and Engg
Transform Here
Data Warehouse: A Multi-Tiered Architecture

Monitor
& OLAP Server
Other Metadata
sources Integrator

Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining

Data Marts

Data Sources Data Storage OLAP Engine Front-End Tools

Department of Information Science and Engg
Transform Here
Department of Information Science and Engg
Transform Here
■ The bottom tier is a warehouse database server that is
almost always a relational database system. Back-end tools
and utilities are used to feed data into the bottom tier from
operational databases or other external sources (e.g.,
customer proﬁle information provided by external
consultants)

■ These tools and utilities perform data extraction, cleaning,

and transformation (e.g., to merge similar data from different
sources into a uniﬁed format), as well as load and refresh
functions to update the data warehouse

Department of Information Science and Engg

Transform Here
The middle tier is an OLAP server that is typically
implemented using either

a) A relational OLAP(ROLAP)model
(i.e.,an extended relational DBMS that maps operations on
multidimensional data to standard relational operations); or

b) A multidimensional OLAP (MOLAP) model

(special-purpose server that directly implements
multidimensional data and operations)

Department of Information Science and Engg

Transform Here
The top tier is a front-end client layer, which contains
query and reporting tools, analysis tools, and/or
data mining tools (e.g., trend analysis, prediction,
and so on).

Department of Information Science and Engg

Transform Here
Three Data Warehouse Models
■ Enterprise warehouse
■ collects all of the information about subjects spanning the

entire organization
■ Data Mart
■ a subset of corporate-wide data that is of value to a specific

groups of users. Its scope is confined to specific, selected

groups, such as marketing data mart
■ Independent vs. dependent (directly from warehouse) data mart
■ Virtual warehouse
■ A set of views over operational databases

■ Only some of the possible summary views may be

materialized
Department of Information Science and Engg
Transform Here
■ A virtual warehouse is easy to build but requires
excess capacity on operational database servers

“What are the pros and cons of the top-down and bottom-up
approaches to data warehouse development?”
■ The top-down development of an enterprise warehouse
serves as a systematic solution and minimizes integration
problems.
■ However, it is expensive, takes a long time to develop,
and lacks ﬂexibility due to the difficulty in achieving
consistency and consensus for a common data model for the
entire organization.

Department of Information Science and Engg

Transform Here
■ The bottom-up approach to the design,
development, and deployment of independent
data marts provides ﬂexibility, low cost, and rapid
return of investment.

■ It, however, can lead to problems when

integrating various disparate data marts into a
consistent enterprise data warehouse.

Department of Information Science and Engg

Transform Here
■ Depending on the source of data, data marts can be
categorized as independent or dependent.
■ Independent data marts are sourced from data
captured from one or more operational systems
or external information providers, or from data
generated locally within a particular department
or geographic area.
■ Dependent data marts are sourced directly from
enterprise data warehouses.

Department of Information Science and Engg

Transform Here
Extraction, Transformation, and Loading (ETL)
■ Data warehouse systems use back-end tools and utilities to populate and
refresh their data These tools and utilities include the following functions:
■ Data extraction
■ get data from multiple, heterogeneous, and external

sources
■ Data cleaning
■ detect errors in the data and rectify them when possible

■ Data transformation
■ convert data from legacy or host format to warehouse

format
■ Load
■ sort, summarize, consolidate, compute views, check integrity, and
build indicies and partitions
■ Refresh
■ propagate the updates from the data sources to the warehouse
Department of Information Science and Engg
Transform Here
Metadata Repository
Meta data is the data defining warehouse objects. It stores:
■ Description of the structure of the data warehouse
■ schema, view, dimensions, hierarchies, derived data deﬁnitions,
data mart locations and contents.
■ Operational meta-data
■ data lineage (history of migrated data and transformation path),
currency of data (active, archived, or purged),
monitoring information (warehouse usage statistics, error reports,
audit trails)
■ The algorithms used for summarization
■ which include measure and dimension deﬁnition algorithms, data on

granularity, partitions, subject areas, aggregation, summarization, and

predeﬁned queries and reports.

Department of Information Science and Engg

Transform Here
■ The mapping from operational environment to the data warehouse
■ which includes source databases and their contents, gateway

descriptions, data partitions, data extraction, cleaning,

transformation rules and defaults, data refresh and purging rules,
and security (user authorization and access control).
■ Data related to system performance
■ which include indices and proﬁles that improve data access and

retrieval performance, in addition to rules for the timing and

scheduling of refresh, update, and replication cycles.
■ Business data
■ which include business terms and deﬁnitions, data ownership

information, and charging policies.

Department of Information Science and Engg

Transform Here
Data Warehousing and On-line Analytical Processing

■ Data Warehouse: Basic Concepts

■ Data Warehouse Modeling: Data Cube and OLAP
■ Data Warehouse Design and Usage
■ Data Warehouse Implementation
■ Data Generalization by Attribute-Oriented Induction

■ Summary

Department of Information Science and Engg

Transform Here
From Tables and Spreadsheets to Data Cubes

■ “What is a data cube?”

“A data cube allows data to be modeled and viewed in
multiple dimensions”.

■ It is deﬁned by dimensions and facts.

Facts are numerical measures.

A dimension is a structure that categorizes data in order to enable

users to answer business questions

Department of Information Science and Engg

Transform Here
Dimensions are the perspectives or entities with respect to which an
organization wants to keep records.
■ Eg: AllElectronics may create a sales data warehouse in order to

keep records of the store’s sales with respect to the dimensions

time, item, branch, and location. These dimensions allow the store
to keep track of things like monthly sales of items and the
branches and locations at which the items were sold.

Each dimension may have a table associated with it, called a

dimension table, which further describes the dimension.
• For example, a dimension table for item may contain the attributes
item name, brand, and type.
• Dimension tables can be speciﬁed by users or experts, or
automatically generated and adjusted based on data
distributions.
Department of Information Science and Engg
Transform Here
■ Facts are numeric measures. Think of them as the quantities
by which we want to analyze relationships between
dimensions.

■ Examples of facts for a sales data warehouse include

dollars sold (sales amount in dollars), units sold
(number of units sold), and amount budgeted.

■ The fact table contains the names of the facts, or measures,

as well as keys to each of the related dimension tables.

Department of Information Science and Engg

Transform Here
■ 2-D representation, the sales for Vancouver are
shown with respect to the time dimension
(organized in quarters) and the item
dimension(organized according to the types of
items sold).

■ The factor measure displayed is dollars sold (in

thousands).

Department of Information Science and Engg

Transform Here
Department of Information Science and Engg
Transform Here
■ suppose that we would like to view the sales data
with a third dimension.

■ For instance, suppose we would like to view the

data according to time and item, as well as
location, for the cities Chicago, New York, Toronto,
and Vancouver. These 3-D data are shown in
Table 4.3.

Department of Information Science and Engg

Transform Here
Department of Information Science and Engg
Transform Here
cuboid
■ A 3-D data cube representation of the data inTable4.3, according to
time, item, and location.
■ The measure displayed is dollars sold (in thousands).

Department of Information Science and Engg

Transform Here
Multidimensional Data
Sales volume as a function of product, month, and
region.

Department of Information Science and Engg

Transform Here
■ Suppose that we would now like to view our sales
data with an additional fourth dimension such as
supplier.

Department of Information Science and Engg

Transform Here
A 4-D data cube representation of sales data, according to time, item, location,
and supplier. The measure displayed is dollars sold (in thousands). For improved
readability, only some of the cube values are shown.

Department of Information Science and Engg

Transform Here
Cube: A Lattice of Cuboids

➢ Given a set of dimensions, we can generate a cuboid for each of

the possible subsets of the given dimensions.
➢ The result would form a lattice of cuboids, each showing the data
at a different level of summarization, or group-by.

➢ The lattice of cuboids is then referred to as a data cube.

➢ In previous slide it shows a lattice of cuboids forming a data cube

for the dimensions time, item, location, and supplier.

➢ The lattice of cuboid forms a data cube

Department of Information Science and Engg

Transform Here
■ The cuboid that holds the lowest level of summarization is called the
base cuboid.

■ For example, the 4-D cuboid in Figure 4.4 is the base cuboid for the
given time, item, location, and supplier dimensions.

■ The 0-D cuboid, which holds the highest level of summarization, is

called the apex cuboid.

■ In our example, this is the total sales, or dollars sold, summarized

over all four dimensions. The apex cuboid is typically denoted by all.

Department of Information Science and Engg

Transform Here
Cube: A Lattice of Cuboids

Lattice of cuboids, making up a 4-D data cube for time, item, location, and supplier. Each cuboid
represents a different degree of summarization.
Department of Information Science and Engg
Transform Here
Stars, Snowﬂakes, and Fact Constellations:
Schemas for Multidimensional Data Models

Department of Information Science and Engg

Transform Here
Conceptual Modeling of Data Warehouses
■ Modeling data warehouses: dimensions & measures
■ Star schema: A fact table in the middle connected to a set
of dimension tables
■ Snowflake schema: A refinement of star schema where
some dimensional hierarchy is normalized into a set of
smaller dimension tables, forming a shape similar to
snowflake
■ Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
Department of Information Science and Engg
Transform Here
Star Schema:
■ The most common modeling paradigm is the star
schema, in which the data warehouse contains

1) A large central table (fact table) containing the

bulk of the data, with no redundancy, and
2) A set of smaller attendant tables (dimension
tables), one for each dimension.

■ The schema graph resembles a starburst, with the

dimension tables displayed in a radial pattern around
the central fact table.
Department of Information Science and Engg
Transform Here
■ Example 4.1 Star schema. A star schema for
AllElectronics sales is shown in Figure 4.6. Sales
are considered along four dimensions: time, item,
branch, and location. The schema contains a
central fact table for sales that contains keys to
each of the four dimensions, along with two
measures: dollars sold and units sold.
■ To minimize the size of the fact table, dimension
identiﬁers (e.g., time key and item key) are
system-generated identiﬁers.

Department of Information Science and Engg

Transform Here
Figure 4.6 Star schema of sales data warehouse.

Department of Information Science and Engg

Transform Here
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key
type
year item_key supplier_type

branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures
Department of Information Science and Engg
Transform Here
Snowﬂake Schema:
■ The snowﬂake schema is a variant of the star
schema model, where some dimension tables are
normalized, thereby further splitting the data into
additional tables.

■ The resulting schema graph forms a shape similar

to a snowﬂake.

Department of Information Science and Engg

Transform Here
■ The major difference between the snowﬂake and star schema models is
that the dimension tables of the snowﬂake model may be kept in
normalized form to reduce redundancies.

■ Such a table is easy to maintain and saves storage space. However,

this space savings is negligible in comparison to the typical magnitude
of the fact table.

■ Furthermore, the snowﬂake structure can reduce the effectiveness of

browsing, since more joins will be needed to execute a query.
Consequently, the system performance may be adversely impacted.
Hence, although the snowﬂake schema reduces redundancy, it is not as
popular as the star schema in data warehouse design.

Department of Information Science and Engg

Transform Here
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name
supplier_key
month brand
time_key supplier_type
quarter type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country
Department of Information Science and Engg
Transform Here
Fact Constellation
Sophisticated applications may require multiple fact tables to share dimension
tables. This kind of schema can be viewed as a collection of stars, and hence is
called a galaxy schema or a fact constellation.

Ex figure: This schema specifies two fact tables, sales and shipping.
The sales table definition is identical to that of the star schema.
The shipping table has five dimensions, or keys: item key, time key, shipper key,
from location, and to location, and two measures: cost and units shipped.
A fact constellation schema allows dimension tables to be shared between fact
tables.
For example, the dimensions tables for time, item, and location are shared
between both the sales and shipping fact tables.
Department of Information Science and Engg
Transform Here
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location

branch_key location_key dollars_cost
branch_name
units_sold
street
branch_type dollars_sold city
units_shipped
province_or_state
avg_sales country shipper
Measures shipper_key
shipper_name
Department of Information Science and Engg location_key
Transform Here shipper_type 56
Dimensions: The Role of Concept Hierarchies

■ A concept hierarchy deﬁnes a sequence of mappings from

a set of low-level concepts to higher-level, more general
concepts.
■ Consider a concept hierarchy for the dimension location. City
values for location include Vancouver, Toronto, New York,
and Chicago.
■ Many concept hierarchies are implicit within the database
schema. For example, suppose that the dimension location is
described by the attributes number, street, city, province or
state, zip code, and country. These attributes are related by a
total order, forming a concept hierarchy such as “street < city
< province or state < country.”
Department of Information Science and Engg
Transform Here
Department of Information Science and Engg
Transform Here
■ Hierarchical and lattice structures of
attributes in warehouse dimensions:
■ (a) a hierarchy for location and
■ (b) a lattice for time.

Lattice :
A regular geometrical arrangement of
points or objects over an area or in
space.

Department of Information Science and Engg

Transform Here
A Concept Hierarchy: Dimension (location)

all all

region Europe ... North_America

country Germany ... Canada ...

Sp Mexi
ain co
city Frankfurt Vancouver Toronto
... ...

office L. Chan ...

M. Wind
Department of Information Science and Engg
Transform Here
View of Warehouses and Hierarchies

Specification of hierarchies
■ Schema hierarchy

day < {month <

quarter; week} < year
■ Set_grouping hierarchy
{1..10} < inexpensive

URL: https://www2.cs.sfu.ca/CourseCentral/459/han/tutorial/tutorial.html

Department of Information Science and Engg

Transform Here
Measures: Their Categorization and Computation

■ A data cube measure is a numeric function that can be

evaluated at each point in the data cube space.
■ A measure value is computed for a given point by
aggregating the data corresponding to the respective
dimension value pairs deﬁning the given point
■ Measures can be organized into three categories
■ Distributive,

■ Algebraic, and

■ Holistic

■ Based on the kind of aggregate functions used.

Department of Information Science and Engg

Transform Here
Data Cube Measures: Three Categories
■ Distributive: if the result derived by applying the function to
n aggregate values, is the same as that derived by applying
the function on all the data without partitioning
■ E.g., count(), sum(), min(), max()
■ Algebraic: if it can be computed by an algebraic function with
M arguments (where M is a bounded integer), each of which is
obtained by applying a distributive aggregate function
■ E.g., avg(), min_N(), standard_deviation()
■ Holistic: if there is no constant bound on the storage size
needed to describe a subaggregate.
■ E.g., median(), mode(), rank()

Department of Information Science and Engg

Transform Here
Typical OLAP Operations
■ Roll up (drill-up): summarize data
■ by climbing up hierarchy or by dimension reduction

■ Drill down (roll down): reverse of roll-up

■ from higher level summary to lower level summary or
detailed data, or introducing new dimensions
■ Slice and dice: project and select
■ Pivot (rotate):
■ reorient the cube, visualization, 3D to series of 2D planes

■ Other operations
■ Drill Across: involving (across) more than one fact table

■ Drill Through: through the bottom level of the cube to its back-end
relational tables (using SQL)

Department of Information Science and Engg

Transform Here
Typical OLAP Operations

Department of Information Science and Engg

Transform Here
ADDITIONAL INFORMATION

Department of Information Science and Engg

Transform Here
Design of Data Warehouse: A Business Analysis Framework
■ Four views regarding the design of a data warehouse
■ Top-down view
■ allows selection of the relevant information necessary for the data
warehouse
■ Data source view
■ exposes the information being captured, stored, and managed by
operational systems
■ Data warehouse view
■ consists of fact tables and dimension tables
■ Business query view
■ sees the perspectives of data in the warehouse from the view of end-
user
Department of Information Science and Engg
Transform Here
Data Warehouse Design Process
■ Top-down, bottom-up approaches or a combination of both
■ Top-down: Starts with overall design and planning (mature)
■ Bottom-up: Starts with experiments and prototypes (rapid)
■ From software engineering point of view
■ Waterfall: structured and systematic analysis at each step before
proceeding to the next
■ Spiral: rapid generation of increasingly functional systems, short
turn around time, quick turn around
■ Typical data warehouse design process
■ Choose a business process to model, e.g., orders, invoices, etc.
■ Choose the grain (atomic level of data) of the business process
■ Choose the dimensions that will apply to each fact table record
■ Choose the measure that will populate each fact table record
Department of Information Science and Engg
Transform Here
Data Warehouse Development: A Recommended Approach

Multi-Tier Data
Warehouse
Distributed
Data Marts

Enterprise
Data Data
Data
Mart Mart
Warehouse

Model refinement Model refinement

Define a high-level corporate data model

Department of Information Science and Engg
Transform Here
Data Warehouse Usage
■ Three kinds of data warehouse applications
■ Information processing
■ supports querying, basic statistical analysis, and reporting
using crosstabs, tables, charts and graphs
■ Analytical processing
■ multidimensional analysis of data warehouse data
■ supports basic OLAP operations, slice-dice, drilling, pivoting
■ Data mining
■ knowledge discovery from hidden patterns
■ supports associations, constructing analytical models, performing
classification and prediction, and presenting the mining results using
visualization tools
Department of Information Science and Engg
Transform Here
From On-Line Analytical Processing (OLAP) to On Line
Analytical Mining (OLAM)
■ Why Online Analytical Mining?
■ High quality of data in data warehouses

■ DW contains integrated, consistent, cleaned data

■ Available information processing structure surrounding data

warehouses
■ ODBC, OLEDB, Web accessing, service facilities,
reporting and OLAP tools
■ OLAP-based exploratory data analysis

■ Mining with drilling, dicing, pivoting, etc.

■ On-line selection of data mining functions

■ Integration and swapping of multiple mining

functions, algorithms, and tasks

Department of Information Science and Engg
Transform Here
Reflections about todays Session

Google Form – Quiz

https://docs.google.com/forms/d/e/1FAIpQLSfdiEz7A6Z3iNU26
F6XAxLO0AU6P06AuOl7mGeGKzMOeUhiKw/viewform

29-11-2024
Department of Information Science and Engg
76
Transform Here
Conclusion
We have studied the below concepts in todays class
1. Topics of Module-1
2. Learning Objectives
3. Basic Definitions of database approaches
4. Database system environment
5. Main Characteristics of the Database Approach
6. Advantages of using the DBMS Approach
7. Historical Development of Database Technology
8. Database Languages and Architectures
9. Schemas versus Instances
10.Reflections

29-11-2024
Department of Information Science and Engg
77
Transform Here
Contact Details:

Dr.Manjunath T N
Professor and Dean – ER
Department of Information Science and Engg
BMS Institute of Technology and Management
Mobile: +91-9900130748
E-Mail: [email protected] / [email protected]

29-11-2024
Department of Information Science and Engg
78
Transform Here

Data Warehousing & Data Mining PDF
100% (6)
Data Warehousing & Data Mining PDF
143 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
ClearQuest How Do I
No ratings yet
ClearQuest How Do I
126 pages
Vsan 70 Monitoring Troubleshooting Guide
No ratings yet
Vsan 70 Monitoring Troubleshooting Guide
51 pages
Module-1 Merged Merged[1]
No ratings yet
Module-1 Merged Merged[1]
105 pages
Multitier DW Architecture & Implementation
No ratings yet
Multitier DW Architecture & Implementation
63 pages
CSEP 546 Data Mining: Instructor: Pedro Domingos
No ratings yet
CSEP 546 Data Mining: Instructor: Pedro Domingos
63 pages
CSE 592 Data Mining: Instructor: Pedro Domingos
No ratings yet
CSE 592 Data Mining: Instructor: Pedro Domingos
63 pages
CH 1
No ratings yet
CH 1
53 pages
Unit-I DW - Architecture
100% (1)
Unit-I DW - Architecture
96 pages
Unit 1
No ratings yet
Unit 1
99 pages
Lec1 - Introduction To DWH
No ratings yet
Lec1 - Introduction To DWH
41 pages
U1-U5 Consolidated PDF
No ratings yet
U1-U5 Consolidated PDF
222 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
Data Warehousing & Data Mining
100% (1)
Data Warehousing & Data Mining
22 pages
03 DM BI Data Warehousing
No ratings yet
03 DM BI Data Warehousing
94 pages
Module 1-1basic Concepts
No ratings yet
Module 1-1basic Concepts
40 pages
Improving Resource Management & Solving Scheduling Problem in Data Warehouse Using Olap & Oltp
No ratings yet
Improving Resource Management & Solving Scheduling Problem in Data Warehouse Using Olap & Oltp
5 pages
DWHDM_22CSE120__MODULE-1
No ratings yet
DWHDM_22CSE120__MODULE-1
45 pages
DMDW 6
No ratings yet
DMDW 6
41 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
86 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
Wk3-4 Data Warehouse
No ratings yet
Wk3-4 Data Warehouse
60 pages
DATA Science Unit -II Part 1
No ratings yet
DATA Science Unit -II Part 1
20 pages
DBMS II Seven 7
No ratings yet
DBMS II Seven 7
13 pages
Data Warehousing - Data Mining CSE - IT (4th Year) Engineering Lecture Notes, Ebook PDF Download
No ratings yet
Data Warehousing - Data Mining CSE - IT (4th Year) Engineering Lecture Notes, Ebook PDF Download
146 pages
DWDM Unit-2 PDF
No ratings yet
DWDM Unit-2 PDF
149 pages
DWDM Lecture Notes
No ratings yet
DWDM Lecture Notes
139 pages
Data Warehousing & Dimensional Modeling Concepts !!
No ratings yet
Data Warehousing & Dimensional Modeling Concepts !!
33 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
51 pages
Chapter 6-Data Warehouse and Datamining
No ratings yet
Chapter 6-Data Warehouse and Datamining
38 pages
CH - 3
No ratings yet
CH - 3
45 pages
Lesson 2. Data Warehouse Basic Concepts
No ratings yet
Lesson 2. Data Warehouse Basic Concepts
18 pages
7931 Ecap446 Data Warehousing and Data Mining
No ratings yet
7931 Ecap446 Data Warehousing and Data Mining
251 pages
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
No ratings yet
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
96 pages
BI unit 1 Data warehouse.ppt
No ratings yet
BI unit 1 Data warehouse.ppt
169 pages
DWBI Unit-1
No ratings yet
DWBI Unit-1
19 pages
Data Warehouse Full Slides
100% (3)
Data Warehouse Full Slides
822 pages
Csb4318 DWDM Unit - 1 Revised
No ratings yet
Csb4318 DWDM Unit - 1 Revised
68 pages
FDS Unit-2
No ratings yet
FDS Unit-2
36 pages
Data Warehousing Olap: Click To Edit Master Subtitle Style
No ratings yet
Data Warehousing Olap: Click To Edit Master Subtitle Style
16 pages
chp15 16 17 Warehouse NoSQL
No ratings yet
chp15 16 17 Warehouse NoSQL
38 pages
Adbms: Data Warehousing OLAP Technology
No ratings yet
Adbms: Data Warehousing OLAP Technology
57 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
86 pages
DMDW Notes
100% (1)
DMDW Notes
62 pages
Data Mining - 3 PDF
No ratings yet
Data Mining - 3 PDF
62 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
An Introduction To Data Warehousing and Data Mining
No ratings yet
An Introduction To Data Warehousing and Data Mining
34 pages
Business Intelligence/ Data Warehousing: Lakshmi Prashad PMG
100% (1)
Business Intelligence/ Data Warehousing: Lakshmi Prashad PMG
101 pages
Data Mining and Data Warehouse - Mukesh Prasad Chaudhary
No ratings yet
Data Mining and Data Warehouse - Mukesh Prasad Chaudhary
651 pages
By Bi Jay Mishra
No ratings yet
By Bi Jay Mishra
685 pages
DW Intro
No ratings yet
DW Intro
30 pages
Decap446 Data Warehousing and Data Mining
No ratings yet
Decap446 Data Warehousing and Data Mining
252 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
DW Concepts
100% (1)
DW Concepts
40 pages
UEU Sistem Pendukung Keputusan Pertemuan 5
No ratings yet
UEU Sistem Pendukung Keputusan Pertemuan 5
46 pages
Lecture # 1-2-Intro
No ratings yet
Lecture # 1-2-Intro
55 pages
Module 3
No ratings yet
Module 3
17 pages
Data Mining& Data Warehousing.
No ratings yet
Data Mining& Data Warehousing.
13 pages
The Study of Building the Data Warehouse
From Everand
The Study of Building the Data Warehouse
venkateswara Rao
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Manual
No ratings yet
Manual
17 pages
Abcdefghi Lmnopq Stuvw Z JK R XY: Acronyms and Abbreviations
No ratings yet
Abcdefghi Lmnopq Stuvw Z JK R XY: Acronyms and Abbreviations
15 pages
SAP eng_co_casestudy
No ratings yet
SAP eng_co_casestudy
16 pages
10CLASS IT CH.1 DOCUMENTATION
No ratings yet
10CLASS IT CH.1 DOCUMENTATION
4 pages
Classification and Types of Software
No ratings yet
Classification and Types of Software
3 pages
01. Lecture PPT - Python Programming Intro v2.3 (1)
No ratings yet
01. Lecture PPT - Python Programming Intro v2.3 (1)
24 pages
LEGv8 - Section 2 - Branch
No ratings yet
LEGv8 - Section 2 - Branch
35 pages
CCU_TX ITELCO_Info_02.35 wib (14-05-2025)
No ratings yet
CCU_TX ITELCO_Info_02.35 wib (14-05-2025)
1 page
(ISC) 2 Certified in Cybersecurity - Exam Prep Flashcards - Quizlet
No ratings yet
(ISC) 2 Certified in Cybersecurity - Exam Prep Flashcards - Quizlet
5 pages
MA NSP Manual 2017-05 en
No ratings yet
MA NSP Manual 2017-05 en
16 pages
RSRCH Systemvirus
No ratings yet
RSRCH Systemvirus
3 pages
SJF
No ratings yet
SJF
2 pages
Dr. Huma Qayyum Department of Software Engineering Huma - Ayub@uettaxila - Edu.pk
No ratings yet
Dr. Huma Qayyum Department of Software Engineering Huma - Ayub@uettaxila - Edu.pk
20 pages
Smart Charger User Manual (SCharger 7KS S0, SCharger 22KT S0) 1
No ratings yet
Smart Charger User Manual (SCharger 7KS S0, SCharger 22KT S0) 1
24 pages
Adecco Optimizes Its Recruitment Chatbot With Work4's Technology
No ratings yet
Adecco Optimizes Its Recruitment Chatbot With Work4's Technology
2 pages
Autocad 2009 Key
No ratings yet
Autocad 2009 Key
5 pages
Brugermanual En130
No ratings yet
Brugermanual En130
39 pages
ms101_spring_allocation_04-01-25
No ratings yet
ms101_spring_allocation_04-01-25
16 pages
Umat SANISAND README PDF
No ratings yet
Umat SANISAND README PDF
4 pages
Huawei E5331 Specs PDF
No ratings yet
Huawei E5331 Specs PDF
18 pages
January 1st Update Yandere Simulator Development Blog
No ratings yet
January 1st Update Yandere Simulator Development Blog
1 page
Unit 4 - Distributed System - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Distributed System - WWW - Rgpvnotes.in
9 pages
Xgvela Use of TM Forum Apis: Vance Shipley
No ratings yet
Xgvela Use of TM Forum Apis: Vance Shipley
27 pages
0901EC201113_Creative_problem_solving
No ratings yet
0901EC201113_Creative_problem_solving
19 pages
KDF71
No ratings yet
KDF71
6 pages
Health Management
No ratings yet
Health Management
2 pages
Alin.-151-199
No ratings yet
Alin.-151-199
49 pages
Allouche J P, Davison J L, Queffélec M, Zamboni L Q - Transcendence of Sturmian or Morphic Continued Fractions - J. Number Theory 91 (2000), 39-66
No ratings yet
Allouche J P, Davison J L, Queffélec M, Zamboni L Q - Transcendence of Sturmian or Morphic Continued Fractions - J. Number Theory 91 (2000), 39-66
28 pages