0% found this document useful (0 votes)

42 views

Unit-II Data Warehousing

Uploaded by

G.Akshaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Unit-II Data Warehousing

Uploaded by

G.Akshaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 98

SCSA3001

Subject Name: Data Mining & Data Warehousing

Faculty Name: K.Babu

UNIT-II
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

1 UNIT-III 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data Warehouse
➢ A data warehouse is a collection of data marts representing
historical data from different operations in the company.
➢ It collect the data from multiple heterogeneous data base files
(flat, text and etc).
➢ It store the5 to 10years of huge amount of data. This data is
stored in a structure optimized for querying and data analysis as a
data warehouse.

2 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Subject Oriented: Data that gives information about a particular

subject instead of about a company’s ongoing operations.
Integrated: Data that is gathered into the data warehouse from a
variety of sources and merged into a coherent whole.
Time-variant: All data in the data warehouse is identified with a
particular time period.
Non-volatile: Data is stable in a data warehouse. More data is
added but data is never removed.

3 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Enterprise Data warehouse:

It collects all information about subjects (customers, products,
sales, assets, personnel) that span the entire organization
Decision Support System (DSS): Information technology to help
the knowledge worker (executive, manager, and analyst) makes
faster & better decisions.

4 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Operational and informational Data Operation al Data:

➢ Focusing on transactional function such as bank card
withdrawals and deposits
➢ Detailed
➢ Updateable
➢ Reflects current data

5 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Information al Data:
➢Focusing on providing answers to problems posed by decision
makers
➢Summarized
➢Non updateable
Data Warehouse Characteristics
➢It is a database designed for analytical tasks
➢Its content is periodically updated
➢It contains current and historical data to provide historical
perspective of information.

6 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

DATA WAREHOUSECOMPONENTS

➢The data warehouse architecture is based on the data base

management system server.
➢The central information repository is surrounded by number of
key components
➢Data warehouse is an environment, not a product which is based
on relational database management system
➢The data entered into the data warehouse transformed into an
integrated structure and format.

7 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

➢Transformation process involves conversion, summarization,

filtering.
➢The data warehouse must be capable of holding and managing
large volumes of data as well as different structure of data
structures over the time.

8 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Key components
➢Data sourcing, cleanup, transformation, and migration tools
➢Metadata repository
➢Warehouse/database technology
➢Data marts
➢Data query, reporting, analysis, and mining tools
➢Data warehouse administration and management
➢ Information delivery system

9 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

10 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data sourcing, cleanup, transformation, and migration tools

➢They perform conversions, summarization, key changes,
structural changes
➢The data transformation is required to use by decision support
tools.
➢The transformation produces programs, control statements.
➢It moves the data into data warehouse from multiple operational
systems.

11 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

The Functionalities of these tools are listed below:

➢To remove unwanted data from operational db
➢Converting to common data names and attributes
➢Calculating summaries and derived data
➢Establishing defaults for missing data
➢Accommodating source data definition changes

12 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Metadata repository
It is data about data. It is used for maintaining, managing and using
the data warehouse.
Technical Meta data: It contains information about data
warehouse data used by warehouse designer, administrator to
carry out development and management tasks. It includes,
➢Info about data stores.
➢Transformation descriptions.
➢That si mapping methods from operational db to warehouse db.
➢Warehouse Object and data structure definitions for target data
13 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

➢The rules used to perform clean up, and data enhancement

➢Data mapping operations
➢Access authorization, backup history, archive history, info
delivery history, data acquisition history, data access etc.,

14 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Business Meta data: It contains info that gives info stored in data
warehouse to users. It includes,
➢Subject areas, and info object type including queries, reports,
images, video, audio clips etc.
➢ Internet home pages
➢ Info related to info delivery system
➢Data warehouse operational info such as ownerships, audit
trails etc. ,

15 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Meta data helps the users to understand content and find the data.
Meta data are stored in a separate data stores which is known as
informational directory or Meta data repository which helps to
integrate, maintain and view the contents of the data warehouse.

Data ware house database

This is the central part of the data ware housing environment.
This is implemented based on RDBMS technology

16 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data marts It is inexpensive tool and alternative to the data ware

house. it based on the subject area Data mart is used in the
following situation:
➢Extremely urgent user requirement
➢The absence of a budget for a full scale data warehouse strategy
➢The decentralization of business needs

17 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data query, reporting ,analysis, and mining tools Its purpose is to

provide info to business users for decision making. There are five
categories:
➢Data query and reporting tools
➢Application development tools
➢Executive info system tools (EIS)
➢OLAP tools
➢Data mining tools

18 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Query and reporting tools: Used to generate query and report

➢Production reporting tool used to generate regular operational
reports
➢Desktop report writer are inexpensive desktop tools designed for
end users.
Managed Query tools: Used to generate SQL query. It uses Met layer
software in between users and databases which offers appoint-and-
click creation of SQL statement.

19 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Application development tools: This is a graphical data access

environment which integrates OLAP tools with data warehouse and
can be used to access all db systems.
OLAP Tools: Are used to analyze the data in multidimensional and
complex views.
Data mining tools: Are used to discover knowledge from the data
warehouse data.

20 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data ware house administration and management

➢Security and priority management
➢Monitoring updates from multiple sources
➢Data quality checks
➢Managing and updating meta data
➢Auditing and reporting data warehouse usage and status
➢Backup and recovery
➢Data warehouse storage management which includes capacity
planning, hierarchical storage management and purging of aged
data etc.,

21 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Information delivery system

➢It is used to enable the process of subscribing for data warehouse
info.
➢Delivery to one or more destinations according to specified
scheduling algorithm

22 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
BUILDING A DATA WAREHOUSE

Business factors:
➢Business users want to make decision quickly and correctly
using all available data.
Technological factors:
➢To address the incompatibility of operational data stores
➢IT infrastructure is changing rapidly. Its capacity is increasing
and cost is decreasing so that building a data warehouse is easy

23 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Business factors
Top – Down Approach It collected enterprise wide business
requirements and decided to build an enterprise data warehouse
with subset data marts.
Bottom Up Approach The data marts are integrated or combined
together to form a data warehouse.
Developing and integrating data marts as and when the
requirements are clear.
The advantage of using the Bottom Up approach is that they do not
require high initial costs and have a faster implementation time;

24 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Design considerations:
➢In general a data warehouse data from multiple heterogeneous
sources into a query database this is also one of the reasons why a
data warehouse is difficult to built Data content
➢The content and structure of the data warehouse are reflected in
its data model.
➢The data model is the template that describes how information
will be organized within the integrated warehouse framework.
➢The data warehouse data must be a detailed data. It must be
formatted, cleaned up and Transformed to fit the warehouse data
model.
25 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Meta data
➢It defines the location and contents of data in the warehouse.
➢Meta data is searchable by users to find definitions or subject
areas.
Data distribution
➢Data volumes continue to grow in nature. Therefore, it becomes
necessary to know how the data should be divided across multiple
servers.
➢The data can be distributed based on the subject area, location
(geographical region), or time (current, month, year)

26 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Technological factors
Technical considerations:
Hardware platforms
➢An important consideration when choosing a data warehouse
server capacity for handling the high volumes of data.
➢It has large data and through put.
➢The modern server can also support large volumes and large
number of flexible GUI

27 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Data warehouse and DBMS specialization
➢Very large size of databases and need to process complex adhoc
queries in a short time
➢The most important requirements for the data warehouse DBMS
are performance, throughput and scalability.
Communication infrastructure
➢The data warehouse user requires a relatively large band width
to interact with the data warehouse and retrieve a significant
amount of data for analysis.
➢This may mean that communication networks have to be
expanded and new hardware and software may have purchased.
28 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Access tools:
Data warehouse implementation relies on selecting suitable data
access tools. The following lists the various type of data that can be
accessed:
1. Simple tabular form data
2. Ranking data
3. Multivariable data
4. Time series data
5. Graphing, charting and pivoting data

29 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data extraction, clean up, transformation and migration:

➢Timeliness of data delivery to the warehouse
➢The tool must have the ability to identify the particular data and
that can be read by conversion tool.
➢The tool must support flat files, indexed files since corporate
data is still in this type
➢The tool must have the capability to merge data from multiple
data stores
➢The tool should have specification interface to indicate the data
to be extracted

30 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

➢The tool should have the ability to read data from data
dictionary The code generated by the tool should be completely
maintainable
➢The data warehouse database system must be able to perform
loading data directly from these tools.

31 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data replication
➢Data replication or data moves to place the data to a particular
workgroup in a localized database.
➢Most companies use data replication servers to copy their most
needed data to a separate database.

Metadata
➢It is a road map to the information stores in the warehouse is
metadata it defines all elements and their attributes.

32 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data placement strategies

➢As a data warehouse grows, there are at least two options for
data placement. One is to put some of the data in the data
warehouse into another storage media.
➢The second option is to distribute the data in the data warehouse
across multiple servers.

33 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

User levels
Casual users: are most comfortable in retrieving info from
warehouse in predefined formats and running pre-existing
queries and reports.
Power Users: can use pre defined as well as user defined queries
to create simple and ad hoc reports. These users can engage in
drill down operations. These users may have the experience of
using reporting and query tools.
Expert users: These users tend to create their own complex
queries and perform standard analysis on the info they retrieve.

34 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Benefits of data warehousing

Tangible benefits
➢Improvement in product inventory
➢Decrement in production cost
➢Improvement in selection of target markets
Intangible (not easy to quantified):
➢Improvement in productivity by keeping all data in single
location and eliminating rekeying of data.
➢Reduced redundant processing Enhanced customer relation.

35 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Multi Dimensional Data Model

The multi-Dimensional Data Model is a method which is used for

ordering data in the database along with good arrangement and
assembling of the contents in the database.
The Multi Dimensional Data Model allows customers to
interrogate analytical questions associated with market or
business trends, unlike relational databases which allow
customers to access data in the form of queries.

36 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

It represents data in the form of data cubes. Data cubes allow to

model and view the data from many dimensions and perspectives.
It is defined by dimensions and facts and is represented by a fact
table. Facts are numerical measures and fact tables contain
measures of the related dimensional tables or names of the facts.

37 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Working on a Multidimensional Data Model

Stage 1 : Assembling data from the client : In first stage, a Multi

Dimensional Data Model collects correct data from the client.
Mostly, software professionals provide simplicity to the client
about the range of data which can be gained with the selected
technology and collect the complete data in detail.
Stage 2 : Grouping different segments of the system : In the
second stage, the Multi Dimensional Data Model recognizes and
classifies all the data to the respective section they belong to and
also builds it problem-free to apply step by step.
38 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Stage 3 : Noticing the different proportions : In the third stage,

it is the basis on which the design of the system is based. In this
stage, the main factors are recognized according to the user’s
point of view. These factors are also known as “Dimensions”.
Stage 4 : Preparing the actual-time factors and their
respective qualities : In the fourth stage, the factors which are
recognized in the previous step are used further for identifying
the related qualities. These qualities are also known
as “attributes” in the database.

39 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Stage 5 : Finding the actuality of factors which are listed

previously and their qualities : A Multi Dimensional Data Model
separates and differentiates the actuality from the factors which are
collected by it. These actually play a significant role in the
arrangement of a Multi Dimensional Data Model.
Stage 6 : Building the Schema to place the data, with respect to
the information collected from the steps above : In the sixth
stage, on the basis of the data which was collected previously, a
Schema is built.

40 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

For Example :
1. Let us take the example of a firm. The revenue cost of a firm can
be recognized on the basis of different factors such as geographical
location of firm’s workplace, products of the firm, advertisements
done, time utilized to flourish a product, etc.

41 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

42 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Let us take the example of the data of a factory which sells

products per quarter in Bangalore. The data is represented in the
table given below :

43 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

In the above given presentation, the factory’s sales for Bangalore

are, for the time dimension, which is organized into quarters and
the dimension of items, which is sorted according to the kind of
item which is sold. The facts here are represented in rupees (in
thousands).
Now, if we desire to view the data of the sales in a three-
dimensional table, then it is represented in the diagram given
below. Here the data of the sales is represented as a
two dimensional table. Let us consider the data according to item,
time and location (like Kolkata, Delhi, Mumbai). Here is the table :

44 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

45 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

This data can be represented in the form of three dimensions

conceptually, which is shown in the image below :

46 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Advantages of Multi Dimensional Data Model

✓ A multi-dimensional data model is easy to handle.
✓ It is easy to maintain.
✓ Its performance is better than that of normal databases
✓ The representation of data is better than traditional databases.
That is because the multi-dimensional databases are multi-viewed
and carry different types of factors.

47 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Disadvantages of Multi Dimensional Data Model

➢ The multi-dimensional Data Model is slightly complicated in
nature and it requires professionals to recognize and examine the
data in the database.
➢ During the work of a Multi-Dimensional Data Model, when the
system caches, there is a great effect on the working of the
system.
➢ It is complicated in nature due to which the databases are
generally dynamic in design.

48 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

OLAP OPERATIONS IN THE MULTIDIMENSIONAL DATA MODEL

OLAP stands for Online Analytical Processing Server. It is a

software technology that allows users to analyze information
from multiple database systems at the same time. It is based on
multidimensional data model and allows the user to query on
multi-dimensional data (eg. Delhi -> 2018 -> Sales data). OLAP
databases are divided into one or more cubes and these cubes
are known as Hyper-cubes.

49 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

50 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

OLAP operations:
Drill down: In drill-down operation, the less detailed data is
converted into highly detailed data. It can be done by:
✓ Moving down in the concept hierarchy
✓ Adding a new dimension
In the cube given in overview section, the drill down operation is
performed by moving down in the concept hierarchy
of Time dimension (Quarter -> Month).

51 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

52 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Roll up: It is just opposite of the drill-down operation. It performs

aggregation on the OLAP cube. It can be done by:
✓Climbing up in the concept hierarchy
✓Reducing the dimensions
In the cube given in the overview section, the roll-up operation is
performed by climbing up in the concept hierarchy
of Location dimension (City -> Country).

53 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

54 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Dice:
It selects a sub-cube from the OLAP cube by selecting two or more
dimensions. In the cube given in the overview section, a sub-cube
is selected by selecting following dimensions with criteria:
Location = “Delhi” or “Kolkata”
Time = “Q1” or “Q2”
Item = “Car” or “Bus”

55 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

56 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Slice: It selects a single dimension from the OLAP cube which

results in a new sub-cube creation. In the cube given in the
overview section, Slice is performed on the dimension Time =
“Q1”.

57 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Pivot: It is also known as rotation operation as it rotates the

current view to get a new view of the representation. In the sub-
cube obtained after the slice operation, performing pivot
operation gives a new view of it.

58 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Three-Tier Data Warehouse Architecture

Data Warehouses usually have a three-level (tier) architecture that

includes:
✓Bottom Tier (Data Warehouse Server)
✓Middle Tier (OLAP Server)
✓Top Tier (Front end Tools).
A bottom-tier that consists of the Data Warehouse server, which
is almost always an RDBMS. It may include several specialized data
marts and a metadata repository.

59 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data from operational databases and external sources (such as

user profile data provided by external consultants) are extracted
using application program interfaces called a gateway. A gateway
is provided by the underlying DBMS and allows customer
programs to generate SQL code to be executed at a server.
Examples of gateways contain ODBC (Open Database
Connection) and OLE-DB (Open-Linking and Embedding for
Databases), by Microsoft, and JDBC (Java Database Connection).

60 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
A middle-tier which consists of an OLAP server for fast querying
of the data warehouse.
(1) A Relational OLAP model, i.e., an extended relational DBMS
that maps functions on multidimensional data to standard
relational operations.
(2) A Multidimensional OLAP model, i.e., a particular purpose
server that directly implements multidimensional information and
operations.
A top-tier that contains front-end tools for displaying results
provided by OLAP, as well as additional tools for data mining of
the OLAP-generated data.
61 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

62 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

The metadata repository stores information that defines DW

objects. It includes the following parameters and information for
the middle and the top-tier applications:
1. A description of the DW structure, including the warehouse
schema, dimension, hierarchies, data mart locations, and
contents, etc.
2. Operational metadata, which usually describes the currency
level of the stored data, i.e., active, archived or purged, and
warehouse monitoring information, i.e., usage statistics, error
reports, audit, etc.
63 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

1. System performance data, which includes indices, used to

improve data access and retrieval performance.
2. Information about the mapping from operational databases,
which provides source RDBMSs and their contents, cleaning
and transformation rules, etc.
3. Summarization algorithms, predefined queries, and reports
business data, which include business terms and definitions,
ownership information, etc.

64 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Load Performance
Data warehouses require increase loading of new data periodically
basis within narrow time windows; performance on the load
process should be measured in hundreds of millions of rows and
gigabytes per hour and must not artificially constrain the volume of
data business.
Load Processing
Many phases must be taken to load new or update data into the
data warehouse, including data conversion, filtering, reformatting,
indexing, and metadata update.

65 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data Quality Management

Fact-based management demands the highest data quality. The
warehouse ensures local consistency, global consistency, and
referential integrity despite "dirty" sources and massive database
size.
Query Performance
Fact-based management must not be slowed by the performance of the
data warehouse RDBMS; large, complex queries must be complete in
seconds, not days.

66 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCHEMAS FOR MULTI-DIMENSIONAL DATA MODEL

Schema is a logical description of the entire database. It includes

the name and description of records of all record types including
all associated data-items and aggregates. Much like a database, a
data warehouse also requires to maintain a schema.
A database uses relational model, while a data warehouse uses
Star, Snowflake, and Fact Constellation schema. In this chapter,
we will discuss the schemas used in a data warehouse.

67 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Star Schema
➢ Each dimension in a star schema is represented with only one-
dimension table.
➢ This dimension table contains the set of attributes.
➢ The following diagram shows the sales data of a company with
respect to the four dimensions, namely time, item, branch, and
location.

68 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

69 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

➢There is a fact table at the center. It contains the keys to each of

four dimensions.
➢The fact table also contains the attributes, namely dollars sold
and units sold.
Note − For example, the location dimension table contains the
attribute set {location key, street, city, province_or_state,country}.
This constraint may cause data redundancy.

70 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Snowflake Schema
➢Some dimension tables in the Snowflake schema are
normalized.
➢The normalization splits up the data into additional tables.
➢Unlike Star schema, the dimensions table in a snowflake schema
are normalized. For example, the item dimension table in star
schema is normalized and split into two dimension tables, namely
item and supplier table.

71 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

72 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Now the item dimension table contains the attributes item_key,

item_name, type, brand, and supplier-key.
The supplier key is linked to the supplier dimension table. The
supplier dimension table contains the attributes supplier_key and
supplier_type.
Note − Due to normalization in the Snowflake schema, the
redundancy is reduced and therefore, it becomes easy to maintain
and the save storage space.

73 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Fact Constellation Schema
➢ A fact constellation has multiple fact tables. It is also known as
galaxy schema.
➢ The following diagram shows two fact tables, namely sales and
shipping.
➢ The sales fact table is same as that in the star schema.
➢ The shipping fact table has the five dimensions, namely
item_key, time_key, shipper_key, from_location, to_location.
➢ The shipping fact table also contains two measures, namely
dollars sold and units sold.

74 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

It is also possible to share dimension tables between fact tables.

For example, time, item, and location dimension tables are shared
between the sales and shipping fact table.

75 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
OLAP
Online Analytical Processing (OLAP) is a category of software
that allows users to analyze information from multiple database
systems at the same time. It is a technology that enables analysts to
extract and view business data from different points of view.
How does it work?
A Data warehouse would extract information from multiple data
sources and formats like text files, excel sheet, multimedia files, etc.
The extracted data is cleaned and transformed. Data is loaded into
an OLAP server where information is pre-calculated in advance for
further analysis.
76 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Basic analytical operations of OLAP

1) Roll-up:
Roll-up is also known as “consolidation” or “aggregation.” The Roll-
up operation can be performed in 2 ways
Reducing dimensions
Climbing up concept hierarchy. Concept hierarchy is a system of
grouping things based on their order or level.

77 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

78 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

➢In this example, cities New jersey and Lost Angles and rolled up
into country USA
➢The sales figure of New Jersey and Los Angeles are 440 and
1560 respectively. They become 2000 after roll-up
➢In this aggregation process, data is location hierarchy moves up
from city to the country.
➢In the roll-up process at least one or more dimensions need to
be removed. In this example, Cities dimension is removed.

79 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

2.Drill-down
In drill-down data is fragmented into smaller parts. It is the
opposite of the rollup process. It can be done via
➢Moving down the concept hierarchy
➢Increasing a dimension
Consider the diagram below
Quater Q1 is drilled down to months January, February, and March.
Corresponding sales are also registers.
In this example, dimension months are added.

80 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

81 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

3) Slice:
Here, one dimension is selected, and a new sub-cube is created.
Following diagram explain how slice operation performed:
➢ Dimension Time is Sliced with Q1 as the filter.
➢ A new cube is created altogether.

82 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

83 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Dice:
This operation is similar to a slice. The difference in dice is you
select 2 or more dimensions that result in the creation of a sub-
cube.

84 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

85 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

4) Pivot
In Pivot, you rotate the data axes to provide a substitute
presentation of data.
In the following example, the pivot is based on item types.

86 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

87 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Types of OLAP

88 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Type of OLAP Explanation

ROLAP is an extended RDBMS along with
Relational
multidimensional data mapping to perform
OLAP(ROLAP)
the standard relational operation.
Multidimensional MOLAP Implements operation in
OLAP (MOLAP) multidimensional data.
In HOLAP approach the aggregated totals are
Hybrid Online stored in a multidimensional database while
Analytical the detailed data is stored in the relational
Processing database. This offers both data efficiency of
(HOLAP) the ROLAP model and the performance of the
MOLAP model.

89 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

In Desktop OLAP, a user downloads a part of the

Desktop OLAP
data from the database locally, or on their
(DOLAP)
desktop and analyze it.
Web OLAP which is OLAP system accessible via
Web OLAP the web browser. WOLAP is a three-tiered
(WOLAP) architecture. It consists of three components:
client, middleware, and a database server.
Mobile OLAP helps users to access and analyze
Mobile OLAP:
OLAP data using their mobile devices
SOLAP is created to facilitate management of both
Spatial OLAP : spatial and non-spatial data in a Geographic
Information system (GIS)

90 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Advantages of OLAP
➢ OLAP is a platform for all type of business includes planning,
budgeting, reporting, and analysis.
➢ Information and calculations are consistent in an OLAP cube.
This is a crucial benefit.
➢ Quickly create and analyze “What if” scenarios
➢ Easily search OLAP database for broad or specific terms.
➢ OLAP provides the building blocks for business modeling tools,
Data mining tools, performance reporting tools.

91 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Disadvantages of OLAP
➢OLAP requires organizing data into a star or snowflake schema.
These schemas are complicated to implement and administer
➢You cannot have large number of dimensions in a single OLAP
cube
➢Transactional data cannot be accessed with OLAP system.
➢Any modification in an OLAP cube needs a full update of the
cube. This is a time-consuming process

92 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Comparisons of OLAP vs OLTP

OLAP (Online analytical OLTP (Online transaction

processing) processing)

Consists of historical data from Consists only operational

various Databases. current data.

It is subject oriented. Used for

It is application oriented. Used
Data Mining, Analytics,
for business tasks.
Decision making,etc.

The data is used in planning,

The data is used to perform day
problem solving and decision
to day fundamental operations.
making.

93 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

The size of the data is relatively

Large amount of data is stored
small as the historical data is
typically in TB, PB
archived. For ex MB, GB

Relatively slow as the amount of

Very Fast as the queries operate
data involved is large. Queries
on 5% of the data.
may take hours.

It only need backup from time Backup and recovery process is

to time as compared to OLTP. maintained religiously

94 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

This data is generally managed This data is managed by clerks,

by CEO, MD, GM. managers.

Only read and rarely write

Both read and write operations.
operation.

95 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Architecture for On-Line Analytical Mining: An OLAM server

performs analytical mining in data cubes in a similar manner as
an OLAP server performs on-line analytical processing.
Where the OLAM and OLAP servers both accept user on-line
queries (or commands) via a graphical user interface API and
work with the data cube in the data analysis via a cube API.

96 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

97 UNIT-II 7/8/2024
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

A metadata directory is used to guide the access of the data cube.

The data cube can be constructed by accessing and/or integrating
multiple databases via an MDDB API and/or by filtering a data
warehouse via a database API that may support OLE DB or ODBC
connections.
Since an OLAM server may perform multiple data mining tasks,
such as concept description, association, classification, prediction,
clustering, time-series analysis, and so on, it usually consists of
multiple integrated data mining modules and is more
sophisticated than an OLAP server.
98 UNIT-II 7/8/2024

Cloud Security
No ratings yet
Cloud Security
4 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
03-LinkProof Tech Training
No ratings yet
03-LinkProof Tech Training
87 pages
Unit 1 Data Warehousing and Mining
100% (1)
Unit 1 Data Warehousing and Mining
19 pages
BCS18010 - Datawarehousing & Data Mining
No ratings yet
BCS18010 - Datawarehousing & Data Mining
136 pages
Datawarehousing and Data Mining Full Notes PDF
No ratings yet
Datawarehousing and Data Mining Full Notes PDF
162 pages
Data Warehousing & Mining: Unit - Ii
No ratings yet
Data Warehousing & Mining: Unit - Ii
41 pages
DWDM Unit-1
No ratings yet
DWDM Unit-1
31 pages
DW DM Notes
No ratings yet
DW DM Notes
107 pages
20it501 DWDM PPT Unit I
No ratings yet
20it501 DWDM PPT Unit I
127 pages
Chapter 2
No ratings yet
Chapter 2
44 pages
Introduction To Data Warehouse: Unit I: Data Warehousing
No ratings yet
Introduction To Data Warehouse: Unit I: Data Warehousing
110 pages
DWM Unit 1
No ratings yet
DWM Unit 1
34 pages
DWDM
No ratings yet
DWDM
15 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
62 pages
DWDM - UNIT 1 Notes PDF
No ratings yet
DWDM - UNIT 1 Notes PDF
33 pages
1.1 Basic Concepts & Architecture
No ratings yet
1.1 Basic Concepts & Architecture
27 pages
12 01 09 10 32 12 1287 Sindhujam PDF
No ratings yet
12 01 09 10 32 12 1287 Sindhujam PDF
23 pages
CS2032 Unit I Notes
No ratings yet
CS2032 Unit I Notes
23 pages
DW Arch
No ratings yet
DW Arch
9 pages
Data Warehousing
No ratings yet
Data Warehousing
111 pages
Data Warehouse
100% (3)
Data Warehouse
26 pages
Malineni Lakshmaiah Engineering College S.KONDA-523101 Andhra Pradesh
No ratings yet
Malineni Lakshmaiah Engineering College S.KONDA-523101 Andhra Pradesh
15 pages
Data Warehouse and Data Mining Notes
No ratings yet
Data Warehouse and Data Mining Notes
31 pages
Paper Presentation: Data Ware Housing AND Data Mining
No ratings yet
Paper Presentation: Data Ware Housing AND Data Mining
10 pages
Data Warehouse
No ratings yet
Data Warehouse
26 pages
Data Mining & Housing
No ratings yet
Data Mining & Housing
13 pages
Data Warehousing
No ratings yet
Data Warehousing
30 pages
DWDM Notes/Unit 1
No ratings yet
DWDM Notes/Unit 1
31 pages
Unit-2-Data-Warehousing
No ratings yet
Unit-2-Data-Warehousing
45 pages
Unit-I PPT DWDM
No ratings yet
Unit-I PPT DWDM
90 pages
Business Intelligence
No ratings yet
Business Intelligence
17 pages
Data Warehousing and Data Mining: Downloaded From
No ratings yet
Data Warehousing and Data Mining: Downloaded From
94 pages
What Is a Data Warehouse
No ratings yet
What Is a Data Warehouse
9 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
Ccs341 Data Warehousing All Units
No ratings yet
Ccs341 Data Warehousing All Units
86 pages
Unit 1
No ratings yet
Unit 1
22 pages
Topic 8 - Intro to Data Warehouse
No ratings yet
Topic 8 - Intro to Data Warehouse
40 pages
Data Warehousing: Made By-Bhanu Priya
No ratings yet
Data Warehousing: Made By-Bhanu Priya
10 pages
Data Ware Housing1
No ratings yet
Data Ware Housing1
18 pages
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
No ratings yet
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
96 pages
Module 3 - Datawarehousing
No ratings yet
Module 3 - Datawarehousing
45 pages
DATA WAREHOUSE
No ratings yet
DATA WAREHOUSE
143 pages
Renganayagi Varatharaj College of Engineering Data Warehousing and Data Mining Notes (Unit I and Ii)
No ratings yet
Renganayagi Varatharaj College of Engineering Data Warehousing and Data Mining Notes (Unit I and Ii)
29 pages
Unit 1 DWDM
No ratings yet
Unit 1 DWDM
122 pages
Unit II-DM
No ratings yet
Unit II-DM
54 pages
Finalpresentation 111220200340 Phpapp01
No ratings yet
Finalpresentation 111220200340 Phpapp01
18 pages
104661
No ratings yet
104661
33 pages
Datawarehousing&Datamining: R.Kartheek B.Tech-Iii RD I.T V.R.S College, Chirala
No ratings yet
Datawarehousing&Datamining: R.Kartheek B.Tech-Iii RD I.T V.R.S College, Chirala
18 pages
Data Mining and Data Warehouse: Raju - Qis@yahoo - Co.in Praneeth - Grp@yahoo - Co.in
No ratings yet
Data Mining and Data Warehouse: Raju - Qis@yahoo - Co.in Praneeth - Grp@yahoo - Co.in
8 pages
2 Data Warehousing Components L3 L4 L5
No ratings yet
2 Data Warehousing Components L3 L4 L5
26 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
Assignment 1
No ratings yet
Assignment 1
15 pages
Copy of 2m unit3
No ratings yet
Copy of 2m unit3
5 pages
DWDM Notes 5 Units
No ratings yet
DWDM Notes 5 Units
110 pages
Data Warehousing Components - L3 - L4 - L5
No ratings yet
Data Warehousing Components - L3 - L4 - L5
26 pages
Data Warehouse
No ratings yet
Data Warehouse
56 pages
Data Warehousing and Business Intelligence
No ratings yet
Data Warehousing and Business Intelligence
8 pages
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
From Everand
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
Robert Johnson
No ratings yet
Information Storage and Management: Storing, Managing, and Protecting Digital Information in Classic, Virtualized, and Cloud Environments
From Everand
Information Storage and Management: Storing, Managing, and Protecting Digital Information in Classic, Virtualized, and Cloud Environments
EMC Education Services
No ratings yet
UNIT 1 Fundamentals of Networks Design
No ratings yet
UNIT 1 Fundamentals of Networks Design
55 pages
LOCAL AREA NETWORKS Unit 2
No ratings yet
LOCAL AREA NETWORKS Unit 2
52 pages
JESU DAA-Unit 1
No ratings yet
JESU DAA-Unit 1
106 pages
Scsa1401 - Ooase - Unit 4
No ratings yet
Scsa1401 - Ooase - Unit 4
51 pages
Scsa1401 - Ooase - Unit 3
No ratings yet
Scsa1401 - Ooase - Unit 3
53 pages
Unit-4-Architecture of 8086 PPT2
No ratings yet
Unit-4-Architecture of 8086 PPT2
126 pages
Unit 5-Data Transfer Instruction
No ratings yet
Unit 5-Data Transfer Instruction
34 pages
Unit 5-8051 Memory Organization
No ratings yet
Unit 5-8051 Memory Organization
33 pages
Scsa1301 Dbms Unit-3
No ratings yet
Scsa1301 Dbms Unit-3
59 pages
Kendriya Vidyalaya Sangathan, Lucknow Region
No ratings yet
Kendriya Vidyalaya Sangathan, Lucknow Region
9 pages
SCSA1104 Unit 1
No ratings yet
SCSA1104 Unit 1
25 pages
KVS Chandigarh Region Paper
No ratings yet
KVS Chandigarh Region Paper
9 pages
Phishing Attacks Detection and Prevention.: Dr. A. L. Sreenivaslu
No ratings yet
Phishing Attacks Detection and Prevention.: Dr. A. L. Sreenivaslu
34 pages
Sample Thesis Proposal Computer Science
100% (2)
Sample Thesis Proposal Computer Science
6 pages
Did Codes Manual Komatsu 930e-4
No ratings yet
Did Codes Manual Komatsu 930e-4
50 pages
IoT Based Humidity and Temperature Monitoring Using Arduino Uno
No ratings yet
IoT Based Humidity and Temperature Monitoring Using Arduino Uno
9 pages
Correspondence Analysis
No ratings yet
Correspondence Analysis
19 pages
CPP_proposal
No ratings yet
CPP_proposal
44 pages
Allied Control Valve Sheet
No ratings yet
Allied Control Valve Sheet
4 pages
Identity in A Technological Society
50% (2)
Identity in A Technological Society
6 pages
Iron Kid
100% (1)
Iron Kid
2 pages
Up or Down? A Male Economist's Manifesto On The Toilet Seat Etiquette
No ratings yet
Up or Down? A Male Economist's Manifesto On The Toilet Seat Etiquette
18 pages
Python Using AI Workshop Notes
No ratings yet
Python Using AI Workshop Notes
21 pages
Whole Number 2
No ratings yet
Whole Number 2
3 pages
Manual CM Ia en
No ratings yet
Manual CM Ia en
5 pages
Reactive Power Management and Voltage Stability
100% (1)
Reactive Power Management and Voltage Stability
32 pages
Dsu Partb
No ratings yet
Dsu Partb
20 pages
ARIS Guide For Nonmodelers
No ratings yet
ARIS Guide For Nonmodelers
2 pages
Blockchain 1st Edition Rajdeep Chakraborty pdf download
100% (7)
Blockchain 1st Edition Rajdeep Chakraborty pdf download
54 pages
Sap-C Hcmpay2203
No ratings yet
Sap-C Hcmpay2203
15 pages
IoT-Enabled_Modern_Parenting_with_Infant_Guard
No ratings yet
IoT-Enabled_Modern_Parenting_with_Infant_Guard
6 pages
document (3)
No ratings yet
document (3)
17 pages
50+ NEXT JS Interview Questions and Answers _ Updated 2025
No ratings yet
50+ NEXT JS Interview Questions and Answers _ Updated 2025
15 pages
IT Controls Part II: Security and Access: Accounting Information Systems, 5
No ratings yet
IT Controls Part II: Security and Access: Accounting Information Systems, 5
39 pages
a-golf-ball-launcher-as-a-sophomore-design-project
No ratings yet
a-golf-ball-launcher-as-a-sophomore-design-project
15 pages
Activity Sheet Empowerment Technologies Module 1
No ratings yet
Activity Sheet Empowerment Technologies Module 1
3 pages
Mammeri Brahim - CV
No ratings yet
Mammeri Brahim - CV
2 pages
The Categories of Neural Network Learning Rules
No ratings yet
The Categories of Neural Network Learning Rules
7 pages
Chart Types
No ratings yet
Chart Types
20 pages
Comparing Sample Proportion and Population Proportion
No ratings yet
Comparing Sample Proportion and Population Proportion
16 pages

Unit-II Data Warehousing

Uploaded by

Unit-II Data Warehousing

Uploaded by

SCSA3001

Subject Name: Data Mining & Data Warehousing

Subject Oriented: Data that gives information about a particular

Enterprise Data warehouse:

Operational and informational Data Operation al Data:

➢The data warehouse architecture is based on the data base

➢Transformation process involves conversion, summarization,

Data sourcing, cleanup, transformation, and migration tools

The Functionalities of these tools are listed below:

➢The rules used to perform clean up, and data enhancement

Data ware house database

Data marts It is inexpensive tool and alternative to the data ware

Data query, reporting ,analysis, and mining tools Its purpose is to

Query and reporting tools: Used to generate query and report

Application development tools: This is a graphical data access

Data ware house administration and management

Information delivery system

Data extraction, clean up, transformation and migration:

Data placement strategies

Benefits of data warehousing

The multi-Dimensional Data Model is a method which is used for

It represents data in the form of data cubes. Data cubes allow to

Working on a Multidimensional Data Model

Stage 1 : Assembling data from the client : In first stage, a Multi

Stage 3 : Noticing the different proportions : In the third stage,

Stage 5 : Finding the actuality of factors which are listed

Let us take the example of the data of a factory which sells

In the above given presentation, the factory’s sales for Bangalore

This data can be represented in the form of three dimensions

Advantages of Multi Dimensional Data Model

Disadvantages of Multi Dimensional Data Model

OLAP OPERATIONS IN THE MULTIDIMENSIONAL DATA MODEL

OLAP stands for Online Analytical Processing Server. It is a

Roll up: It is just opposite of the drill-down operation. It performs

Slice: It selects a single dimension from the OLAP cube which

Pivot: It is also known as rotation operation as it rotates the

Data Warehouses usually have a three-level (tier) architecture that

Data from operational databases and external sources (such as

The metadata repository stores information that defines DW

1. System performance data, which includes indices, used to

Data Quality Management

Schema is a logical description of the entire database. It includes

➢There is a fact table at the center. It contains the keys to each of

Now the item dimension table contains the attributes item_key,

It is also possible to share dimension tables between fact tables.

Basic analytical operations of OLAP

Type of OLAP Explanation

In Desktop OLAP, a user downloads a part of the

OLAP (Online analytical OLTP (Online transaction

Consists of historical data from Consists only operational

It is subject oriented. Used for

The data is used in planning,

The size of the data is relatively

Relatively slow as the amount of

It only need backup from time Backup and recovery process is

This data is generally managed This data is managed by clerks,

Only read and rarely write

Architecture for On-Line Analytical Mining: An OLAM server

A metadata directory is used to guide the access of the data cube.

You might also like