0% found this document useful (0 votes)

21 views25 pages

VV - Data Warehousing and Data Mining

A data warehouse is a centralized repository for storing and managing large amounts of data from various sources, enhancing analysis and reporting for improved business intelligence. It provides benefits such as data security, scalability, and access to historical insights, facilitating data-driven decision-making. The document also discusses metadata, data modeling, and the importance of aggregate tables in optimizing query performance.

Uploaded by

p bb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views25 pages

VV - Data Warehousing and Data Mining

Uploaded by

p bb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Data Warehousing and Data Mining

Q1) What is a Datawarehouse and why do we need it? BENEFITS.

A data warehouse is a secure, centralized repository for storing and managing large amounts of data from various sources.
A data warehouse is usually used for linking and analyzing heterogeneous sources of business data.

The data warehouse is the center of the data collection and reporting framework developed for the BI system.

NEED
Enhancing the turnaround time for analysis and reporting
Improved Business Intelligence
Benefit of historical data
Standardization of data
Immense ROI (Return On Investment)

BENEFITS
Improved Data Security
Scalability
Access to Historical Insights
Works On-Premises and on Cloud
Q2) 1.3 Data Warehouse and its Need

1. What is it?: A centralized repository for storing data from multiple sources for reporting and analysis.
2. Need in Modern Business: Enables data-driven decision-making and provides a 360-degree view of the business.

Q3) DESIGN APPROACH

BottomUp Approach
Integrated

Time Variant
Q) OLAP vs OLTP
GRANULARITY
Implies levels of details of the data.

Less Granularity More Details Less Summary Fine Granularity

More Granularity Less Details More Summary Gross Granularity

META DATA

In a general sense, metadata is "data about data." It describes the structure, format, and characteristics of the data, enabling effective
management and usage.

In the Context of a Data Warehouse

In a data warehousing environment, metadata takes on a more specialized role. It serves as the roadmap or directory that helps users and
applications interact with the data in the warehouse. It can contain information about:

1. Data Source: Describes where the data comes from, including database names, tables, and columns.
2. Data Transformations: Records any changes made to the data during the ETL (Extract, Transform, Load) process, such as data cleansing,
aggregation, or enrichment.
3. Data Structure: Describes the schema, tables, and fields in the warehouse. This can include field definitions, data types, and
relationships between tables.
4. Business Metadata: Includes definitions, business rules, and lineage to make the data understandable and usable by business users.
5. Operational Metadata: Information about batch loads, query performance statistics, and data usage metrics.
6. Data Lineage: Information about how data flows through the system, useful for troubleshooting and impact analysis.

Importance
1. Data Understanding: Helps users understand what data is available and how to use it.
2. Data Governance: Assists in maintaining data quality, lineage, and security.
3. Query Optimization: Utilized by the system to optimize query performance.
4. Compliance: Important for meeting regulatory requirements related to data management and usage.

In summary, metadata in a data warehouse provides a crucial layer of information that facilitates both the effective use of data by end-users
and the efficient operation of the data warehouse itself.

DATA DICTIONARY VS META DATA

Both data dictionaries and metadata serve the purpose of providing additional information about data, but they are used in different contexts
and for different scopes.

Aspect Data Dictionary Metadata

A centralized repository of
information about data such as Data about data, describing the
meaning, relationships, origin, and structure, type, and characteristics
Definition usage. of the data.
Aspect Data Dictionary Metadata

Primarily focuses on database

objects like tables, columns, keys, Broader in scope, can pertain to
and indexes within a specific any type of data including files,
Scope database. images, and configurations.

Used in various contexts including

Mostly used in the context of databases, data warehouses, file
Context relational databases. systems, and more.

Database administrators, Database administrators,

developers, and sometimes end- developers, data analysts, and
Users users. sometimes automated systems.

Can include data lineage,

Contains names, definitions, and transformations, source systems,
Content attributes of database objects. and operational metadata.

Purpose Aids in database design, Facilitates data management,

governance, and usage across
Aspect Data Dictionary Metadata

maintenance, and documentation. different systems and applications.

Usually accessible through specific Could be embedded within the

database management system data or accessible through
Accessibility tools. separate metadata repositories.

Update Generally static, updated when Can be dynamic, updated as data

Frequency database schema changes. is transformed, moved, or used.

Example:

 Data Dictionary: In a customer database, a data dictionary will specify that the CustomerID column is an integer, serves as a primary
key, and is auto-incremented. It may also specify constraints and relationships with other tables.
 Metadata: In the context of a data warehouse, metadata might indicate that the CustomerID field is sourced from the "Sales" database
and transformed by removing leading zeros during the ETL process.

In summary, while both serve to describe data, a data dictionary is more specific to the structure of a database, whereas metadata is a broader
term that covers additional aspects of data including its lineage, transformations, and usage across different systems and applications.
CHAPTER 2

2) No Data Preprocessing before loading

1) There is data preprocessing

BUT
Has both.
A data mart is a subject-oriented relational database that stores transactional data in rows and columns, which makes it easy to access,
organize, and understand.
UNIT 3 DIMENSIONAL MODELLING

Q) WHAT IS A DATA MODEL?

A data model is a representation of how data is stored in a database and it is usually a diagram of the few tables and the relationships that exist
between them.

Q) WHAT IS DIMENSIONAL MODELLING?

Dimensional modeling is a data model design adopted when building a data warehouse. Simply, it can be understood that dimension modeling
reduces the response time of query fired unlike relational systems.
STAR SCHEMA
SNOWFLAKE SCHEMA

Snowflake schema is the extension of star schema which adds more dimensions to give more meaning to the logical view of the database.
These additional tables are more normalized than star schema.

The snowflake model is the conclusion of decomposing one or more of the dimensions. Snowflake Schema in data warehouse is a logical
arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape.

3.6.1 Features of Snowflake Schema

Following are the important features of snowflake schema:
1. It has normalized tables
2. Occupy less disk space.
3. It requires more lookup time as many tables are interconnected and extending dimensions.
AGGREGATE TABLES
Aggregate fact tables roll up the basic fact tables of the schema to improve the query processing. The BI tools smoothly select the level of
aggregation to improve the query performance. Aggregate fact tables contain foreign keys referring to dimension tables.

Points to note about Aggregate tables:

1) It is also called summary tables.
2) It contains pre-computed queries of the data warehouse schema.
3) It reduces the dimensionality of the base fact tables.
4) It can be used to respond to the queries of the dimensions that are saved.

NEED FOR BUILDING AGGREGATE FACT TABLES

1) Reduction in query processing time

2) Readymade composite queries of DW-Schema so the connection is faster.

ETL Testing - PPT
No ratings yet
ETL Testing - PPT
77 pages
First Data WarehouseAima First Final Updated 9 Sep 2016
No ratings yet
First Data WarehouseAima First Final Updated 9 Sep 2016
188 pages
Unit-1 Lecture Notes
100% (1)
Unit-1 Lecture Notes
43 pages
Data Warehouse and Data Mining Notes
No ratings yet
Data Warehouse and Data Mining Notes
66 pages
Data Mining
No ratings yet
Data Mining
98 pages
Module 1 Data Warehousing Fundamentals
No ratings yet
Module 1 Data Warehousing Fundamentals
17 pages
Data Modeling Principles
100% (1)
Data Modeling Principles
21 pages
DWDM Concept Demonstration
No ratings yet
DWDM Concept Demonstration
102 pages
Data Modeling: Agnivesh Kumar
100% (1)
Data Modeling: Agnivesh Kumar
21 pages
Unit-2 DW
No ratings yet
Unit-2 DW
26 pages
Data Warehouse
No ratings yet
Data Warehouse
81 pages
Antim Prahar Business Data Warehousing Data Mining 2024
No ratings yet
Antim Prahar Business Data Warehousing Data Mining 2024
65 pages
DW Part A Part B Notes
No ratings yet
DW Part A Part B Notes
69 pages
Unit 2 Updated
No ratings yet
Unit 2 Updated
50 pages
All Sec DWDM
No ratings yet
All Sec DWDM
48 pages
Datascience Unit 02 1
No ratings yet
Datascience Unit 02 1
53 pages
Data Warehouse
No ratings yet
Data Warehouse
85 pages
Data Warehouse - BSA 1st Year For BCA
No ratings yet
Data Warehouse - BSA 1st Year For BCA
20 pages
DWDM Unit 1 Notes
No ratings yet
DWDM Unit 1 Notes
41 pages
Unit - 2 Data Warehouse
No ratings yet
Unit - 2 Data Warehouse
55 pages
Sec A and B DWDM
No ratings yet
Sec A and B DWDM
31 pages
Ch4 DW Detailed Version
No ratings yet
Ch4 DW Detailed Version
39 pages
Data Mining UNIT 2 LECTURE NOTES
No ratings yet
Data Mining UNIT 2 LECTURE NOTES
32 pages
Data Warehousing, Business Analytics and Online Analytical - 1
No ratings yet
Data Warehousing, Business Analytics and Online Analytical - 1
35 pages
Data Warehouse - Concept and Fundamentals: Sridevi
No ratings yet
Data Warehouse - Concept and Fundamentals: Sridevi
25 pages
???? ?????????
No ratings yet
???? ?????????
22 pages
Data Warehousing Fundamentals - Unit 1
No ratings yet
Data Warehousing Fundamentals - Unit 1
26 pages
21IS503 UnitI LM1
No ratings yet
21IS503 UnitI LM1
28 pages
R20-DMT Unit-I
No ratings yet
R20-DMT Unit-I
24 pages
Data Warehousing - CH2
No ratings yet
Data Warehousing - CH2
26 pages
Lect 14 DM
No ratings yet
Lect 14 DM
33 pages
FDS Unit 2
No ratings yet
FDS Unit 2
21 pages
Data Mining
No ratings yet
Data Mining
25 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
Assignment-1 DWH
No ratings yet
Assignment-1 DWH
13 pages
DWDM Lecturenotes PDF
No ratings yet
DWDM Lecturenotes PDF
133 pages
CURVIC1
No ratings yet
CURVIC1
48 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
DWM QB Soln
No ratings yet
DWM QB Soln
18 pages
MetaData and Its Classification f6qQfIZTfw
No ratings yet
MetaData and Its Classification f6qQfIZTfw
14 pages
04 Handout 1
No ratings yet
04 Handout 1
12 pages
Dataware House Unit-1 Continued
No ratings yet
Dataware House Unit-1 Continued
12 pages
Data Dictionary
No ratings yet
Data Dictionary
11 pages
DWM Mod 1
No ratings yet
DWM Mod 1
17 pages
CS 8520: Artificial Intelligence: Knowledge Representation
No ratings yet
CS 8520: Artificial Intelligence: Knowledge Representation
30 pages
CSE231 - Lecture 5
No ratings yet
CSE231 - Lecture 5
33 pages
Data Warehousing Unit 1,2
No ratings yet
Data Warehousing Unit 1,2
9 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
DMDW 7
No ratings yet
DMDW 7
30 pages
DW Concepts Shiva
No ratings yet
DW Concepts Shiva
32 pages
CDM - Class 5,6,7
No ratings yet
CDM - Class 5,6,7
8 pages
ch4 DW Summary
No ratings yet
ch4 DW Summary
8 pages
Data Warehouse and Mining
No ratings yet
Data Warehouse and Mining
7 pages
Data Warehouse 1
No ratings yet
Data Warehouse 1
6 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
Katalog - Body Vzplanuti PDF
No ratings yet
Katalog - Body Vzplanuti PDF
44 pages
What Is Data Warehouse?: Explanatory Note
No ratings yet
What Is Data Warehouse?: Explanatory Note
10 pages
MCSL-223 2022 23
No ratings yet
MCSL-223 2022 23
20 pages
DW Basic Questions
No ratings yet
DW Basic Questions
9 pages
221
No ratings yet
221
2 pages
Data Warehousing Basics Interview Questions
No ratings yet
Data Warehousing Basics Interview Questions
24 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
7 pages
Chapter - 6 OB Personality Re
No ratings yet
Chapter - 6 OB Personality Re
24 pages
Unit-2 Linear Programming Problem
No ratings yet
Unit-2 Linear Programming Problem
39 pages
De Eep Tem Peratur Re Freez Er: S Service Manuall
No ratings yet
De Eep Tem Peratur Re Freez Er: S Service Manuall
21 pages
Need of Two Types of Data: Information
No ratings yet
Need of Two Types of Data: Information
7 pages
Autocad 2D: Dimensioning - Part 2
No ratings yet
Autocad 2D: Dimensioning - Part 2
18 pages
Act No. 2 of 2021the Cyber Security and Cyber Crimes
No ratings yet
Act No. 2 of 2021the Cyber Security and Cyber Crimes
49 pages
Project Report
No ratings yet
Project Report
23 pages
A3 Operation Instruction 1.1
No ratings yet
A3 Operation Instruction 1.1
63 pages
PCS902S 21L
No ratings yet
PCS902S 21L
5 pages
Chapter 1 - OB Re
No ratings yet
Chapter 1 - OB Re
30 pages
Chapter 5 Analog Transmission PDF
No ratings yet
Chapter 5 Analog Transmission PDF
6 pages
SLM Entrepreneurship Dev Innovation MGMT
No ratings yet
SLM Entrepreneurship Dev Innovation MGMT
122 pages
Block 4
No ratings yet
Block 4
88 pages
Unit-3 Transportation Problem
No ratings yet
Unit-3 Transportation Problem
60 pages
DW 2
No ratings yet
DW 2
66 pages
Unit-6 Network Analysis
No ratings yet
Unit-6 Network Analysis
38 pages
Chapter - 4 Ob Perception Re
No ratings yet
Chapter - 4 Ob Perception Re
23 pages
Unit 2
No ratings yet
Unit 2
47 pages
DW 1
No ratings yet
DW 1
64 pages
Unit 3
No ratings yet
Unit 3
35 pages
Patents Database
No ratings yet
Patents Database
126 pages
Day # 18 (2 Past Papers)
No ratings yet
Day # 18 (2 Past Papers)
32 pages
VV Uml
No ratings yet
VV Uml
49 pages
Unit 1
No ratings yet
Unit 1
28 pages
Upload A Document - Scribd
No ratings yet
Upload A Document - Scribd
4 pages
IITM Thesis Format
No ratings yet
IITM Thesis Format
12 pages
Chapter - 8 OB Motivation Re
No ratings yet
Chapter - 8 OB Motivation Re
26 pages
Section 1
No ratings yet
Section 1
13 pages
Brochure Inpage
No ratings yet
Brochure Inpage
2 pages
Draft Amrita Institute Prospectus
No ratings yet
Draft Amrita Institute Prospectus
28 pages
Chapter - 5 OB Re
No ratings yet
Chapter - 5 OB Re
16 pages
Technical Answers For Real World Problems (TARP) CSE-3999: Assessment - 3
No ratings yet
Technical Answers For Real World Problems (TARP) CSE-3999: Assessment - 3
9 pages
Marketing Cell, BTCL.: Bangladesh Telecommunications Company Limited
No ratings yet
Marketing Cell, BTCL.: Bangladesh Telecommunications Company Limited
42 pages
DX Diag
No ratings yet
DX Diag
31 pages
Chapter 1 - OB
No ratings yet
Chapter 1 - OB
3 pages
Link Prediction in Multilayer Networks Via Cross-Network Embedding
No ratings yet
Link Prediction in Multilayer Networks Via Cross-Network Embedding
9 pages
Excercise Solution 3-5
No ratings yet
Excercise Solution 3-5
5 pages
c0594434 Section 5 Part 2
No ratings yet
c0594434 Section 5 Part 2
4 pages
310 Unit
No ratings yet
310 Unit
8 pages
Complete Global Private Label Forex Offer Solution
No ratings yet
Complete Global Private Label Forex Offer Solution
12 pages
16 Recommender Systems PDF
No ratings yet
16 Recommender Systems PDF
6 pages
ENPH110
No ratings yet
ENPH110
3 pages
JD - MIS Data Scientist
No ratings yet
JD - MIS Data Scientist
2 pages
Solutions To Assignment 2: Problem 1: Smallest Error in Differentiation
No ratings yet
Solutions To Assignment 2: Problem 1: Smallest Error in Differentiation
3 pages
Mustafa Awni CV PDF
No ratings yet
Mustafa Awni CV PDF
1 page
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet

VV - Data Warehousing and Data Mining

Uploaded by

VV - Data Warehousing and Data Mining

Uploaded by

Data Warehousing and Data Mining

Q1) What is a Datawarehouse and why do we need it? BENEFITS.

Q3) DESIGN APPROACH

Less Granularity More Details Less Summary Fine Granularity

In the Context of a Data Warehouse

DATA DICTIONARY VS META DATA

Aspect Data Dictionary Metadata

Primarily focuses on database

Used in various contexts including

Database administrators, Database administrators,

Can include data lineage,

Purpose Aids in database design, Facilitates data management,

maintenance, and documentation. different systems and applications.

Usually accessible through specific Could be embedded within the

Update Generally static, updated when Can be dynamic, updated as data

2) No Data Preprocessing before loading

1) There is data preprocessing

Q) WHAT IS A DATA MODEL?

Q) WHAT IS DIMENSIONAL MODELLING?

3.6.1 Features of Snowflake Schema

Points to note about Aggregate tables:

NEED FOR BUILDING AGGREGATE FACT TABLES

1) Reduction in query processing time

You might also like