0% found this document useful (0 votes)

10 views6 pages

DWDM Unit-2

Data Warehouse and Data Mining unit 2 notes

Uploaded by

Pramath Pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

DWDM Unit-2

Data Warehouse and Data Mining unit 2 notes

Uploaded by

Pramath Pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DWDM UNIT-2

Summary Table: Components Of Data Mining Architecture

Component Description

Data Sources Raw input data repositories

Data Warehouse Centralized data storage for analysis

Data Preprocessing Cleaning, transforming, and selecting data

Data Mining Engine Core engine that applies mining algorithms

Pattern Evaluation Filters and evaluates interesting patterns

Knowledge Base Domain knowledge and metadata support

User Interface Interaction layer for users

Market Basket Analysis
A data mining technique that is used to uncover purchase patterns in any retail setting is
known as Market Basket Analysis. Basically, market basket analysis in data mining
involves analyzing the combinations of products that are bought together.

This is a technique that gives the careful study of purchases done by a customer in a
supermarket. This concept identifies the pattern of frequent purchase items by
customers. This analysis can help to promote deals, offers, sale by the companies, and
data mining techniques helps to achieve this analysis task.

Types of Market Basket Analysis

There are three types of Market Basket Analysis. They are as follow:
1. Descriptive market basket analysis: This sort of analysis looks for patterns and
connections in the data that exist between the components of a market basket.
This kind of study is mostly used to understand consumer behavior, including
what products are purchased in combination and what the most typical item
combinations. Retailers can place products in their stores more profitably by
understanding which products are frequently bought together with the aid of
descriptive market basket analysis.

2. Predictive Market Basket Analysis: Market basket analysis that predicts future
purchases based on past purchasing patterns is known as predictive market
basket analysis. Large volumes of data are analyzed using machine learning
algorithms in this sort of analysis in order to create predictions about which
products are most likely to be bought together in the future. Retailers may make
data-driven decisions about which products to carry, how to price them, and
how to optimize shop layouts with the use of predictive market basket research.

3. Differential Market Basket Analysis: Differential market basket analysis

analyses two sets of market basket data to identify variations between them.
Comparing the behavior of various client segments or the behavior of customers
over time is a common usage for this kind of study. Retailers can respond to
shifting consumer behavior by modifying their marketing and sales tactics with
the help of differential market basket analysis.

Benefits of Market Basket Analysis

1. Enhanced Customer Understanding

2. Improved Inventory Management

3. Better Pricing Strategies

4. Sales Growth

Measures Of Central Tendency

Parallel Processors and Cluster Systems in Data Warehouse Process
Technology

In data warehouse technology, Parallel Processors and Cluster Systems play a

crucial role in enhancing performance, scalability, and reliability.

Parallel Processing involves dividing large tasks into smaller sub-tasks and
executing them simultaneously across multiple processors. This significantly
speeds up data loading, querying, and analysis in data warehouses. Parallel
processing systems include:

• Shared Memory Systems, where all processors share a global memory.

• Shared Nothing Systems, where each processor has its own memory and disk,
suitable for large-scale data warehousing.

Cluster Systems consist of multiple interconnected computers (nodes) that work

together as a single system. These are cost-effective and offer high availability. If
one node fails, others can take over, ensuring uninterrupted operations. Clusters
support load balancing and failover mechanisms, which are essential for
managing large volumes of data in real time.

Both technologies allow data warehouses to handle massive datasets efficiently,

support complex queries, and scale horizontally by adding more processors or
nodes. They are essential for modern Business Intelligence (BI) and Big Data
analytics platforms.

In summary, parallel processors and cluster systems form the backbone of high-
performance data warehouses, enabling fast, reliable, and scalable data
processing.
Warehousing Software and Warehouse Schema Design in Data Warehouse
Process Technology

Warehousing Software provides the tools needed to build, manage, and access a
data warehouse. It supports data integration, extraction, transformation, loading
(ETL), query processing, and reporting. Popular warehousing software includes
Microsoft SQL Server, Oracle Warehouse Builder, Informatica, and Snowflake. These
tools ensure efficient data storage, real-time access, and business intelligence
support.

Warehouse Schema Design refers to how data is logically structured in the

warehouse. It affects query performance and data organization. The main types of
schemas are:

• Star Schema: Central fact table linked to multiple dimension tables. Simple and
fast for queries.

• Snowflake Schema: Extension of star schema with normalized dimensions.

Saves storage but more complex.

• Galaxy (Fact Constellation) Schema: Contains multiple fact tables sharing

dimension tables. Used for complex applications.

Effective schema design ensures efficient data retrieval, minimizes redundancy, and
improves performance.

In summary, warehousing software handles the technical operations of the

warehouse, while schema design structures the data logically for optimized access
and analysis. Both are essential for a functional and scalable data warehouse
system.

a. Differentiate between:

(i) Min-Max Normalization vs Z-score Normalization

Aspect Min-Max Normalization Z-score Normalization

Transforms data based on

Scales data to a fixed range, usually
Definition mean and standard
[0, 1]
deviation

X′=X−XminXmax−XminX' = \frac{X -
Z=X−μσZ = \frac{X -
Formula X_{min}}{X_{max} - X_{min}}X′=Xmax
\mu}{\sigma}Z=σX−μ
−XminX−Xmin
Aspect Min-Max Normalization Z-score Normalization

When data distribution

Use Case When data range is known and fixed
needs to be standardized

If X = 80, min = 50, max = 100 ⇒ X' = If X = 80, μ = 70, σ = 10 ⇒ Z

Example
(80-50)/(100-50) = 0.6 = (80-70)/10 = 1.0

(ii) Binary Data Variables vs Nominal Data Variables

Aspect Binary Variables Nominal Variables

Variables with only two Categorical variables with more

Definition
values (0 or 1) than two values

True/False, Yes/No, Red, Blue, Green; Apple, Orange,

Values
Male/Female Banana

Example Gender: Male (1), Female (0) Color: Red, Green, Blue

Semiconductor Optoelectronic Devices - Bhattacharya, Pallab - 1997
100% (3)
Semiconductor Optoelectronic Devices - Bhattacharya, Pallab - 1997
650 pages
Data Preparation For Analytics Using SAS
100% (1)
Data Preparation For Analytics Using SAS
440 pages
Pptcs 1661
No ratings yet
Pptcs 1661
38 pages
Ch.3 Data Preprocessing
No ratings yet
Ch.3 Data Preprocessing
16 pages
206 Data Mining
No ratings yet
206 Data Mining
28 pages
Dmbi Ia2 Ans
No ratings yet
Dmbi Ia2 Ans
17 pages
Lec 02
No ratings yet
Lec 02
33 pages
A Material
No ratings yet
A Material
191 pages
What Is Data Mining
No ratings yet
What Is Data Mining
10 pages
Data Mining Questions
No ratings yet
Data Mining Questions
2 pages
AIML Unit 2 Understanding Data
No ratings yet
AIML Unit 2 Understanding Data
51 pages
Data Mining
No ratings yet
Data Mining
130 pages
Unit 2
No ratings yet
Unit 2
144 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
Unit - 1
No ratings yet
Unit - 1
32 pages
DM Question Bank
No ratings yet
DM Question Bank
50 pages
Patterns Mined +frequent Patterns
No ratings yet
Patterns Mined +frequent Patterns
18 pages
Adbs Unit IV
No ratings yet
Adbs Unit IV
34 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Data Mining-Unit-1
No ratings yet
Data Mining-Unit-1
21 pages
The Need of Data Analysis
No ratings yet
The Need of Data Analysis
12 pages
Bana1 Visualization
No ratings yet
Bana1 Visualization
22 pages
Unit 2
No ratings yet
Unit 2
37 pages
Data Mining
No ratings yet
Data Mining
14 pages
Business Intelligence, Data Analytics and Reporting Training
No ratings yet
Business Intelligence, Data Analytics and Reporting Training
5 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
R21 DM Unit1
No ratings yet
R21 DM Unit1
77 pages
Unit 4
No ratings yet
Unit 4
42 pages
Unit 1
No ratings yet
Unit 1
21 pages
Class Test-1: Manpreet Singh 2K19/DMBA/48 Ans 1)
No ratings yet
Class Test-1: Manpreet Singh 2K19/DMBA/48 Ans 1)
2 pages
Unit - 2 Data Warehouse
No ratings yet
Unit - 2 Data Warehouse
11 pages
Unit 3 PPT (BA)
No ratings yet
Unit 3 PPT (BA)
19 pages
Dmbi
No ratings yet
Dmbi
9 pages
Unit - I
No ratings yet
Unit - I
65 pages
Chapter-1 Introduction To Data Analytics
No ratings yet
Chapter-1 Introduction To Data Analytics
34 pages
BOOK Forest Business Management by Girish
100% (3)
BOOK Forest Business Management by Girish
155 pages
Unit 3 Data Warehousing and Data Mining
No ratings yet
Unit 3 Data Warehousing and Data Mining
7 pages
DAI101 4 Data Preparation
No ratings yet
DAI101 4 Data Preparation
45 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Data Mining 1 2 and 3
No ratings yet
Data Mining 1 2 and 3
20 pages
Data Warehousing and Data Mining: DR Seema Agarwal
No ratings yet
Data Warehousing and Data Mining: DR Seema Agarwal
72 pages
6 1 DWM 2019 S
No ratings yet
6 1 DWM 2019 S
7 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
1.3 Tasks of Data Mining
No ratings yet
1.3 Tasks of Data Mining
10 pages
Solutions To DM I MID (A)
100% (1)
Solutions To DM I MID (A)
19 pages
Chapter-04-Analisis Dan Drfinisi Kebutuhan Datawarehouse
No ratings yet
Chapter-04-Analisis Dan Drfinisi Kebutuhan Datawarehouse
56 pages
Techniques Used in BI: Data Visualization
No ratings yet
Techniques Used in BI: Data Visualization
2 pages
Down 2
No ratings yet
Down 2
61 pages
Characteristics of The Data: Unprocessed, Unorganised and Discrete
No ratings yet
Characteristics of The Data: Unprocessed, Unorganised and Discrete
4 pages
DCS802DataMiningProject PDF
No ratings yet
DCS802DataMiningProject PDF
10 pages
Bana Midterm
No ratings yet
Bana Midterm
6 pages
DWM Chp2 Notes
No ratings yet
DWM Chp2 Notes
21 pages
Business Cases and Benefits Management
100% (2)
Business Cases and Benefits Management
66 pages
Data Analytics (Finished
No ratings yet
Data Analytics (Finished
4 pages
Abinitio Vijay - 8553385664
No ratings yet
Abinitio Vijay - 8553385664
28 pages
Data Warehousing and Data Mining - Thara - M.Tech Cse
No ratings yet
Data Warehousing and Data Mining - Thara - M.Tech Cse
11 pages
Yellow Jambhala Cultivation Booklet PDF
No ratings yet
Yellow Jambhala Cultivation Booklet PDF
15 pages
Exploratory Data Analysis (Eda)
No ratings yet
Exploratory Data Analysis (Eda)
10 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Business Analytics (MIS171) Summary Notes
No ratings yet
Business Analytics (MIS171) Summary Notes
6 pages
Proposal
No ratings yet
Proposal
5 pages
Business Intelligence For Big Data Analytics
No ratings yet
Business Intelligence For Big Data Analytics
8 pages
ICT Security Risk Management
100% (1)
ICT Security Risk Management
25 pages
Apa Itu Capacity Management
No ratings yet
Apa Itu Capacity Management
5 pages
Name: Alex Tito R. Corporal Year/Course: Btvted Elx 2A
No ratings yet
Name: Alex Tito R. Corporal Year/Course: Btvted Elx 2A
3 pages
10 04 2023 - 17 07 44 - Crash
No ratings yet
10 04 2023 - 17 07 44 - Crash
15 pages
Naviswork Manage
No ratings yet
Naviswork Manage
31 pages
English Assignment 1
No ratings yet
English Assignment 1
30 pages
RDCAM Software Installation Manual
100% (1)
RDCAM Software Installation Manual
25 pages
anization+Theory+and+Public+Administration Final
No ratings yet
anization+Theory+and+Public+Administration Final
27 pages
Script Foundation Day
No ratings yet
Script Foundation Day
2 pages
Probabilistic Seismic Hazard Assessments For Myanmar and Its Metropolitan Areas Yang - Et - Al-2023-Geoscience - Letters
No ratings yet
Probabilistic Seismic Hazard Assessments For Myanmar and Its Metropolitan Areas Yang - Et - Al-2023-Geoscience - Letters
16 pages
Como Integrar AD o LDAP Con La Ume Del Portal
No ratings yet
Como Integrar AD o LDAP Con La Ume Del Portal
7 pages
SPCCPDF
No ratings yet
SPCCPDF
83 pages
RZ Vs NRZ
No ratings yet
RZ Vs NRZ
7 pages
IS082IU-Syllabus of Retail Management
No ratings yet
IS082IU-Syllabus of Retail Management
9 pages
Chap 3 - The Lion Cub
No ratings yet
Chap 3 - The Lion Cub
2 pages
Audit Chapter 5 Remaining Questions (Kindly Printout)
No ratings yet
Audit Chapter 5 Remaining Questions (Kindly Printout)
18 pages
Adj - To V - That Clause
No ratings yet
Adj - To V - That Clause
7 pages
Spelling Bee First Grade PDF
No ratings yet
Spelling Bee First Grade PDF
63 pages
EIM4
No ratings yet
EIM4
4 pages
The Magic of Childhood A Journey Through Simple Pleasures
No ratings yet
The Magic of Childhood A Journey Through Simple Pleasures
10 pages
CFD Unit 1 QB Answers
No ratings yet
CFD Unit 1 QB Answers
34 pages
645-1 Agua
No ratings yet
645-1 Agua
4 pages
Psychoanalysis and Structuration Theory: The Social Logic of Identity
No ratings yet
Psychoanalysis and Structuration Theory: The Social Logic of Identity
18 pages
Submitted By: Dakshyani Murari Hemanya Ahuja Krishvardhan Maini Lavanya Chaudhary Viyom Gupta
No ratings yet
Submitted By: Dakshyani Murari Hemanya Ahuja Krishvardhan Maini Lavanya Chaudhary Viyom Gupta
10 pages
Take Test: Intro To DB Quiz
No ratings yet
Take Test: Intro To DB Quiz
3 pages
Class 10 English Solutions VP2
No ratings yet
Class 10 English Solutions VP2
3 pages

DWDM Unit-2

Uploaded by

DWDM Unit-2

Uploaded by

DWDM UNIT-2

Summary Table: Components Of Data Mining Architecture

Data Sources Raw input data repositories

Data Warehouse Centralized data storage for analysis

Data Preprocessing Cleaning, transforming, and selecting data

Data Mining Engine Core engine that applies mining algorithms

Pattern Evaluation Filters and evaluates interesting patterns

Knowledge Base Domain knowledge and metadata support

User Interface Interaction layer for users

Types of Market Basket Analysis

3. Differential Market Basket Analysis: Differential market basket analysis

Benefits of Market Basket Analysis

1. Enhanced Customer Understanding

2. Improved Inventory Management

3. Better Pricing Strategies

Measures Of Central Tendency

In data warehouse technology, Parallel Processors and Cluster Systems play a

• Shared Memory Systems, where all processors share a global memory.

Cluster Systems consist of multiple interconnected computers (nodes) that work

Both technologies allow data warehouses to handle massive datasets efficiently,

Warehouse Schema Design refers to how data is logically structured in the

• Snowflake Schema: Extension of star schema with normalized dimensions.

• Galaxy (Fact Constellation) Schema: Contains multiple fact tables sharing

In summary, warehousing software handles the technical operations of the

(i) Min-Max Normalization vs Z-score Normalization

Aspect Min-Max Normalization Z-score Normalization

Transforms data based on

When data distribution

If X = 80, min = 50, max = 100 ⇒ X' = If X = 80, μ = 70, σ = 10 ⇒ Z

(ii) Binary Data Variables vs Nominal Data Variables

Aspect Binary Variables Nominal Variables

Variables with only two Categorical variables with more

True/False, Yes/No, Red, Blue, Green; Apple, Orange,

You might also like