0% found this document useful (0 votes)
10 views6 pages

DWDM Unit-2

Data Warehouse and Data Mining unit 2 notes

Uploaded by

Pramath Pathak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

DWDM Unit-2

Data Warehouse and Data Mining unit 2 notes

Uploaded by

Pramath Pathak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DWDM UNIT-2

Summary Table: Components Of Data Mining Architecture

Component Description

Data Sources Raw input data repositories

Data Warehouse Centralized data storage for analysis

Data Preprocessing Cleaning, transforming, and selecting data

Data Mining Engine Core engine that applies mining algorithms

Pattern Evaluation Filters and evaluates interesting patterns

Knowledge Base Domain knowledge and metadata support

User Interface Interaction layer for users


Market Basket Analysis
A data mining technique that is used to uncover purchase patterns in any retail setting is
known as Market Basket Analysis. Basically, market basket analysis in data mining
involves analyzing the combinations of products that are bought together.

This is a technique that gives the careful study of purchases done by a customer in a
supermarket. This concept identifies the pattern of frequent purchase items by
customers. This analysis can help to promote deals, offers, sale by the companies, and
data mining techniques helps to achieve this analysis task.

Types of Market Basket Analysis

There are three types of Market Basket Analysis. They are as follow:
1. Descriptive market basket analysis: This sort of analysis looks for patterns and
connections in the data that exist between the components of a market basket.
This kind of study is mostly used to understand consumer behavior, including
what products are purchased in combination and what the most typical item
combinations. Retailers can place products in their stores more profitably by
understanding which products are frequently bought together with the aid of
descriptive market basket analysis.

2. Predictive Market Basket Analysis: Market basket analysis that predicts future
purchases based on past purchasing patterns is known as predictive market
basket analysis. Large volumes of data are analyzed using machine learning
algorithms in this sort of analysis in order to create predictions about which
products are most likely to be bought together in the future. Retailers may make
data-driven decisions about which products to carry, how to price them, and
how to optimize shop layouts with the use of predictive market basket research.

3. Differential Market Basket Analysis: Differential market basket analysis


analyses two sets of market basket data to identify variations between them.
Comparing the behavior of various client segments or the behavior of customers
over time is a common usage for this kind of study. Retailers can respond to
shifting consumer behavior by modifying their marketing and sales tactics with
the help of differential market basket analysis.

Benefits of Market Basket Analysis

1. Enhanced Customer Understanding

2. Improved Inventory Management

3. Better Pricing Strategies

4. Sales Growth

Measures Of Central Tendency


Parallel Processors and Cluster Systems in Data Warehouse Process
Technology

In data warehouse technology, Parallel Processors and Cluster Systems play a


crucial role in enhancing performance, scalability, and reliability.

Parallel Processing involves dividing large tasks into smaller sub-tasks and
executing them simultaneously across multiple processors. This significantly
speeds up data loading, querying, and analysis in data warehouses. Parallel
processing systems include:

• Shared Memory Systems, where all processors share a global memory.

• Shared Nothing Systems, where each processor has its own memory and disk,
suitable for large-scale data warehousing.

Cluster Systems consist of multiple interconnected computers (nodes) that work


together as a single system. These are cost-effective and offer high availability. If
one node fails, others can take over, ensuring uninterrupted operations. Clusters
support load balancing and failover mechanisms, which are essential for
managing large volumes of data in real time.

Both technologies allow data warehouses to handle massive datasets efficiently,


support complex queries, and scale horizontally by adding more processors or
nodes. They are essential for modern Business Intelligence (BI) and Big Data
analytics platforms.

In summary, parallel processors and cluster systems form the backbone of high-
performance data warehouses, enabling fast, reliable, and scalable data
processing.
Warehousing Software and Warehouse Schema Design in Data Warehouse
Process Technology

Warehousing Software provides the tools needed to build, manage, and access a
data warehouse. It supports data integration, extraction, transformation, loading
(ETL), query processing, and reporting. Popular warehousing software includes
Microsoft SQL Server, Oracle Warehouse Builder, Informatica, and Snowflake. These
tools ensure efficient data storage, real-time access, and business intelligence
support.

Warehouse Schema Design refers to how data is logically structured in the


warehouse. It affects query performance and data organization. The main types of
schemas are:

• Star Schema: Central fact table linked to multiple dimension tables. Simple and
fast for queries.

• Snowflake Schema: Extension of star schema with normalized dimensions.


Saves storage but more complex.

• Galaxy (Fact Constellation) Schema: Contains multiple fact tables sharing


dimension tables. Used for complex applications.

Effective schema design ensures efficient data retrieval, minimizes redundancy, and
improves performance.

In summary, warehousing software handles the technical operations of the


warehouse, while schema design structures the data logically for optimized access
and analysis. Both are essential for a functional and scalable data warehouse
system.

a. Differentiate between:

(i) Min-Max Normalization vs Z-score Normalization

Aspect Min-Max Normalization Z-score Normalization

Transforms data based on


Scales data to a fixed range, usually
Definition mean and standard
[0, 1]
deviation

X′=X−XminXmax−XminX' = \frac{X -
Z=X−μσZ = \frac{X -
Formula X_{min}}{X_{max} - X_{min}}X′=Xmax
\mu}{\sigma}Z=σX−μ
−XminX−Xmin
Aspect Min-Max Normalization Z-score Normalization

When data distribution


Use Case When data range is known and fixed
needs to be standardized

If X = 80, min = 50, max = 100 ⇒ X' = If X = 80, μ = 70, σ = 10 ⇒ Z


Example
(80-50)/(100-50) = 0.6 = (80-70)/10 = 1.0

(ii) Binary Data Variables vs Nominal Data Variables

Aspect Binary Variables Nominal Variables

Variables with only two Categorical variables with more


Definition
values (0 or 1) than two values

True/False, Yes/No, Red, Blue, Green; Apple, Orange,


Values
Male/Female Banana

Example Gender: Male (1), Female (0) Color: Red, Green, Blue

You might also like