0% found this document useful (0 votes)
131 views4 pages

Unit1 Detailed Notes DWDM MAKAUT

This document provides detailed notes on Data Warehousing and Data Mining, covering definitions, characteristics, architecture, and the differences between OLAP and OLTP. It also discusses the need for data mining, the KDD process, frequent pattern mining, and algorithms such as Apriori and FP-Growth. Additionally, it introduces sequential pattern mining, its primitives, and scalable methods.

Uploaded by

arpisaha.cse.23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views4 pages

Unit1 Detailed Notes DWDM MAKAUT

This document provides detailed notes on Data Warehousing and Data Mining, covering definitions, characteristics, architecture, and the differences between OLAP and OLTP. It also discusses the need for data mining, the KDD process, frequent pattern mining, and algorithms such as Apriori and FP-Growth. Additionally, it introduces sequential pattern mining, its primitives, and scalable methods.

Uploaded by

arpisaha.cse.23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Unit 1: Data Warehousing and Data Mining (Detailed Notes)

Data Warehousing: Definitions, characteristics, architecture, OLAP vs OLTP.

Definition:

A Data Warehouse is a centralized repository used to store integrated data from multiple sources. It supports

decision-making and business intelligence through historical data analysis.

Characteristics:

- Subject-Oriented: Organized by business subject (e.g., sales).

- Integrated: Combines data from different sources.

- Time-Variant: Stores historical data.

- Non-Volatile: Data is stable and not frequently changed.

Architecture:

1. Data Sources (OLTP systems)

2. ETL (Extract, Transform, Load)

3. Data Warehouse Storage

4. OLAP Servers

5. Front-End Tools

OLAP vs OLTP:

Feature | OLTP | OLAP

------------- | ---------------------------- | -----------------------------

Purpose | Day-to-day operations | Analytical processing

Data Type | Current, operational | Historical, analytical

Queries | Simple, short | Complex, long

Normalization | Highly normalized | De-normalized

Data Mining: Introduction, need, KDD process.

Data Mining is the process of discovering patterns, trends, and knowledge from large datasets.
Unit 1: Data Warehousing and Data Mining (Detailed Notes)

Need:

- Helps in decision making.

- Identifies hidden patterns.

- Useful in business, healthcare, marketing, etc.

KDD (Knowledge Discovery in Database) Process:

1. Data Cleaning

2. Data Integration

3. Data Selection

4. Data Transformation

5. Data Mining

6. Pattern Evaluation

7. Knowledge Presentation

Mining frequent patterns, associations, correlations.

Frequent Pattern Mining:

Discovers recurring relationships among data items.

Association Rule:

If-Then statements that help uncover relationships (e.g., {Milk, Bread} => {Butter})

Support: How often a rule occurs.

Confidence: How often a rule is true.

Lift: How much more often items occur together than expected.

Correlation:

Measures strength and direction of a relationship.

Apriori algorithm, FP-Growth.


Unit 1: Data Warehousing and Data Mining (Detailed Notes)

Apriori Algorithm:

- Uses "Bottom-up" approach.

- Generates candidate itemsets and prunes those below minimum support.

Steps:

1. Scan DB for frequency count.

2. Generate candidate itemsets.

3. Prune infrequent ones.

FP-Growth:

- Uses FP-Tree to avoid candidate generation.

- Compresses database and grows patterns.

Advantages:

- FP-Growth is faster than Apriori for large datasets.

Sequential Pattern Mining: Concept, primitives, scalable methods.

Concept:

Sequential pattern mining identifies frequent subsequences in sequence data (e.g., customer purchasing

behavior over time).

Primitives:

- Sequence: Ordered list of events.

- Itemset: A set of items occurring together.

- Support: Proportion of data sequences containing the pattern.

Methods:

- GSP (Generalized Sequential Pattern)

- SPADE (Sequential Pattern Discovery using Equivalence classes)

- PrefixSpan (Pattern-growth approach)


Unit 1: Data Warehousing and Data Mining (Detailed Notes)

Scalability:

- Pruning infrequent subsequences.

- Using pattern-growth rather than candidate generation.

You might also like