Fundamentals of Data Mining
Fundamentals of Data Mining
DATA MINING
CHAPTER 1- INTRODUCTION
1. Why data mining?
2. What is Data Mining?
3. Data Mining Process
4. Data Mining Applications & Benefits
WHY DATA MINING?
Explosive Growth of data: from
terabytes to petabytes.
Data Collection & Data
Availability:
Automated data Collection tools,
database systems, web, emails,
Computerized Society.
SOURCES OF DATA
Web data
E-Commerce
Bank Transaction
Digital media
Online Games
Research
ONLINE DATA
Every 60s
98k + tweets
Millions of FB updates
11 million of chats
217 new mobile user
SOLUTION
We are drowning in data but lacking
in Knowledge.
The solution is to mine the
knowledge from data.
Automated analysis of massive data
sets.
WHAT IS DATA MINING?
DATA MINING
It is the process of mining knowledge from large amount of data.
Data Mining
Techniques
Useful data
WHY WE DO THIS?
1 2
• We also get • Data mining is
trends and also called as
patterns, insights Knowledge
of collected data. Discovery in
Database (KDD).
DATA MINING TECHNIQUES
s
Segmentation
mathematical
AI
KNN algo
ML Apriori algo
K mean algo
Naïve bayes
DATA MINING PROCESS
DATA SELECTION
DATA PREPROCESSING
DATA TRANSFORMATION
DATA MINING
PATTERN EVALUATION
KNOWLEDGE
PRESENTATION
ARCHITECTURE
Knowled
ge base
Pattern Evaluation
User Interface
WHAT KIND OF DATA CAN BE
MINED?
Here are the data
TRANSACTIONAL DATA
DATA MINING APPLICATIONS
Demand Prediction
DATA MINING TASK
There are two type of task:
Descriptive Predictive
Clustering Classification
• Grouping Similar • Categorizing new data
Customer based on based on previous
their interest patterns
Association Mining Regression
Rule
• Finding Relationships • Predicting Continuous
between items in values like sales and
data stock prices
WHAT KIND OF PATTERNS
CAN BE MINED?
CLASS/CONCEPT
DESCRIPTION:
In data mining, class/concept description helps in
understanding and summarizing data by describing
characteristics and differences of data groups.
Characterization
Discrimination
Mining Frequent Patterns
Association and Correlations
Classification and Regression
CHARACTERIZATION AND
DISCRIMINATION
Characterization: (Describing a group)
• It describes the common characteristics of a group (class or
concept).
• It summarizes general patterns in data.
Discrimination: (Comparing two or more groups)
• It compares two or more groups to find differences between
them.
• It identifies what makes one group different from another.
COMPARISON
Features Characterization Discrimination
What it does? Describes common Compares two or more
characteristics of a group groups to find differences
Example "Loyal customers shop "High-risk borrowers have
frequently and spend low credit scores"
more"
Use Case Customer profiling, Fraud detection, risk
business trends analysis
MINING FREQUENT
PATTERNS
Frequent pattern mining is a technique in data mining that
finds repeating patterns in large datasets. These patterns help
in understanding trends, making predictions, and improving
decision-making.
What are Frequent Pattern?
A frequent pattern is something that appears often in a dataset.
Example:
Supermarket Purchases
Many customers buy bread and butter together.
If this happens frequently, it is called a frequent pattern.
TYPES OF FREQUENT
PATTERNS
Frequent Itemsets → Groups of items that appear together
frequently.
Example: Customers often buy milk, bread, and eggs together.
Sequential Patterns → Repeated patterns in a sequence (ordered
events).
Example: A customer first buys a phone, then buys a phone
case after a week.
Association Rules → If one event happens, another is likely to
happen.
Example: If people buy diapers, they often buy baby wipes
too.
ASSOCIATION AND
CORRELATIONS
These are techniques used in data mining to find relationships between items
in a dataset.
Association:
It finds connections between items that often appear together.
Example: If customers buy bread, they often buy butter too.
Correlations:
It checks if two things change together and how strong their relationship is.
Example (Weather & Ice Cream Sales):
On hot days, ice cream sales increase.
This means temperature and ice cream sales are correlated.
A high correlation means the two things are strongly related.
KEY DIFFERENCE
Association = Items that appear together frequently.
Correlation = Items that influence each other’s behavior.
ASSOCIATION RULE MINING
ARM is also called market basket analysis.
Set of items in a transaction is called market
basket.
Mostly used in retail Industry.
SUPPORT AND CONFIDENCE
In association rule mining, we use support and confidence to measure
the strength of a rule.
Support:
Support tells how often an itemset appears in the dataset. It helps in
finding popular items.
Confidence:
Confidence tells how often an association rule is true. It shows the
likelihood of B happening when A occurs.
Example:
We want to check the rule:
If a customer buys milk, they also buy bread
ASSOCIATION ANALYSIS
Transaction Item Purchase
ID
1 Bread, Cheese, Egg, Juice
2 Bread, Cheese, Juice
3 Bread, yogurt, Milk
4 Bread, Juice, Milk
5 Cheese, Juice, Milk