0% found this document useful (0 votes)
9 views36 pages

Fundamentals of Data Mining

The document provides an introduction to data mining, explaining its importance due to the explosive growth of data and the need for automated analysis to extract knowledge. It outlines the data mining process, techniques, applications, and the role of data warehouses and OLAP in managing and analyzing data. Additionally, it discusses various data mining tasks, including descriptive and predictive tasks, as well as methods for finding patterns and associations in data.

Uploaded by

noahwilson686
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views36 pages

Fundamentals of Data Mining

The document provides an introduction to data mining, explaining its importance due to the explosive growth of data and the need for automated analysis to extract knowledge. It outlines the data mining process, techniques, applications, and the role of data warehouses and OLAP in managing and analyzing data. Additionally, it discusses various data mining tasks, including descriptive and predictive tasks, as well as methods for finding patterns and associations in data.

Uploaded by

noahwilson686
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

FUNDAMENTALS OF Lecture 1

DATA MINING
CHAPTER 1- INTRODUCTION
1. Why data mining?
2. What is Data Mining?
3. Data Mining Process
4. Data Mining Applications & Benefits
WHY DATA MINING?
Explosive Growth of data: from
terabytes to petabytes.
Data Collection & Data
Availability:
Automated data Collection tools,
database systems, web, emails,
Computerized Society.
SOURCES OF DATA
Web data
E-Commerce
Bank Transaction
Digital media
Online Games
Research
ONLINE DATA
Every 60s
98k + tweets
Millions of FB updates
11 million of chats
217 new mobile user
SOLUTION
We are drowning in data but lacking
in Knowledge.
The solution is to mine the
knowledge from data.
Automated analysis of massive data
sets.
WHAT IS DATA MINING?
DATA MINING
It is the process of mining knowledge from large amount of data.

Data Mining
Techniques

Useful data
WHY WE DO THIS?

1 • Companies & organizations get huge amount


of data from different sources and platforms.

• A size of database increasing and it is very


2 difficult to manually search for useful
information in it.

• They use data mining techniques which

3 includes AI and mathematical complex


algorithms for getting specific and useful
data
CONTINUED….

1 2
• We also get • Data mining is
trends and also called as
patterns, insights Knowledge
of collected data. Discovery in
Database (KDD).
DATA MINING TECHNIQUES

Statistic Cluster techniques


Regression

s
Segmentation
mathematical

AI
KNN algo

ML Apriori algo
K mean algo
Naïve bayes
DATA MINING PROCESS
DATA SELECTION

DATA PREPROCESSING

DATA TRANSFORMATION

DATA MINING

PATTERN EVALUATION
KNOWLEDGE
PRESENTATION
ARCHITECTURE

Database or data warehouse

Data Mining Engine

Knowled
ge base
Pattern Evaluation

User Interface
WHAT KIND OF DATA CAN BE
MINED?
Here are the data

Database Data Data Warehouse Other kind of data


DATA WAREHOUSE
A data warehouse is a repository of
information collected from multiple
sources, stored under a unified schema,
and usually residing at a single site. Data
warehouses are constructed via a
process of data cleaning, data
integration, data transformation, data
loading, and periodic data refreshing.
KEY FEATURES
 Subjected
Integrated
Non-Volatile
Time-Varient
Data Granularity
A data warehouse is
usually modeled by a
multidimensional data
structure, called a data
cube, in which each
dimension corresponds to
an attribute or a set of
attributes in the schema,
and each cell stores the
value of some aggregate
measure such as count or
sum(sales amount). A data
cube provides a
multidimensional view of
data and allows the
precomputation and fast
access of summarized
data.
OLAP
OLAP (Online Analytical Processing) is a
technology used in data warehouses to
analyze large volumes of data from multiple
perspectives quickly and efficiently. It allows
users to perform complex queries, such as
comparing sales by region, time, or product
category, and interact with the data to
discover insights.
OLAP OPERATIONS
 Pivoting
 Slice and Dice
Roll up and drill down
PIVOTING
SLICE AND DICE
ROLL UP AND DRILL DOWN
OTHERS DATA

TRANSACTIONAL DATA
DATA MINING APPLICATIONS

Customer Segmentation Benefits:


• Manufacturing
Market basket analysis • Mail Order
• Supermarkets
Risk Management • Airlines
• Department Store
Fraud Detection •

Insurance
Banks

Demand Prediction
DATA MINING TASK
There are two type of task:

Descriptive Predictive

Clustering Classification
• Grouping Similar • Categorizing new data
Customer based on based on previous
their interest patterns
Association Mining Regression
Rule
• Finding Relationships • Predicting Continuous
between items in values like sales and
data stock prices
WHAT KIND OF PATTERNS
CAN BE MINED?
CLASS/CONCEPT
DESCRIPTION:
In data mining, class/concept description helps in
understanding and summarizing data by describing
characteristics and differences of data groups.

Characterization
Discrimination
Mining Frequent Patterns
Association and Correlations
Classification and Regression
CHARACTERIZATION AND
DISCRIMINATION
Characterization: (Describing a group)
• It describes the common characteristics of a group (class or
concept).
• It summarizes general patterns in data.
Discrimination: (Comparing two or more groups)
• It compares two or more groups to find differences between
them.
• It identifies what makes one group different from another.
COMPARISON
Features Characterization Discrimination
What it does? Describes common Compares two or more
characteristics of a group groups to find differences
Example "Loyal customers shop "High-risk borrowers have
frequently and spend low credit scores"
more"
Use Case Customer profiling, Fraud detection, risk
business trends analysis
MINING FREQUENT
PATTERNS
Frequent pattern mining is a technique in data mining that
finds repeating patterns in large datasets. These patterns help
in understanding trends, making predictions, and improving
decision-making.
What are Frequent Pattern?
A frequent pattern is something that appears often in a dataset.
Example:
Supermarket Purchases
Many customers buy bread and butter together.
If this happens frequently, it is called a frequent pattern.
TYPES OF FREQUENT
PATTERNS
Frequent Itemsets → Groups of items that appear together
frequently.
Example: Customers often buy milk, bread, and eggs together.
Sequential Patterns → Repeated patterns in a sequence (ordered
events).
Example: A customer first buys a phone, then buys a phone
case after a week.
Association Rules → If one event happens, another is likely to
happen.
Example: If people buy diapers, they often buy baby wipes
too.
ASSOCIATION AND
CORRELATIONS
These are techniques used in data mining to find relationships between items
in a dataset.

Association:
It finds connections between items that often appear together.
Example: If customers buy bread, they often buy butter too.

Correlations:
It checks if two things change together and how strong their relationship is.
Example (Weather & Ice Cream Sales):
On hot days, ice cream sales increase.
This means temperature and ice cream sales are correlated.
A high correlation means the two things are strongly related.
KEY DIFFERENCE
Association = Items that appear together frequently.
Correlation = Items that influence each other’s behavior.
ASSOCIATION RULE MINING
 ARM is also called market basket analysis.
 Set of items in a transaction is called market
basket.
 Mostly used in retail Industry.
SUPPORT AND CONFIDENCE
In association rule mining, we use support and confidence to measure
the strength of a rule.
Support:
Support tells how often an itemset appears in the dataset. It helps in
finding popular items.
Confidence:
Confidence tells how often an association rule is true. It shows the
likelihood of B happening when A occurs.
Example:
We want to check the rule:
If a customer buys milk, they also buy bread
ASSOCIATION ANALYSIS
Transaction Item Purchase
ID
1 Bread, Cheese, Egg, Juice
2 Bread, Cheese, Juice
3 Bread, yogurt, Milk
4 Bread, Juice, Milk
5 Cheese, Juice, Milk

You might also like