0% found this document useful (0 votes)

36 views

Lecturenotes Data Mining

Data mining is the process of analyzing data from different perspectives and summarizing it into useful information. It involves discovering patterns and relationships within large datasets. Common techniques include classification, clustering, association rule mining, and prediction. Decision trees and clustering are popular algorithms. The CRISP-DM methodology provides a standardized process for conducting a data mining project through phases of business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

Uploaded by

tanyah Lloyd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Lecturenotes Data Mining

Uploaded by

tanyah Lloyd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

DATA MINING

• It is the process of analyzing data from different

perspectives and summarizing it into useful
information - information that can be used to
increase revenue, cuts costs, or both.
(http://www.anderson.ucla.edu)
• Also defined as the process of extracting valid
previously unknown comprehensible and actionable
information from large databases and using it to
make crucial business decisions.(Conolly & Begg,
2005)
� Technically, it is a process of discovering
meaningful patterns and relationships that lie
hidden within very large databases(Seidman,
2001)
� Refers to the mining or discovery of new
information in terms of patterns or rules from
vast amounts of data
� Keyword here is patterns:
So what is a pattern??
� A set of events that occur with enough frequency
in the dataset to reveal a relationship between
them. Revealing the relationship is usually an
inductive reasoning process
THE MATHEMATICS OF DATA MINING

� Mathematicians have provided an ideal

framework within which to conduct data mining
called the “EUCLIDEAN SPACE” and the
mathematical theory describing it is known as
linear algebra
� So what is the Euclidean space??
PREDICTION

CLASSIFICATION GOALS OF DATA MINING

OPTIMIZATION

IDENTIFICATION
STYLES TO DATA MINING
• Directed data mining- takes the form of predictive
modelling where we know exactly what we want to
predict
• It classifies data for use in making predictions or
estimates with the goal of deriving target values
• Egs banks may use it to predict defaulters on loans,
businesses may use it to decide whom to market their
products to
• Uses popular data mining algorithms such as
decision trees(which will be discussed later on in detail)
� Undirected data mining- which finds patterns
in the data and leaves it up to the user to
determine whether or not these patterns are
important
� Data is placed in a format that makes it easier
for us to make sense of it
� Most commonly used algorithm is clustering
which clumps data together in groups based on
common characteristics(to be discussed later in detail)
� One can then take one of the derived clusters
and apply the decision tree algorithm to it so
that they focus on a particular segment of the
cluster
DATA MINING METHODOLOGY
DATA MINING ALGORITHMS

� A data mining algorithm is a well-defined

procedure that takes data as input and produces as
output: models or patterns
DECISION TREES

� This algorithm analyzes the data and creates a

repeating series of branches until no more
relevant branches can be made
� The end result is a binary tree structure where the
splits in the branches can be followed along
specific criteria to find the most desired result
� Decision Tree (DT):
�Tree where the root and each internal node is labeled
with a question.
�The arcs represent each possible answer to the
associated question.
�Each leaf node represents a prediction of a solution to
the problem.
� Popular technique for classification; Leaf node
indicates class to which the corresponding tuple
belongs.
CLUSTERING
� This algorithm groups data into clusters
� The goal of clustering is to place records into
groups, such that records in a group are similar
to each other and dissimilar to records in other
groups
� An important facet of clustering is the
similarity function that is used
� The Euclidean distance(the ordinary or straight
line distance between two points) can be used
to measure similarity
ASSOCIATION RULE MINING

� It is an important data mining model initially

used for Market Basket Analysis to find how
items purchased by customers are related
ASSOCIATION RULE MINING
� Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items
in the transaction

Market-Basket transactions
Example of Association Rules

{Diaper} → {Beer},
{Milk, Bread} → {Eggs,Coke},
{Beer, Bread} → {Milk},

Implication means co-occurrence,

not causality!
DEFINITION: ASSOCIATION RULE
● Association Rule
– An implication expression of the form
X → Y, where X and Y are itemsets
– Example:
{Milk, Diaper} → {Beer}

● Rule Evaluation Metrics

– Support (s)
◆ Fraction of transactions that contain Example
both X and Y :
– Confidence (c)
◆ Measures how often items in Y
appear in transactions that
contain X
MINING ASSOCIATION RULES
Example of Rules:
{Milk,Diaper} → {Beer} (s=0.4, c=0.67)
{Milk,Beer} → {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} → {Milk} (s=0.4, c=0.67)
{Beer} → {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} → {Milk,Beer} (s=0.4, c=0.5)
{Milk} → {Diaper,Beer} (s=0.4, c=0.5)

Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
CROSS INDUSTRY STANDARD PROCESS FOR DATA
MINING (CRISP- DM)
CRISP-DM: OVERVIEW

� CRISP-DM is a comprehensive data mining

methodology and process model that provides
anyone—from novices to data mining experts—
with a complete blueprint for conducting a data
mining project.
� CRISP-DM breaks down the life cycle of a data
mining project into six phases.
CRISP-DM: PHASES

Business Understanding
� Understanding project objectives and
requirements; Data mining problem definition
Data Understanding
Initial data collection and familiarization; Identify
data quality issues; Initial, obvious results
Data Preparation
� Record and attribute selection; Data cleansing
Modeling
� Run the data mining tools
Evaluation
� Determine if results meet business objectives;
Identify business issues that should have been
addressed earlier
Deployment
� Put the resulting models into practice; Set up for
continuous mining of the data

Introduction To Spring Framework (Presentation - 143 Slides)
100% (2)
Introduction To Spring Framework (Presentation - 143 Slides)
143 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Data Mining
No ratings yet
Data Mining
30 pages
BI-Unit-3-Part-1-PPT.ppt
No ratings yet
BI-Unit-3-Part-1-PPT.ppt
51 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Knowledge Discovery & Data Mining
No ratings yet
Knowledge Discovery & Data Mining
30 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
Data Mining: Prof Jyotiranjan Hota
No ratings yet
Data Mining: Prof Jyotiranjan Hota
17 pages
Instructor:: Doaa Adil Mohamed Altayeb
No ratings yet
Instructor:: Doaa Adil Mohamed Altayeb
34 pages
Data Mining
No ratings yet
Data Mining
63 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
Fundamentals of Data Mining
No ratings yet
Fundamentals of Data Mining
36 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
1. Introduction
No ratings yet
1. Introduction
26 pages
DSS chapter 5
No ratings yet
DSS chapter 5
9 pages
Process: 1. Data Mining (The Analysis Step of The "Knowledge Discovery in Databases" Process, or KDD)
No ratings yet
Process: 1. Data Mining (The Analysis Step of The "Knowledge Discovery in Databases" Process, or KDD)
4 pages
unit 3 BI & Data science (1)
No ratings yet
unit 3 BI & Data science (1)
19 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Big Data 4 (3 - 4)
No ratings yet
Big Data 4 (3 - 4)
13 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Data Mining Implementation
No ratings yet
Data Mining Implementation
9 pages
CPS 196.03: Information Management and Mining: Shivnath Babu
No ratings yet
CPS 196.03: Information Management and Mining: Shivnath Babu
30 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
p144 Data Mining
100% (3)
p144 Data Mining
11 pages
Topic 3 Data Mining For Business Intelligence
No ratings yet
Topic 3 Data Mining For Business Intelligence
49 pages
Business Intelligence Data Mining: (John Naisbett)
No ratings yet
Business Intelligence Data Mining: (John Naisbett)
60 pages
Data Mining
No ratings yet
Data Mining
31 pages
DM Lec1
No ratings yet
DM Lec1
40 pages
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
32 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
dw and dm notes (1)
No ratings yet
dw and dm notes (1)
89 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
36 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
DMlecture1
No ratings yet
DMlecture1
39 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
3 DM
No ratings yet
3 DM
36 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
Data Mining
No ratings yet
Data Mining
33 pages
Yihao Final Paper CCSC for Submission
No ratings yet
Yihao Final Paper CCSC for Submission
6 pages
DWM
No ratings yet
DWM
66 pages
Data Mining
No ratings yet
Data Mining
87 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
2 Data Mining
No ratings yet
2 Data Mining
20 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
24 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
No ratings yet
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
35 pages
Data Mining Slides
No ratings yet
Data Mining Slides
65 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Data Mining
No ratings yet
Data Mining
25 pages
Introduction-to-Data-Mining
No ratings yet
Introduction-to-Data-Mining
32 pages
My Chapter Two
No ratings yet
My Chapter Two
57 pages
Data Mining Real
No ratings yet
Data Mining Real
19 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Ma Data Management Fs Oi Madm 60 en
No ratings yet
Ma Data Management Fs Oi Madm 60 en
139 pages
DLookup Function in Microsoft Access
No ratings yet
DLookup Function in Microsoft Access
2 pages
It Question Bank All Units
No ratings yet
It Question Bank All Units
20 pages
Akhil Data+Engineer1
No ratings yet
Akhil Data+Engineer1
5 pages
Chapter 5 Database Concepts Using LibreOffice - SOP
No ratings yet
Chapter 5 Database Concepts Using LibreOffice - SOP
4 pages
The Richest Man in Babylon Orang Terkaya Di Babylon: Table of Content
No ratings yet
The Richest Man in Babylon Orang Terkaya Di Babylon: Table of Content
2 pages
T-SQL-Practice 02
No ratings yet
T-SQL-Practice 02
18 pages
8960 - DWM Experiment 2
No ratings yet
8960 - DWM Experiment 2
15 pages
JDBC Interview Questions and Answers
No ratings yet
JDBC Interview Questions and Answers
63 pages
TCL Commands in SQL: Go To Challenge
No ratings yet
TCL Commands in SQL: Go To Challenge
24 pages
Act 1
No ratings yet
Act 1
1 page
(Manfredo P. Do Carmo) Differential Forms and Appl (BookFi) PDF
No ratings yet
(Manfredo P. Do Carmo) Differential Forms and Appl (BookFi) PDF
119 pages
Adv - Java Means Durga Sir... : Durgasoft, Plot No: 202, Iind Floor, Huda Maitrivanam, Ameerpet, Hyderabad-500038
100% (2)
Adv - Java Means Durga Sir... : Durgasoft, Plot No: 202, Iind Floor, Huda Maitrivanam, Ameerpet, Hyderabad-500038
15 pages
Database Designing Concepts Data Base: Disadvantages of Manual System
60% (5)
Database Designing Concepts Data Base: Disadvantages of Manual System
51 pages
Conquest PACS
100% (1)
Conquest PACS
143 pages
Anil Kumar: Data Engineer
No ratings yet
Anil Kumar: Data Engineer
8 pages
File Access Methods in Operating System
No ratings yet
File Access Methods in Operating System
4 pages
Evaluation Measures For Text Summarization
No ratings yet
Evaluation Measures For Text Summarization
25 pages
CIT208 CALCULUS EDUCATIONAL CONSULT 2020_1
No ratings yet
CIT208 CALCULUS EDUCATIONAL CONSULT 2020_1
34 pages
Apps - Differences Between EIT and SIT in HRMS
No ratings yet
Apps - Differences Between EIT and SIT in HRMS
7 pages
Iaps 1003 - Practice Note
No ratings yet
Iaps 1003 - Practice Note
4 pages
1.interoffice Communication Management System
No ratings yet
1.interoffice Communication Management System
2 pages
Retrieval-Augmented Generation For Large Language Models A Survey
No ratings yet
Retrieval-Augmented Generation For Large Language Models A Survey
26 pages
Fact Tables
No ratings yet
Fact Tables
3 pages
G Punithavalli
No ratings yet
G Punithavalli
2 pages
Chapter 9 - BDMT
No ratings yet
Chapter 9 - BDMT
61 pages
Computer Science Practical For KV
No ratings yet
Computer Science Practical For KV
13 pages
Scrip SQL
No ratings yet
Scrip SQL
7 pages
Shewa Curiculem of Level 2
No ratings yet
Shewa Curiculem of Level 2
59 pages

Lecturenotes Data Mining

Uploaded by

Lecturenotes Data Mining

Uploaded by

DATA MINING

• It is the process of analyzing data from different

� Mathematicians have provided an ideal

CLASSIFICATION GOALS OF DATA MINING

� A data mining algorithm is a well-defined

� This algorithm analyzes the data and creates a

� It is an important data mining model initially

Implication means co-occurrence,

● Rule Evaluation Metrics

� CRISP-DM is a comprehensive data mining

You might also like