Unit 1 - Lecture 2

The document provides an overview of data mining, distinguishing it from simple data querying and outlining its classification schemes, tasks, and techniques. It introduces the CRISP-DM framework for data mining projects, detailing its iterative and flexible nature, and discusses the components and architecture of data mining systems. Additionally, it covers predictive analytics, its challenges, and various applications across different industries.

Uploaded by

adarshsingh.swg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views15 pages

Unit 1 - Lecture 2

Uploaded by

adarshsingh.swg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Data Mining

Content

 Data mining Introduction

 KDD
What is (not) Data Mining?

What is not Data What is Data Mining? –

Mining?
– Certain names are more
– Look up phone number prevalent in certain US
in phone directory locations (O’Brien, O’Rurke,
O’Reilly… in Boston area)
– Query a Web search
engine for information – Group together similar
about “Amazon” documents returned by search
engine according to their
– Querying or searching context (e.g. Amazon rainforest,
Amazon.com,)

– Finding trends and patterns

Data Mining: Classification Schemes

 Decisions in data mining

– Kinds of databases to be mined
 – Kinds of knowledge to be discovered
 – Kinds of techniques utilized
 – Kinds of applications adapted

 Data mining tasks

– Descriptive data mining
 – Predictive data mining
Decisions in data mining
 Databases to be mined
 Relational, transactional, object-oriented, spatial, time-
series, text, multi-media, heterogeneous, WWW, etc.
 Knowledge to be mined
 Characterization, discrimination, association,
classification, clustering, trend, deviation and outlier
analysis, etc.
 Multiple/integrated functions and mining at multiple
levels
 Techniques utilized
 Database-oriented, data warehouse (OLAP), machine
learning, statistics, visualization, neural network, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud analysis,
Data mining tasks/techniques
 Predictive modeling
 Use some variables to predict unknown or future values
of other variables
 Descriptive modeling
 Find human-interpretable patterns that describe the
data.
Data mining tasks/techniques
 Predictive Modeling:
 Classification: Assigning data instances to predefined
classes (e.g., decision trees, neural networks, support
vector machines).
 Regression: Predicting continuous numerical values
(e.g., linear regression, logistic regression).
 Time Series Analysis: Analyzing data points collected at
specific time intervals (e.g., ARIMA, exponential
smoothing).
 Descriptive Modeling:
 Clustering: Grouping similar data points together (e.g.,
k-means, hierarchical clustering).
 Association Rule Mining: Discovering relationships
between items (e.g., market basket analysis).
 Outlier Detection: Identifying abnormal data points
CRISP-DM: Framework for Data Mining
CRISP-DM stands for Cross-Industry Standard Process for Data
Mining.
 Widely adopted methodology
 Provides a structured approach for planning & executing DM
projects.
 Designed to be adaptable across various industries and
applications.
 Key Characteristics of CRISP-DM
 Iterative: The process is not strictly linear. You may need to
revisit previous phases as you progress.
 Flexible: It can be adapted to various project sizes and
CRISP-DM: Data Mining Operations
1. Business Understanding:
4. Data Modeling:
1. Determine business
objectives and 1. Select modeling techniques.
requirements. 2. Generate test design.
2. Assess situation and
3. Build and Assess models.
resources.
3. Determine data mining 5. Evaluation:
goals.
1. Evaluate results.
2. Data Understanding: 2. Review process.
1. Collect initial data. 3. Determine next steps.
2. Describe data.
3. Explore data.
6. Deployment:
4. Verify data quality. 1. Plan deployment.
2. Plan monitoring and
3. Data Preparation:
1. Select and Clean data. maintenance.

2. Construct data. 3. Produce final report.

CRISP-DM: Framework for Data Mining
Components of Data Mining
 Data Source: This is the origin of the data, which can be databases,
data warehouses, or other repositories.
 Data Warehouse Server: This component retrieves relevant data
from the data source based on user requests.
 Data Mining Engine: The heart of the data mining process, it
applies various algorithms and techniques to extract patterns from
the data.
 Pattern Evaluation Module: Assesses the discovered patterns
based on predefined criteria to determine their significance and
usefulness.
 Graphical User Interface (GUI): This provides a user-friendly
interface for interaction with the data mining system.
Data Mining Architecture
Predictive Analytics

 It is the use of data to predict future trends and events.

 Attempts to answer the question, “What might happen next?”
 It leverages historical data, statistical modeling, and machine
learning algorithms to identify patterns and make forecasts.
 It works by identifying correlations between different
elements in selected datasets.
 There are broadly two types of predictive analytics models:
 classification models
 regression models.
Predictive Analytics Challenges
 Data Quality: Inaccurate, incomplete, or biased data can lead to
unreliable models.
 Data Availability: Insufficient or limited data can hinder model
development.
 Model Complexity: Complex models can be difficult to interpret and
explain.
 Overfitting: Models that are too closely fitted to the training data
may not perform well on new data.
 Ethical Considerations: Concerns about privacy, bias, and fairness
in model development and deployment.
 Computational Resources: Handling large datasets and complex
models requires significant computational power.
Predictive Analytics Applications
 Finance: Fraud detection, credit risk assessment, investment
portfolio optimization, market trend prediction.
 Healthcare: Disease outbreak prediction, patient risk assessment,
drug discovery, personalized medicine.
 Retail: Customer segmentation, demand forecasting, inventory
management, recommendation systems.
 Marketing: Customer churn prediction, campaign optimization,
targeted advertising.
 Manufacturing: Predictive maintenance, supply chain optimization,
quality control.
 Insurance: Risk assessment, fraud detection, customer churn
prediction.

Data Mining Concepts
100% (3)
Data Mining Concepts
122 pages
Unit 5
No ratings yet
Unit 5
26 pages
1 Introduction To SAP MM
No ratings yet
1 Introduction To SAP MM
26 pages
Data Mining Seminar
100% (2)
Data Mining Seminar
21 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Chapter 6 - Data Mining
No ratings yet
Chapter 6 - Data Mining
62 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
48 pages
Chapter Five Data Mining For Healthcare Analytics
No ratings yet
Chapter Five Data Mining For Healthcare Analytics
77 pages
Digital Design - Morris Mano-Fifth Edition
No ratings yet
Digital Design - Morris Mano-Fifth Edition
31 pages
Handout 2 Data Mining
No ratings yet
Handout 2 Data Mining
16 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Data Mining and Its Branches
No ratings yet
Data Mining and Its Branches
37 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Screenshot 2024-06-04 at 12.00.45 AM
No ratings yet
Screenshot 2024-06-04 at 12.00.45 AM
45 pages
Screenshot 2024-06-03 at 11.59.21 PM
No ratings yet
Screenshot 2024-06-03 at 11.59.21 PM
45 pages
Screenshot 2024-06-04 at 12.07.18 AM
No ratings yet
Screenshot 2024-06-04 at 12.07.18 AM
45 pages
Screenshot 2024-06-04 at 12.01.00 AM
No ratings yet
Screenshot 2024-06-04 at 12.01.00 AM
45 pages
2 - Unit 1 - Lecture 3
No ratings yet
2 - Unit 1 - Lecture 3
16 pages
Ba Unit 3 Own
No ratings yet
Ba Unit 3 Own
7 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Control Desk Calibration and Data Set Management
No ratings yet
Control Desk Calibration and Data Set Management
168 pages
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
No ratings yet
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
42 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
Data Mining
No ratings yet
Data Mining
30 pages
DSS Chapter 5
No ratings yet
DSS Chapter 5
9 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
07 DataMining
No ratings yet
07 DataMining
37 pages
1 - DM
No ratings yet
1 - DM
5 pages
Knowledge Management - 10 - Data Mining Overview
No ratings yet
Knowledge Management - 10 - Data Mining Overview
41 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Databricks
No ratings yet
Databricks
81 pages
Data Mining
No ratings yet
Data Mining
88 pages
1 DMiningKuliah 1 Introduction
No ratings yet
1 DMiningKuliah 1 Introduction
51 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
Activity 6 Think Before You Click
100% (1)
Activity 6 Think Before You Click
3 pages
2 Buss Intel Analytics
No ratings yet
2 Buss Intel Analytics
43 pages
DMiningKuliah 1 Introduction
No ratings yet
DMiningKuliah 1 Introduction
41 pages
BI Chapter 04 - Unlocked
No ratings yet
BI Chapter 04 - Unlocked
47 pages
Data Mining Transparencies
No ratings yet
Data Mining Transparencies
50 pages
PredictiveAnalysis U1 U2
No ratings yet
PredictiveAnalysis U1 U2
7 pages
1 IT326 - Ch1 - Introduction
No ratings yet
1 IT326 - Ch1 - Introduction
37 pages
Data Mining
No ratings yet
Data Mining
30 pages
Unit 3 BI & Data Science
No ratings yet
Unit 3 BI & Data Science
19 pages
Unit 1
No ratings yet
Unit 1
59 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
Data Mining
No ratings yet
Data Mining
63 pages
Test Class Questions
No ratings yet
Test Class Questions
23 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
Creadit and Saving Systemfinal
No ratings yet
Creadit and Saving Systemfinal
95 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
SaMD Audit Template (MDR + 62304 + IMDRF)
No ratings yet
SaMD Audit Template (MDR + 62304 + IMDRF)
391 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
CH 1 Intro To Data Mining
No ratings yet
CH 1 Intro To Data Mining
17 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
1) Intro To Datamining
No ratings yet
1) Intro To Datamining
17 pages
CIS Ubuntu Linux 18.04 LTS Benchmark v2.0.1 PDF
No ratings yet
CIS Ubuntu Linux 18.04 LTS Benchmark v2.0.1 PDF
522 pages
9.1.3 Packet Tracer - Identify MAC and IP Addresses
No ratings yet
9.1.3 Packet Tracer - Identify MAC and IP Addresses
7 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Configuration Steps of Investment and WBS Setup in S4 HANA
No ratings yet
Configuration Steps of Investment and WBS Setup in S4 HANA
2 pages
8423 Ecap776 Programming in Python
No ratings yet
8423 Ecap776 Programming in Python
208 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
Web Services: Internet and Web Application Development
100% (1)
Web Services: Internet and Web Application Development
17 pages
Poovarasan S Pintout1
No ratings yet
Poovarasan S Pintout1
2 pages
Guide To Data-Centric System Threat Modeling: Draft NIST Special Publication 800-154
No ratings yet
Guide To Data-Centric System Threat Modeling: Draft NIST Special Publication 800-154
25 pages
Datamining: by Guan Hang Su Cs157A Section 2 Fall 2005
0% (1)
Datamining: by Guan Hang Su Cs157A Section 2 Fall 2005
31 pages
MDA - Model Driven Architecture
No ratings yet
MDA - Model Driven Architecture
17 pages
Libsecp256k1 As A Library - DLL - Etc
No ratings yet
Libsecp256k1 As A Library - DLL - Etc
1 page
Java Oop Interview Questions
No ratings yet
Java Oop Interview Questions
38 pages
CSS 2 Week 1
No ratings yet
CSS 2 Week 1
5 pages
Fortigate 200f Series
No ratings yet
Fortigate 200f Series
11 pages
Network Monitoring Project
No ratings yet
Network Monitoring Project
30 pages
AWS+Tagging Naming+Conventions
No ratings yet
AWS+Tagging Naming+Conventions
4 pages
5-Services Csirt
No ratings yet
5-Services Csirt
45 pages
VIDEO GUIDE - How To Setup Jitsi in Docker With A Reverse Proxy - Page 4 - Docker Containers - Unraid
No ratings yet
VIDEO GUIDE - How To Setup Jitsi in Docker With A Reverse Proxy - Page 4 - Docker Containers - Unraid
14 pages
Information Tech and MGMT (ITM)
No ratings yet
Information Tech and MGMT (ITM)
2 pages
Bescom Public Grievance Redressal System (PGRS) : User Manual
No ratings yet
Bescom Public Grievance Redressal System (PGRS) : User Manual
13 pages
A0205e-1 Cetrics DBBackupTool UserManual
No ratings yet
A0205e-1 Cetrics DBBackupTool UserManual
16 pages
Sample Book Data Migration For SAP SAP Press
100% (1)
Sample Book Data Migration For SAP SAP Press
24 pages
Cloud Computing in A Military Context
No ratings yet
Cloud Computing in A Military Context
25 pages
How To Start Using 1. Register For The Send-To-Pocketbook Service, For This - Launch Application: - From The - or From
No ratings yet
How To Start Using 1. Register For The Send-To-Pocketbook Service, For This - Launch Application: - From The - or From
4 pages
Dedicated Website To Automatic Weather Station Network Operation
No ratings yet
Dedicated Website To Automatic Weather Station Network Operation
3 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Chap 1
No ratings yet
Chap 1
32 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

Unit 1 - Lecture 2

Uploaded by

Unit 1 - Lecture 2

Uploaded by

Data Mining

 Data mining Introduction

What is not Data What is Data Mining? –

– Finding trends and patterns

 Decisions in data mining

 Data mining tasks

2. Construct data. 3. Produce final report.

 It is the use of data to predict future trends and events.

You might also like