2 - Unit 1 - Lecture 3

Uploaded by

sihagmukesh05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views16 pages

2 - Unit 1 - Lecture 3

Uploaded by

sihagmukesh05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Data Mining

Content

 Data mining Introduction

 KDD
What is (not) Data Mining?

What is not Data What is Data Mining? –

Mining?
– Certain names are more
– Look up phone number prevalent in certain US
in phone directory locations (O’Brien, O’Rurke,
O’Reilly… in Boston area)
– Query a Web search
engine for information – Group together similar
about “Amazon” documents returned by search
engine according to their
– Querying or searching context (e.g. Amazon rainforest,
Amazon.com,)

– Finding trends and patterns

Data Mining: Classification Schemes

 Decisions in data mining

– Kinds of databases to be mined
 – Kinds of knowledge to be discovered
 – Kinds of techniques utilized
 – Kinds of applications adapted

 Data mining tasks

– Descriptive data mining
 – Predictive data mining
Decisions in data mining
 Databases to be mined
 Relational, transactional, object-oriented, spatial, time-
series, text, multi-media, heterogeneous, WWW, etc.
 Knowledge to be mined
 Characterization, discrimination, association,
classification, clustering, trend, deviation and outlier
analysis, etc.
 Multiple/integrated functions and mining at multiple
levels
 Techniques utilized
 Database-oriented, data warehouse (OLAP), machine
learning, statistics, visualization, neural network, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud analysis,
Data mining tasks/techniques
 Predictive modeling
 Use some variables to predict unknown or future values
of other variables
 Descriptive modeling
 Find human-interpretable patterns that describe the
data.
Data mining tasks/techniques
 Predictive Modeling:
 Classification: Assigning data instances to predefined
classes (e.g., decision trees, neural networks, support
vector machines).
 Regression: Predicting continuous numerical values
(e.g., linear regression, logistic regression).
 Time Series Analysis: Analyzing data points collected at
specific time intervals (e.g., ARIMA, exponential
smoothing).
 Descriptive Modeling:
 Clustering: Grouping similar data points together (e.g.,
k-means, hierarchical clustering).
 Association Rule Mining: Discovering relationships
between items (e.g., market basket analysis).
 Outlier Detection: Identifying abnormal data points
CRISP-DM: Framework for Data Mining
CRISP-DM stands for Cross-Industry Standard Process for Data
Mining.
 Widely adopted methodology
 Provides a structured approach for planning & executing DM
projects.
 Designed to be adaptable across various industries and
applications.
 Key Characteristics of CRISP-DM
 Iterative: The process is not strictly linear. You may need to
revisit previous phases as you progress.
 Flexible: It can be adapted to various project sizes and
SELF->Key Characterisics
Here’s a simplified explanation of the key characteristics of CRISP-DM:
1. Iterative: The CRISP-DM process isn’t a straight line; it’s more like a circle. As you
work on a data project, you might find that you need to go back and revisit earlier
steps. For example, after analyzing your data, you might realize you need to refine
your questions or gather more data.
2. Flexible: CRISP-DM can be used for different types of projects, whether they are big
or small. You can adjust the process to fit the specific needs of your project, making it
versatile for various situations.
3. Industry-Neutral: This approach can be used in any industry, whether it’s healthcare,
finance, marketing, or any other field. It’s designed to be useful no matter what kind of
data you’re working with.
4. Focus on Business Value: At the heart of CRISP-DM is the idea of understanding
what the business needs. It’s important to make sure that your data analysis is
aligned with the goals of the organization. This way, your work provides real value and
helps the business succeed.
5. Structured Framework: CRISP-DM provides a clear framework for managing data
mining projects. It outlines specific steps to follow, making it easier for teams to
collaborate and stay organized. This structure helps ensure that all important aspects
of the project are covered, from understanding the problem to evaluating the results.
CRISP-DM: Data Mining Operations
1. Business Understanding:
4. Data Modeling:
1. Determine business
objectives and 1. Select modeling techniques.
requirements. 2. Generate test design.
2. Assess situation and
3. Build and Assess models.
resources.
3. Determine data mining 5. Evaluation:
goals.
1. Evaluate results.
2. Data Understanding: 2. Review process.
1. Collect initial data. 3. Determine next steps.
2. Describe data.
3. Explore data.
6. Deployment:
4. Verify data quality. 1. Plan deployment.
2. Plan monitoring and
3. Data Preparation:
1. Select and Clean data. maintenance.

2. Construct data. 3. Produce final report.

CRISP-DM: Framework for Data Mining
Components of Data Mining
 Data Source: This is the origin of the data, which can be databases,
data warehouses, or other repositories.
 Data Warehouse Server: This component retrieves relevant data
from the data source based on user requests.
 Data Mining Engine: The heart of the data mining process, it
applies various algorithms and techniques to extract patterns from
the data.
 Pattern Evaluation Module: Assesses the discovered patterns
based on predefined criteria to determine their significance and
usefulness.
 Graphical User Interface (GUI): This provides a user-friendly
interface for interaction with the data mining system.
Data Mining Architecture/ Components Of
data Mining
Predictive Analytics

 It is the use of data to predict future trends and events.

 Attempts to answer the question, “What might happen next?”
 It leverages historical data, statistical modeling, and machine
learning algorithms to identify patterns and make forecasts.
 It works by identifying correlations between different
elements in selected datasets.
 There are broadly two types of predictive analytics models:
 classification models
 regression models.
Predictive Analytics Challenges
 Data Quality: Inaccurate, incomplete, or biased data can lead to
unreliable models.
 Data Availability: Insufficient or limited data can hinder model
development.
 Model Complexity: Complex models can be difficult to interpret and
explain.
 Overfitting: Models that are too closely fitted to the training data
may not perform well on new data.
 Ethical Considerations: Concerns about privacy, bias, and fairness
in model development and deployment.
 Computational Resources: Handling large datasets and complex
models requires significant computational power.
Predictive Analytics Applications
 Finance: Fraud detection, credit risk assessment, investment
portfolio optimization, market trend prediction.
 Healthcare: Disease outbreak prediction, patient risk assessment,
drug discovery, personalized medicine.
 Retail: Customer segmentation, demand forecasting, inventory
management, recommendation systems.
 Marketing: Customer churn prediction, campaign optimization,
targeted advertising.
 Manufacturing: Predictive maintenance, supply chain optimization,
quality control.
 Insurance: Risk assessment, fraud detection, customer churn
prediction.

PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
No ratings yet
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
50 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Unit 1 - Lecture 2
No ratings yet
Unit 1 - Lecture 2
15 pages
Data Mining
No ratings yet
Data Mining
63 pages
Handout 2 Data Mining
No ratings yet
Handout 2 Data Mining
16 pages
DSS Lec.8
No ratings yet
DSS Lec.8
22 pages
Chapter 4 - IS 466 - Spring Semester 23-24 Final
No ratings yet
Chapter 4 - IS 466 - Spring Semester 23-24 Final
57 pages
Chapter 4 SR2023
No ratings yet
Chapter 4 SR2023
58 pages
Data Mining Concepts
100% (3)
Data Mining Concepts
122 pages
Chapter Five Data Mining For Healthcare Analytics
No ratings yet
Chapter Five Data Mining For Healthcare Analytics
77 pages
Data Mining
No ratings yet
Data Mining
30 pages
Data Mining
No ratings yet
Data Mining
13 pages
Predictive & Prescriptive Analytics
No ratings yet
Predictive & Prescriptive Analytics
19 pages
Data Mining and IBM SPSS Modeler
No ratings yet
Data Mining and IBM SPSS Modeler
20 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Chapter 6 - Data Mining
No ratings yet
Chapter 6 - Data Mining
62 pages
Ba Unit 3 Own
No ratings yet
Ba Unit 3 Own
7 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
48 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
PAM - Unit1 PDF
No ratings yet
PAM - Unit1 PDF
217 pages
Turban Dss9e ch05
No ratings yet
Turban Dss9e ch05
54 pages
DMiningKuliah 1 Introduction
No ratings yet
DMiningKuliah 1 Introduction
41 pages
07 DataMining
No ratings yet
07 DataMining
37 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Modern Data Mining Design
No ratings yet
Modern Data Mining Design
49 pages
1 DMiningKuliah 1 Introduction
No ratings yet
1 DMiningKuliah 1 Introduction
51 pages
DSS Chapter 5
No ratings yet
DSS Chapter 5
9 pages
BI Chapter 04 - Unlocked
No ratings yet
BI Chapter 04 - Unlocked
47 pages
Data Mining
No ratings yet
Data Mining
41 pages
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
No ratings yet
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
42 pages
Data Mining
No ratings yet
Data Mining
30 pages
PredictiveAnalysis U1 U2
No ratings yet
PredictiveAnalysis U1 U2
7 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
CH 1 Intro To Data Mining
No ratings yet
CH 1 Intro To Data Mining
17 pages
Chapter 5 - Data Mining
No ratings yet
Chapter 5 - Data Mining
29 pages
PAM - Complete
No ratings yet
PAM - Complete
322 pages
1 - DM
No ratings yet
1 - DM
5 pages
Data Mining
100% (2)
Data Mining
36 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
Study Guide For Test 4
No ratings yet
Study Guide For Test 4
6 pages
Unit 3 BI & Data Science
No ratings yet
Unit 3 BI & Data Science
19 pages
1) Intro To Datamining
No ratings yet
1) Intro To Datamining
17 pages
Unit 1
No ratings yet
Unit 1
59 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
Erickson Power Electronics PDF
0% (2)
Erickson Power Electronics PDF
2 pages
UML - Structural and Behavioral Things
100% (1)
UML - Structural and Behavioral Things
34 pages
SAS IT Theory PC-3 PDF
100% (1)
SAS IT Theory PC-3 PDF
18 pages
ProCash NDC V2000 ProConsult NDC V2000 UserGuide en
100% (1)
ProCash NDC V2000 ProConsult NDC V2000 UserGuide en
420 pages
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
No ratings yet
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
60 pages
T Assignment
No ratings yet
T Assignment
5 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Specification: Short Message Service Centre External Machine Interface
No ratings yet
Specification: Short Message Service Centre External Machine Interface
68 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
My Chapter Two
No ratings yet
My Chapter Two
57 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
What Is Data Mining?: Dama-Ncr
No ratings yet
What Is Data Mining?: Dama-Ncr
36 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
RapidMiner Data Engineering Professional Certification Exam Quiz Answers
No ratings yet
RapidMiner Data Engineering Professional Certification Exam Quiz Answers
8 pages
Caterpillar CCM PC Manual
100% (8)
Caterpillar CCM PC Manual
113 pages
Crisp DM
No ratings yet
Crisp DM
7 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
HTML Videos and Audio
No ratings yet
HTML Videos and Audio
9 pages
Customer List
No ratings yet
Customer List
22 pages
MP 5002 PDF Error
No ratings yet
MP 5002 PDF Error
8 pages
Java Programming 8th Edition by Joyce Farrell
No ratings yet
Java Programming 8th Edition by Joyce Farrell
308 pages
List of Signals Om-Ot
No ratings yet
List of Signals Om-Ot
42 pages
Scrivener Keyboard Shortcuts
No ratings yet
Scrivener Keyboard Shortcuts
3 pages
Industrial Networking
No ratings yet
Industrial Networking
428 pages
Maximus DVR Quick Start Guide v1
No ratings yet
Maximus DVR Quick Start Guide v1
4 pages
Magtag Covid Tracking Project Iot Display: Created by Lady Ada
No ratings yet
Magtag Covid Tracking Project Iot Display: Created by Lady Ada
30 pages
SQL Introduction: DR - Aparna Chaparala
No ratings yet
SQL Introduction: DR - Aparna Chaparala
13 pages
Internet
No ratings yet
Internet
2 pages
AUTOSAR SWS Persistency
No ratings yet
AUTOSAR SWS Persistency
96 pages
CO3053 - Lecture 3 - Embedded Systems Development Process
No ratings yet
CO3053 - Lecture 3 - Embedded Systems Development Process
19 pages
Bayanat UAE JAVA Developer Dubai
No ratings yet
Bayanat UAE JAVA Developer Dubai
2 pages
Ecs4100 12ph
No ratings yet
Ecs4100 12ph
6 pages
CCS 335-Assign - 1
No ratings yet
CCS 335-Assign - 1
3 pages
ORSYP Forum Tutorials:: Dollar Universe - Uproc Steps
No ratings yet
ORSYP Forum Tutorials:: Dollar Universe - Uproc Steps
5 pages
Cit 3350 Mobile Application Development
No ratings yet
Cit 3350 Mobile Application Development
3 pages
Deploy
No ratings yet
Deploy
3 pages
Lovejeet Ar Worksheet 10
No ratings yet
Lovejeet Ar Worksheet 10
2 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
29 Ccxtrem
No ratings yet
29 Ccxtrem
3 pages
Growth Strategy For Digital Champion Program
No ratings yet
Growth Strategy For Digital Champion Program
3 pages
Module - 1 Fundamentals
No ratings yet
Module - 1 Fundamentals
46 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

2 - Unit 1 - Lecture 3

Uploaded by

2 - Unit 1 - Lecture 3

Uploaded by

Data Mining

 Data mining Introduction

What is not Data What is Data Mining? –

– Finding trends and patterns

 Decisions in data mining

 Data mining tasks

2. Construct data. 3. Produce final report.

 It is the use of data to predict future trends and events.

You might also like