0% found this document useful (0 votes)

45 views11 pages

Data Mining What Is Data Mining?

This document discusses data mining and knowledge discovery. It defines data mining as involving domain understanding, data selection, data cleaning, preprocessing, pattern discovery, interpretation, and reporting. The document outlines common data mining techniques like prediction, process control, and fraud detection. It distinguishes data mining from other data analysis techniques and discusses different types of learning systems, models, and knowledge representation forms used in data mining like decision trees, rules, and neural networks.

Uploaded by

james russell west brook

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views11 pages

Data Mining What Is Data Mining?

Uploaded by

james russell west brook

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

What is Data Mining?

Data Mining
• Domain understanding
• Data selection
Andrew Kusiak
Intelligent Systems Laboratory • Data cleaning, e.g., data duplication,
2139 Seamans Center missing data
The University of Iowa
Iowa City, IA 52242 - 1527
• Preprocessing, e.g., integration of different
[email protected] files
http://www.icaen.uiowa.edu/~ankusiak • Pattern (knowledge) discovery
Tel. 319-335 5934
Fax. 319-335 5669 • Interpretation (e.g.,visualization)
• Reporting

Data Mining “Architecture” Illustrative Applications

• Prediction of equipment faults
• Determining a stock level
• Process control
• Fraud detection
• Genetics
• Disease staging and diagnosis
• Decision making

1
Pharmaceutical Industry Pharmaceutical Industry
• Selection of “Patient suitable” medication
– Adverse drug effects minimized
– Drug effectiveness maximized
An individual object (e.g., product,
– New markets for “seemingly ineffective” drugs
patient, drug) orientation
vs • “Medication bundle”
– Life-time treatments
A population of objects (products,
patients, drugs) orientation • Design and virtual testing of new drugs

What is Knowledge Discovery?

Data Mining is Not
• Data warehousing
Data • SQL / Ad hoc queries / reporting
• Software agents
Set
• Online Analytical Processing (OLAP)
• Data visualization
E.g., Excel, Access,
Data Warehouse

2
Learning Systems (1/2)
Learning Systems (2/2)
• Classical statistical methods
(e.g., discriminant analysis) • Association rule algorithms
• Modern statistical techniques • Text mining algorithms
(e.g., k-nearest neighbor, Bayes theorem) • Meta-learning algorithms
• Neural networks • Inductive learning programming
• Support vector machines • Sequence learning
• Decision tree algorithms
• Decision rule algorithms
• Learning classifier systems

Regression Models Neural Networks

• Simple linear regression = Linear combination of inputs

• Based on biology
• Inputs transformed via a network of simple processors
• Logistic regression = Logistic function of a linear
• Processor combines (weighted) inputs and produces an
combination of inputs
output value
- Classic “perceptron”
• Obvious questions: What transformation function do you use
and how are the weights determined?

3
Neural Networks Types of Decision Trees
• CHAID: Chi-Square Automatic Interaction Detection
- Kass (1980)
• Feed-forward - Regression analogy - n-way splits
• Multi-layer NN- Nonlinear regression analogy - Categorical variables
• CART: Classification and Regression Trees
- Breimam, Friedman, Olshen, and Stone (1984)
- Binary splits
- Continuous variables
• C4.5
- Quinlan (1993)
- Also used for rule induction

Text Mining Yet Another Classification

• Supervised
• Mining unstructured data (free-form text) is - Regression models
a challenge for data mining - k-Nearest-Neighbor
• Usual solution is to impose structure on the data and - Neural networks
then process using standard techniques, e.g.,
- Rule induction
- Simple heuristics (e.g., unusual words)
- Domain expertise - Decision trees
- Linguistic analysis • Unsupervised
• Presentation is critical - k-means clustering
- Self organized maps

4
Supervised Learning Algorithms
Knowledge Representation Forms
• kNN
- Quick and easy
- Models tend to be very large
• Neural Networks
- Difficult to interpret • Decision rules
- Training can be time consuming • Trees (graphs)
• Rule Induction
- Understandable • Patterns (matrices)
- Need to limit calculations
• Decision Trees
- Understandable
- Relatively fast
- Easy to translate into SQL queries

Decision Rules
DM: Product Quality Example
Rule 1. IF (Process_parameter_1 < 0.515) THEN (D = Poor_Quality);
[2, 2, 50.00%, 100.00%][2, 0][5, 6]
Training data set
Rule 2. IF (Test_2 = Low) THEN (D = Poor_Quality);
Product Process Test_1 Process Test_2 Quality [3, 3, 75.00%, 100.00%][3,0][2, 5, 8]
ID param 1 param_2 D
1 1.02 Red 2.98 High Good_Quality Rule 3. IF (Process_parameter_2 >= 2.01) THEN (D = Good_Quality);
2 2.03 Black 1.04 Low Poor_Quality [3, 3, 75.00%, 100.00%][0, 3][1, 3, 4]
3 0.99 Blue 3.04 High Good_Quality
4 2.03 Blue 3.11 High Good_Quality Rule 4. IF (Process_parameter_1 >= 0.515) & (Test_1 = Orange) THEN
5 0.03 Orange 0.96 Low Poor_Quality (D = Good_Quality);
[1, 1, 25.00%, 100.00%][0, 1][7]
6 0.04 Blue 1.04 Medium Poor_Quality
7 0.99 Orange 1.04 Medium Good_Quality
8 1.02 Red 0.94 Low Poor_Quality

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

5
Decision Rule Metrics Definitions
Rule 12 • Support = Number of objects satisfying
IF (Flow = 6) AND (Pressure = 7) conditions of the rule
THEN (Efficiency = 81); No of supporting • Strength = Number of objects satisfying
[13, 8, 4.19%, 61.54%] [1, 8, 4] objects conditions and the decision of the rule
Support Strength Relative strength Confidence • Relative strength = Number of objects
[ { 524 }, satisfying conditions and decision of the
{ 527, 528, 529, 530, 531, 533, 535, 536 }, rule/The number of objects in the class
{ 525, 526, 532, 534 }]
• Confidence = Strength/Support
Supporting objects

Classification Accuracy
Decision rules
Test: Leaving-one-out Rule 113
Confusion Matrix
Poor_Quality Good_Quality None IF (B_Master >= 1634.26)
Poor_Quality 3 1 0 AND (B_Temp in (1601.2, 1660.22]
Good_Quality 1 3 0 AND (B_Pressure in [17.05, 18.45))
AND (A_point = 0.255) AND (Average_O2 = 77)
Average Accuracy [%] THEN (Eff = 87) OR (Eff = 88);
Correct Incorrect None
Total 75.00 25.00 0.00 [6, 6, 23.08%, 100.00%][0, 0, 0, 0, 0, 0, 0, 3, 3, 0]
Poor_Quality 75.00 25.00 0.00 [{2164, 2167, 2168}, {2163, 2165, 2166}]
Good_Quality 75.00 25.00 0.00

6
Decision rules Decision Rule vs Decision Tree
Algorithms
Rule 12
IF (Ave_Middle_Bed = 0) AND (PA_Fan_Flow = 18) THEN
(Efficiency = 71); F1 F2 F3 F4 D
0 0 0 1 One
[16, 10, 10.31%, 62.50%] [1, 1, 2, 10, 2,]
[{ 682 }, { 681 }, { 933, 936 }, 0 0 1 1 Two
{ 875, 876, 877, 878, 879, 880, 934, 935, 1000, 1001}, 0 1 1 1 Three
{ 881, 882 }]
1 1 1 1 Four

Decision Tree
F1 F2 F3 F4 D
Decision Tree
0 0 0 1 One
0 0 1 1 Two
F1 F2 F3 F4 D
0 1 1 1 Three 0 0 0 1 One
1 1 1 1 Four
F2 1 0 0 1 1 Two
0
0 1 1 1 Three
F3 F1 1 1 1 1 Four 0
F2
1

0 1 0 1 F3 F1

0 1 0 1
0001 0011 0111 1111 0001 0011 0111 1111

One Two Three Four One Two Three Four

7
Decision Rules
Rule 1. (F3 = 0) THEN (D = One);
Decision Tree vs Rule Tree
[1, 100.00%, 100.00%][1]
Rule 2. (F2 = 0) AND (F3 = 1) THEN (D = Two); F2 1
[1, 100.00%, 100.00%][2] 0
Rule 3. (F1 = 0) AND (F2 = 1) THEN (D = Three); Decision Tree
[1, 100.00%, 100.00%][3] F3 F1
Rule 4. (F1 = 1) THEN (D = Four);
[1, 100.00%, 100.00%][4] 0 1 0 1
0001 0011 0111 1111
F1 F2 F3 F4 D
One Two Three Four
0 0 0 1 One
F3 F2 F1
0 0 1 1 Two
0 1 0 1 0 1
0 1 1 1 Three Rule Tree
0001 0011 0111 1111
1 1 1 1 Four One Two Three Four

Use of Extracted Knowledge

Decision Rule Algorithms
-0 1-

F1 F2 F3 F4 D
Identify 0
F2
1
0 0 0 1 One
unique features of an object F3 F1
0 0 1 1 Two
rather than 0 1 1 1 Three
0 1 0 1

0001 0011 0111 1111

commonality among all objects 1 1 1 1 Four One Two Three Four

8
Traditional Modeling Data Mining
• Regression analysis
• Neural network

• Rules
• Decision trees
• Patterns

Data Life Cycle

Evolution in Data Mining

Data
farming
• Data Farming

Result
• Cultivating data evaluation Knowledge
extraction
rather than
assuming that it is
Decision-
available making

9
Data Farming Data Farming

Pull data approach

vs
Define features that
Push data approach in classical • Maximize classification accuracy
and
data mining • Minimize the data collection cost

Data Mining Standards Summary

• Predictive Model Markup Language (PMML)
- The Data Mining Group (www.dmg.org) • Data mining
- XML based (DTD) algorithms support a
• Java Data Mining API spec request (JSR-000073) new paradigm:
- Oracle, Sun, IBM, …
Identify what is
- Support for data mining APIs on J2EE platforms
- Build, manage, and score models programmatically unique about an object
• OLE DB for Data Mining • DM tools to enter new
- Microsoft areas of information
- Table based analysis
- Incorporates PMML

10
References (1/2) References (2/2)
Kusiak, A. Rough Set Theory: A Data Mining Tool for Semiconductor
Manufacturing, IEEE Transactions on Electronics Packaging A. Kusiak, Feature Transformation Methods in Data Mining,
Manufacturing, Vol. 24, No. 1, 2001, pp. 44-50. IEEE Transactions on Electronics Packaging Manufacturing,
Vol. 24, No. 3, 2001, pp. 214 -221.
Kusiak, A., Decomposition in Data Mining: An Industrial Case Study,
IEEE Transactions on Electronics Packaging Manufacturing,
Vol. 23, No. 4, 2000, pp. 345-353. A. Kusiak, I.H. Law, M.D. Dick, The G-Algorithm for Extraction
of Robust Decision Rules: Children’s Postoperative Intra-atrial
Kusiak, A., J.A. Kern, K.H. Kernstine, and T.L. Tseng, Autonomous Arrhythmia Case Study, IEEE Transactions on Information
Decision-Making: A Data Mining Approach, IEEE Transactions on Technology in Biomedicine, Vol. 5, No. 3, 2001, pp. 225-235.
Information Technology in Biomedicine, Vol. 4, No. 4, 2000, pp. 274-
284.

Bise Bahawalpur Fa FSC Result Gazette 2021
No ratings yet
Bise Bahawalpur Fa FSC Result Gazette 2021
634 pages
Review of Data Mining Classification Techniques
No ratings yet
Review of Data Mining Classification Techniques
4 pages
Theories Supporting The Teaching of Physical and Health Education For Elementary Grades
100% (1)
Theories Supporting The Teaching of Physical and Health Education For Elementary Grades
10 pages
Self Determination Theory
100% (1)
Self Determination Theory
11 pages
Basic Principles of Measurement
No ratings yet
Basic Principles of Measurement
13 pages
Organizational Behavior Chapter 16 Quiz
0% (2)
Organizational Behavior Chapter 16 Quiz
3 pages
Chp8 Classification Basic Concepts - Lecture#8
No ratings yet
Chp8 Classification Basic Concepts - Lecture#8
40 pages
Discipline and Ideas in The Applied Social Sciences
No ratings yet
Discipline and Ideas in The Applied Social Sciences
22 pages
CDEV8130 Career Management Assignment 3 My Interview Preparation Self Assessment
100% (2)
CDEV8130 Career Management Assignment 3 My Interview Preparation Self Assessment
7 pages
Terjemahan Robert K. Yin - Multi-Case - Multiple Case Method
No ratings yet
Terjemahan Robert K. Yin - Multi-Case - Multiple Case Method
13 pages
Strain Theory Case Study
No ratings yet
Strain Theory Case Study
6 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
Socioeconomic Status Parental Involvemen 9164904a
No ratings yet
Socioeconomic Status Parental Involvemen 9164904a
12 pages
Dbms Unit 3
No ratings yet
Dbms Unit 3
40 pages
6 الى13 داتا ماينق
No ratings yet
6 الى13 داتا ماينق
19 pages
Why Data Mining
No ratings yet
Why Data Mining
12 pages
Introduction To Object-Oriented Programming Using C++: Peter Müller
No ratings yet
Introduction To Object-Oriented Programming Using C++: Peter Müller
100 pages
SSRN Id4413415
No ratings yet
SSRN Id4413415
43 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Unit 1 The Nature and Context of Social Research
No ratings yet
Unit 1 The Nature and Context of Social Research
48 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
No ratings yet
A.V.C.College of Engineering: Mannampandal, Mayiladuthurai-609 305
96 pages
DmUnit 3
No ratings yet
DmUnit 3
42 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Data Mining Technologies and Implementations
No ratings yet
Data Mining Technologies and Implementations
34 pages
Data Mining: Budi Santosa, PHD 2008 Lab Komputasi Dan Optimasi Industri Teknik Industri Its
No ratings yet
Data Mining: Budi Santosa, PHD 2008 Lab Komputasi Dan Optimasi Industri Teknik Industri Its
42 pages
Discovering New Knowledge - Data Mining
No ratings yet
Discovering New Knowledge - Data Mining
55 pages
Lec06 Classification NaiveBayes RuleBased
No ratings yet
Lec06 Classification NaiveBayes RuleBased
44 pages
Tutorial
No ratings yet
Tutorial
52 pages
Week 5 - Decision Trees
No ratings yet
Week 5 - Decision Trees
42 pages
Information Technology Fundamentals: CCIT4085
No ratings yet
Information Technology Fundamentals: CCIT4085
43 pages
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
28 pages
Personal Mastery and Mental Model
No ratings yet
Personal Mastery and Mental Model
14 pages
Association Rule Mining Using Improved Apriori Algorithm: Munawar Hassan
No ratings yet
Association Rule Mining Using Improved Apriori Algorithm: Munawar Hassan
25 pages
Chapter 5 TM
No ratings yet
Chapter 5 TM
17 pages
3 1 Overfitting
No ratings yet
3 1 Overfitting
25 pages
Unit 1-1
No ratings yet
Unit 1-1
45 pages
AI Lecture 9
No ratings yet
AI Lecture 9
69 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
Gerard MC Nulty Systems Optimisation LTD BA.,B.A.I.,C.Eng.,F.I.E.I
No ratings yet
Gerard MC Nulty Systems Optimisation LTD BA.,B.A.I.,C.Eng.,F.I.E.I
39 pages
Artificial Intelligence Applied To Software Testing
No ratings yet
Artificial Intelligence Applied To Software Testing
7 pages
Macias. J. 9.1 EC Presentation
No ratings yet
Macias. J. 9.1 EC Presentation
12 pages
Class 2a-Decision Trees
No ratings yet
Class 2a-Decision Trees
28 pages
RP Ai
No ratings yet
RP Ai
22 pages
Data Mining Definition: - Finding Hidden Information in A Database - Similar Terms
No ratings yet
Data Mining Definition: - Finding Hidden Information in A Database - Similar Terms
25 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
Toward Integrating Feature Selection Algorithms For Classification and Clustering-M7s PDF
No ratings yet
Toward Integrating Feature Selection Algorithms For Classification and Clustering-M7s PDF
12 pages
Classification and Prediction-Module4
No ratings yet
Classification and Prediction-Module4
26 pages
AI351 Lecture 1
No ratings yet
AI351 Lecture 1
32 pages
Major Project Presentation ON S.A.G.E: Student's Academic Guide Engine
No ratings yet
Major Project Presentation ON S.A.G.E: Student's Academic Guide Engine
18 pages
Performance Evaluation of Machine Learning Algorithms in Post-Operative Life Expectancy in The Lung Cancer Patients
No ratings yet
Performance Evaluation of Machine Learning Algorithms in Post-Operative Life Expectancy in The Lung Cancer Patients
11 pages
University of Mindanao: Rolando A. Daraquit, Jr. MBA 101 - Applied Business Research
No ratings yet
University of Mindanao: Rolando A. Daraquit, Jr. MBA 101 - Applied Business Research
19 pages
LESSON 5 Flexibility in Learning
No ratings yet
LESSON 5 Flexibility in Learning
15 pages
Data Mining Lab Manual Student - Copy - For - Print
No ratings yet
Data Mining Lab Manual Student - Copy - For - Print
24 pages
Ques 1.give Some Examples of Data Preprocessing Techniques?: Assignment - DWDM Submitted By-Tanya Sikka 1719210284
No ratings yet
Ques 1.give Some Examples of Data Preprocessing Techniques?: Assignment - DWDM Submitted By-Tanya Sikka 1719210284
7 pages
Machine Learning and Statistics: A Matter of Perspective
No ratings yet
Machine Learning and Statistics: A Matter of Perspective
8 pages
Historical Foundation
No ratings yet
Historical Foundation
13 pages
50 Years Later: A Conversation About The Biological Study of Language With Noam Chomsky
No ratings yet
50 Years Later: A Conversation About The Biological Study of Language With Noam Chomsky
13 pages
ML Passing Package - 1
No ratings yet
ML Passing Package - 1
43 pages
DMBI Sample Questions
No ratings yet
DMBI Sample Questions
7 pages
A Data Mining Architecture For Distributed Environments: Lecture Notes in Computer Science June 2002
No ratings yet
A Data Mining Architecture For Distributed Environments: Lecture Notes in Computer Science June 2002
13 pages
Script of E - Previous Question Papers - URR18 03.08.2023 - VI Semester - U18CS605 PDF
No ratings yet
Script of E - Previous Question Papers - URR18 03.08.2023 - VI Semester - U18CS605 PDF
10 pages
BML Answer Key
No ratings yet
BML Answer Key
21 pages
Data Mining University Answer
No ratings yet
Data Mining University Answer
10 pages
Class 2b-Decision Rules
No ratings yet
Class 2b-Decision Rules
24 pages
Ans DM
No ratings yet
Ans DM
16 pages
Unit 2 S4 Slo2
No ratings yet
Unit 2 S4 Slo2
37 pages
Notes On Normalization of Databases Normalization Is Due To E. F. Codd - Creator of The Relational Database Management
No ratings yet
Notes On Normalization of Databases Normalization Is Due To E. F. Codd - Creator of The Relational Database Management
4 pages
MSc-Process-Engineering ETH Zurich
No ratings yet
MSc-Process-Engineering ETH Zurich
9 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
Rule Induction in Data Mining
No ratings yet
Rule Induction in Data Mining
10 pages
0002 - Evolution of Management Thought
No ratings yet
0002 - Evolution of Management Thought
8 pages
ESSAY
No ratings yet
ESSAY
9 pages
Unit 4 Classification & Prediction
No ratings yet
Unit 4 Classification & Prediction
10 pages
A Hybrid Approach For Classification Tree Generation
No ratings yet
A Hybrid Approach For Classification Tree Generation
3 pages
Aligning The Goals of Community Engaged Research .14
No ratings yet
Aligning The Goals of Community Engaged Research .14
7 pages
Teaching Decision Tree Classification Using Microsoft Excel
No ratings yet
Teaching Decision Tree Classification Using Microsoft Excel
9 pages
1744 5586 1 PB
No ratings yet
1744 5586 1 PB
9 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
RM Cource Outline
No ratings yet
RM Cource Outline
5 pages
ML Important
No ratings yet
ML Important
11 pages
Introduction: Biology Today: Powerpoint Lectures For
No ratings yet
Introduction: Biology Today: Powerpoint Lectures For
12 pages
DWDM Short YNotes
No ratings yet
DWDM Short YNotes
9 pages
Predictive Data Mining and Discovering Hidden Values of Data Warehouse
No ratings yet
Predictive Data Mining and Discovering Hidden Values of Data Warehouse
5 pages
Q1R Ext
No ratings yet
Q1R Ext
4 pages
Ia1 ML Scheme Common To Is, Ai, Cs
No ratings yet
Ia1 ML Scheme Common To Is, Ai, Cs
10 pages
DS 2 Hamza
No ratings yet
DS 2 Hamza
5 pages
Data Mining Algo
No ratings yet
Data Mining Algo
8 pages
Data Mining List of Important Question
No ratings yet
Data Mining List of Important Question
4 pages
T & L Details: (Asynchronous Learning/Tutorial)
No ratings yet
T & L Details: (Asynchronous Learning/Tutorial)
3 pages
Essential Methods For Design To Value" (DTV) in Procurement and Cost & Value Engineering - Design To Cost
No ratings yet
Essential Methods For Design To Value" (DTV) in Procurement and Cost & Value Engineering - Design To Cost
2 pages
VANSH
No ratings yet
VANSH
2 pages

Data Mining What Is Data Mining?

Uploaded by

Data Mining What Is Data Mining?

Uploaded by

What is Data Mining?

Data Mining “Architecture” Illustrative Applications

What is Knowledge Discovery?

Regression Models Neural Networks

• Simple linear regression = Linear combination of inputs

Text Mining Yet Another Classification

One Two Three Four One Two Three Four

Use of Extracted Knowledge

0001 0011 0111 1111

Data Life Cycle

Pull data approach

Data Mining Standards Summary

You might also like