0% found this document useful (0 votes)

22 views19 pages

L4

This document discusses machine learning techniques for data mining and extracting information from data. It describes Weka, a software for data preparation, classification, regression, clustering, and other machine learning tasks. The document provides examples of using structural descriptions like rules and decision trees to classify and make predictions from datasets about weather conditions, contact lenses, iris flowers, and CPU performance. It contrasts classification rules with association rules and discusses issues with finding accurate and interesting patterns in data.

Uploaded by

ebrahimsarhan13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views19 pages

L4

Uploaded by

ebrahimsarhan13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Data Mining

Practical Machine
Learning Tools and
Techniques
Weka 3:
Machine Learning Software in Java
Weka is a collection of machine learning algorithms
for data mining tasks. It contains tools for data
preparation, classification, regression, clustering,
association rules mining, and visualization.
From data to information
• Society produces huge amounts of data
• Sources: business, science, medicine, economics, geography,
environment, sports, …
• This data is a potentially valuable resource
• Raw data is useless: need techniques to automatically
extract information from it
• Data: recorded facts
• Information: patterns underlying the data
• We are concerned with machine learning techniques for
automatically finding patterns in data
• Patterns that are found may be represented as structural
descriptions or as black-box models
4
Structural descriptions
• Example: if-then rules

If tear production rate = reduced

then recommendation = none
Otherwise, if age = young and astigmatic = no
then recommendation = soft

Age Spectacle Astigmatism Tear production Recommended

prescription rate lenses

Young Myope No Reduced None

Young Hypermetrope No Normal Soft

Pre-presbyopic Hypermetrope No Reduced None

Presbyopic Myope Yes Normal Hard

… … … … …

5
Machine learning
• Definitions of “learning” from dictionary:
To get knowledge of by study, Difficult to measure
experience, or being taught
To become aware by information or
from observation
To commit to memory Trivial for computers
To be informed of, ascertain; to receive instruction

• Operational definition:
Things learn when they change their behavior Does a slipper learn?
in a way that makes them perform better in
the future.

• Does learning imply intention?

6
Data mining
• Finding patterns in data that provide insight or enable
fast and accurate decision making
• Strong, accurate patterns are needed to make decisions
• Problem 1: most patterns are not interesting
• Problem 2: patterns may be inexact (or spurious)
• Problem 3: data may be garbled or missing
• Machine learning techniques identify patterns in data and
provide many tools for data mining
• Of primary interest are machine learning techniques that
provide structural descriptions

7
The weather problem
• Conditions for playing a certain game

Outlook Temperature Humidity Windy Play

Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild Normal False Yes
… … … … …

If outlook = sunny and humidity = high then play = no

If outlook = rainy and windy = true then play = no
If outlook = overcast then play = yes
If humidity = normal then play = yes
If none of the above then play = yes

8
Classification vs. association rules
• Classification rule:
predicts value of a given attribute (the classification of an example)

If outlook = sunny and humidity = high

then play = no

• Association rule:
predicts value of arbitrary attribute (or combination)

If temperature = cool then humidity = normal

If humidity = normal and windy = false
then play = yes
If outlook = sunny and play = no
then humidity = high
If windy = false and play = no
then outlook = sunny and humidity = high

9
Weather data with mixed attributes
• Some attributes have numeric values

Outlook Temperature Humidity Windy Play

Sunny 85 85 False No
Sunny 80 90 True No
Overcast 83 86 False Yes
Rainy 75 80 False Yes
… … … … …

If outlook = sunny and humidity > 83 then play = no

If outlook = rainy and windy = true then play = no
If outlook = overcast then play = yes
If humidity < 85 then play = yes
If none of the above then play = yes

10
The contact lenses data
Age Spectacle prescription Astigmatism Tear production rate Recommended
lenses
Young Myope No Reduced None
Young Myope No Normal Soft
Young Myope Yes Reduced None
Young Myope Yes Normal Hard
Young Hypermetrope No Reduced None
Young Hypermetrope No Normal Soft
Young Hypermetrope Yes Reduced None
Young Hypermetrope Yes Normal hard
Pre-presbyopic Myope No Reduced None
Pre-presbyopic Myope No Normal Soft
Pre-presbyopic Myope Yes Reduced None
Pre-presbyopic Myope Yes Normal Hard
Pre-presbyopic Hypermetrope No Reduced None
Pre-presbyopic Hypermetrope No Normal Soft
Pre-presbyopic Hypermetrope Yes Reduced None
Pre-presbyopic Hypermetrope Yes Normal None
Presbyopic Myope No Reduced None
Presbyopic Myope No Normal None
Presbyopic Myope Yes Reduced None
Presbyopic Myope Yes Normal Hard
Presbyopic Hypermetrope No Reduced None
Presbyopic Hypermetrope No Normal Soft
Presbyopic Hypermetrope Yes Reduced None
Presbyopic Hypermetrope Yes Normal None
11
A complete and correct rule set

If tear production rate = reduced then recommendation = none

If age = young and astigmatic = no
and tear production rate = normal then recommendation = soft
If age = pre-presbyopic and astigmatic = no
and tear production rate = normal then recommendation = soft
If age = presbyopic and spectacle prescription = myope
and astigmatic = no then recommendation = none
If spectacle prescription = hypermetrope and astigmatic = no
and tear production rate = normal then recommendation = soft
If spectacle prescription = myope and astigmatic = yes
and tear production rate = normal then recommendation = hard
If age young and astigmatic = yes
and tear production rate = normal then recommendation = hard
If age = pre-presbyopic
and spectacle prescription = hypermetrope
and astigmatic = yes then recommendation = none
If age = presbyopic and spectacle prescription = hypermetrope
and astigmatic = yes then recommendation = none

12
A decision tree for this problem

13
Classifying iris flowers

Sepal length Sepal width Petal length Petal width Type

1 5.1 3.5 1.4 0.2 Iris setosa
2 4.9 3.0 1.4 0.2 Iris setosa
…
51 7.0 3.2 4.7 1.4 Iris versicolor
52 6.4 3.2 4.5 1.5 Iris versicolor
…
101 6.3 3.3 6.0 2.5 Iris virginica
102 5.8 2.7 5.1 1.9 Iris virginica
…

If petal length < 2.45 then Iris setosa

If sepal width < 2.10 then Iris versicolor
...

14
Predicting CPU performance
• Example: 209 different computer configurations
Cycle time Main memory Cache Channels Performance
(ns) (Kb) (Kb)
MYCT MMIN MMAX CACH CHMIN CHMAX PRP
1 125 256 6000 256 16 128 198
2 29 8000 32000 32 8 32 269
…
208 480 512 8000 32 0 0 67
209 480 1000 4000 0 0 0 45

• Linear regression function

PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX

+ 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX

15
Data from labor negotiations

Attribute Type 1 2 3 … 40
Duration (Number of years) 1 2 3 2
Wage increase first year Percentage 2% 4% 4.3% 4.5
Wage increase second year Percentage ? 5% 4.4% 4.0
Wage increase third year Percentage ? ? ? ?
Cost of living adjustment {none,tcf,tc} none tcf ? none
Working hours per week (Number of hours) 28 35 38 40
Pension {none,ret-allw, empl-cntr} none ? ? ?
Standby pay Percentage ? 13% ? ?
Shift-work supplement Percentage ? 5% 4% 4
Education allowance {yes,no} yes ? ? ?
Statutory holidays (Number of days) 11 15 12 12
Vacation {below-avg,avg,gen} avg gen gen avg
Long-term disability assistance {yes,no} no ? ? yes
Dental plan contribution {none,half,full} none ? full full
Bereavement assistance {yes,no} no ? ? yes
Health plan contribution {none,half,full} none ? full half
Acceptability of contract {good,bad} bad good good good

16
Decision trees for the labor data

17
Questions
Thank you

Nptel Swayam DWDM Slides
No ratings yet
Nptel Swayam DWDM Slides
406 pages
How To Lucid Dream Tonight: Simple Proven Techniques For Having Your First Lucid Dream Within 24 Hours
From Everand
How To Lucid Dream Tonight: Simple Proven Techniques For Having Your First Lucid Dream Within 24 Hours
Kai Riverstone
5/5 (1)
Docmine: Spare Parts Catalog
No ratings yet
Docmine: Spare Parts Catalog
83 pages
Macos Mojave Compatibility 02 07
No ratings yet
Macos Mojave Compatibility 02 07
11 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
40 pages
Chapter 1
No ratings yet
Chapter 1
42 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
42 pages
Machine Learning and Statistics: A Matter of Perspective
No ratings yet
Machine Learning and Statistics: A Matter of Perspective
8 pages
Week 2, ML Models
No ratings yet
Week 2, ML Models
39 pages
DLWSS551 - Introduction
No ratings yet
DLWSS551 - Introduction
54 pages
CH 3
No ratings yet
CH 3
38 pages
Lecture 2
No ratings yet
Lecture 2
39 pages
Class 1a-DataCollection
No ratings yet
Class 1a-DataCollection
14 pages
A Comparative Study On Data Mining Tools: Related Papers
No ratings yet
A Comparative Study On Data Mining Tools: Related Papers
4 pages
PPDM 2 Definition
No ratings yet
PPDM 2 Definition
12 pages
Data Mining Slides
No ratings yet
Data Mining Slides
43 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
43 pages
Data Science
No ratings yet
Data Science
33 pages
Data Mining in Medicine
No ratings yet
Data Mining in Medicine
42 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
76 pages
Comp 6838
No ratings yet
Comp 6838
41 pages
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
39 pages
MLDM Lect1 Introduction
No ratings yet
MLDM Lect1 Introduction
40 pages
Unit 3
No ratings yet
Unit 3
33 pages
Data Mining Practical 7
No ratings yet
Data Mining Practical 7
7 pages
Using Decision Trees in Data Mining For Predicting Factors Influencing of Heart Disease
No ratings yet
Using Decision Trees in Data Mining For Predicting Factors Influencing of Heart Disease
6 pages
Data Mining For Exam
No ratings yet
Data Mining For Exam
10 pages
Background: Research and Evolution
No ratings yet
Background: Research and Evolution
6 pages
5
No ratings yet
5
1 page
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
Data Mining Tools
No ratings yet
Data Mining Tools
13 pages
Lec 1
No ratings yet
Lec 1
72 pages
Lec 1
No ratings yet
Lec 1
48 pages
1.1 What Is Data Mining?
No ratings yet
1.1 What Is Data Mining?
6 pages
07 DataMining
No ratings yet
07 DataMining
37 pages
Unit 10
No ratings yet
Unit 10
47 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
Data Mining Tools
No ratings yet
Data Mining Tools
13 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
35 pages
BI Unit 3 Part 1
No ratings yet
BI Unit 3 Part 1
51 pages
Data Mining: Nicoleta ROGOVSCHI
No ratings yet
Data Mining: Nicoleta ROGOVSCHI
84 pages
Data Mining and Visualization
No ratings yet
Data Mining and Visualization
8 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
27 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Informa) CS: Lecture 6 - Processing Informa4on
No ratings yet
Informa) CS: Lecture 6 - Processing Informa4on
29 pages
Data Science Intro Mulawarman
No ratings yet
Data Science Intro Mulawarman
89 pages
Data Mining
No ratings yet
Data Mining
10 pages
Ijcse 01768
No ratings yet
Ijcse 01768
4 pages
Datamining 1class
No ratings yet
Datamining 1class
76 pages
Top 10 Open Source Data Mining Tools: A Brief Look at Mining Tasks
No ratings yet
Top 10 Open Source Data Mining Tools: A Brief Look at Mining Tasks
2 pages
DataMining Unit-3
No ratings yet
DataMining Unit-3
8 pages
COMP527: Data Mining: M. Sulaiman Khan (Mskhan@liv - Ac.uk)
No ratings yet
COMP527: Data Mining: M. Sulaiman Khan (Mskhan@liv - Ac.uk)
28 pages
Introduction To Machine Learning Notes
No ratings yet
Introduction To Machine Learning Notes
27 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Data Mining at UVA: New Horizons in Teaching and Learning Conference
No ratings yet
Data Mining at UVA: New Horizons in Teaching and Learning Conference
19 pages
Intro - Data Mining & Knowledge Discovery
No ratings yet
Intro - Data Mining & Knowledge Discovery
58 pages
Latest Tools For Data Mining and Machine Learning
No ratings yet
Latest Tools For Data Mining and Machine Learning
1 page
Instant Optimism: How to Be Optimistic Instantly!
From Everand
Instant Optimism: How to Be Optimistic Instantly!
The INSTANT Series
No ratings yet
7 Steps to Endless Motivation
From Everand
7 Steps to Endless Motivation
Aaron Tupaz
No ratings yet
Productivity Dynamo: Time Management Secrets to Supercharge Your Productivity
From Everand
Productivity Dynamo: Time Management Secrets to Supercharge Your Productivity
Ryan Tiernan
No ratings yet
Curved Point-in-Space
No ratings yet
Curved Point-in-Space
13 pages
Gyan Sagar College of Engineering, SAGAR, (M.P.)
No ratings yet
Gyan Sagar College of Engineering, SAGAR, (M.P.)
5 pages
Opticalsmokedetector Salwicoev P
No ratings yet
Opticalsmokedetector Salwicoev P
2 pages
Building Code Requirements For Structural Concrete Reinforced With Glass FiberReinforced Polymer (GFRP) Bars Code and Commentary 440.11.22 Chapter 22
100% (1)
Building Code Requirements For Structural Concrete Reinforced With Glass FiberReinforced Polymer (GFRP) Bars Code and Commentary 440.11.22 Chapter 22
32 pages
Power Electronics For Electric Vehicles
No ratings yet
Power Electronics For Electric Vehicles
51 pages
Ece 34 - Microprocessor System Project
No ratings yet
Ece 34 - Microprocessor System Project
3 pages
Performance and Durability Comparison: Dell Latitude 14 5000 Series vs. HP EliteBook 840 G1
No ratings yet
Performance and Durability Comparison: Dell Latitude 14 5000 Series vs. HP EliteBook 840 G1
20 pages
HDI OnQ RandI Set A Closed To Arrival Control On Rate Levels V1.0
No ratings yet
HDI OnQ RandI Set A Closed To Arrival Control On Rate Levels V1.0
11 pages
fml-g12s Ds en
No ratings yet
fml-g12s Ds en
7 pages
Deep and Surface Learning PDF
No ratings yet
Deep and Surface Learning PDF
1 page
Shenzhen Denver 3000T User Manual
No ratings yet
Shenzhen Denver 3000T User Manual
358 pages
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
No ratings yet
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
14 pages
Lesson Plans Feb. 2019
No ratings yet
Lesson Plans Feb. 2019
13 pages
School Students' Physical Activity Physical Activity and Its Contributing Factors in
No ratings yet
School Students' Physical Activity Physical Activity and Its Contributing Factors in
8 pages
Mining Industry Business Plan by Slidesgo
No ratings yet
Mining Industry Business Plan by Slidesgo
58 pages
3 Happiness Exercises
No ratings yet
3 Happiness Exercises
20 pages
Gender Display in Advertisement and McRobbie Four Ideological Codes
No ratings yet
Gender Display in Advertisement and McRobbie Four Ideological Codes
9 pages
Mathematics Board Examination Mastery Test 2 Engineering Pre-Board
No ratings yet
Mathematics Board Examination Mastery Test 2 Engineering Pre-Board
18 pages
2025 - Fairview Bio Pi Mock F4
No ratings yet
2025 - Fairview Bio Pi Mock F4
13 pages
M62015L, FP M62016L, FP: V C Reset INT GND
No ratings yet
M62015L, FP M62016L, FP: V C Reset INT GND
4 pages
03 - PH of Cement For Floor Installation Testing Hi-Res
No ratings yet
03 - PH of Cement For Floor Installation Testing Hi-Res
2 pages
Susanto Update Cv.2023
No ratings yet
Susanto Update Cv.2023
3 pages
MS For Survey Works (Draft) R5
No ratings yet
MS For Survey Works (Draft) R5
47 pages
List of Government Colleges Affiliated To The University of Jammu (ACADEMIC SESSION 2020-21)
No ratings yet
List of Government Colleges Affiliated To The University of Jammu (ACADEMIC SESSION 2020-21)
9 pages
Robert D'Onofrio-Delay Analysis UK-US Approaches 2018
100% (1)
Robert D'Onofrio-Delay Analysis UK-US Approaches 2018
9 pages
Class 10 - Maths - Arithmetic Progressions
No ratings yet
Class 10 - Maths - Arithmetic Progressions
51 pages
FreemanWhite Hybrid Operating Room Design Guide PDF
No ratings yet
FreemanWhite Hybrid Operating Room Design Guide PDF
11 pages
Oral Habits and Its Relationship To Malocclusion A Review.20141212083000
No ratings yet
Oral Habits and Its Relationship To Malocclusion A Review.20141212083000
4 pages

L4

Uploaded by

L4

Uploaded by

Data Mining

If tear production rate = reduced

Age Spectacle Astigmatism Tear production Recommended

Young Myope No Reduced None

Young Hypermetrope No Normal Soft

Pre-presbyopic Hypermetrope No Reduced None

Presbyopic Myope Yes Normal Hard

• Does learning imply intention?

Outlook Temperature Humidity Windy Play

If outlook = sunny and humidity = high then play = no

If outlook = sunny and humidity = high

If temperature = cool then humidity = normal

Outlook Temperature Humidity Windy Play

If outlook = sunny and humidity > 83 then play = no

If tear production rate = reduced then recommendation = none

Sepal length Sepal width Petal length Petal width Type

If petal length < 2.45 then Iris setosa

• Linear regression function

PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX

You might also like