0% found this document useful (0 votes)
22 views19 pages

L4

This document discusses machine learning techniques for data mining and extracting information from data. It describes Weka, a software for data preparation, classification, regression, clustering, and other machine learning tasks. The document provides examples of using structural descriptions like rules and decision trees to classify and make predictions from datasets about weather conditions, contact lenses, iris flowers, and CPU performance. It contrasts classification rules with association rules and discusses issues with finding accurate and interesting patterns in data.

Uploaded by

ebrahimsarhan13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views19 pages

L4

This document discusses machine learning techniques for data mining and extracting information from data. It describes Weka, a software for data preparation, classification, regression, clustering, and other machine learning tasks. The document provides examples of using structural descriptions like rules and decision trees to classify and make predictions from datasets about weather conditions, contact lenses, iris flowers, and CPU performance. It contrasts classification rules with association rules and discusses issues with finding accurate and interesting patterns in data.

Uploaded by

ebrahimsarhan13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Data Mining

Practical Machine
Learning Tools and
Techniques
Weka 3:
Machine Learning Software in Java
Weka is a collection of machine learning algorithms
for data mining tasks. It contains tools for data
preparation, classification, regression, clustering,
association rules mining, and visualization.
From data to information
• Society produces huge amounts of data
• Sources: business, science, medicine, economics, geography,
environment, sports, …
• This data is a potentially valuable resource
• Raw data is useless: need techniques to automatically
extract information from it
• Data: recorded facts
• Information: patterns underlying the data
• We are concerned with machine learning techniques for
automatically finding patterns in data
• Patterns that are found may be represented as structural
descriptions or as black-box models
4
Structural descriptions
• Example: if-then rules

If tear production rate = reduced


then recommendation = none
Otherwise, if age = young and astigmatic = no
then recommendation = soft

Age Spectacle Astigmatism Tear production Recommended


prescription rate lenses

Young Myope No Reduced None

Young Hypermetrope No Normal Soft

Pre-presbyopic Hypermetrope No Reduced None

Presbyopic Myope Yes Normal Hard

… … … … …

5
Machine learning
• Definitions of “learning” from dictionary:
To get knowledge of by study, Difficult to measure
experience, or being taught
To become aware by information or
from observation
To commit to memory Trivial for computers
To be informed of, ascertain; to receive instruction

• Operational definition:
Things learn when they change their behavior Does a slipper learn?
in a way that makes them perform better in
the future.

• Does learning imply intention?


6
Data mining
• Finding patterns in data that provide insight or enable
fast and accurate decision making
• Strong, accurate patterns are needed to make decisions
• Problem 1: most patterns are not interesting
• Problem 2: patterns may be inexact (or spurious)
• Problem 3: data may be garbled or missing
• Machine learning techniques identify patterns in data and
provide many tools for data mining
• Of primary interest are machine learning techniques that
provide structural descriptions

7
The weather problem
• Conditions for playing a certain game

Outlook Temperature Humidity Windy Play


Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild Normal False Yes
… … … … …

If outlook = sunny and humidity = high then play = no


If outlook = rainy and windy = true then play = no
If outlook = overcast then play = yes
If humidity = normal then play = yes
If none of the above then play = yes

8
Classification vs. association rules
• Classification rule:
predicts value of a given attribute (the classification of an example)

If outlook = sunny and humidity = high


then play = no

• Association rule:
predicts value of arbitrary attribute (or combination)

If temperature = cool then humidity = normal


If humidity = normal and windy = false
then play = yes
If outlook = sunny and play = no
then humidity = high
If windy = false and play = no
then outlook = sunny and humidity = high

9
Weather data with mixed attributes
• Some attributes have numeric values

Outlook Temperature Humidity Windy Play


Sunny 85 85 False No
Sunny 80 90 True No
Overcast 83 86 False Yes
Rainy 75 80 False Yes
… … … … …

If outlook = sunny and humidity > 83 then play = no


If outlook = rainy and windy = true then play = no
If outlook = overcast then play = yes
If humidity < 85 then play = yes
If none of the above then play = yes

10
The contact lenses data
Age Spectacle prescription Astigmatism Tear production rate Recommended
lenses
Young Myope No Reduced None
Young Myope No Normal Soft
Young Myope Yes Reduced None
Young Myope Yes Normal Hard
Young Hypermetrope No Reduced None
Young Hypermetrope No Normal Soft
Young Hypermetrope Yes Reduced None
Young Hypermetrope Yes Normal hard
Pre-presbyopic Myope No Reduced None
Pre-presbyopic Myope No Normal Soft
Pre-presbyopic Myope Yes Reduced None
Pre-presbyopic Myope Yes Normal Hard
Pre-presbyopic Hypermetrope No Reduced None
Pre-presbyopic Hypermetrope No Normal Soft
Pre-presbyopic Hypermetrope Yes Reduced None
Pre-presbyopic Hypermetrope Yes Normal None
Presbyopic Myope No Reduced None
Presbyopic Myope No Normal None
Presbyopic Myope Yes Reduced None
Presbyopic Myope Yes Normal Hard
Presbyopic Hypermetrope No Reduced None
Presbyopic Hypermetrope No Normal Soft
Presbyopic Hypermetrope Yes Reduced None
Presbyopic Hypermetrope Yes Normal None
11
A complete and correct rule set

If tear production rate = reduced then recommendation = none


If age = young and astigmatic = no
and tear production rate = normal then recommendation = soft
If age = pre-presbyopic and astigmatic = no
and tear production rate = normal then recommendation = soft
If age = presbyopic and spectacle prescription = myope
and astigmatic = no then recommendation = none
If spectacle prescription = hypermetrope and astigmatic = no
and tear production rate = normal then recommendation = soft
If spectacle prescription = myope and astigmatic = yes
and tear production rate = normal then recommendation = hard
If age young and astigmatic = yes
and tear production rate = normal then recommendation = hard
If age = pre-presbyopic
and spectacle prescription = hypermetrope
and astigmatic = yes then recommendation = none
If age = presbyopic and spectacle prescription = hypermetrope
and astigmatic = yes then recommendation = none

12
A decision tree for this problem

13
Classifying iris flowers

Sepal length Sepal width Petal length Petal width Type


1 5.1 3.5 1.4 0.2 Iris setosa
2 4.9 3.0 1.4 0.2 Iris setosa

51 7.0 3.2 4.7 1.4 Iris versicolor
52 6.4 3.2 4.5 1.5 Iris versicolor

101 6.3 3.3 6.0 2.5 Iris virginica
102 5.8 2.7 5.1 1.9 Iris virginica

If petal length < 2.45 then Iris setosa


If sepal width < 2.10 then Iris versicolor
...

14
Predicting CPU performance
• Example: 209 different computer configurations
Cycle time Main memory Cache Channels Performance
(ns) (Kb) (Kb)
MYCT MMIN MMAX CACH CHMIN CHMAX PRP
1 125 256 6000 256 16 128 198
2 29 8000 32000 32 8 32 269

208 480 512 8000 32 0 0 67
209 480 1000 4000 0 0 0 45

• Linear regression function

PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX


+ 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX

15
Data from labor negotiations

Attribute Type 1 2 3 … 40
Duration (Number of years) 1 2 3 2
Wage increase first year Percentage 2% 4% 4.3% 4.5
Wage increase second year Percentage ? 5% 4.4% 4.0
Wage increase third year Percentage ? ? ? ?
Cost of living adjustment {none,tcf,tc} none tcf ? none
Working hours per week (Number of hours) 28 35 38 40
Pension {none,ret-allw, empl-cntr} none ? ? ?
Standby pay Percentage ? 13% ? ?
Shift-work supplement Percentage ? 5% 4% 4
Education allowance {yes,no} yes ? ? ?
Statutory holidays (Number of days) 11 15 12 12
Vacation {below-avg,avg,gen} avg gen gen avg
Long-term disability assistance {yes,no} no ? ? yes
Dental plan contribution {none,half,full} none ? full full
Bereavement assistance {yes,no} no ? ? yes
Health plan contribution {none,half,full} none ? full half
Acceptability of contract {good,bad} bad good good good

16
Decision trees for the labor data

17
Questions
Thank you

You might also like