0% found this document useful (0 votes)
8 views13 pages

PPDM - 3 Input (Part1)

Uploaded by

xiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views13 pages

PPDM - 3 Input (Part1)

Uploaded by

xiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Mining Techniques

(Topics in Machine Learning and Data Mining)

3 – Input (part1)
Dr. Kambiz Ghazinour
Fall 2016
Kent State University
© 2015 Department of Computer Science/Kambiz Ghazinour
Input: Concepts, instances, attributes

What’s a concept?
 Classification, association, clustering, numeric prediction

What’s in an example?
 Relations, flat files, recursion

What’s in an attribute?
 Nominal, ordinal, interval, ratio

Preparing the input
 ARFF, attributes, missing values, getting to know data

Data Mining: Practical Machine Learning


2
Tools and Techniques (Chapter 2)
What’s a concept?

Styles of learning:
Classification learning:
predicting a discrete class
Association learning:

detecting associations between


features
Clustering:

grouping similar instances into


clusters
Numeric prediction:

predicting a numeric quantity



Concept: thing to be learned

Concept description: output
Data Mining: Practical Machine Learning of learning
3
Tools and Techniques (Chapter 2)
Classification learning

Example problems: weather data,
contact lenses, irises, labor negotiations

Classification learning is supervised
Scheme is provided with actual outcome


Outcome is called the class of the
example

Measure success on fresh data for which
class labels are known (test data)

Data Mining: Practical Machine Learning


4
Tools and Techniques (Chapter 2)
Association learning

Can be applied if no class is specified and any
kind of structure is considered “interesting”

Difference to classification learning:
Can predict any attribute’s value, not just the class,
and more than one attribute’s value at a time
Hence: far more association rules than classification

rules

Data Mining: Practical Machine Learning


5
Tools and Techniques (Chapter 2)
Clustering

Finding groups of items that are similar

Clustering is unsupervised
 The class of an example is not known

Success often measured subjectively
Sepal Sepal Petal Petal Type
length width length width
1 5.1 3.5 1.4 0.2 Iris setosa
2 4.9 3.0 1.4 0.2 Iris setosa

51 7.0 3.2 4.7 1.4 Iris
versicolor
52 6.4 3.2 4.5 1.5 Iris
versicolor

101 6.3 3.3 6.0 2.5 Iris virginica
102 5.8 2.7 5.1 1.9 Iris virginica

Data Mining: Practical Machine Learning
6
Tools and Techniques (Chapter 2)
Numeric prediction

Variant of classification learning where
“class” is numeric (also called
“regression”)

Learning is supervised
Scheme is being provided with target value


Measure success on test data
Outlook Temperature Humidity Windy Play-time
Sunny Hot High False 5
Sunny Hot High True 0
Overcast Hot High False 55
Rainy Mild Normal False 40
… … … … …

Data Mining: Practical Machine Learning


7
Tools and Techniques (Chapter 2)
A family tree
Peter Peggy Grace Ray
= =
M F F M

Steven Graham Pam Ian Pippa Brian


=
M M F M F M

Anna Nikki
F F

Data Mining: Practical Machine Learning


8
Tools and Techniques (Chapter 2)
Family tree represented as a table

Name Gender Parent1 parent2


Peter Male ? ?
Peggy Female ? ?
Steven Male Peter Peggy
Graham Male Peter Peggy
Pam Female Peter Peggy
Ian Male Grace Ray
Pippa Female Grace Ray
Brian Male Grace Ray
Anna Female Pam Ian
Nikki Female Pam Ian

Data Mining: Practical Machine Learning


9
Tools and Techniques (Chapter 2)
The “sister-of” relation
First Second Sister First Second Sister
person person of? person person of?
Peter Peggy No Steven Pam Yes
Peter Steven No Graham Pam Yes
… … … Ian Pippa Yes
Steven Peter No Brian Pippa Yes
Steven Graham No Anna Nikki Yes
Steven Pam Yes Nikki Anna Yes
… … … All the rest No
Ian Pippa Yes
… … …
Anna Nikki Yes Closed-world assumption
… … …
Nikki Anna yes
Data Mining: Practical Machine Learning
10
Tools and Techniques (Chapter 2)
A full representation in one table

First person Second person Sister


of?
Name Gender Parent Parent Name Gender Parent Parent2
1 2 1
Steven Male Peter Peggy Pam Female Peter Peggy Yes
Graha Male Peter Peggy Pam Female Peter Peggy Yes
m
Ian Male Grace Ray Pippa Female Grace Ray Yes
Brian Male Grace Ray Pippa Female Grace Ray Yes
Anna Female Pam Ian Nikki Female Pam Ian Yes
Nikki Female Pam Ian Anna Female Pam Ian Yes
All the rest No

If second person’s gender = female


and first person’s parent = second person’s parent
then sister-of = yes

Data Mining: Practical Machine Learning


11
Tools and Techniques (Chapter 2)
Generating a flat file

Process of flattening called “denormalization”
Several relations are joined together to make one


Denormalization may produce spurious
regularities that reflect structure of database
Example: “supplier” predicts “supplier address”

Data Mining: Practical Machine Learning


12
Tools and Techniques (Chapter 2)
Recursion

Infinite relations require recursion

If person1 is a parent of person2


then person1 is an ancestor of person2

If person1 is a parent of person2


and person2 is an ancestor of person3
then person1 is an ancestor of person3


Appropriate techniques are known as
“inductive logic programming”

Data Mining: Practical Machine Learning


13
Tools and Techniques (Chapter 2)

You might also like