0% found this document useful (0 votes)
15 views12 pages

PPDM 2 Definition

Uploaded by

xiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

PPDM 2 Definition

Uploaded by

xiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Mining Techniques

(Topics in Machine Learning and Data Mining)


2 – Definition
Dr. Kambiz Ghazinour
Fall 2016
Kent State University
© 2015 Department of Computer Science/Kambiz Ghazinour
Data vs. information
Society produces huge amounts of data


Sources: business, science, medicine, economics,
geography, environment, sports, …

Potentially valuable resource

Raw data is useless: need techniques to
automatically extract information from it
◆Data: recorded facts
◆Information: patterns underlying the data

2
Data mining
Extracting information from data

◆implicit,
◆previously unknown,

◆potentially useful


Needed: programs that detect patterns and
regularities in the data

Strong patterns => good predictions
◆Problem 1: most patterns are not interesting
◆Problem 2: patterns may be inexact (or spurious)

◆Problem 3: data may be garbled or missing 3


Machine learning techniques

Algorithms for acquiring structural
descriptions from examples

Structural descriptions represent patterns
explicitly
◆ Can be used to predict outcome in new situation
◆ Can be used to understand and explain how
prediction is derived
(may be even more important)

Methods originate from artificial
intelligence, statistics, and research on
databases Data Mining: Practical Machine Learning
4
Tools and Techniques (Chapter 1)
Structural descriptions

Example: if-then rules
If tear production rate = reduced
then recommendation = none
Otherwise, if age = young and astigmatic = no
then recommendation = soft

Age Spectacle Astigmatism Tear Recommended


prescription production rate lenses

Young Myope No Reduced None

Young Hypermetrope No Normal Soft

Pre- Hypermetrope No Reduced None


presbyopic
Presbyopic Myope Yes Normal Hard

… … … … …

5
Can machines really learn?

Definitions of “learning” from dictionary:
To get knowledge of by study, Difficult to measure
experience, or being taught
To become aware by information or
from observation
To commit to memory Trivial for computers
To be informed of, ascertain; to
receive instruction

Operational definition:

Things learn when they change


their behavior in a way that
Does a slipper learn?
makes them perform better in
the future.

Does learning imply intention?


Data Mining: Practical Machine Learning


6
Tools and Techniques (Chapter 1)
The weather problem

Conditions for playing a certain game
Outlook Temperature Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild Normal False Yes
… … … … …

If outlook = sunny and humidity = high then play = no


If outlook = rainy and windy = true then play = no
If outlook = overcast then play = yes
If humidity = normal then play = yes
If none of the above then play = yes

Data Mining: Practical Machine Learning


7
Tools and Techniques (Chapter 1)
Weather data with mixed attributes

Some attributes have numeric values
Outlook Temperature Humidity Windy Play
Sunny 85 85 False No
Sunny 80 90 True No
Overcast 83 86 False Yes
Rainy 75 80 False Yes
… … … … …

If outlook = sunny and humidity > 83 then play = no


If outlook = rainy and windy = true then play = no
If outlook = overcast then play = yes
If humidity < 85 then play = yes
If none of the above then play = yes

8
Applications
•The result of learning—or the learning method itself—is
deployed in practical applications
–Processing loan applications
–Screening images for oil slicks
–Electricity supply forecasting
–Diagnosis of machine faults
–Marketing and sales
–Separating crude oil and natural gas
–Reducing banding in rotogravure printing
–Finding appropriate technicians for telephone faults
–Scientific applications: biology, astronomy, chemistry
–Automatic selection of TV programs
–Monitoring intensive care patients

© 2015 Department of Computer Science/Kambiz Ghazinour


Privacy Preserving Data Mining
• We want to release aggregate information
about the data, without leaking individual
information about participants.

© 2015 Department of Computer Science/Kambiz Ghazinour


Data mining and ethics I
Ethical issues arise in practical applications

Anonymizing data is difficult


◆ 85% of Americans can be identified from

just zip code, birth date and sex


Data mining often used to discriminate

◆E.g. loan applications: using some information (e.g. sex,


religion, race) is unethical
Ethical situation depends on application

◆ E.g. same information ok in medical application


Attributes may contain problematic information

◆ E.g. area code may correlate with race


11
Data mining and ethics II
Important questions:

◆Who is permitted access to the data?


◆For what purpose was the data collected?

◆What kind of conclusions can be legitimately drawn

from it?

Caveats (warnings) must be attached to
results

Purely statistical arguments are never
sufficient!

Are resources put to good use?

12

You might also like