0% found this document useful (0 votes)
19 views

2 Data Mining

Uploaded by

ygayathri2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

2 Data Mining

Uploaded by

ygayathri2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

DATA MINING

Coping with Information

 Computerization of daily life produces data


 Point-of-sale, Internet shopping (& browsing), credit
cards, banks . . .
 Info on credit cards, purchase patterns, payment
history, sites visited . . .
 Travel. One trip by one person generates info
on destination, airline preferences, seat
selection, hotel, rental car, name, address,
restaurant choices . . .
 Data cannot be processed or even inspected
manually
 Automated data collection tools and mature
database technology lead to tremendous
amounts of data stored in databases, data
warehouses and other information repositories
Data Overload
Vast quantities of data are collected and
stored out of fear that important info will be
missed
Data volume grows so fast that old data is
never analyzed
Only a small portion of data collected is
analyzed (estimate: 5%)
Database systems do not support queries like
 “Who is likely to buy product X”
 “List all reports of problems similar to this one”
 “Flag all fraudulent transactions”

But these may be the most important


questions!
Why mine data?

There is often information ‘hidden’ in


the data that is not readily evident
“More often, data mining yields
unexpected nuggets of information that
open the company’s eyes to new
markets, new ways of reaching
customers and new ways of doing
business”
Human analysts may take a very long
time to discover useful information
What Is Data Mining?

Data mining (knowledge discovery in databases):

 Extraction of interesting (non-trivial, implicit, previously


unknown and potentially useful) information or patterns from
data in large databases
Alternative names :
 Data mining: a misnomer?
 Knowledge discovery(mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.
Data Mining: A KDD Process
Evaluation & presentation
 Data mining: the core
of knowledge discovery
process.
Data Mining patterns

Selection & transformation

Data Selection
Warehouse
Data Cleaning

Data Integration

Databases
Data Mining: On What Kinds of Data?
7
 Database-oriented data sets and applications
Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data

 Time-series data, temporal data, sequence data (incl. bio-

sequences)
 Structure data, graphs, social networks and multi-linked data

 Object-relational databases

 Heterogeneous databases and legacy databases

 Spatial data and spatiotemporal data

 Multimedia database

 Text databases

 The World-Wide Web

Data Mining: Concepts and Techniques February 11, 2025


Data Mining Functions
(What kind of patterns can be mined)

Concept/class Descriptions
Mining frequent patterns, Associations & correlation
Classification & Prediction
Cluster Analysis
Outlier Analysis
Evolution Analysis
Concept/class description:

Data can be associated with classes or concepts.


Eg: In an electronic store,
classes of items -- computers & printers
concepts of customers --big spenders &
budget spenders
Mining frequent patterns, Associations &
Correlation:

Frequent patterns are the patterns that occur

frequently in data.
There are many kinds of frequent patterns,

including itemsets, subsequences, substructures.


Classification & prediction

•Given: A collection of records (training set)


•Task:
• Find a model for the class attribute as a function of
other attributes
• Use the model to predict the class for previously
unseen records
•Goal:
•Model should accurately predict the class for
previously unseen records (test set)
Process (1): Model Construction
12

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


ravi Assistant Prof 3 no (Model)
suresh Assistant Prof 7 yes
raghav Professor 2 yes
rohan Associate Prof 7 yes
david Assistant Prof 6 no IF rank = ‘professor’
shiva Associate Prof 3 no OR years > 6
THEN tenured = ‘yes’
Data Mining: Concepts and Techniques February 11, 2025
Process (2): Using the Model in Prediction
13

Classifier

Testing
Data Unseen Data

(Sriram, Professor, 4)
NAME RANK YEARS TENURED
mellisa Assistant Prof 2 no Tenured?
ritu Associate Prof 7 no
priya Professor 5 yes
Joseph Assistant Prof 7 yes
Data Mining: Concepts and Techniques February 11, 2025
age( x, “youth” ) AND income( x,”high”) -> class( x,”A”)

age( x, ”youth” ) AND income( x, ”low”) -> class( x,”B”)


age( x, “middle-aged”) --------------------> class( x, ”c”)
age( x, “senior”) ----------------------------> class( x, ”c”)

Fig:IF – THEN rules


Age?
youth Middle_aged
, senior

Income?
Class c

high low

i o n
Class A Class B cis
a de
. :
i
F eg
tre
Clustering

“The art of finding groups in data”


Given:
 A set of data points
 Each data point has a set of attributes
 A distance/similarity measure between data
points
 E.g., Euclidean distance, cosine distance etc.
Task:
 Partition the data points into separate
groups (clusters)
Goal:
 Data points that belong to the same cluster
are similar to one another
 Much more difficult than classification since
the classes are not known in advance (no
training)
 Technique: unsupervised learning
The objects are
clustered or
grouped based on
principle of
maximizing the
intraclass similarity
& minimizing the
interclass
similarity.
Outlier analysis

A database may contain data objects


that do not comply with general
behavior or model of data. These data
objects are outliers.
Some data mining methods discard
outliers as noise or exceptions.
However, it is useful in some
applications such as fraud detection.
Evolution analysis

It describes & models regularities or


trends for objects whose behavior
changes over time.

Example:
The data of result of the last several years of
a college would give an idea of quality of
graduates produced by it.

You might also like