0% found this document useful (0 votes)
75 views30 pages

Lecture - 2 - Data Mining Concepts

This document discusses data mining and provides an overview of key concepts. It defines descriptive and predictive tasks, with descriptive tasks deriving patterns from data and predictive tasks predicting values based on other attribute values. It also outlines common data mining methods like classification, clustering, association rule mining, and frequent pattern mining. Visualization and different data types are also covered at a high level.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views30 pages

Lecture - 2 - Data Mining Concepts

This document discusses data mining and provides an overview of key concepts. It defines descriptive and predictive tasks, with descriptive tasks deriving patterns from data and predictive tasks predicting values based on other attribute values. It also outlines common data mining methods like classification, clustering, association rule mining, and frequent pattern mining. Visualization and different data types are also covered at a high level.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Data Mining

LECTURE 2
Representation of Data
Semi-structured
Data
Unstructured
Data
Road Map

1. Definitions & Motivations

2. Data to be mined

3. Knowledge to be discovered

4. Major Issues in Data Mining


Data Mining Models and
Tasks
Descriptive and Predictive Task
1.Descriptive tasks:  To derive patterns (correlation, trends, trajectories) that summarizes the
underlying relationship between data.

This task present the general properties of data stored in database. The descriptive task are used to find
out patterns in data that is cluster,Correlation trends and anomalies etc.

2. Predictive tasks:  Predict the value of a specific attribute (target/dependent variable)based on the
value of other attributes (explanatory). 

Predictive data mining task predict the value of one attribute on the basis of values of other attributes,
which is known as target or dependent variable and the attribute used for making the prediction are
known as independent variables.
Types of Variables/Attributes
Data can help us solve specific problems.
How should these pictures be placed
into 3 groups?
How should these pictures be placed into groups? How many groups
should there be?
Which genes are associated with a disease? How can expression values be used
to predict survival?
What items should Amazon display for me?
Is it likely that this stock was traded based on illegal
insider information?
Where are the faces in this
picture?
Is this
spam?
What techniques people
apply on data?
 They apply data mining algorithms and discover useful
knowledge

 So, what are the some of the well-known Data mining


Tasks?
 Clustering,
 Classification,
 Frequent Patterns,
 Association Rules,
 ….
What people do with the time series
data?
Clustering
Classification

Motif Discovery Rule Query by


10
Discover Content
y 
s=
0.5
c=
0.3
Visualization Novelty Detection Motif Association
What people do with the trajectory
data?
Clusterin Frequent Travel
g Patterns

Motif Prediction
Discovery

Visualization Classification
In,
Summary
Types of Data
Data Mining
 Transactional Data Methods
 Sequence Data  Frequent
 Interval Data Pattern
 Time Series Data Clustering
Algorithm Discovery
 Spatial Data  Outlier Detection
s Classification
 Spatio-Temporal Data  Statistical
 Data Set with Analysis
Multiple Kinds of  …
Data
 ….
Activity 1( Complete Till 2nd
Class of this week)
 Find top 3 recent research activities around the world
that are analyzing data. You need to write short
summary for each research activities. First three line
must follow following format:
 Line 1: Problem they are trying to sole along with dataset
they are using
 Line 2: How they are solving the problem
 Line 3: Justify yourself why you rate this work as a top 5
activities
 Remaining lines… you can think yourself ….

Example : BigN’Smart Research group at IIT-Roorkee is analyzing


“YelpReview” Dataset for learning Location-to-activity Tagging. They
are applying
… . I feel this is an interesting research because …
Related
Field
Machine Visualizatio
Learnin n
g Data Mining and
Knowledge
Discovery
Statistic Database
s s

43
Related
Field
 Statistics:
 more theory-based
 more focused on testing hypotheses

 Machine learning
 more heuristic
 focused on improving performance of a learning agent
 also looks at real-time learning and robotics – areas not part of
data mining

 Data Mining and Knowledge Discovery


 integrates theory and heuristics
 focus on the entire process of knowledge discovery, including data cleaning,
learning, and integration and visualization of results

 Distinctions are fuzzy


Classificati
on
Learn a method for predicting the instance class from pre-labeled (classified)
instances

Many approaches: Statistics, Decision Trees,


Neural Networks,
...

45
Clustering
Find “natural” grouping of instances given
un- labeled data

46
Association Rules & Frequent Itemsets
Transactions
Frequent Itemsets:
TID Produce
1 MILK, BREAD, EGGS Milk, Bread (4)
2 BREAD, SUGAR
Bread, Cereal (3)
3 BREAD, CEREAL Milk, Bread, Cereal
4 MILK, BREAD, SUGAR (2)
5 MILK, CEREAL …
6 BREAD, CEREAL
7 MILK, CEREAL
8 MILK, BREAD, CEREAL, EGGS
9 MILK, BREAD, CEREAL

Rules:
Milk => Bread
(66%)

47
Visualization & Data
Mining
 Visualizing the data to
facilitate human
discovery

 Presenting the
discovered results in a
visually "nice" way

48
Summarization

 Describe features of the


selected group
 Use natural language
and graphics
 Usually in Combination
with Deviation detection or
other methods

Average length of stay in this study area rose 45.7


percent, from 4.3 days to 6.2 days, because ...

49

You might also like