0% found this document useful (0 votes)
14 views27 pages

2 Data Mining Functionalities 14-12-2024

The document outlines the functionalities of data mining, categorizing tasks into descriptive and predictive types. It details various techniques such as classification, clustering, and association analysis, emphasizing the importance of identifying patterns and relationships within data. Additionally, it discusses the concepts of supervised and unsupervised learning, along with measures of interestingness for discovered patterns.

Uploaded by

Bharani Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views27 pages

2 Data Mining Functionalities 14-12-2024

The document outlines the functionalities of data mining, categorizing tasks into descriptive and predictive types. It details various techniques such as classification, clustering, and association analysis, emphasizing the importance of identifying patterns and relationships within data. Additionally, it discusses the concepts of supervised and unsupervised learning, along with measures of interestingness for discovered patterns.

Uploaded by

Bharani Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

Data mining functionalities

March 6, 2025 SWE2009 - Data Mining Techniques 1


Introduction
 Data mining functionalities are used to
specify the kind of patterns to be found in
data mining tasks.

 Data mining tasks  classified into two


categories: descriptive and predictive.

 Descriptive mining tasks characterize the


general properties of the data in the
database.

 Predictive mining tasks perform inference on


the current data in order to make
predictions.
March 6, 2025 SWE2009 - Data Mining Techniques 2
Functionalities/Techniques
 Concept/Class Description: Characterization
and Discrimination
 Mining Frequent Patterns, Associations and
correlations
 Classification and Prediction
 Cluster Analysis
 Outlier Analysis
 Evolution Analysis

March 6, 2025 SWE2009 - Data Mining Techniques 3


Characterization and
Discrimination
 Data  associated with classes or concepts.

 For example, in the Electronics store,


classes of items for sale include computers
and printers, and concepts of customers
include bigSpenders and budgetSpenders.

 Useful to describe individual classes and


concepts in summarized, concise, and yet
precise terms. Such descriptions of a class
or a concept are called class/concept
descriptions.

March 6, 2025 SWE2009 - Data Mining Techniques 4


Contd….

 These descriptions can be derived via

(1) data characterization, by summarizing the data


of the class under study (often called the target
class) in general terms, or

(2) data discrimination, by comparison of the


target class with one or a set of comparative
classes (often called the contrasting classes), or

(3) both data characterization and discrimination.

March 6, 2025 SWE2009 - Data Mining Techniques 5


Characterization and
Discrimination
 Data Characterization: A data mining
system should be able to produce a
description summarizing the characteristics
of customers.

 Example: The characteristics of customers


who spend more than $1000 a year at
(some store called ) AllElectronics. The
result can be a general profile such as age,
employment status or credit ratings.

March 6, 2025 SWE2009 - Data Mining Techniques 6


Contd….

 Data Discrimination: It is a comparison of the


general features of targeting class data
objects with the general features of objects
from one or a set of contrasting classes. User
can specify target and contrasting classes.

 Example: The user may like to compare the


general features of software products whose
sales increased by 10% in the last year with
those whose sales decreased by about 30%
in the same duration.

March 6, 2025 SWE2009 - Data Mining Techniques 7


Contd….

 The output of data characterization can


be presented in various forms.

 Examples include pie charts, bar charts,


curves, multidimensional data cubes, and
multidimensional tables, including
crosstabs.

 The resulting descriptions can also be


presented as generalized relations or in rule
form(called characteristic rules).

March 6, 2025 SWE2009 - Data Mining Techniques 8


Associations and
correlations
 Frequent Patterns : As the name suggests
patterns that occur frequently in data.

 Frequent Itemset : A set of items that


frequently appear together in a
transactional data set, such as milk and
bread.

 Frequent Sequential Pattern : A frequently


occurring subsequence, such as the pattern
that customers tend to purchase first a PC,
followed by a digital camera, and then a
memory card.
March 6, 2025 SWE2009 - Data Mining Techniques 9
Contd….

 Substructure : Refer to different structural


forms, such as graphs, trees, or lattices,
which may be combined with itemsets or
subsequences.

 If a substructure occurs frequently, it is called


a (frequent) structured pattern.

 Mining frequent patterns leads to the


discovery of interesting associations and
correlations within data.

March 6, 2025 SWE2009 - Data Mining Techniques 10


Contd….
Association Analysis: from marketing perspective,
determining which items are frequently purchased
together within the same transaction.
Example: An example is mined from the (some store)
AllElectronic transactional database.
buys (X, “Computers”)  buys (X, “software”)
[Support = 1%, confidence = 50% ]
 X represents customer

 Confidence or certainty = 50% , if a customer buys

a computer there is a 50% chance that he/she will


buy software as well.
 Support = 1%, means that 1% of all the
transactions under analysis showed that computer
and software were purchased together.

March 6, 2025 SWE2009 - Data Mining Techniques 11


Are All the “Discovered” Patterns
Interesting?
 Data mining may generate thousands of patterns: Not all of
them are interesting
 Suggested approach: Human-centered, query-based, focused
mining
 Interestingness measures
 A pattern is interesting if it is easily understood by humans, valid
on new or test data with some degree of certainty, potentially
useful, novel, or validates some hypothesis that a user seeks to
confirm
 Objective vs. subjective interestingness measures
 Objective: based on statistics and structures of patterns, e.g.,
support, confidence, etc.
 Subjective: based on user’s belief in the data, e.g.,
unexpectedness, novelty, actionability, etc.
March 6, 2025 SWE2009 - Data Mining Techniques 12
Contd…
 Support  usefulness

 Confidence  certainty

 The support for a rule R is the ratio of the number of


occurrences of R, given all occurrences of all rules.

 The confidence of a rule X  Y, is the ratio of the


number of occurrences of Y given X, among all other
occurrences given X

 In multidimensional databases, where each attribute


is referred to as a dimension, the above rule can be
referred to as a multidimensional association rule.

March 6, 2025 SWE2009 - Data Mining Techniques 13


Support and Confidence
 Support count: The support count of an
itemset X, denoted by X.count, in a data
set T is the number of transactions in T
that contain X. Assume T has n
transactions.
 Then,
( X  Y ).count
support 
n
( X  Y ).count
confidence 
X .count

March 6, 2025 SWE2009 - Data 14


Mining Techniques
Contd….

Support for {Bag, Uniform} =


Bag Uniform Crayons 5/10 = 0.5
Books Bag Uniform
Bag Uniform Pencil
Bag Pencil Book
Uniform Crayons Bag Confidence for Bag  Uniform =
Bag Pencil Book 5/8 = 0.625
Crayons Uniform Bag
Books Crayons Bag
Uniform Crayons Pencil
Pencil Uniform Books

March 6, 2025 SWE2009 - Data Mining Techniques 15


t1: Beef, Chicken, Milk
t2: Beef, Cheese
t3: Cheese, Boots
t4: Beef, Chicken, Cheese
t5: Beef, Chicken, Clothes, Cheese, Milk
t6: Chicken, Clothes, Milk
t7: Chicken, Milk, Clothes

Clothes  Milk, Chicken

Clothes, Chicken  Milk

March 6, 2025 SWE2009 - Data Mining Techniques 16


Contd…

 Motivation: Finding inherent regularities in data


 What products were often purchased
together?— Bag, Uniform?!
 What are the subsequent purchases after
buying a PC?
 What kinds of DNA are sensitive to this new
drug?
 Can we automatically classify web
documents?
March 6, 2025 SWE2009 - Data Mining Techniques 17
Associations and
correlations
 Another example:
 Age (X, 20…29) ^ income (X, 20K-29K) 
buys(X, “CD Player”) [Support = 2%,
confidence = 60% ]
 Customers between 20 to 29 years of age
with an income $20000-$29000. There is
60% chance they will purchase CD Player
and 2% of all the transactions under
analysis showed that this age group
customers with that range of income
bought CD Player.

March 6, 2025 SWE2009 - Data Mining Techniques 18


Classification and Prediction
 Classification is the process of finding a
model that describes and distinguishes data
classes or concepts for the purpose of being
able to use the model to predict the class of
objects whose class label is unknown.
 Construct models (functions) that describe
and distinguish classes or concepts for
future prediction
 Training data  Building the model
 Test data  Evaluate the model
 Classification model can be represented in
various forms such as

IF-THEN Rules

A decision tree
March 6, 2025

Neural network 19
SWE2009 - Data Mining Techniques
Contd….
 A decision tree is a flow-chart-like tree
structure, where each node denotes a test
on an attribute value, each branch
represents an outcome of the test, and tree
leaves represent classes or class
distributions.

 Decision trees can easily be converted to


classification rules.

 A neural network, when used for


classification, is typically a collection of
neuron-like processing units with weighted
connections between the units.
March 6, 2025 SWE2009 - Data Mining Techniques 20
Classification Model

March 6, 2025 SWE2009 - Data Mining Techniques 21


Cluster Analysis
 Clustering analyses data objects without
consulting a known class label.

 Groups data elements into different groups


based on the similarity between elements
within a single group

 Maximizing the intraclass similarity and


minimizing the interclass similarity.

 Example: Result analysis

March 6, 2025 SWE2009 - Data Mining Techniques 22


Cluster Analysis

March 6, 2025 SWE2009 - Data Mining Techniques 23


Outlier Analysis
 Outlier Analysis : A database may contain data objects
that do not comply with the general behavior or model
of the data. These data objects are outliers.

 Outliers" are values that "lie outside" the other values.

 Example: Use in finding Fraudulent usage of credit


cards. Outlier Analysis may uncover Fraudulent usage
of credit cards by detecting purchases of extremely
large amounts for a given account number in
comparison to regular charges incurred by the same
account. Outlier values may also be detected with
respect to the location and type of purchase or the
purchase frequency.
March 6, 2025 SWE2009 - Data Mining Techniques 24
Evolution Analysis
 Evolution Analysis: Data evolution analysis
describes and models regularities or trends for
objects whose behavior changes over time.

 Example: Time-series data. If the stock market


data (time-series) of the last several years
available from the New York Stock exchange and
one would like to invest in shares of high tech
industrial companies. A data mining study of stock
exchange data may identify stock evolution
regularities for overall stocks and for the stocks of
particular companies. Such regularities may help
predict future trends in stock market prices,
contributing to one’s decision making regarding
stock investments.
March 6, 2025 SWE2009 - Data Mining Techniques 25
Supervised vs. Unsupervised
Learning

 Supervised learning (classification)



Supervision: The training data (observations,
measurements, etc.) are accompanied by
labels indicating the class of the observations

New data is classified based on the training set
 Unsupervised learning (clustering)

The class labels of training data is unknown

Given a set of measurements, observations,
etc. with the aim of establishing the existence
of classes or clusters in the data
March 6, 2025 SWE2009 - Data Mining Techniques 26
Test Partition (in SL)

Training Data (Build Model)

Validation Data(Evaluate Model)

Test Data(Re-evaluate Model)

New Data(Predict/classify using final model)

March 6, 2025 SWE2009 - Data Mining Techniques 27

You might also like