0% found this document useful (0 votes)

14 views27 pages

2 Data Mining Functionalities 14-12-2024

The document outlines the functionalities of data mining, categorizing tasks into descriptive and predictive types. It details various techniques such as classification, clustering, and association analysis, emphasizing the importance of identifying patterns and relationships within data. Additionally, it discusses the concepts of supervised and unsupervised learning, along with measures of interestingness for discovered patterns.

Uploaded by

Bharani Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views27 pages

2 Data Mining Functionalities 14-12-2024

Uploaded by

Bharani Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 27

Data mining functionalities

March 6, 2025 SWE2009 - Data Mining Techniques 1

Introduction
 Data mining functionalities are used to
specify the kind of patterns to be found in
data mining tasks.

 Data mining tasks  classified into two

categories: descriptive and predictive.

 Descriptive mining tasks characterize the

general properties of the data in the
database.

 Predictive mining tasks perform inference on

the current data in order to make
predictions.
March 6, 2025 SWE2009 - Data Mining Techniques 2
Functionalities/Techniques
 Concept/Class Description: Characterization
and Discrimination
 Mining Frequent Patterns, Associations and
correlations
 Classification and Prediction
 Cluster Analysis
 Outlier Analysis
 Evolution Analysis

March 6, 2025 SWE2009 - Data Mining Techniques 3

Characterization and
Discrimination
 Data  associated with classes or concepts.

 For example, in the Electronics store,

classes of items for sale include computers
and printers, and concepts of customers
include bigSpenders and budgetSpenders.

 Useful to describe individual classes and

concepts in summarized, concise, and yet
precise terms. Such descriptions of a class
or a concept are called class/concept
descriptions.

March 6, 2025 SWE2009 - Data Mining Techniques 4

Contd….

 These descriptions can be derived via

(1) data characterization, by summarizing the data

of the class under study (often called the target
class) in general terms, or

(2) data discrimination, by comparison of the

target class with one or a set of comparative
classes (often called the contrasting classes), or

(3) both data characterization and discrimination.

March 6, 2025 SWE2009 - Data Mining Techniques 5

Characterization and
Discrimination
 Data Characterization: A data mining
system should be able to produce a
description summarizing the characteristics
of customers.

 Example: The characteristics of customers

who spend more than $1000 a year at
(some store called ) AllElectronics. The
result can be a general profile such as age,
employment status or credit ratings.

March 6, 2025 SWE2009 - Data Mining Techniques 6

Contd….

 Data Discrimination: It is a comparison of the

general features of targeting class data
objects with the general features of objects
from one or a set of contrasting classes. User
can specify target and contrasting classes.

 Example: The user may like to compare the

general features of software products whose
sales increased by 10% in the last year with
those whose sales decreased by about 30%
in the same duration.

March 6, 2025 SWE2009 - Data Mining Techniques 7

Contd….

 The output of data characterization can

be presented in various forms.

 Examples include pie charts, bar charts,

curves, multidimensional data cubes, and
multidimensional tables, including
crosstabs.

 The resulting descriptions can also be

presented as generalized relations or in rule
form(called characteristic rules).

March 6, 2025 SWE2009 - Data Mining Techniques 8

Associations and
correlations
 Frequent Patterns : As the name suggests
patterns that occur frequently in data.

 Frequent Itemset : A set of items that

frequently appear together in a
transactional data set, such as milk and
bread.

 Frequent Sequential Pattern : A frequently

occurring subsequence, such as the pattern
that customers tend to purchase first a PC,
followed by a digital camera, and then a
memory card.
March 6, 2025 SWE2009 - Data Mining Techniques 9
Contd….

 Substructure : Refer to different structural

forms, such as graphs, trees, or lattices,
which may be combined with itemsets or
subsequences.

 If a substructure occurs frequently, it is called

a (frequent) structured pattern.

 Mining frequent patterns leads to the

discovery of interesting associations and
correlations within data.

March 6, 2025 SWE2009 - Data Mining Techniques 10

Contd….
Association Analysis: from marketing perspective,
determining which items are frequently purchased
together within the same transaction.
Example: An example is mined from the (some store)
AllElectronic transactional database.
buys (X, “Computers”)  buys (X, “software”)
[Support = 1%, confidence = 50% ]
 X represents customer

 Confidence or certainty = 50% , if a customer buys

a computer there is a 50% chance that he/she will

buy software as well.
 Support = 1%, means that 1% of all the
transactions under analysis showed that computer
and software were purchased together.

March 6, 2025 SWE2009 - Data Mining Techniques 11

Are All the “Discovered” Patterns
Interesting?
 Data mining may generate thousands of patterns: Not all of
them are interesting
 Suggested approach: Human-centered, query-based, focused
mining
 Interestingness measures
 A pattern is interesting if it is easily understood by humans, valid
on new or test data with some degree of certainty, potentially
useful, novel, or validates some hypothesis that a user seeks to
confirm
 Objective vs. subjective interestingness measures
 Objective: based on statistics and structures of patterns, e.g.,
support, confidence, etc.
 Subjective: based on user’s belief in the data, e.g.,
unexpectedness, novelty, actionability, etc.
March 6, 2025 SWE2009 - Data Mining Techniques 12
Contd…
 Support  usefulness

 Confidence  certainty

 The support for a rule R is the ratio of the number of

occurrences of R, given all occurrences of all rules.

 The confidence of a rule X  Y, is the ratio of the

number of occurrences of Y given X, among all other
occurrences given X

 In multidimensional databases, where each attribute

is referred to as a dimension, the above rule can be
referred to as a multidimensional association rule.

March 6, 2025 SWE2009 - Data Mining Techniques 13

Support and Confidence
 Support count: The support count of an
itemset X, denoted by X.count, in a data
set T is the number of transactions in T
that contain X. Assume T has n
transactions.
 Then,
( X  Y ).count
support 
n
( X  Y ).count
confidence 
X .count

March 6, 2025 SWE2009 - Data 14

Mining Techniques
Contd….

Support for {Bag, Uniform} =

Bag Uniform Crayons 5/10 = 0.5
Books Bag Uniform
Bag Uniform Pencil
Bag Pencil Book
Uniform Crayons Bag Confidence for Bag  Uniform =
Bag Pencil Book 5/8 = 0.625
Crayons Uniform Bag
Books Crayons Bag
Uniform Crayons Pencil
Pencil Uniform Books

March 6, 2025 SWE2009 - Data Mining Techniques 15

t1: Beef, Chicken, Milk
t2: Beef, Cheese
t3: Cheese, Boots
t4: Beef, Chicken, Cheese
t5: Beef, Chicken, Clothes, Cheese, Milk
t6: Chicken, Clothes, Milk
t7: Chicken, Milk, Clothes

Clothes  Milk, Chicken

Clothes, Chicken  Milk

March 6, 2025 SWE2009 - Data Mining Techniques 16

Contd…

 Motivation: Finding inherent regularities in data

 What products were often purchased
together?— Bag, Uniform?!
 What are the subsequent purchases after
buying a PC?
 What kinds of DNA are sensitive to this new
drug?
 Can we automatically classify web
documents?
March 6, 2025 SWE2009 - Data Mining Techniques 17
Associations and
correlations
 Another example:
 Age (X, 20…29) ^ income (X, 20K-29K) 
buys(X, “CD Player”) [Support = 2%,
confidence = 60% ]
 Customers between 20 to 29 years of age
with an income $20000-$29000. There is
60% chance they will purchase CD Player
and 2% of all the transactions under
analysis showed that this age group
customers with that range of income
bought CD Player.

March 6, 2025 SWE2009 - Data Mining Techniques 18

Classification and Prediction
 Classification is the process of finding a
model that describes and distinguishes data
classes or concepts for the purpose of being
able to use the model to predict the class of
objects whose class label is unknown.
 Construct models (functions) that describe
and distinguish classes or concepts for
future prediction
 Training data  Building the model
 Test data  Evaluate the model
 Classification model can be represented in
various forms such as

IF-THEN Rules

A decision tree
March 6, 2025

Neural network 19
SWE2009 - Data Mining Techniques
Contd….
 A decision tree is a flow-chart-like tree
structure, where each node denotes a test
on an attribute value, each branch
represents an outcome of the test, and tree
leaves represent classes or class
distributions.

 Decision trees can easily be converted to

classification rules.

 A neural network, when used for

classification, is typically a collection of
neuron-like processing units with weighted
connections between the units.
March 6, 2025 SWE2009 - Data Mining Techniques 20
Classification Model

March 6, 2025 SWE2009 - Data Mining Techniques 21

Cluster Analysis
 Clustering analyses data objects without
consulting a known class label.

 Groups data elements into different groups

based on the similarity between elements
within a single group

 Maximizing the intraclass similarity and

minimizing the interclass similarity.

 Example: Result analysis

March 6, 2025 SWE2009 - Data Mining Techniques 22

Cluster Analysis

March 6, 2025 SWE2009 - Data Mining Techniques 23

Outlier Analysis
 Outlier Analysis : A database may contain data objects
that do not comply with the general behavior or model
of the data. These data objects are outliers.

 Outliers" are values that "lie outside" the other values.

 Example: Use in finding Fraudulent usage of credit

cards. Outlier Analysis may uncover Fraudulent usage
of credit cards by detecting purchases of extremely
large amounts for a given account number in
comparison to regular charges incurred by the same
account. Outlier values may also be detected with
respect to the location and type of purchase or the
purchase frequency.
March 6, 2025 SWE2009 - Data Mining Techniques 24
Evolution Analysis
 Evolution Analysis: Data evolution analysis
describes and models regularities or trends for
objects whose behavior changes over time.

 Example: Time-series data. If the stock market

data (time-series) of the last several years
available from the New York Stock exchange and
one would like to invest in shares of high tech
industrial companies. A data mining study of stock
exchange data may identify stock evolution
regularities for overall stocks and for the stocks of
particular companies. Such regularities may help
predict future trends in stock market prices,
contributing to one’s decision making regarding
stock investments.
March 6, 2025 SWE2009 - Data Mining Techniques 25
Supervised vs. Unsupervised
Learning

 Supervised learning (classification)


Supervision: The training data (observations,
measurements, etc.) are accompanied by
labels indicating the class of the observations

New data is classified based on the training set
 Unsupervised learning (clustering)

The class labels of training data is unknown

Given a set of measurements, observations,
etc. with the aim of establishing the existence
of classes or clusters in the data
March 6, 2025 SWE2009 - Data Mining Techniques 26
Test Partition (in SL)

Training Data (Build Model)

Validation Data(Evaluate Model)

Test Data(Re-evaluate Model)

New Data(Predict/classify using final model)

March 6, 2025 SWE2009 - Data Mining Techniques 27

Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
(ANSWERED) Informecial App Analysis Question Test
100% (1)
(ANSWERED) Informecial App Analysis Question Test
3 pages
CEC331-4G & 5G Lab Manual
No ratings yet
CEC331-4G & 5G Lab Manual
25 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Free Rental Receipt Template
No ratings yet
Free Rental Receipt Template
21 pages
Bilal Turabi CV
No ratings yet
Bilal Turabi CV
1 page
Huawei MV Oss-Global Case Stories1 PDF
No ratings yet
Huawei MV Oss-Global Case Stories1 PDF
40 pages
Classical Planning in AI
100% (1)
Classical Planning in AI
5 pages
New PUMA Mathematics Mastery Curriculum Maps 1
No ratings yet
New PUMA Mathematics Mastery Curriculum Maps 1
31 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Windows Server 2003 Domains Active Directory
No ratings yet
Windows Server 2003 Domains Active Directory
392 pages
IT326 - Ch1
100% (1)
IT326 - Ch1
17 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Unit 1
No ratings yet
Unit 1
148 pages
Modicon LMC078: Motion Controller Programming Guide
No ratings yet
Modicon LMC078: Motion Controller Programming Guide
276 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
Internal
No ratings yet
Internal
267 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
q8, q9, q10 Question and Answers
No ratings yet
q8, q9, q10 Question and Answers
16 pages
Data Mining
No ratings yet
Data Mining
27 pages
DMT Unit1
No ratings yet
DMT Unit1
46 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
Module 1
No ratings yet
Module 1
107 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Unit 1
No ratings yet
Unit 1
59 pages
L1 CH 1 Introd
No ratings yet
L1 CH 1 Introd
97 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Screenshot 2024-06-03 at 11.59.21 PM
No ratings yet
Screenshot 2024-06-03 at 11.59.21 PM
45 pages
Screenshot 2024-06-04 at 12.00.45 AM
No ratings yet
Screenshot 2024-06-04 at 12.00.45 AM
45 pages
Screenshot 2024-06-04 at 12.01.00 AM
No ratings yet
Screenshot 2024-06-04 at 12.01.00 AM
45 pages
Screenshot 2024-06-04 at 12.07.18 AM
No ratings yet
Screenshot 2024-06-04 at 12.07.18 AM
45 pages
DM 1
No ratings yet
DM 1
47 pages
Data Mining
No ratings yet
Data Mining
35 pages
Module 4
No ratings yet
Module 4
54 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
6-Fingerprint Anatomy - History-10-01-2025
No ratings yet
6-Fingerprint Anatomy - History-10-01-2025
39 pages
Kaviya Bharathi. D (21MIS0069) Bharani Kumar A (21MIS0110) Katta Vamsi Krishna (21MIS0250) Anill Udhayakumar (21MIS0363) Arvind E (21MIS0439)
No ratings yet
Kaviya Bharathi. D (21MIS0069) Bharani Kumar A (21MIS0110) Katta Vamsi Krishna (21MIS0250) Anill Udhayakumar (21MIS0363) Arvind E (21MIS0439)
26 pages
Lec 02
No ratings yet
Lec 02
33 pages
Lect 2
No ratings yet
Lect 2
35 pages
Sarvesh - Types of Biometrics
No ratings yet
Sarvesh - Types of Biometrics
35 pages
2-Tasks and Techniques
No ratings yet
2-Tasks and Techniques
17 pages
Power For All - UttarPradesh
No ratings yet
Power For All - UttarPradesh
106 pages
#CH-2 2 2
No ratings yet
#CH-2 2 2
16 pages
Digital Data Mining Nostos - FP
No ratings yet
Digital Data Mining Nostos - FP
37 pages
Fundamentals of Data Mining
No ratings yet
Fundamentals of Data Mining
36 pages
Introduction
No ratings yet
Introduction
26 pages
Scribbed 223751127-Chapter-12-Enhanced-Entity-Relationship-Modeling PDF
No ratings yet
Scribbed 223751127-Chapter-12-Enhanced-Entity-Relationship-Modeling PDF
16 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
24 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Unit 3 BI & Data Science
No ratings yet
Unit 3 BI & Data Science
19 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
Teknik Lipatan Minggu 14
No ratings yet
Teknik Lipatan Minggu 14
42 pages
UNIT 3 Notes
No ratings yet
UNIT 3 Notes
23 pages
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
OKI Printer Driver Compatibility and Schedule With Mac OS X 10.7 Lion
No ratings yet
OKI Printer Driver Compatibility and Schedule With Mac OS X 10.7 Lion
9 pages
Chapter 1 DM
No ratings yet
Chapter 1 DM
20 pages
Data Mining-Unit-1
No ratings yet
Data Mining-Unit-1
21 pages
Semtech Broadcast SelectorGuide 2021 Web
No ratings yet
Semtech Broadcast SelectorGuide 2021 Web
12 pages
1 IT326 - Ch1 - Introduction
No ratings yet
1 IT326 - Ch1 - Introduction
37 pages
CH 2
No ratings yet
CH 2
37 pages
A Brief Overview On Data Mining Survey PDF
No ratings yet
A Brief Overview On Data Mining Survey PDF
8 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
Artificial Intelligence Based Facial Emotion Recognition With Deep Neural GAN Augmentation
No ratings yet
Artificial Intelligence Based Facial Emotion Recognition With Deep Neural GAN Augmentation
5 pages
Module III Data Mining
No ratings yet
Module III Data Mining
7 pages
Data Mining 1
No ratings yet
Data Mining 1
56 pages
Bni Iol-712-000-K023 - en - Bni00041
No ratings yet
Bni Iol-712-000-K023 - en - Bni00041
12 pages
02-Data Mining Functionalities-2
No ratings yet
02-Data Mining Functionalities-2
23 pages
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
8 Data Mining Algorithms
No ratings yet
8 Data Mining Algorithms
8 pages
Seminar On Data Mining Concepts and Its
No ratings yet
Seminar On Data Mining Concepts and Its
8 pages
Data Mining
No ratings yet
Data Mining
6 pages
Find Changes Logs For A Table Using SM30 - SAP Blogs
No ratings yet
Find Changes Logs For A Table Using SM30 - SAP Blogs
7 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
31 pages
Alternate Autonomous AP Upgrade Procedure
No ratings yet
Alternate Autonomous AP Upgrade Procedure
14 pages
Healthcare ERP Project Success: It's All About Avoiding Missteps
No ratings yet
Healthcare ERP Project Success: It's All About Avoiding Missteps
5 pages
Introduction To Web Development
No ratings yet
Introduction To Web Development
2 pages
2-DigitalOcean Invoice 2023 Sep (7467235-466314537)
No ratings yet
2-DigitalOcean Invoice 2023 Sep (7467235-466314537)
2 pages
Alex Watts CV
No ratings yet
Alex Watts CV
2 pages
Internet Safety - Crossword Puzzle
No ratings yet
Internet Safety - Crossword Puzzle
2 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Product Senior Manager Financial Services in Phoenix AZ Resume Corey Miller
No ratings yet
Product Senior Manager Financial Services in Phoenix AZ Resume Corey Miller
2 pages
Name:-Nitish Xavier Tirkey F.Y.Bca Date: - 4 October, 2010
No ratings yet
Name:-Nitish Xavier Tirkey F.Y.Bca Date: - 4 October, 2010
10 pages
2.1.1.5 Lab - The World Runs On Circuits
No ratings yet
2.1.1.5 Lab - The World Runs On Circuits
3 pages
Detector Block Chamber Unit: To Sec7 TOC
No ratings yet
Detector Block Chamber Unit: To Sec7 TOC
1 page
Review Paper: Virtual Autopsy: A New Trend in Forensic Investigation
No ratings yet
Review Paper: Virtual Autopsy: A New Trend in Forensic Investigation
7 pages
Transmitting Loop Antenna For The 40M Band
No ratings yet
Transmitting Loop Antenna For The 40M Band
12 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet

2 Data Mining Functionalities 14-12-2024

Uploaded by

2 Data Mining Functionalities 14-12-2024

Uploaded by

Data mining functionalities

March 6, 2025 SWE2009 - Data Mining Techniques 1

 Data mining tasks  classified into two

 Descriptive mining tasks characterize the

 Predictive mining tasks perform inference on

March 6, 2025 SWE2009 - Data Mining Techniques 3

 For example, in the Electronics store,

 Useful to describe individual classes and

March 6, 2025 SWE2009 - Data Mining Techniques 4

 These descriptions can be derived via

(1) data characterization, by summarizing the data

(2) data discrimination, by comparison of the

(3) both data characterization and discrimination.

March 6, 2025 SWE2009 - Data Mining Techniques 5

 Example: The characteristics of customers

March 6, 2025 SWE2009 - Data Mining Techniques 6

 Data Discrimination: It is a comparison of the

 Example: The user may like to compare the

March 6, 2025 SWE2009 - Data Mining Techniques 7

 The output of data characterization can

 Examples include pie charts, bar charts,

 The resulting descriptions can also be

March 6, 2025 SWE2009 - Data Mining Techniques 8

 Frequent Itemset : A set of items that

 Frequent Sequential Pattern : A frequently

 Substructure : Refer to different structural

 If a substructure occurs frequently, it is called

 Mining frequent patterns leads to the

March 6, 2025 SWE2009 - Data Mining Techniques 10

 Confidence or certainty = 50% , if a customer buys

a computer there is a 50% chance that he/she will

March 6, 2025 SWE2009 - Data Mining Techniques 11

 The support for a rule R is the ratio of the number of

 The confidence of a rule X  Y, is the ratio of the

 In multidimensional databases, where each attribute

March 6, 2025 SWE2009 - Data Mining Techniques 13

March 6, 2025 SWE2009 - Data 14

Support for {Bag, Uniform} =

March 6, 2025 SWE2009 - Data Mining Techniques 15

Clothes  Milk, Chicken

Clothes, Chicken  Milk

March 6, 2025 SWE2009 - Data Mining Techniques 16

 Motivation: Finding inherent regularities in data

March 6, 2025 SWE2009 - Data Mining Techniques 18

 Decision trees can easily be converted to

 A neural network, when used for

March 6, 2025 SWE2009 - Data Mining Techniques 21

 Groups data elements into different groups

 Maximizing the intraclass similarity and

 Example: Result analysis

March 6, 2025 SWE2009 - Data Mining Techniques 22

March 6, 2025 SWE2009 - Data Mining Techniques 23

 Outliers" are values that "lie outside" the other values.

 Example: Use in finding Fraudulent usage of credit

 Example: Time-series data. If the stock market

 Supervised learning (classification)

Training Data (Build Model)

Validation Data(Evaluate Model)

Test Data(Re-evaluate Model)

New Data(Predict/classify using final model)

March 6, 2025 SWE2009 - Data Mining Techniques 27

You might also like