0% found this document useful (0 votes)
8 views24 pages

Data Mining Tasks

The document outlines various data mining tasks, which are classified into descriptive and predictive categories. Key tasks include mining frequent patterns, association analysis, correlation analysis, cluster analysis, and classification and regression for predictive analysis. It emphasizes the importance of these tasks in discovering patterns, making predictions, and summarizing data in various applications.

Uploaded by

virat18kohli360
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views24 pages

Data Mining Tasks

The document outlines various data mining tasks, which are classified into descriptive and predictive categories. Key tasks include mining frequent patterns, association analysis, correlation analysis, cluster analysis, and classification and regression for predictive analysis. It emphasizes the importance of these tasks in discovering patterns, making predictions, and summarizing data in various applications.

Uploaded by

virat18kohli360
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

DATA MINING TASKS

 Definition
 Classifications of Data Mining Tasks
 Key Data Mining Tasks
Definition
 Data mining tasks are the kind of data patterns that
can be mined.
 Data mining functionalities are used to specify the
kinds of patterns to be found in data mining tasks.
Classifications of Data Mining Tasks
In general, data mining tasks can be classified into two
categories

 Descriptive
 Predictive
Descriptive data mining

 Descriptive mining tasks characterize the general


properties of the data in a target data set.
 Descriptive data mining demonstrates the common
characteristics in the results.
 It offers knowledge of the data and gives insight into
what's going on inside the data without any prior
idea.
Predictive Data Mining

 Predictive mining tasks perform induction


(inferences) on the current data in order to
make predictions.
 Predictive data mining provides prediction
features from data to its users.
Key Data Mining Tasks
Class/Concept Description: Characterization and
Discrimination

 Data entries can be associated with classes or


concepts.
 It can be useful to describe individual classes and
concepts in summarized, concise, and yet precise
terms. Such descriptions of a class or a concept are
called class/concept descriptions.
 These descriptions can be derived by the
following two ways,
 Data characterization is a summarization of the general
characteristics or features of a target class of data.
 Data Discrimination is referring to the mapping or
classification of a class with some predefined group or class.
Mining Frequent Patterns
 Frequent patterns are patterns that occur frequently in
transactional data.
 There are many kinds of frequent patterns
 Frequent itemsets
 Frequent subsequences (also known as sequential
patterns)
 Frequent substructures
Frequent itemset

 A frequent itemset refers to a set of items that


frequently appear together in a transactional data set.
 for example, milk and bread, which are frequently
bought together in grocery stores by many
customers.
Frequent Subsequence
 A sequence of patterns that occur frequently
such as purchasing a camera is followed by
memory card.
Frequent substructure
 A substructure can refers to the various types of data
structures that can be combined with an item set or
subsequences, such as trees and graphs.
 If a substructure occurs frequently, it is called a
(frequent) structured pattern.
 Mining frequent patterns leads to the discovery of
interesting associations and correlations within data.

Association Analysis

 It analyses the set of items that generally occur together in a


transactional dataset. It is also known as Market Basket
Analysis for its wide use in retail sales.
Two parameters are used for determining the association rules:
 It provides which identifies the common item set in the
database.
 Confidence is the conditional probability that an item occurs
when another item occurs in a transaction.
Correlation Analysis

 Correlation is a mathematical technique that can show whether


and how strongly the pairs of attributes are related to each other.
 For example, Highted people tend to have more weight.
 It is a kind of additional analysis performed to uncover
interesting statistical correlations between two item sets to
analyze that if they have positive, negative or no effect on each
other.
Cluster Analysis

 Cluster refers to a group of similar kind of objects.


 Cluster analysis refers to forming group of objects
that are very similar to each other but are highly
different from the objects in other clusters.
Summarization

 Summarization is the generalization of data. A set of


relevant data is summarized which result in a smaller
set that gives aggregated information of the data.
 For example, the shopping done by a customer can
be summarized into total products, total spending,
offers used, etc
Sequence Discovery

Sequence discovery or sequential pattern mining,


is a data mining technique that is used to find
relevant and important patterns in sequential
data.
Classification and Regression for Predictive Analysis
Classification
 Classification is the process of finding a model (or function)
that describes and distinguishes data classes or concepts.
 Classification derives a model to determine the class of an
object based on its attributes.
 The model is derived based on the analysis of a set of training
data (i.e., data objects for which the class labels are known).
 The model is used to predict the class label of objects for
which the the class label is unknown.
 A classification model can be represented in various forms
 IF-THEN rules
 Decision tree
 Neural network
IF-THEN rules
Decision Tree
 A decision tree is a flowchart-like
tree structure, where each node
denotes a test on an attribute value,
each branch represents an outcome
of the test, and tree leaves represent
classes or class distributions.
 Decision trees can easily be
converted to classification rules.
Neural network

 A neural network, when


used for classification, is
typically a collection of
neuron-like processing
units with weighted
connections between the
units.are many other methods for constructing classification models,
There
such as Naive Bayesian classification, support vector machines,
and k-nearest-neighbor classification.
Regression
 Regression is learning a function which maps a data item to a
real-valued prediction variable.
 Regression is used to predict missing or unavailable
numerical data values rather than (discrete) class labels. The
term prediction refers to both numeric prediction and class
label prediction.
 Regression analysis is a statistical methodology that is most
often used for numeric prediction.
Time Series Analysis
 Time series is a sequence of events where the next event is
determined by one or more of the preceding events.
 Time series reflects the process being measured and there are
certain components that affect the behaviour of a process.
 Time series analysis includes methods to analyze time-series
data in order to extract useful patterns, trends, rules and
statistics.
 Stock market prediction is an important application of time-
series analysis.
Prediction
 Prediction task predicts the possible values of missing or
future data. Prediction involves developing a model based on
the available data and this model is used in predicting future
values of a new data set of interest.
 For example, a model can predict the income of an employee
based on education, experience and other demographic
factors like place of stay, gender etc.
 Also prediction analysis is used in different areas including
medical diagnosis, fraud detection etc.

You might also like