0% found this document useful (0 votes)
0 views16 pages

03 Data Mining Functionalities

Data mining functionalities are categorized into descriptive and predictive tasks, which involve characterizing data properties and making predictions, respectively. Key functionalities include concept/class description, frequent pattern mining, classification and prediction, cluster analysis, and outlier analysis. Each functionality serves specific purposes, such as summarizing data characteristics, predicting outcomes, and identifying anomalies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views16 pages

03 Data Mining Functionalities

Data mining functionalities are categorized into descriptive and predictive tasks, which involve characterizing data properties and making predictions, respectively. Key functionalities include concept/class description, frequent pattern mining, classification and prediction, cluster analysis, and outlier analysis. Each functionality serves specific purposes, such as summarizing data characteristics, predicting outcomes, and identifying anomalies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Data Mining Functionalities

 Data mining functionalities specify the kind of patterns


to be found in data mining tasks.

 In general, data mining tasks can be classified into two


categories: descriptive and predictive.

 Descriptive mining tasks characterize the general


properties of the data in the database.

 Predictive mining tasks perform inference on the


current data in order to make predictions.
Data Mining Functionalities
 Data mining functionalities, and the kinds of patterns they can
discover, are:

 Concept/Class Description: Characterization and


Discrimination

 Mining Frequent Patterns, Associations, and Correlations

 Classification and Prediction

 Cluster Analysis

 Outlier Analysis
Concept/Class Description
 Data can be associated with classes or concepts.

 Class: A collection of things sharing a common attribute.

 Classes of items for sale include computers and printers

 Concept: An abstract or general idea inferred or derived from specific

instances.

 Concepts of customers include bigSpenders and budgetSpenders.

 Summarized, concise and precise descriptions of individual classes and

concepts are called class/concept descriptions.

 These descriptions can be derived via data characterization, data

discrimination or both.
Concept/Class Description
 Data characterization is a summary of the general characteristics or
features of a target class of data.

 The data corresponding to the user-specified class are typically


collected by a database query.

 For example, to study the characteristics of software products whose


sales increased by 10% in the last year, the data related to such
products can be collected by executing an SQL query.

 Simple data summaries can be done based on statistical measures and


plots.
Concept/Class Description

 The output of data characterization can be presented in


various forms.

 Examples include pie charts, bar charts, curves,


multidimensional data cubes, and multidimensional tables.
Concept/Class Description
 Example:
 A data mining system should be able to produce a description
summarizing the characteristics of customers who spend more than
$1,000 a year at AllElectronics.

 The result could be a general profile of the customers, such as they are
40–50 years old, employed, and have excellent credit ratings.

 The system should allow users to drill down on any dimension, such as
on occupation in order to view these customers according to their type of
employment.
Concept/Class Description
 Data discrimination is a comparison of the general
features of target class data objects with the general
features of objects from one or a set of contrasting classes.

 The target and contrasting classes can be specified by the


user, and the corresponding data objects retrieved through
database queries.

 For example, the user may like to compare the general


features of software products whose sales increased by 10%
in the last year with those whose sales decreased by at least
30% during the same period.
Concept/Class Description
 Example of Data discrimination

 A data mining system should be able to compare two groups


of AllElectronics customers, such as those who shop for
computer products regularly versus those who rarely shop
for such products.
Concept/Class Description

 80% of the customers who frequently purchase computer


products are between 20 and 40 years old and have a
university education.

 60% of the customers who infrequently buy such products


are either seniors or youths, and have no university degree.

 Drilling down on a dimension, such as occupation, or adding


new dimensions, such as income level, may help in finding
even more discriminative features between the two classes.
Mining Frequent Patterns
 Patterns that occur frequently in data – Frequent Patterns

 Frequent itemset is a set of items that frequently appear together in a


transactional data set, such as milk and bread.

 Subsequence is a (frequent) sequential pattern such as the pattern that


customers tend to purchase first a PC, followed by a digital camera, and
then a memory card.

 Substructure can refer to different structural forms, such as graphs,


trees, or lattices, which may be combined with itemsets or
subsequences.

 Mining frequent patterns leads to the discovery of interesting


Classification and Prediction
 Classification: process of finding a model that describes and
distinguishes data classes or concepts.

 Use the model to predict the class of objects whose class label is
unknown.

 The derived model is based on the analysis of a set of training data.

 How is the derived model presented?

 Classification (IF-THEN) rules,

 Decision trees

 Mathematical formulae, or neural networks.


Classification and Prediction
 Classification predicts categorical (discrete, unordered) labels.

 Prediction models continuous-valued functions.

 Prediction is used to predict missing or unavailable numerical data values


rather than class labels.

 Regression analysis is a statistical methodology that is most often used


for numeric prediction, although other methods exist as well.
Classification and Prediction
 Example:

 Classify a large set of items in the store, based on three kinds of


responses to a sales campaign: good response, mild response, and no
response.

 Derive a model for each of these three classes based on the descriptive
features of the items, such as price, brand, place made, type, and
category.

 IF-THEN rules:
Classification and Prediction
 Example:

 Decision tree:

 Predict the amount of revenue that each item will generate during an
upcoming sale at AllElectronics, based on previous sales data.
Cluster Analysis
 Unlike classification and prediction, which analyse class-labelled data
objects, clustering analyses data objects without consulting a known
class label.

 The objects are clustered or grouped based on the principle of


maximizing the intra-class similarity and minimizing the interclass
similarity.

 Objects within a cluster have high similarity in comparison to one


another, but are very dissimilar to objects in other clusters.
Outlier Analysis
 A database may contain data objects that do not comply with the general
behavior or model of the data.

 These data objects are outliers. Most data mining methods discard
outliers as noise or exceptions.

You might also like