Post Processing Phase

Lecture 04
Post Processing Phase

Post processing Phase
 This phase is concerned with Filtering, evaluation,

visualization and interpretation of patterns generated during
data mining phase
 Patterns are local structures that makes statements only about
restricted regions of the space spanned by the variables,
Filtering Patterns,
Visualization,
Pattern
Interpretation
PostProcessing phase Knowledge
2
Evaluation of patterns
 Pattern evaluation involves assessing interestingness of

patterns using experts opinions and experimental analysis.
 A pattern is interesting if it is easily understood by humans,
valid on new or test data with some degree of certainty,
potentially useful, novel, or validates some hypothesis that a
user seeks to confirm
 Only the approved patterns are retained and the entire process
is revisited to identify which alternative actions could have
been taken to improve the results.
Pattern evaluation: Interesting pattern
Finding interesting
A pattern is interesting if it has the following properties:
1. Easily understood by humans,
2. Valid on new or test data with some degree of certainty
3. potentially useful
4. novel
5.validates some hypothesis that a user seeks to confirm
Pattern evaluation
Finding interesting patterns

 Data mining may generate thousands of patterns: Not all of
them are interesting
 Suggested approach: Human-centered, query-based, focused
mining
April 11, 2023 Data Mining: Concepts and Techniques 5

Pattern Evaluation: Measurements of Interestingness
1. Novel Patterns : new or unique patterns or not previously

known, surprising (this is used to remove redundant rules
Novel Metric: Uniqueness(new)
2. Valid patterns(Accurate pattern): The discovered patterns should
be suitable or applicable on new data with some degree of
accuracy.
Pattern Evaluation: Measurements of Interestingness
3. Potentially Useful(utility): The patterns should potentially lead

to some useful actions, as measured by some utility function.
Utility metric : support
4. Ultimately Understandable(simple): patterns comprehensible
or uncomplicated to humans in order to facilitate a better
understanding of the underlying data.
Simplicity metrics: Rule length, (decision) tree size etc

Pattern Evaluation: Statistical Evaluation measures
 Kappa statistic” is used to measure how strongly data items in

the same class resemble each other. That is, the level of intra-
class correlation.
 It is similar to correlation coefficient.
 0.0= complete disagreement (do not resemble each other).
 0.40 to 0.59 = moderate agreement
 0.60 to 0.79= substantial agreement
 above 0.80 = outstanding agreement
 1.0= complete agreement (items strongly resemble each other)
 An absolute error is the magnitude of the difference between the
exact value and the approximation (prediction)
 Mean Absolute Error (MAE) can be defined as sum of absolute
errors divided by number of predictions.
 MAE measures set of predicted value to actual value i.e. how
close a predicted model to actual model.
 Small value of MAE means better prediction of model
 Root Mean Square Error (RMSE) is defined as square root of sum
of squares error divided number of predictions.
 RMSE measures the differences between values predicted by a
model and the values actually observed. Small value of RMSE
means better accuracy of model.
 So, minimum of RMSE & MAE is better prediction and accuracy.
9
 Confusion matrix
 Also known as contingency table.
 It shows the number of correctly classified instances and the
number of incorrectly classified instances,
 The number of Correctly classified instances is obtained by
calculating the sum of diagonals in the matrix.
 all others are incorrectly classified
 Confusion matrix Example
a b <-- classified as
7 2 | a = yes
3 2 | b = no
 class "a" gets misclassified as "b" exactly twice, and class "b"
gets misclassified as "a" three times).
True positive (TP): The number of items correctly retrieved

(labelled as belonging to the positive class
True Negative(TN):The total number of items correctly labelled
as belonging to Negative class
False positives(FP): Items that are incorrectly labeled as
belonging to positive class ( or retrieved items that are not
relevant) . Also known as Type1 error
•False negatives(Fn): Items which were not labelled as

belonging to the positive class but should have been.
(or relevant items not retrieved). Also known as Type11 error
 Accuracy level is the number of correctly classified items

divided by the total number of all items
 Accuracy (A) = (tp+tn)/Total # samples
 Precision
 Also called positive predictive value.
 It is the fraction of retrieved instances that are relevant
 precision=tp/(tp+fp)
 That is, the proportion of instances which truly have

class x among all those which were classified as class x.
 In confusion matrix it is the diagonal element divided by the
sum over the relevant column,
 Precision Example
7 2 | a = yes
3 2 | b = no
 Precision for class yes=7/(7+3)=0.7
 Precision for class no =2/(2+2)=0.5
 a perfect precision score of 1.0 means that every item in
positive class was relevant
 However, it says nothing about whether all relevant items
were classified in the positive class
 Recall,
 The number of true positives(tp) divided by the total number
of elements that actually belong to the positive class (tp+fn).
 Also called True positive rate.
 Recall (R) = tp/(tp+fn)
 It is the fraction of relevant instances that are retrieved.
 A perfect recall score of 1.0 means that all relevant items were
classified in the positive class .
 However, It says nothing about how many irrelevant items
were also included in the positive class.
 Recall example
7 2 | a = yes
3 2 | b = no
 Recall (R) = tp/(tp+fn)

 =7/(7+2)=7/9= 0.778 for class yes
 =2/(3+2)=0.4 for class no.
 The True Positive (TP) rate is the proportion of examples which were
classified as class x, among all examples which truly have class x.
 i.e. how much part of the class was captured. It is equivalent to Recall.
 TP rate Example:
7 2 | a = yes
3 2 | b = no
 In the above confusion matrix,
 tp rate is the diagonal element divided by the sum over the relevant row,
i.e. 7/(7+2)=0.778 for class yes and 2/(3+2)=0.4 for class no in our
example.
 The False Positive (FP) rate
 This is the proportion of examples which were classified as class x, but
belong to a different class, among all examples which are not of class x.
 In the matrix, this is the column sum of class x minus the diagonal
element, divided by the rows sums of all other classes;
 Example
7 2 | a = yes
3 2 | b = no
 FPR for class yes = 3/5=0.6
 FPR for class no = 2/ 5=0.222 .
 Precision vs Recall
 A precision score of 1.0 for a class C means that every item
labelled as belonging to class C does indeed belong to class C
(but says nothing about the number of items from class C that
were not labelled correctly)
 a recall of 1.0 means that every item from class C was
labelled as belonging to class C (but says nothing about how
many other items were incorrectly also labelled as belonging
to class C).
 F-Measure
 This is a combined measure for precision and recall.
 The two measures are used together in the
 f-measure) to provide a single measurement for a system.
 It is computed as follows:
 F-Measure= 2*Precision*Recall/(Precision+Recall)
 Receiver operating Characteristic(ROC) curve

 ROC curve, is a graphical plot that illustrates the
performance of a binary classifier system as its
discrimination threshold is varied.
 The curve is created by plotting the true positive rate(TPR)
against the false positive rate(FPR) at various threshold
settings
 For instance
 A value near 0.5 means the lack of any statistical
dependence.
 ROC Area
 The area under a ROC curve quantifies the overall ability of
the test to discriminate between those individuals with the
condition (TP) and those without the condition(TN).
 A truly useless test : has an area of 0.5.
 Meaning that it is not better at identifying true positives than
false postives.
 A perfect test: Has zero false positives (FP)and zero false
negatives) has an area of 1.00.
 Meaning that it is better at identifying true positives
 The test has an area between these two values.

 ROC area Example
 ROC Area
 The graph shows three ROC curves representing excellent,
good, and worthless tests plotted on the same graph.
 The accuracy of the test depends on how well the test
separates the group being tested into those with and without
the condition in question.
 Accuracy is measured by the area under the ROC curve.
 An area of 1 represents a perfect test;
 an area of 0.5 represents a worthless test.
 ROC area
A rough guide for classifying the accuracy of a diagnostic test is
the traditional academic point system:
0.90-1 = excellent (A)
0.80-0.90 = good (B)
0.70-0.80 = fair (C)
0.60-0.70 = poor (D)
0.50-0.60 = fail (F)
A value near 0.5 means the lack of any statistical
dependence.
 ROC area Example
 ROC area
 ROC curves can also be constructed from clinical prediction rules.
 The graphs shows how clinical findings predict strep throat . The
study compared patients in Virginia(VA) and Nebraska (NE) .
 It was observed that the rule performed more accurately in
Virginia (VA) since area under the curve =0.78 compared to
Nebraska, whose area under the curve = 0.73
 Mathews correlation coefficient(MCC)

 A coefficient of +1 represents a perfect prediction, 0 no
better than random prediction
 A coefficient of −1 indicates total disagreement between
prediction and observation.
 The statistic is also known as the phi coefficient. MCC is
related to the chi-square statistic for a 2×2 contingency table
 Precision Recall curve(PRC) Area
 The PRC plot shows the relationship between precision and
sensitivity (Recall),
 It have been cited as an alternative to ROC curves for tasks
with a large skew in the class distribution.
 ROC space and PR space differs in the visual representation
of the curves
 In PR space the goal is to be in the upper-right-hand corner
while the goal in ROC space is to be in the upper-left-hand
corner
Pattern Visualization
 Visualization is the process by which data are converted into

meaningful 3-D images or some graphical representation.
 Visualization use good interface and graphics to present the
results of data mining.
 Visualization techniques are important for making the results
useful
Examples
 1. tables
 2. cross tabs
 3. pie/bar chart
4. rooted tree
Main purpose for visualization
Visualization techniques provides graphical results of data

mining in order to support decision making process
Increasing potential
to support
decision making End User
decide how
To use
knowledge
post processing
Visualization Techniques
Data Mining Bio informatics

Knowledge Discovery Analyst
Pre-processing
Data Warehouses / Data Marts
DBA
Data Sources
Paper, Files, Information Providers, Database Systems,
Benefits of pattern Visualization
1. Provide insight into an information space by mapping data

onto graphics
2. Help find interesting regions and suitable parameters for
further quantitative analysis
3. Provide a visual proof of relationships Within group and
between groups at the same time.
Benefits of Pattern Visualization
4. Conveys information easily: picture is worth a thousand

words”
 Example:
Disadvantages of Visualization
1. It Requires human eyes, some people have defective

eyes(e.g color blind),hence they may not notice
difference in colors.
2. It can be misleading
Example of misleading visualization:
Year Sales
1999 2,110 Sales
2000 2,105 2130

2001 2,120 2125
2120
2002 2,121 2115
Sales
2003 2,124 2110
2105
2100
2095
1999 2000 2001 2002 2003
Y-Axis scale gives WRONG

impression of big change
Better Visualization example
Year Sales Sales
1999 2,110
3000
2000 2,105 2500
2001 2,120 2000
1500 Sales
2002 2,121
1000
2003 2,124
500
Axis from 0 to 2000 0

1999 2000 2001 2002 2003
scale gives
correct impression of
small change
Tufte’s Principles of graphical Excellence
 Give the viewer
1. The greatest number of ideas
2. in the shortest time
3. with the least ink in the smallest space.
4. Tell the truth about the data!
Visualization: Example
 Example: Consider the following tree.
39
Visualization: Example
 The data in previous slide can be visualized using as follows:
During visualization, C4.5 creates a threshold and then splits the list
into categories whose attribute value is above the threshold and those
that are less than or equal to it.
Interpretation of Patterns
 This involves explain data mining results by describing patterns
produced during mining
 This requires interaction with domain experts
 Visualization makes interpretation easier
 Example:
The tree in the previous slide shows that
There are 50 setosa in the original dataset without any
misclassification, so this was successful.
46 samples reached virginica leaf and 45 of them were
virginica, but 1 of the samples was not a virginica.
41
Knowledge usage
 The understandable patterns are used to:

 Make predictions or classifications about new data
 Summarize the contents of a large database to support
decision making
 Fund new research.
 Explain existing data
 Graphical data visualization to aid humans in discovering
deeper patterns
Summarized Process of Knowledge Discovery (KDD) process
Integration
Interpretation Knowledge
Da & Evaluation
ta
Mi
nin
Tr g Knowledge
an
sfo
r m
Understanding
Int & __ __ __
Patterns
sel
RawData Cl egra ec t
__ __ __
and
ea tio __ __ __
nin n
g & Rules
Transformed
Target Data
DATA
Data
Ware
house
Required effort for each KDD Step
• Arrows indicate the direction we hope the effort should go.

The structure of Typical Data Mining/knowledge discovery System
Graphical User Interface
Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data
Warehouse Server
data cleaning, integration, and selection
Data World-Wide Other Info

Database Repositories
Warehouse Web

Lab Exercise
 Use decision tree learning algorithms in weka machine

learning software to mine weather data .
 Visualize the output
 Interpret the results
 NB
 If ID3 is disabled in the Explorer it is because your data
contains numeric attributes.
 ID3 only operates on nominal attributes.
 J48 operates both nominal and numeric attributes
End
Thank you
Questions

Post Processing Phase

Uploaded by

Copyright:

Available Formats

Post Processing Phase

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Post Processing Phase

Uploaded by

Copyright:

Available Formats

Lecture 04

Post Processing Phase

 This phase is concerned with Filtering, evaluation,

PostProcessing phase Knowledge

 Pattern evaluation involves assessing interestingness of

Finding interesting patterns

April 11, 2023 Data Mining: Concepts and Techniques 5

1. Novel Patterns : new or unique patterns or not previously

3. Potentially Useful(utility): The patterns should potentially lead

April 11, 2023 Data Mining: Concepts and Techniques 7

 Kappa statistic” is used to measure how strongly data items in

True positive (TP): The number of items correctly retrieved

•False negatives(Fn): Items which were not labelled as

 Accuracy level is the number of correctly classified items

 That is, the proportion of instances which truly have

 Recall (R) = tp/(tp+fn)

 Receiver operating Characteristic(ROC) curve

 Mathews correlation coefficient(MCC)

 Visualization is the process by which data are converted into

Visualization techniques provides graphical results of data

Data Mining Bio informatics

Data Warehouses / Data Marts

1. Provide insight into an information space by mapping data

4. Conveys information easily: picture is worth a thousand

1. It Requires human eyes, some people have defective

1999 2,110 Sales

2000 2,105 2130

Y-Axis scale gives WRONG

Better Visualization example

Year Sales Sales

Axis from 0 to 2000 0

 Example: Consider the following tree.

 The understandable patterns are used to:

• Arrows indicate the direction we hope the effort should go.

Graphical User Interface

data cleaning, integration, and selection

Data World-Wide Other Info

April 11, 2023 Data Mining: Concepts and Techniques 45

 Use decision tree learning algorithms in weka machine

You might also like