Lesson 6 Analytics Methods

isp610 notes for student uitm

Uploaded by

nsyfqhamirah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views12 pages

Lesson 6 Analytics Methods

isp610 notes for student uitm

Uploaded by

nsyfqhamirah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Analytics Methods

ITS480 BUSINESS DATA ANALYTICS

Prepared by: Ezzatul Akmal Kamaru Zaman

November 18 1
Model Evaluation
• Evaluation metrics: How can we measure accuracy? Other metrics to
consider?
• Use test set of class-labeled tuples instead of training set when
assessing accuracy
• Methods for estimating a classifier’s accuracy:
• Holdout method, random subsampling
• Cross-validation
• Bootstrap
• Comparing classifiers:
• Confidence intervals
• Cost-benefit analysis and ROC Curves
November 18 2
Classifier Evaluation Metrics:
Accuracy & Error Rate
Confusion Matrix:

Actual class\Predicted class C1 ~C1

C1 True Positives (TP) False Negatives (FN)
~C1 False Positives (FP) True Negatives (TN)

Classifier Accuracy, or recognition rate: percentage of test set tuples that are
correctly classified,
true positives (TP): These are cases in which we predicted yes
(they have the disease), and they do have the disease.
true negatives (TN): We predicted no, and they don't have the
disease.
Error rate: 1 – accuracy, or false positives (FP): We predicted yes, but they don't actually
have the disease. (Also known as a "Type I error.")
false negatives (FN): We predicted no, but they actually do
have the disease. (Also known as a "Type II error.")
3
November 18 3
3
Classifier Evaluation Accuracy: Overall, how often is the classifier correct?
• (TP+TN)/total = (100+50)/165 = 0.91
Metrics: Misclassification Rate: Overall, how often is it wrong?
• (FP+FN)/total = (10+5)/165 = 0.09
Example - Confusion • equivalent to 1 minus Accuracy
• also known as "Error Rate"
Matrix True Positive Rate: When it's actually yes, how often does it
predict yes?
• TP/actual yes = 100/105 = 0.95
• also known as "Sensitivity" or "Recall"
False Positive Rate: When it's actually no, how often does it
predict yes?
• FP/actual no = 10/60 = 0.17
Specificity: When it's actually no, how often does it predict no?
• TN/actual no = 50/60 = 0.83
• equivalent to 1 minus False Positive Rate
Precision: When it predicts yes, how often is it correct?
• TP/predicted yes = 100/110 = 0.91
Prevalence: How often does the yes condition actually occur in
our sample?
• actual yes/total = 105/165 = 0.64
November 18 4
Sensitivity in yellow, specificity in red

November 18 5
Precision in red, recall in yellow

November 18 6
Equations
• sensitivity = recall = tp / t = tp / (tp + fn)
• specificity = tn / n = tn / (tn + fp)
• precision = tp / p = tp / (tp + fp)

November 18 7
Equations explanation
• Sensitivity/recall – how good a test is at detecting the positives. A test
can cheat and maximize this by always returning “positive”.
• Specificity – how good a test is at avoiding false alarms. A test can
cheat and maximize this by always returning “negative”.
• Precision – how many of the positively classified were relevant. A test
can cheat and maximize this by only returning positive on one result
it’s most confident in.
• The cheating is resolved by looking at both relevant metrics instead of
just one. E.g. the cheating 100% sensitivity that always says “positive”
has 0% specificity.
November 18 8
Classifier Evaluation Metrics:
Sensitivity and Specificity
• Class Imbalance Problem:
• one class may be rare, e.g. fraud detection data,
medical data
• significant majority of the negative class and minority
of the positive class
• Sensitivity: True Positive recognition rate,

• Specificity: True Negative recognition rate

9
November 18 9
Classifier Evaluation Metrics:
Example Predicted class

class cancer = cancer = no Total Recognition(%)

yes
cancer = yes 90 TP 210 FN 300 P 30.00
sensitivity
Actual class

cancer = no 140 FP 9560 TN 9700 N 98.56

specificity

Total 230 P’ 9770 N’ 10000 96.40

accuracy

Sensitivity = 90/300 = 30%

Specificity = 9560/9700 = 98.56% HIGH ACCURACY >90%,
Ability to classify positive class is low,
Ability to classify negative is high
Precision = 90/230 = 39.13%; Precision (exactness)
Recall (completeness)
Recall = 90/300 = 30.00%
10
November 18 10
Evaluating Classifier Accuracy:
Holdout & Cross-Validation Methods
• Holdout method
• Given data is randomly partitioned into two independent sets
• Training set (e.g., 2/3) for model construction
• Test set (e.g., 1/3) for accuracy estimation
• Random sampling: a variation of holdout Fig 1 : Holdout Method
• Repeat holdout k times, accuracy = avg. of the accuracies obtained
• Cross-validation (k-fold, where k = 10 is most popular)
• Randomly partition the data into k mutually exclusive subsets, each
approximately equal size
• At i-th iteration, use Di as test set and others as training set
• Leave-one-out: k folds where k = # of tuples, for small sized data,
one sample is left out for testing
• *Stratified cross-validation*: folds are stratified so that class dist. in each
fold is approx. the same as that in the initial data

November 18
Fig 2 : Cross-Validation Method
11
Model 1
Model 1 is better
Model Selection: ROC Curves Model 2 than Model 2.
Why?
• ROC (Receiver Operating Characteristics)
Diagonal
curves: for visual comparison of classification
line
models
• Originated from signal detection theory
• Shows the trade-off between the true positive
rate and the false positive rate  Vertical axis represents
the true positive rate
• The area under the ROC curve is a measure of
 Horizontal axis rep. the
the accuracy of the model
false positive rate
• Rank the test tuples in decreasing order: the  The plot also shows a
one that is most likely to belong to the positive diagonal line
class appears at the top of the list
 A model with perfect
• The closer to the diagonal line (i.e., the closer accuracy will have an area
the area is to 0.5), the less accurate is the of 1.0
12
model
November 18 12

3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
Unit6 -7 Issues_23bc7150-918a-4ebe-9af6-01db96af986a
No ratings yet
Unit6 -7 Issues_23bc7150-918a-4ebe-9af6-01db96af986a
53 pages
Classification Model, Features and Decision Region
No ratings yet
Classification Model, Features and Decision Region
17 pages
Lecture 5 Evaluation_Classifer
No ratings yet
Lecture 5 Evaluation_Classifer
61 pages
Analytic Method:: Model Evaluation
No ratings yet
Analytic Method:: Model Evaluation
17 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Data Mining Models and Evaluation Techniques
No ratings yet
Data Mining Models and Evaluation Techniques
59 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
bi2
No ratings yet
bi2
25 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
Unit 6-Feature Engineering and Sensitivity Analysis
No ratings yet
Unit 6-Feature Engineering and Sensitivity Analysis
63 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
37 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
18 pages
D3 IT Performance Metrics May 2023
No ratings yet
D3 IT Performance Metrics May 2023
48 pages
DL_IT324a_4
No ratings yet
DL_IT324a_4
52 pages
Classification - Performance Evlaution
No ratings yet
Classification - Performance Evlaution
13 pages
Classification Model, Features and Decision Region
No ratings yet
Classification Model, Features and Decision Region
17 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
L22 KNN+Metrics
No ratings yet
L22 KNN+Metrics
18 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
Ch01_ICS422_03
No ratings yet
Ch01_ICS422_03
46 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
CST 42315 Dam - L9 1
No ratings yet
CST 42315 Dam - L9 1
15 pages
EvaluationMatrix
No ratings yet
EvaluationMatrix
29 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
lecture11evaluationmetricsforclassification-240913060639-0c766554
No ratings yet
lecture11evaluationmetricsforclassification-240913060639-0c766554
28 pages
Classification Metrics.pptx
No ratings yet
Classification Metrics.pptx
39 pages
CH-5_ML
No ratings yet
CH-5_ML
36 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
Lectures3 5
No ratings yet
Lectures3 5
57 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
APznzaag02xO1GGi5u_A2DhJZs4CkLi9le3t7z9-R-wpvTJmn6o4ZfwQPBMHbFF9nnLxXjm40qffE-ZJQt7sji0grSXm812681Z1HXweJuujlkNekCE0LBXhi7QZzIbYwVm0Gy8OihuREB3yX-xuUY9vnUp00zdff4914hbLoLi_yw8ca2WGrMjDOn15XXUi5lnBdigIFlLgiIztS_axMl
No ratings yet
APznzaag02xO1GGi5u_A2DhJZs4CkLi9le3t7z9-R-wpvTJmn6o4ZfwQPBMHbFF9nnLxXjm40qffE-ZJQt7sji0grSXm812681Z1HXweJuujlkNekCE0LBXhi7QZzIbYwVm0Gy8OihuREB3yX-xuUY9vnUp00zdff4914hbLoLi_yw8ca2WGrMjDOn15XXUi5lnBdigIFlLgiIztS_axMl
15 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
25 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Prescription Writing March DR AZT_unlocked
No ratings yet
Prescription Writing March DR AZT_unlocked
129 pages
A10-Model-Performance-v2-2up
No ratings yet
A10-Model-Performance-v2-2up
11 pages
Module 6
No ratings yet
Module 6
24 pages
CE880_Lecture6_slides
No ratings yet
CE880_Lecture6_slides
25 pages
Instant Ebooks Textbook Setting Nutritional Standards Theory Policies Practices Elizabeth Neswald David F Smith Ulrike Thoms Download All Chapters
100% (1)
Instant Ebooks Textbook Setting Nutritional Standards Theory Policies Practices Elizabeth Neswald David F Smith Ulrike Thoms Download All Chapters
49 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
46 pages
Model Performance Assessment
No ratings yet
Model Performance Assessment
13 pages
9b. Evaluation of Classifiers
No ratings yet
9b. Evaluation of Classifiers
4 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
Esst - HS: Advantages and Disadvantages of Science and Technology
No ratings yet
Esst - HS: Advantages and Disadvantages of Science and Technology
2 pages
(Charles W. Socarides) Homosexuality - A Freedom T (B-Ok - Xyz)
100% (1)
(Charles W. Socarides) Homosexuality - A Freedom T (B-Ok - Xyz)
329 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Accuracy and error measures
No ratings yet
Accuracy and error measures
14 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
SESLHDPR715 - Lift Entrapment Emergency Procedures
No ratings yet
SESLHDPR715 - Lift Entrapment Emergency Procedures
20 pages
Evaluation Metrics:: Confusion Matrix
No ratings yet
Evaluation Metrics:: Confusion Matrix
7 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
MFDS Sample Questions
No ratings yet
MFDS Sample Questions
8 pages
Turner Magazine Winter 2024 Issuu
No ratings yet
Turner Magazine Winter 2024 Issuu
36 pages
Air Pollution Exercise
100% (2)
Air Pollution Exercise
3 pages
MSDS Ficha de Seguridad Tamiz Siliporite (ENG)
No ratings yet
MSDS Ficha de Seguridad Tamiz Siliporite (ENG)
20 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Keynotes Allen
No ratings yet
Keynotes Allen
332 pages
Ergomics Training Handouts
No ratings yet
Ergomics Training Handouts
44 pages
Toaz - Info Powerbuilding 6x Spreadsheetxlsx PR
No ratings yet
Toaz - Info Powerbuilding 6x Spreadsheetxlsx PR
215 pages
Pathophysiology of CHOLELITHIASIS
63% (8)
Pathophysiology of CHOLELITHIASIS
2 pages
SOP For Closed Cooling Water System
100% (1)
SOP For Closed Cooling Water System
16 pages
Admissões Por Queimaduras em Países de Baixa e Média Renda
No ratings yet
Admissões Por Queimaduras em Países de Baixa e Média Renda
9 pages
Years With Guruji: 1. Uttanasana 3. Uttanasana 2. Adho Mukha Svanasana
100% (2)
Years With Guruji: 1. Uttanasana 3. Uttanasana 2. Adho Mukha Svanasana
3 pages
Atiyah - Peer Pressure
No ratings yet
Atiyah - Peer Pressure
5 pages
Tarea 2 - Task 2 - Writing
No ratings yet
Tarea 2 - Task 2 - Writing
5 pages
Muhammad Farhan Ahmed - Logical Reasoning Test (LRT) - 2460002
No ratings yet
Muhammad Farhan Ahmed - Logical Reasoning Test (LRT) - 2460002
4 pages
Unit Ii
No ratings yet
Unit Ii
10 pages
Diabetes Mellitus (Definition, Classification, Clinical Features) PPT
No ratings yet
Diabetes Mellitus (Definition, Classification, Clinical Features) PPT
2 pages
E Portfolio Standards 6 and 7
No ratings yet
E Portfolio Standards 6 and 7
10 pages
10-You__x27_re_Not_Crazy___40_28-20__41_
No ratings yet
10-You__x27_re_Not_Crazy___40_28-20__41_
1 page
Laboratory Guide Nervous System
No ratings yet
Laboratory Guide Nervous System
2 pages
"Happiness - Enough Already"
No ratings yet
"Happiness - Enough Already"
5 pages
(Cô Vũ Mai Phương) Câu Hỏi Ngữ Pháp Trong Đề Thi Chính Thức Của Bộ Giáo Dục Và Đào Tạo (Buổi 2)
No ratings yet
(Cô Vũ Mai Phương) Câu Hỏi Ngữ Pháp Trong Đề Thi Chính Thức Của Bộ Giáo Dục Và Đào Tạo (Buổi 2)
2 pages
Mens's 6-8 Minute Workout
No ratings yet
Mens's 6-8 Minute Workout
3 pages
NCP Kasus 5
No ratings yet
NCP Kasus 5
9 pages
Resume Char Lie Hall 2012
No ratings yet
Resume Char Lie Hall 2012
1 page
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
From Everand
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
S. Deviant
4.5/5 (6)
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet