0% found this document useful (0 votes)

215 views3 pages

9780133354690

The document discusses evaluating different candidate rules (R1, R2, R3) for a classification problem using various metrics including accuracy, information gain, likelihood ratio, Laplace measure, and m-estimate measure. For the training data, R2 performed best according to information gain, Laplace measure, and m-estimate, while R3 performed best based on likelihood ratio. When evaluating rules learned from another dataset, R1 achieved the highest scores on likelihood ratio, Laplace measure, and m-estimate.

Uploaded by

Pushkal KS Vaidya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

215 views3 pages

9780133354690

Uploaded by

Pushkal KS Vaidya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment#4 Solutions (Chapter 5)

4. Consider a training set that contains 100 positive examples and 400 negative examples.
For each of the following candidate rules,

R1 : A  + (covers 4 positive and 1 negative examples),

R2 : B  + (covers 30 positive and 10 negative examples),
R3 : C  + (covers 100 positive and 90 negative examples),

determine which is the best and worst candidate rule according to:

a) Rule accuracy.

Answer: The accuracies of the rules are 80% (for R1), 75% (for R2), and 52.6% (for R3),
respectively. Therefore R1 is the best candidate and R3 is the worst candidate according to
rule accuracy.

b) FOIL’s information gain.

Answer: Assume the initial rule is ∅  +. This rule covers p0 = 100 positive examples
and n0 = 400 negative examples.
The rule R1 covers p1 = 4 positive examples and n1 = 1 negative example.
Therefore, the information gain for this rule is

4 [ log(4/5)-log(100/500)]=8.

The rule R2 covers p1 = 30 positive examples and n1 = 10 negative examples. Therefore,

the information gain for this rule is

30 [ log(30/40) –log(100/500)] = 57.2

The rule R3 covers p1 = 100 positive examples and n1 = 90 negative examples. Therefore,
the information gain for this rule is

100 [log (100/190) – log (100/500) ] = 139.6

Therefore, R3 is the best candidate and R1 is the worst candidate according to FOIL’s
information gain.

c) The likelihood ratio statistic.

Answer: For R1, the expected frequency for the positive class is 5 × 100/500 = 1 and the
expected frequency for the negative class is 5 × 400/500 = 4. Therefore, the likelihood
ratio for R1 is

2 × [ 4 × log2(4/1) + 1 × log2(1/4) ] = 12.

For R2, the expected frequency for the positive class is 40×100/500 = 8 and the expected
frequency for the negative class is 40 × 400/500 = 32. Therefore, the likelihood ratio for R2 is

2 × [ 30 × log2(30/8) + 10 × log2(10/32) ] = 80.85

For R3, the expected frequency for the positive class is 190 ×100/500 = 38 and the expected
frequency for the negative class is 190 ×400/500 = 152. Therefore, the likelihood ratio for R3 is

2 × [ 100 × log2(100/38) + 90 × log2(90/152) ] = 143.09

Therefore, R3 is the best candidate and R1 is the worst candidate according to the likelihood
ratio statistic.

d) The Laplace measure.

Answer: The Laplace measure of the rules are 71.43% (for R1), 73.81% (for R2), and 52.6%
(for R3), respectively. Therefore R2 is the best candidate and R3 is the worst candidate
according to the Laplace measure.

e) The m-estimate measure (with k = 2 and p+ = 0.2).

Answer: The m-estimate measure of the rules are 62.86% (for R1), 73.38% (for R2), and 52.3%
(for R3), respectively. Therefore R2 is the best candidate and R3 is the worst candidate
according to the m-estimate mea-sure.

5. Figure 5.4 illustrates the coverage of the classification rules R1, R2, and R3. Determine
which is the best and worst rule according to:

a) The likelihood ratio statistic.

Answer: There are 29 positive examples and 21 negative examples in the data set. R1 covers
12 positive examples and 3 negative examples. The expected frequency for the positive class is
15 × 29/50 = 8.7 and the expected frequency for the negative class is 15×21/50 = 6.3.
Therefore,
the likelihood ratio for R1 is

2 × [12 × log2(12/8.7) + 3 × log2(3/6.3) ] = 4.71.

R2 covers 7 positive examples and 3 negative examples. The expected frequency for the
positive class is 10 × 29/50 = 5.8 and the expecte
frequency for the negative class is 10 × 21/50 = 4.2. Therefore, the likelihood ratio for R2 is

2 × [ 7 × log2(7/5.8) + 3 × log2(3/4.2) ] = 0.89.

R3 covers 8 positive examples and 4 negative examples. The expected frequency for the positive
class is 12 × 29/50 = 6.96 and the expected frequency for the negative class is 12 × 21/50 = 5.04.
Therefore, the likelihood ratio for R3 is

2 × [ 8 × log2(8/6.96) + 4 × log2(4/5.04) ] = 0.5472.

R1 is the best rule and R3 is the worst rule according to the likelihood ratio statistic.

b) The Laplace measure.

Answer: The Laplace measure for the rules are 76.47% (for R1), 66.67% (for R2), and 64.29% (for
R3), respectively. Therefore R1 is the best rule and R3 is the worst rule according to the Laplace
measure.

c) The m-estimate measure (with k = 2 and p+ = 0.58).

Answer: The m-estimate measure for the rules are 77.41% (for R1), 68.0% (for R2), and 65.43%
(for R3), respectively. Therefore R1 is the best rule and R3 is the worst rule according to the m-
estimate measure.

d) The rule accuracy after R1 has been discovered, where none of the examples covered by R1
are discarded).

Answer: If the examples for R1 are not discarded, then R2 will be chosen because it has a higher
accuracy (70%) than R3 (66.7%).

e) The rule accuracy after R1 has been discovered, where only the positive examples covered by
R1 are discarded).

Answer: If the positive examples covered by R1 are discarded, the new accuracies for R2 and R3
are 70% and 60%, respectively. Therefore R2 is preferred over R3.

f) The rule accuracy after R1 has been discovered, where both positive and negative examples
covered by R1 are discarded.

Answer: If the positive and negative examples covered by R1 are discarded, the new accuracies for
R2 and R3 are 70% and 75%, respectively. In this case, R3 is preferred over R2.

Mathematical Statistics Keith Knight Solution Manual PDF
0% (7)
Mathematical Statistics Keith Knight Solution Manual PDF
3 pages
Answers To Problems For Software Engineering, 10th Edition by Ian Sommerville
No ratings yet
Answers To Problems For Software Engineering, 10th Edition by Ian Sommerville
6 pages
Devops Assignment 1
No ratings yet
Devops Assignment 1
33 pages
Math11statprob Finals
No ratings yet
Math11statprob Finals
28 pages
Computer Networks 159.334 Answers Tutorial No. 5 Professor Richard Harris
No ratings yet
Computer Networks 159.334 Answers Tutorial No. 5 Professor Richard Harris
3 pages
Numerical in Computer Networks
No ratings yet
Numerical in Computer Networks
208 pages
Unit-V Risk Management Reactive vs. Proactive Risk Strategies
No ratings yet
Unit-V Risk Management Reactive vs. Proactive Risk Strategies
13 pages
CS341Tut3 PDF
100% (1)
CS341Tut3 PDF
3 pages
Uncertainty AI
No ratings yet
Uncertainty AI
45 pages
Faculty of Engineering Scit B. Tech It/Cse/Cce VI Semester First Mid Term Examination: 2021-22 Data Mining and Warehousing (IT3240)
No ratings yet
Faculty of Engineering Scit B. Tech It/Cse/Cce VI Semester First Mid Term Examination: 2021-22 Data Mining and Warehousing (IT3240)
2 pages
Assignment SE
No ratings yet
Assignment SE
1 page
Unit 5
No ratings yet
Unit 5
72 pages
PHD CS 100 Questions Answers Complete
No ratings yet
PHD CS 100 Questions Answers Complete
9 pages
Software Quality Metric Unit 2 Notes
No ratings yet
Software Quality Metric Unit 2 Notes
16 pages
SEN Important Question: 1) Describe The Layered Technology Approach of Software Engineering
No ratings yet
SEN Important Question: 1) Describe The Layered Technology Approach of Software Engineering
49 pages
Operating System Solutions
No ratings yet
Operating System Solutions
2 pages
Operations Research
No ratings yet
Operations Research
23 pages
Density & Grid Based Clustering
100% (1)
Density & Grid Based Clustering
21 pages
Fundamentals of Cyber Security Question Paper
No ratings yet
Fundamentals of Cyber Security Question Paper
10 pages
Presentation On: Presented To Dr. Vinay Pathak
89% (19)
Presentation On: Presented To Dr. Vinay Pathak
37 pages
CH 03
No ratings yet
CH 03
67 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
Rayleigh Model
No ratings yet
Rayleigh Model
9 pages
The Wumpus World in Artificial Intelligence
No ratings yet
The Wumpus World in Artificial Intelligence
9 pages
Comprehensive 1 PDF
No ratings yet
Comprehensive 1 PDF
154 pages
Java Collections PDF
No ratings yet
Java Collections PDF
566 pages
Cloud Computing Security Testing
No ratings yet
Cloud Computing Security Testing
12 pages
Big Data Lec5
No ratings yet
Big Data Lec5
37 pages
MT 1117: Linear Algebra For ICT: Instructor: A.V. Mathias Department of Mathematics & Statistics University of Dodoma
No ratings yet
MT 1117: Linear Algebra For ICT: Instructor: A.V. Mathias Department of Mathematics & Statistics University of Dodoma
25 pages
Salaryconditional
No ratings yet
Salaryconditional
1 page
Branch and Bound NOV 2021
No ratings yet
Branch and Bound NOV 2021
38 pages
N Queen Problem
No ratings yet
N Queen Problem
12 pages
Software Testing - 2024 - Assignment 1
No ratings yet
Software Testing - 2024 - Assignment 1
5 pages
Lab Manual Java PDF
No ratings yet
Lab Manual Java PDF
27 pages
SE - 2024 - Assignment 10
No ratings yet
SE - 2024 - Assignment 10
7 pages
Uncertainty in Expert Systems
67% (3)
Uncertainty in Expert Systems
2 pages
1.inception Phase: SPM Unit 2
No ratings yet
1.inception Phase: SPM Unit 2
7 pages
Distributed File System
No ratings yet
Distributed File System
49 pages
Data Mining Comprehensive Exam - Regular PDF
No ratings yet
Data Mining Comprehensive Exam - Regular PDF
3 pages
CS2055 - Software Quality Assurance
No ratings yet
CS2055 - Software Quality Assurance
15 pages
Markov Analysis
100% (1)
Markov Analysis
4 pages
Unit-2 SE Notes DKPJ
No ratings yet
Unit-2 SE Notes DKPJ
21 pages
MULTIPLE CHOICE. Choose The One Alternative That Best Completes The Statement or Answers The Question
No ratings yet
MULTIPLE CHOICE. Choose The One Alternative That Best Completes The Statement or Answers The Question
54 pages
A Report of Six Weaks Industrial Training at BBSBEC, Fatehgarh Sahib
No ratings yet
A Report of Six Weaks Industrial Training at BBSBEC, Fatehgarh Sahib
24 pages
Steganography Project Report For Major Project in B Tech
No ratings yet
Steganography Project Report For Major Project in B Tech
74 pages
Software Risk, Configuration Management
No ratings yet
Software Risk, Configuration Management
35 pages
Software Engineering
No ratings yet
Software Engineering
44 pages
OOSE Lab Report
No ratings yet
OOSE Lab Report
30 pages
III Year V Sem Cs6503 Theory of Computation
No ratings yet
III Year V Sem Cs6503 Theory of Computation
44 pages
Software Metrics-3
No ratings yet
Software Metrics-3
19 pages
Solution
No ratings yet
Solution
9 pages
UGB 108 Quantitative Methods For Business
No ratings yet
UGB 108 Quantitative Methods For Business
10 pages
Massachusetts Institute of Technology
No ratings yet
Massachusetts Institute of Technology
12 pages
Spring 2006 Final Solution
No ratings yet
Spring 2006 Final Solution
12 pages
2023 U6 Statistics Test 9 Share
No ratings yet
2023 U6 Statistics Test 9 Share
2 pages
Assignment-2: Name: Ahamad Ashique Mozumder ID: 1821474030 Section: 25
No ratings yet
Assignment-2: Name: Ahamad Ashique Mozumder ID: 1821474030 Section: 25
13 pages
MQ's School of Mathematics: Mock - Examinations
No ratings yet
MQ's School of Mathematics: Mock - Examinations
6 pages
Review Final 200 Questions
No ratings yet
Review Final 200 Questions
53 pages
Computational Thinking
No ratings yet
Computational Thinking
34 pages
2023 U6 Statistics Test 4 Share.... PM
No ratings yet
2023 U6 Statistics Test 4 Share.... PM
2 pages
University of Dar Es Salaam College of Natural and Applied Sciences
100% (1)
University of Dar Es Salaam College of Natural and Applied Sciences
9 pages
Statistics Homework Help
No ratings yet
Statistics Homework Help
26 pages
UNit 1 1CN
No ratings yet
UNit 1 1CN
50 pages
7th STD English Poem 08-The Quarrel Notes 2019-20
No ratings yet
7th STD English Poem 08-The Quarrel Notes 2019-20
2 pages
Unit - 2 Part 1 - CNS
No ratings yet
Unit - 2 Part 1 - CNS
44 pages
Cryptography
No ratings yet
Cryptography
53 pages
Unit - 2 Part 1 - CNS
No ratings yet
Unit - 2 Part 1 - CNS
12 pages
ch5 PDF
No ratings yet
ch5 PDF
28 pages
CNC Test
No ratings yet
CNC Test
7 pages
Buffer Hier
No ratings yet
Buffer Hier
1 page
4 Rules
No ratings yet
4 Rules
23 pages
A.P.S. College of Engineering: Visvesvaraya Technological University
No ratings yet
A.P.S. College of Engineering: Visvesvaraya Technological University
2 pages
Physics Investigatory Project
0% (1)
Physics Investigatory Project
2 pages
Table of Resistivity
50% (2)
Table of Resistivity
2 pages
File Structure Module 1 Notes
No ratings yet
File Structure Module 1 Notes
17 pages
Basic Concepts of One Way Analysis of Variance (ANOVA)
No ratings yet
Basic Concepts of One Way Analysis of Variance (ANOVA)
38 pages
14chap 1.5 Problems On Normal Distribution
100% (2)
14chap 1.5 Problems On Normal Distribution
14 pages
Unit 4 Statistics
No ratings yet
Unit 4 Statistics
33 pages
S5 M1 Quiz 5 - Further Probability
No ratings yet
S5 M1 Quiz 5 - Further Probability
6 pages
Intervals
No ratings yet
Intervals
43 pages
SME10e SM App15
No ratings yet
SME10e SM App15
17 pages
Stat 101 (Reviewer)
No ratings yet
Stat 101 (Reviewer)
5 pages
STATISTICS
No ratings yet
STATISTICS
7 pages
Message
No ratings yet
Message
4 pages
Business Analytics CHAPTER 5
No ratings yet
Business Analytics CHAPTER 5
3 pages
Wa0039.
No ratings yet
Wa0039.
80 pages
METHODOLOGY PR2 All
No ratings yet
METHODOLOGY PR2 All
11 pages
Practice Exam Chapter 7
No ratings yet
Practice Exam Chapter 7
3 pages
Statistics For Managememt
No ratings yet
Statistics For Managememt
152 pages
Statistics-8 - Q1 - Module 1 History of Statistics
No ratings yet
Statistics-8 - Q1 - Module 1 History of Statistics
12 pages
Math13 - Advanced Statistics Lecture Note: Case of Two Independent Samples
No ratings yet
Math13 - Advanced Statistics Lecture Note: Case of Two Independent Samples
27 pages
Module 2 ILT1 Statistics in Analytical Chemistry
No ratings yet
Module 2 ILT1 Statistics in Analytical Chemistry
7 pages
Week 1 2 NumData TableChart
No ratings yet
Week 1 2 NumData TableChart
11 pages
Chapter 1. Defining and Collecting Data Triệu Vi Gửi
No ratings yet
Chapter 1. Defining and Collecting Data Triệu Vi Gửi
43 pages
SHS Demo Presentation
No ratings yet
SHS Demo Presentation
50 pages
S3 Chp5 CorrelationExamQuestions
No ratings yet
S3 Chp5 CorrelationExamQuestions
4 pages
15ma301 MCQ Unit 1 - Unit 5
No ratings yet
15ma301 MCQ Unit 1 - Unit 5
16 pages
Chapter 06 - Probability Theory
No ratings yet
Chapter 06 - Probability Theory
64 pages
Research and Medical Statistics (Basic To Inference) .PPT (Read-Only) (Compa
100% (1)
Research and Medical Statistics (Basic To Inference) .PPT (Read-Only) (Compa
265 pages
Stat Assignment
No ratings yet
Stat Assignment
11 pages
STAT400
No ratings yet
STAT400
6 pages
Computer Oriented Statistical Methods - Lab List of Programs
No ratings yet
Computer Oriented Statistical Methods - Lab List of Programs
26 pages