Assignment 3

Given a large dataset that cannot fit into memory, an efficient decision tree construction method is to load chunks of the data into memory in batches and build subtrees on each batch, combining the subtrees later. For a decision tree with rules, converting to rules then pruning has an advantage over pruning then converting, as it allows pruning rules individually rather than whole subbranches. The best candidate rule for several measures used to evaluate rules based on a training set with 100 positive and 400 negative examples is R2, while the worst is R3.

Uploaded by

Ashutosh Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views

Assignment 3

Uploaded by

Ashutosh Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Assignment 3

1. Given a 5 GB data set with 50 attributes (each containing 100 distinct values) and
512 MB of main memory in your laptop, outline an efficient method that constructs
decision trees in such large data sets. Justify your answer by rough calculation of
your main memory usage.

2. Given a decision tree, you have the option of (a) converting the decision tree to rules
and then pruning the resulting rules, or (b) pruning the decision tree and then
converting the pruned tree to rules. What advantage does (a) have over (b)?

3. Consider a training set that contains 100 positive examples and 400 negative
examples. For each of the following candidate rules,
R1: A -→ + (covers 4 positive and 1 negative examples),
R2: B -→ + (covers 30 positive and 10 negative examples),
R3: C -→ + (covers 100 positive and 90 negative examples),
determine which is the best and worst candidate rule according to:

I. FOIL’s information gain.

II. The likelihood ratio statistic
III. The Laplace measure.
IV. The m-estimate measure (with k = 2 and p+ = 0.2).
V. Rule accuracy.

4. (a) Suppose the fraction of undergraduate students who smoke is 15% and the
fraction of graduate students who smoke is 23%. If one-fifth of the college students
are graduate students and the rest are undergraduates, what is the probability that a
student who smokes is a graduate student?
(b) Given the information in part (a), is a randomly chosen college student more
likely to be a graduate or undergraduate student?
(c) Repeat part (b) assuming that the student is a smoker.
(d) Suppose 30% of the graduate students live in a dorm but only 10% of the
undergraduate students live in a dorm. If a student smokes and lives in the dorm, is
he or she more likely to be a graduate or undergraduate student? You can assume
independence between students who live in a dorm and those who smoke.

5. Consider the data set shown in Table 1

(a) Estimate the conditional probabilities for P(A|+), P(B|+), P(C|+), P(A|-), P(B|-),
and P(C|-).
(b) Use the estimate of conditional probabilities given in the previous question to
predict the class label for a test sample (A = 0, B = 1, C = 0) using the naive Bayes
approach.
(c) Estimate the conditional probabilities using the m-estimate approach, with p =
1/2 and m = 4.
(d) Repeat part (b) using the conditional probabilities given in part (c).

Table 1

IIT Kanpur Machine Learning End Sem Paper
No ratings yet
IIT Kanpur Machine Learning End Sem Paper
10 pages
Assignment 3: 1. Consider A Binary Classification Problem With The Following Set of Attributes and
No ratings yet
Assignment 3: 1. Consider A Binary Classification Problem With The Following Set of Attributes and
2 pages
Posh Shearwater: 3,200 DWT/ Platform Supply Vessel (PSV) / DP 2 Call Sign: 3EJM7
No ratings yet
Posh Shearwater: 3,200 DWT/ Platform Supply Vessel (PSV) / DP 2 Call Sign: 3EJM7
3 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
30 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
Anova Report Ankita PDF
No ratings yet
Anova Report Ankita PDF
16 pages
Problem 1 - (Download Data) : Importing Nessceary Libraries
No ratings yet
Problem 1 - (Download Data) : Importing Nessceary Libraries
16 pages
Estimation Problems
No ratings yet
Estimation Problems
4 pages
Business Statistics
No ratings yet
Business Statistics
9 pages
COSC 6342"machine Learning" Homework1 Spring 2013
No ratings yet
COSC 6342"machine Learning" Homework1 Spring 2013
9 pages
AI+and+ML Assigment 03
No ratings yet
AI+and+ML Assigment 03
4 pages
Sample Practice Final Exam 2004
No ratings yet
Sample Practice Final Exam 2004
13 pages
Ch-7 Estimation of Parameter
No ratings yet
Ch-7 Estimation of Parameter
2 pages
Practice Exam Final
No ratings yet
Practice Exam Final
11 pages
Updated Assignment#3 MAS2001
No ratings yet
Updated Assignment#3 MAS2001
3 pages
HW02
100% (1)
HW02
4 pages
Classification: K N X X X y I y
No ratings yet
Classification: K N X X X y I y
6 pages
Assignment 1 3
No ratings yet
Assignment 1 3
4 pages
Ps 3
No ratings yet
Ps 3
6 pages
Assignment 3 Solution
No ratings yet
Assignment 3 Solution
3 pages
Week 7-8 Point Estimators and Statistical Intervals
No ratings yet
Week 7-8 Point Estimators and Statistical Intervals
32 pages
2743021a949b2be20a570e94ff11f796 (1)
No ratings yet
2743021a949b2be20a570e94ff11f796 (1)
17 pages
Assignment
50% (2)
Assignment
10 pages
Work Sheet V - Sampling Distribution
No ratings yet
Work Sheet V - Sampling Distribution
4 pages
Assignment3 Fall2018 Version
No ratings yet
Assignment3 Fall2018 Version
5 pages
2324-CS420-21CTT1-IA05
No ratings yet
2324-CS420-21CTT1-IA05
4 pages
Review Final 200 Questions
No ratings yet
Review Final 200 Questions
53 pages
Review Mid Term Class 2 Solutions
No ratings yet
Review Mid Term Class 2 Solutions
5 pages
Solutions 7
No ratings yet
Solutions 7
4 pages
Practice Questions
No ratings yet
Practice Questions
5 pages
BRM
0% (1)
BRM
6 pages
2020 Bbac 312 Bba 241 A2
No ratings yet
2020 Bbac 312 Bba 241 A2
5 pages
Math T STPM Sem 3 2019
No ratings yet
Math T STPM Sem 3 2019
2 pages
Valid From 1 January To 31 December 2014: - It Is Compulsory To Submit The Assignment Before Filling in The
No ratings yet
Valid From 1 January To 31 December 2014: - It Is Compulsory To Submit The Assignment Before Filling in The
7 pages
31.data Inferences
0% (2)
31.data Inferences
5 pages
Sample Questions For Final
No ratings yet
Sample Questions For Final
8 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
MFDS - Test 1 Problems
No ratings yet
MFDS - Test 1 Problems
9 pages
Midterm-Exam-2020
No ratings yet
Midterm-Exam-2020
18 pages
UMBC CMSC 671 Final Exam: December 20, 2009
No ratings yet
UMBC CMSC 671 Final Exam: December 20, 2009
8 pages
Stats 2B03 Test #1 (Version 1) October 27th, 2010
No ratings yet
Stats 2B03 Test #1 (Version 1) October 27th, 2010
7 pages
Decision Tree Practice Problems
No ratings yet
Decision Tree Practice Problems
2 pages
MIT18 05S14 Class10 Slides
No ratings yet
MIT18 05S14 Class10 Slides
23 pages
QMM Exam Assist
67% (3)
QMM Exam Assist
21 pages
Managerial Statistics International What Is Statistics
No ratings yet
Managerial Statistics International What Is Statistics
11 pages
CS2B Nov 24 QP
No ratings yet
CS2B Nov 24 QP
5 pages
Probability and Statistics 2019-2020 (Se2)
No ratings yet
Probability and Statistics 2019-2020 (Se2)
2 pages
Business Statistics
No ratings yet
Business Statistics
3 pages
PDF SAT 3.0 Session 5 10-29-2020
No ratings yet
PDF SAT 3.0 Session 5 10-29-2020
61 pages
Basic Statistics level 1
No ratings yet
Basic Statistics level 1
7 pages
Answer Book
No ratings yet
Answer Book
10 pages
Statictics Paper 2 Gumbonzvanda.
No ratings yet
Statictics Paper 2 Gumbonzvanda.
7 pages
Probability Ultimate Question Set
No ratings yet
Probability Ultimate Question Set
5 pages
2320-TT1-2018-Summer
No ratings yet
2320-TT1-2018-Summer
2 pages
TD6
No ratings yet
TD6
3 pages
ASSIGNMENT
No ratings yet
ASSIGNMENT
3 pages
Assignment
No ratings yet
Assignment
5 pages
AST-01 July 2013 (E)
No ratings yet
AST-01 July 2013 (E)
6 pages
A Complete Guide to M.C.Q (Class-10, Mathematics): CBSE MCQ Series, #1
From Everand
A Complete Guide to M.C.Q (Class-10, Mathematics): CBSE MCQ Series, #1
Er. Sajal Kumar Ghosh
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Prem 20204146 DStheory Assg5
No ratings yet
Prem 20204146 DStheory Assg5
7 pages
CN Udp
No ratings yet
CN Udp
7 pages
TCP Vegas Analysis
No ratings yet
TCP Vegas Analysis
8 pages
AMAN A FoLT ASSIGNMENT 04
No ratings yet
AMAN A FoLT ASSIGNMENT 04
8 pages
Analog and Digital Electronics Lab: Experiment - 7
No ratings yet
Analog and Digital Electronics Lab: Experiment - 7
16 pages
Oops 8 Assignment Notes
No ratings yet
Oops 8 Assignment Notes
49 pages
Assignment 02 Mnnit
No ratings yet
Assignment 02 Mnnit
2 pages
A Probabilistic Theory of Deep Learning: Unit 2
No ratings yet
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
CyFlow™ Counter (22) - CY-S-3022SM08EN - Parts List
No ratings yet
CyFlow™ Counter (22) - CY-S-3022SM08EN - Parts List
36 pages
WS-100 To WS-235: Arun Series
No ratings yet
WS-100 To WS-235: Arun Series
3 pages
23ES1106_Programming in C_QB_2024_2025
No ratings yet
23ES1106_Programming in C_QB_2024_2025
6 pages
Haswanth P Ram Varada - CV
No ratings yet
Haswanth P Ram Varada - CV
3 pages
BASIC TECH J.S.S 3 2ND TERM
No ratings yet
BASIC TECH J.S.S 3 2ND TERM
4 pages
Advanced Business Data Structures
No ratings yet
Advanced Business Data Structures
137 pages
State of Edge 2025 - Zededa
No ratings yet
State of Edge 2025 - Zededa
25 pages
Colette Patterns 0006 Moneta Extras Patterns Wide
No ratings yet
Colette Patterns 0006 Moneta Extras Patterns Wide
1 page
Marketing Analytics Instructional Manual Version 1.0
No ratings yet
Marketing Analytics Instructional Manual Version 1.0
14 pages
SS1 Physics - Thermal Capacity
No ratings yet
SS1 Physics - Thermal Capacity
3 pages
Database Recovery:: The Need For Recovery
No ratings yet
Database Recovery:: The Need For Recovery
5 pages
Powerroc T50: Surface Drill Rig For Quarrying and Open Pit Mining
No ratings yet
Powerroc T50: Surface Drill Rig For Quarrying and Open Pit Mining
5 pages
Quote IP CCTV System Hikvision
No ratings yet
Quote IP CCTV System Hikvision
1 page
Power quality surveys of photovoltaic power plants characterisation and analysis of gridcode requirements
No ratings yet
Power quality surveys of photovoltaic power plants characterisation and analysis of gridcode requirements
8 pages
Air Enters The Compressor of A Gas Turbine at 1...
No ratings yet
Air Enters The Compressor of A Gas Turbine at 1...
4 pages
Hydraulic System PC 1250-11R
100% (1)
Hydraulic System PC 1250-11R
5 pages
CENG 2111 - Course Outline
No ratings yet
CENG 2111 - Course Outline
2 pages
Lumen Programming Guide Writing PHP Microservices REST and Web Service APIs 1st Edition Paul Redmond (Auth.) download
100% (1)
Lumen Programming Guide Writing PHP Microservices REST and Web Service APIs 1st Edition Paul Redmond (Auth.) download
59 pages
Top 10 Mobile Phones in India With Price, Best Mobiles in India 2020
No ratings yet
Top 10 Mobile Phones in India With Price, Best Mobiles in India 2020
12 pages
Design of Off Board Electric Vehicle Charger Using PV Array Through Matlab-Simulink
No ratings yet
Design of Off Board Electric Vehicle Charger Using PV Array Through Matlab-Simulink
10 pages
Standard Operating Procedure Manual
No ratings yet
Standard Operating Procedure Manual
218 pages
Uowd Program Selection Guide Dec 2024
No ratings yet
Uowd Program Selection Guide Dec 2024
24 pages
Model Law of e Commerce
No ratings yet
Model Law of e Commerce
14 pages
Perilaku Batang Tekan Profil Siku Tunggal Dengan Sambungan Baut Di Kedua Ujung Tumpuan
No ratings yet
Perilaku Batang Tekan Profil Siku Tunggal Dengan Sambungan Baut Di Kedua Ujung Tumpuan
12 pages
Data Structures (Sorting)
No ratings yet
Data Structures (Sorting)
28 pages
Grand Vitara 08 PDF
No ratings yet
Grand Vitara 08 PDF
40 pages
Brian Lee Hill Offender Tracking Information System (OTIS) and Michigan Public Sex Offender Registry (PSOR)
No ratings yet
Brian Lee Hill Offender Tracking Information System (OTIS) and Michigan Public Sex Offender Registry (PSOR)
1 page
YAMAHA DXR15 Spec
No ratings yet
YAMAHA DXR15 Spec
7 pages

Assignment 3

Uploaded by

Assignment 3

Uploaded by

Assignment 3

I. FOIL’s information gain.

5. Consider the data set shown in Table 1

You might also like