0% found this document useful (0 votes)
152 views

Assignment 3

Given a large dataset that cannot fit into memory, an efficient decision tree construction method is to load chunks of the data into memory in batches and build subtrees on each batch, combining the subtrees later. For a decision tree with rules, converting to rules then pruning has an advantage over pruning then converting, as it allows pruning rules individually rather than whole subbranches. The best candidate rule for several measures used to evaluate rules based on a training set with 100 positive and 400 negative examples is R2, while the worst is R3.

Uploaded by

Ashutosh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views

Assignment 3

Given a large dataset that cannot fit into memory, an efficient decision tree construction method is to load chunks of the data into memory in batches and build subtrees on each batch, combining the subtrees later. For a decision tree with rules, converting to rules then pruning has an advantage over pruning then converting, as it allows pruning rules individually rather than whole subbranches. The best candidate rule for several measures used to evaluate rules based on a training set with 100 positive and 400 negative examples is R2, while the worst is R3.

Uploaded by

Ashutosh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment 3

1. Given a 5 GB data set with 50 attributes (each containing 100 distinct values) and
512 MB of main memory in your laptop, outline an efficient method that constructs
decision trees in such large data sets. Justify your answer by rough calculation of
your main memory usage.

2. Given a decision tree, you have the option of (a) converting the decision tree to rules
and then pruning the resulting rules, or (b) pruning the decision tree and then
converting the pruned tree to rules. What advantage does (a) have over (b)?

3. Consider a training set that contains 100 positive examples and 400 negative
examples. For each of the following candidate rules,
R1: A -→ + (covers 4 positive and 1 negative examples),
R2: B -→ + (covers 30 positive and 10 negative examples),
R3: C -→ + (covers 100 positive and 90 negative examples),
determine which is the best and worst candidate rule according to:

I. FOIL’s information gain.


II. The likelihood ratio statistic
III. The Laplace measure.
IV. The m-estimate measure (with k = 2 and p+ = 0.2).
V. Rule accuracy.

4. (a) Suppose the fraction of undergraduate students who smoke is 15% and the
fraction of graduate students who smoke is 23%. If one-fifth of the college students
are graduate students and the rest are undergraduates, what is the probability that a
student who smokes is a graduate student?
(b) Given the information in part (a), is a randomly chosen college student more
likely to be a graduate or undergraduate student?
(c) Repeat part (b) assuming that the student is a smoker.
(d) Suppose 30% of the graduate students live in a dorm but only 10% of the
undergraduate students live in a dorm. If a student smokes and lives in the dorm, is
he or she more likely to be a graduate or undergraduate student? You can assume
independence between students who live in a dorm and those who smoke.

5. Consider the data set shown in Table 1


(a) Estimate the conditional probabilities for P(A|+), P(B|+), P(C|+), P(A|-), P(B|-),
and P(C|-).
(b) Use the estimate of conditional probabilities given in the previous question to
predict the class label for a test sample (A = 0, B = 1, C = 0) using the naive Bayes
approach.
(c) Estimate the conditional probabilities using the m-estimate approach, with p =
1/2 and m = 4.
(d) Repeat part (b) using the conditional probabilities given in part (c).

Table 1

You might also like