0% found this document useful (0 votes)
37 views8 pages

Data Mining f20 Practice Final Solutions

Uploaded by

Kashif Kashif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views8 pages

Data Mining f20 Practice Final Solutions

Uploaded by

Kashif Kashif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Page 1 of 8

Data Mining Practice Final Exam Solutions

True/False Questions:

1. T F Our use of association analysis will yield the same frequent itemsets and strong
association rules whether a specific item occurs once or three times in an individual
transaction.

2. T F The k-means clustering algorithm that we studied will automatically find the best
value of k as part of its normal operation.

3. T F A density-based clustering algorithm can generate non-globular clusters.

4. T F In association rule mining the generation of the frequent itermsets is the


computational intensive step.

Multiple Choice Questions

5. In the figure below, there are two clusters. They are connected by a line which represents the
distance used to determine inter-cluster similarity.

Which inter-cluster similarity metric does this line represent (circle one)?

a. MIN
b. MAX
c. Group Average
d. Distance between centroids
Page 2 of 8

Short Form Questions

1. Sometimes a data set is partitioned such that a validation set is provided. What is the purpose of the
validation set?

The validation set is used to select amongst multiple models or to tune a specific model (which can
be viewed as a family of models).

Here is more explanation. In this sense is it used like a test set in that it is used for evaluation, but it
is not like a test set in that it cannot be used for reporting the performance of the model. That is, the
validation set chooses a model—for example the right amount or pruning—but then that model can
be evaluated on a test set and that performance number can be reported. If one dataset could be used
for finding the best model and reporting the performance, you could generate 1million models and
then pick the best on the data set and report that performance. But it is likely that model really does
not have the best performance but just happened to do best on that one data set. .

2. Are decision trees easy to interpret (circle one): Yes No

3. How can you convert a decision tree into a rule set? Explain the process.

Create one rule per leaf node by traversing the conditions from the root node to the leaf and
conjoining those conditions. Note that the rules would be mutually exclusive, meaning that only one
rule could “fire” at a time.

4. List two reasons why data mining is popular now and it wasn’t as popular 20 years ago.

Faster computers, cheaper memory, more data being routinely recorded (e.g., popularity of the Web
and devices like smartphones), and to a lesser degree better algorithms.

5. How does an ordinal feature differ from a nominal feature? Explain in one or two sentences.

An ordinal feature is a nominal feature where there is a natural ordering of each attribute value.

6. Sally measures the pressure of all of tires coming into her garage for an oil change and records the
values. Unknown to her, her tire gauge is miscalibrated and adds 3 psi to each reading. According to
the definition of noise used by our textbook, is this error introduced by the tire gauge considered
noise? Answer “yes” or “no” and justify your answer.

No, since noise must be random, not systematic.


Page 3 of 8

7. For a two-class classification problem, with a Positive class P and a negative class N, we can
describe the performance of the algorithm using the following terms: TP, FP, TN, and FN.

a) What do each of these terms refer to?

TP: True Positive

TN: True Negative

FP: False Positive

FN: False Negative

b) Place the 4 terms listed above in part a into the appropriate slots in the table below.

Predicted

Positive Negative

Positive TP FN
Actual
Negative FP TN

c) Provide the formula for accuracy in terms of TP, TN, FP, and FN.

(TP + TN) / (TP + TN + FP + FN)

d) Provide the formula for precision and recall using TP, TN, FP, and FN.

Precision = TP/(TP + FP)

Recall = TP/(TP + FN)

8. If we build a classifier and evaluate it on the training set and the test set:

a) Which data set would we expect to have the higher accuracy: training set test set

b) Which data set provides best accuracy estimate on new data: training set test set
Page 4 of 8

9. A learning curve shows the performance of a classifier as the training set size increases. Assume that
training set size is plotted on the x-axis and accuracy is plotted on the y axis.

a) On the figure below, plot a typical/expected learning curve when the accuracy is measured on
the 1) training set data and 2) the test set data (i.e., draw two curves). Should there be any
difference? If so, comment on the expected difference.
Accuracy

Training set size

The most important thing is that for the test set data it increases, initially steeply but then begins to
plateau. Do not worry too much about the training set curve, since the answer is not really clear.

10. You need to split on attribute a1 in your decision tree. The attribute has 8 values. Why might a two
way split be better than an 8-way split? What might be a problem with the 8-way split?

The 8-way split can lead to the problem of data fragmentation. The data will be split up excessively
leaving smaller amounts of data available for future splits.

Entropy (t ) = − p( j | t ) log p( j | t ) GINI (t ) = 1 − [ p( j | t )]2


j j

11. Given a training set with 5+ and 10- examples,

a) What is the entropy value associated with this data set? You need not simplify your answer to
get a numerical answer.

-(1/3)log(1/3) - (2/3)log(2/3)

b) What is the Gini associated with this data set? In this case you should simplify your result,
although you may express the answer the answer as a fraction rather than a decimal.

1 – [(1/3)2 + (2/3) 2] = 1 – 1/9 – 4/9 = 1 – 5/9 = 4/9


Page 5 of 8

c) If you generated a decision tree with just the root node for the examples in this data set, what
class value would you assign and what would be the training-set error rate associated with this
(very short) decision tree?

Majority class is negative class, so classify it as negative. Training error rate is then 5/15 1/3.

12. The nearest neighbor algorithms relies on having a good notion of similarity, or distance. In class we
discussed several factors that can make it non-trivial to have a good similarity metric. What were two
of the factors?

A good similarity metric requires that the scales of the features are similar. For example, if one
feature varies from 1 to 100 and another from 1, to 1,000,000, then there is a problem and the values
should be rescaled. Another problem is that some features may be much less important than others,
and yet by default all features are considered equally important. Also, redundant or highly
correlated features will through off the distance metric, because the related features will be
overvalued.

13. What classifier induction algorithm can effectively generate the most expressive classifiers, in terms
of the decision boundaries that can be formed? Which is the least expressive. Rank order them from
most to least expressive. Briefly justify your ordering.

The induction algorithms are: decision trees, linear classifiers, and nearest neighbor.

Most expressive: nearest neighbor

Middle: decision trees

Least expressive: linear classifier

14. What is the curse of dimensionality?

The curse of dimensionality is that when the number of features increases, the concentration of the data
points within the instance space decreases, which makes it harder to find patterns. For example, if you
have 100 data points and one variable, it is likely the space is dense, but if we have 100 features, the
space will be quite sparse.

15. What does it mean if the rule set for a rule learner is exhaustive?

It means that the rules will collectively cover every possible example.

16. Does the Ripper rule learner build rules from general to specific or specific to general?

It builds rules from general to specific. It starts with a rule where the antecedent has no conditions and
then adds conditions one at a time.
Page 6 of 8

17. (4 points) We generally will be more interested in association rules with high confidence. However,
often we will not be interested in association rules that have a confidence of 100%. Why? Then
specifically explain why association rules with 99% confidence may be interesting (i.e., what might
they indicate)?

While we generally prefer association rules with high confidence, a rule with 100% confidence most
likely represents some already know fact or policy (e.g., checking account → savings account may
just indicate that all customers are required to have a checking account if they have a savings
account). Rules with 99% confidence are interesting not because of the 99% part but because of the
1% part. These are the exceptions to the rule. They may indicate, for example, that a policy is being
violated. They might also indicate that there is a data entry error. Either way, it would be interesting
to understand why the 1% do not follow the general pattern.\
18. (4 points) The algorithm that we used to do association rule mining is the Apriori algorithm This
algorithm is efficient because it relies on and exploits the Apriori property. What is the Apriori
property?
The Apriori property state that if an itemset is frequent then all of its subsets must also be frequent.

19. (4 points) Discuss the basic difference between the agglomerative and divisive hierarchical
clustering algorithms and mention which type of hierarchical clustering algorithm is more commonly
used.

Agglomerative methods start with each object as and individual cluster and then incrementally
builds larger clusters by merging clusters. Divisive methods, on the other hand, start with all points
belonging to one cluster and then splits apart a cluster each iteration. The agglomerative method is
more common.

20. (4 points) Are the two clusters shown below well separated? Circle an answer: Yes No
Now in one or two sentences justify your answer.

It is not well separated because some points in each cluster are closer to points in another cluster
than to points in the same cluster.

+ +
+
+ +
+
+
Page 7 of 8

Long Problem (33 points)

1. A database has 4 transactions, shown below.

TID Date items_bought


T100 10/15/04 {K, A, D, B}
T200 10/15/04 {D, A, C, E, B}
T300 10/19/04 {C, A, B, E}
T400 10/22/04 {B, A, D}

Assuming a minimum level of support min_sup = 60% and a minimum level of confidence
min_conf = 80%:
(a) Find all frequent itemsets (not just the ones with the maximum width/length) using the
Apriori algorithm. Show your work—just showing the final answer is not acceptable. For
each iteration show the candidate and acceptable frequent itemsets. You should show your
work similar to the way the example was done in the PowerPoint slides.

Answer:

C1/L1 C2/L2

Itemset Support Count Itemset Support Count

{A} 4 {A, B} 4

{B} 4 {A, D} 3

{C} 2 {B, D} 3

{D} 3
C3/L3
{E} 2
Itemset Support Count
{K} 1
{A, B, D} 3

The final answer is: {{A}, {B}, {D}, {A, B}, {B, D}, {A, B, D}}
(include {A,D} above)
Page 8 of 8

(b) List all of the strong association rules, along with their support and confidence values, which
match the following metarule, where X is a variable representing customers and itemi denotes
variables representing items (e.g., “A”, “B”, etc.).
x  transaction, buys(X, item1)  buys(X, item2)  buys(X, item3)
Hint: don’t worry about the fact that the statement above uses relations. The point of the
metarule is to tell you to only worry about association rules of the form X  Y  Z (or {X,
Y}  Z if you prefer that notation). That is, you don’t need to worry about rules of the form
X  Z.
Grading: This part is worth 4 points. Each of the strong association rules is worth 2
points.

Answer:
buys(X, A)  buys(X, B) → buys(X, D) (75%, 75%) Not Strong
buys(X, A)  buys(X, D) → buys(X, B) (75%, 100%) Strong
buys(X, B)  buys(X, D) → buys(X, A) (75%, 100%) Strong

You might also like