Data Mining Exercises - Solutions
Data Mining Exercises - Solutions
Python is easy to use, object oriented, easy to read, expressive, open source, portable programming
language which has a lot of libraries for data mining algorithms
<class ‘tuple’>
[ 2.0, 100, 5]
3. (10 pts) Please make the necessary change in the given code so that it doesn’t give the following
error message and works as commented:
4. What is the output of the following code? (Hint if the loop test is False then the execution jumps to
the else: row.)
4 320
5. What is the output of the following code?
True
False
None
6. We are writing a sublist function which compares two lists and returns true if the first list (lst1) is
a sublist of (is contained inside) the second list (lst2). We created a version of the second list as ls2
where we eliminated all elements of lst2 which are not in the in the first list to e if the final lists are
the same. However even though the final lists contain the same elements,
What property of lists can we use in the comparison ( ?==? ) so that function gives correct result:
(True) in the given example above.
Line 4 must be → return sorted(lst1) == sorted(ls2)
Note: Another sublist function given in the Apriori algorithm code runs faster.
7. What is the output of the following code?
25
81
75
In predictive modeling a labeled dataset is split into two parts as training and
test datasets. A model is built using the training data set and test dataset is
fed into the model to predict their labels. Actual labels of test dataset and
predicted labels are compared to evaluate the performance of the model.
Classification techniques are most suited for predicting or describing data sets
with binary or nominal categories. Decision Trees and Rule Based Classifiers
are examples.
10. At one stage in K-Means Clustering of the given data set with two
attributes, distances of the points to each centroid are given in the following
table: What will be the centroid coordinates in the next stage?
C1=[a.m.(X0,X1,X9), a.m.(Y0,Y1,Y9)] =
[(12+28+55)/3,(39+30+14)/3]
Similarly
C2=[a.m.(X8,X10,X11), a.m.(Y8,Y10,Y11) ]=
[(53+64+69)/3,(23+19+7)/3]
and
[(29+24+45+52+52+55)/6,(54+55+63+70+63+58)/6]
Answer:
11. How many different splits can be made on the dataset given below.
Note: Use the “Entropy” measure for information gain given by the following
formula:
Id a1 a2 a3 Class
1 T T 1.0 +
2 T T 6.0 +
3 T F 5.0 -
4 F F 4.0 +
5 F T 7.0 -
6 F T 3.0 -
7 F F 8.0 -
8 T F 7.0 +
9 F T 5.0 -