ML Exercises 4 5 6 en
ML Exercises 4 5 6 en
Chater 4
(Bayes Classifier)
4.1 A study of a university found out that 15% of undergraduate students who smoke and 23% of
graduate students who smoke. If 1/5 students in the university are graduate students and the rest
are undergraduate students, what is the probability that a graduate student who smokes?
4.3 State the difference between k-nearest neighbor algorithm and Naïve Bayes in classification.
4.4 State the assumption on the characteristic of the dataset which allows us to apply Naïve
Bayes classifier.
If we have a test pattern P with feature 1 as 0 and feature 2 as 0 and feature 3 as 1, classify this
pattern using Naïve Bayes classifier.
4.6. Given a dataset as in Exercise 4.4. Since the attributes are not continuous, we apply the
following method to calculate the distance between two patterns with categorical attributes.
Given two patterns, each consists of m categorical attributes. The distance between X and Y is
total number of differences between the corresponding attribute values of the two patterns. The
total number of differences is smaller, the two patterns more similar. That means:
where
By using this distance measure, apply 1-nearest neighbor algorithm to classify the test pattern P
= (0, 0, 1), based on the dataset given in Exercise 4.4.
Compare the results of the two classification method: 1-nearest neighbor algorithm and Naïve
Bayes (Exercise 4.4).
Chapter 5
(Decision Trees)
5.2. Consider the following data set for a binary classification problem. Each pattern has two
binary attributes and one class label (+ or -).
A B Class label
T F +
T T +
T T +
T F -
T T +
F F -
F F -
F F -
T T -
T F -
Let use the information gain when determining the splitting attribute. Which of the features is
selected as splitting attribute at the root node in the decision tree for the data set.
5.3. Consider the following Weather dataset for a binary classification problem. Each pattern has
four discrete attributes and one class label (Yes or No).
Let use the information gain when determining the splitting attribute. Which of the features is
selected as splitting attribute at the root node in the decision tree for the data set.
5.4 (True/false) The depth of a learned decision tree can be larger than the number of training
examples used to create the tree.
Chương 6
(Clustering)
6.1 State the difference between supervised learning (classification) and unsupervised learning
(clustering).
6.3 If there is a set of n patterns and it is required to cluster these patterns to form two clusters,
how many such partitions will there be?
6.5 In agglomerative hierarchical clustering, among the current clusters, how to select the most
suitable pair of clusters to be merged?
6.8 State the strong points and weak points of k-means algorithm.
6.9. State the computational complexity of agglomerative hierarchical clustering and divisive
hierarchical clustering.
6.13 State the similarity between two clustering algorithms: k-means and fuzzy-c-means.
6.14 Given a set of 2-dimensional patterns: X1 = (1, 6), X2 = (2,5), X3 = (3, 8), X4 =(4,4). X5 =
(5, 7), X6 =(6,9). Let apply fuzzy-c-means with k = 2 to cluster this dataset. Assume that at a
certain iteration, the dataset is grouped into 2 clusters with the membership weights as follows.
X1 X2 X3 X4 X5 X6
Cluster c1 0.8 0.9 0.7 0.3 0.5 0.2
Cluster c2 0.2 0.1 0.3 0.7 0.5 0.8
Let perform the next iteration which consists of two steps : recalculating the centroids and
assigning the membership weights for each pattern.
Note: Euclidean distance is used in the fuzzy-c-means.
6.18 Give an example in which clustering can be used as a preprocessing step for an another data
classification task.
6.19 Explain the term incremental clustering. State the weak point of the Leader algorithm for
incremental clustering.
A = (1, 1), B = (1, 2), C = (2, 2), D = (6, 2), E = (7, 2), F = (6, 6), G = (7, 6)
Let apply Leader algorithm to cluster the dataset. Assume that the data will be processed in the
order A, B, C, D, E, F and G, and the user specified threshold T be 3.