0% found this document useful (0 votes)

17 views4 pages

ML Exercises 4 5 6 en

The document contains questions and exercises related to machine learning topics including Bayes classifier, decision trees, and clustering algorithms. Various questions assess concepts like classifying patterns using Naive Bayes and k-nearest neighbors, determining entropy and information gain in decision trees, applying k-means clustering with different numbers of clusters, and comparing hierarchical and partitional clustering methods.

Uploaded by

Thắng Phùng Đình

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

ML Exercises 4 5 6 en

Uploaded by

Thắng Phùng Đình

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 4

Questions and Exercises

Course: Machine Learning

Chater 4
(Bayes Classifier)

4.1 A study of a university found out that 15% of undergraduate students who smoke and 23% of
graduate students who smoke. If 1/5 students in the university are graduate students and the rest
are undergraduate students, what is the probability that a graduate student who smokes?

4.2 (True/false) If P(A|B) = P(A) then P(A,B) = P(A).P(B).

4.3 State the difference between k-nearest neighbor algorithm and Naïve Bayes in classification.

4.4 State the assumption on the characteristic of the dataset which allows us to apply Naïve
Bayes classifier.

4.5 Consider the following data set:

Feature 1 Feature 2 Feature 3 Class

0 0 0 0
1 0 1 1
1 0 0 0
1 1 1 1
0 1 1 1
0 1 1 0

If we have a test pattern P with feature 1 as 0 and feature 2 as 0 and feature 3 as 1, classify this
pattern using Naïve Bayes classifier.

4.6. Given a dataset as in Exercise 4.4. Since the attributes are not continuous, we apply the
following method to calculate the distance between two patterns with categorical attributes.
Given two patterns, each consists of m categorical attributes. The distance between X and Y is
total number of differences between the corresponding attribute values of the two patterns. The
total number of differences is smaller, the two patterns more similar. That means:

where

By using this distance measure, apply 1-nearest neighbor algorithm to classify the test pattern P
= (0, 0, 1), based on the dataset given in Exercise 4.4.
Compare the results of the two classification method: 1-nearest neighbor algorithm and Naïve
Bayes (Exercise 4.4).
Chapter 5
(Decision Trees)

5.1. Determine the entropy impurity for the following distributions.

a) The dataset has 1/2 of patterns belonging to the first class, 1/4 of patterns belonging to the
second class, 1/8 of patterns belonging to the third class, 1/16 of patterns belonging to the fourth
class, and 1/16 of patterns belonging to the fifth class.
b) The dataset consists of five classes and each class has 1/5 of patterns.

5.2. Consider the following data set for a binary classification problem. Each pattern has two
binary attributes and one class label (+ or -).

A B Class label
T F +
T T +
T T +
T F -
T T +
F F -
F F -
F F -
T T -
T F -

Let use the information gain when determining the splitting attribute. Which of the features is
selected as splitting attribute at the root node in the decision tree for the data set.

5.3. Consider the following Weather dataset for a binary classification problem. Each pattern has
four discrete attributes and one class label (Yes or No).

Outlook Temperature Humidity Windy Play Tenis

Sunny Hot High False No
Sunny Hot High False No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Warm High False No
Sunny Cool Normal False Yes
Rainy Warm Normal False Yes
Sunny Warm Normal True Yes
Overcast Warm High True Yes
Overcast Hot Normal False Yes
Rainy Warm High True No

Let use the information gain when determining the splitting attribute. Which of the features is
selected as splitting attribute at the root node in the decision tree for the data set.

5.4 (True/false) The depth of a learned decision tree can be larger than the number of training
examples used to create the tree.

Chương 6
(Clustering)

6.1 State the difference between supervised learning (classification) and unsupervised learning
(clustering).

6.2 Consider the following 10 patterns:

X1 = (1, 1), X2 = (6, 1), X3 = (2, 1), X4 = (6, 7), X5 = (1, 2), X6 = (7, 1), X7 = (7, 7), X8 = (2, 2), X1
= (6, 2), X10 = (7, 6)
Obtain the distance matrix using the Euclidean distance as the distance between two patterns.

6.3 If there is a set of n patterns and it is required to cluster these patterns to form two clusters,
how many such partitions will there be?

6.4. Given the cluster of 5 patterns:

X1 = (1, 1), X2 = (1, 2), X3 = (2, 1), X4 = (1.6, 1.4), X5 = (2, 2)
Show that the medoid of the cluster is (1.6, 1.4).

6.5 In agglomerative hierarchical clustering, among the current clusters, how to select the most
suitable pair of clusters to be merged?

6.6 In divisive hierarchical clustering,

a. How to find the best way to split a cluster into two clusters.
b. How to select among the current clusters the most suitable cluster to be split.

6.7 State the computational complexity of k-means algorithm.

6.8 State the strong points and weak points of k-means algorithm.

6.9. State the computational complexity of agglomerative hierarchical clustering and divisive
hierarchical clustering.

6.10 Consider the two dimensional data set given below:

(1, 1), (1, 2), (2, 1), (2, 1.5), (3, 2), (4, 1.5), (4, 2), (5, 1.5), (4.5, 2), (4, 4), (4.5, 4), (4.5, 5), (4, 5),
(5, 5)
Use the k-means algorithm to cluster these patterns with k = 3.
6.11 Given a set of 2-dimensional patterns: X1 = (1, 3), X2 = (1.5, 3.2), X3 = (1.3, 2.8), X4
=(3,1). Let apply k-means with k=2 to cluster this dataset. Assume that at a certain iteration, the
dataset is grouped into 2 clusters as follows : the first cluster consists of X1, and the second
cluster consists of X2, X3, X4. Let perform the next iteration which consists of two steps :
recalculating the centroids and assigning the patterns to the clussters.
Note: Euclidean distance is used in the k-means.

6.12 Consider the two dimensional data set given below:

(1, 1), (1, 2), (2, 1), (2, 1.5), (3, 2), (4, 1.5), (4, 2), (5, 1.5), (4.5, 2), (4, 4), (4.5, 4), (4.5, 5), (4, 5),
(5, 5)
Use agglomerative hierarchical clustering with single-link and agglomerative hierarchical
clustering with complete-link to cluster the dataset into 4 clusters.

6.13 State the similarity between two clustering algorithms: k-means and fuzzy-c-means.

6.14 Given a set of 2-dimensional patterns: X1 = (1, 6), X2 = (2,5), X3 = (3, 8), X4 =(4,4). X5 =
(5, 7), X6 =(6,9). Let apply fuzzy-c-means with k = 2 to cluster this dataset. Assume that at a
certain iteration, the dataset is grouped into 2 clusters with the membership weights as follows.
X1 X2 X3 X4 X5 X6
Cluster c1 0.8 0.9 0.7 0.3 0.5 0.2
Cluster c2 0.2 0.1 0.3 0.7 0.5 0.8

Let perform the next iteration which consists of two steps : recalculating the centroids and
assigning the membership weights for each pattern.
Note: Euclidean distance is used in the fuzzy-c-means.

6.15. (True/False) K-Means can generate clusters with arbitrary shapes.

6.16. (True/False) DBSCAN can generate clusters with arbitrary shapes.
6.17. (True/False) K-Means can generate clusters only with spherical shapes.

6.18 Give an example in which clustering can be used as a preprocessing step for an another data
classification task.

6.19 Explain the term incremental clustering. State the weak point of the Leader algorithm for
incremental clustering.

6.20 Given a set of 2-dimensional patterns :

A = (1, 1), B = (1, 2), C = (2, 2), D = (6, 2), E = (7, 2), F = (6, 6), G = (7, 6)
Let apply Leader algorithm to cluster the dataset. Assume that the data will be processed in the
order A, B, C, D, E, F and G, and the user specified threshold T be 3.

6.21 How to evaluate clustering quality based on objective function.

IBM Developer Brand System Guidelines R3.0 2019
No ratings yet
IBM Developer Brand System Guidelines R3.0 2019
95 pages
10.1007@978 3 030 14070 0
100% (1)
10.1007@978 3 030 14070 0
611 pages
Artificial Intelligence (AI)
100% (5)
Artificial Intelligence (AI)
23 pages
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
No ratings yet
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
14 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Springer Ebook List 12245 Feb 2016
No ratings yet
Springer Ebook List 12245 Feb 2016
2,730 pages
Python For Algorithmic Trading & The AI Machine: DR Yves J Hilpisch
0% (1)
Python For Algorithmic Trading & The AI Machine: DR Yves J Hilpisch
36 pages
POA - Tracker
No ratings yet
POA - Tracker
60 pages
Basics of Soft Computing08 - Chapter1 PDF
No ratings yet
Basics of Soft Computing08 - Chapter1 PDF
16 pages
Supervisory Policy Manual: IC-1 Risk Management Framework
No ratings yet
Supervisory Policy Manual: IC-1 Risk Management Framework
42 pages
BISC 2020 - Event Brief and Guidelines - Final
No ratings yet
BISC 2020 - Event Brief and Guidelines - Final
49 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
CZ4032 Data Analytics & Mining Notes
No ratings yet
CZ4032 Data Analytics & Mining Notes
16 pages
878 2234 1 PB
No ratings yet
878 2234 1 PB
12 pages
Zajko (2021)
No ratings yet
Zajko (2021)
21 pages
Opportunities and Challenges in Using AI Chatbots in Higher Education
No ratings yet
Opportunities and Challenges in Using AI Chatbots in Higher Education
5 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
WS - Data Analytics Fundamental-R
No ratings yet
WS - Data Analytics Fundamental-R
51 pages
Assignment ML
No ratings yet
Assignment ML
3 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Exercises - Dss - Partd - Handout
No ratings yet
Exercises - Dss - Partd - Handout
12 pages
Problem 1: Cse352 AI Homework 3 Solutions
No ratings yet
Problem 1: Cse352 AI Homework 3 Solutions
31 pages
Seminar 3
No ratings yet
Seminar 3
43 pages
Smart HR 4.0 - How Industry 4.0 Is Disrupting HR: Brijesh Sivathanu Rajasshrie Pillai
No ratings yet
Smart HR 4.0 - How Industry 4.0 Is Disrupting HR: Brijesh Sivathanu Rajasshrie Pillai
6 pages
Inista 2021
No ratings yet
Inista 2021
7 pages
What About Problem Solving and Heuristic Search
No ratings yet
What About Problem Solving and Heuristic Search
2 pages
DOBOT Product Catalog Educational Version
No ratings yet
DOBOT Product Catalog Educational Version
34 pages
CS-3035 (ML) - CS End April 2024
No ratings yet
CS-3035 (ML) - CS End April 2024
21 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
No ratings yet
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
11 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
Dahua Technology Full Color Network CCTV Datasheet
No ratings yet
Dahua Technology Full Color Network CCTV Datasheet
6 pages
Midterm F07 Solutions
No ratings yet
Midterm F07 Solutions
4 pages
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
No ratings yet
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
25 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
AI Unit 5
No ratings yet
AI Unit 5
103 pages
Unit 6 Ai
No ratings yet
Unit 6 Ai
28 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Chapter
100% (1)
Chapter
101 pages
Aiml K2
No ratings yet
Aiml K2
8 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Unit 2
No ratings yet
Unit 2
57 pages
Analytical Methods of Machine Learning Model For E-Commerce Sales Analysis and Prediction
No ratings yet
Analytical Methods of Machine Learning Model For E-Commerce Sales Analysis and Prediction
6 pages
DM 2023
No ratings yet
DM 2023
8 pages
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
100% (1)
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
5 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
HW1
No ratings yet
HW1
4 pages
DMKD Guide
No ratings yet
DMKD Guide
3 pages
Nptel ML Questions
No ratings yet
Nptel ML Questions
12 pages
PCCCS504 Module 4
No ratings yet
PCCCS504 Module 4
4 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
Important Questions dwdm-2
No ratings yet
Important Questions dwdm-2
7 pages
Python DM Lab Manual Part 2
No ratings yet
Python DM Lab Manual Part 2
8 pages
Digital Sales 2025 Trend Report Columbia Road
No ratings yet
Digital Sales 2025 Trend Report Columbia Road
101 pages
Investigating Algorithm Review Boards For Organiza
No ratings yet
Investigating Algorithm Review Boards For Organiza
12 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Revision (Ques - Only)
No ratings yet
Revision (Ques - Only)
2 pages
Himansh PR
No ratings yet
Himansh PR
12 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
Unit 5
No ratings yet
Unit 5
8 pages
ML Unit-5
No ratings yet
ML Unit-5
31 pages
MT2023 Sol
No ratings yet
MT2023 Sol
8 pages
DWDM Imp Questions
No ratings yet
DWDM Imp Questions
2 pages
Collaborative Work Management (CWM)
No ratings yet
Collaborative Work Management (CWM)
8 pages
ML Imp Ques 2
No ratings yet
ML Imp Ques 2
37 pages
Data Mining Unit-2
No ratings yet
Data Mining Unit-2
37 pages
Pathway 2035 For Financial Innovation Your Navigator.v1.0.1
No ratings yet
Pathway 2035 For Financial Innovation Your Navigator.v1.0.1
44 pages
Natural Language Processing in Action Understanding Analyzing and Generating Text With Python 1st Edition by Hannes Hapke, Cole Howard, Hobson Lane 1617294632 9781617294631 Download
100% (1)
Natural Language Processing in Action Understanding Analyzing and Generating Text With Python 1st Edition by Hannes Hapke, Cole Howard, Hobson Lane 1617294632 9781617294631 Download
47 pages
Comp 414 Revision
No ratings yet
Comp 414 Revision
9 pages
Exp. 7
No ratings yet
Exp. 7
3 pages
ML Cheatsheet 2024-2025
No ratings yet
ML Cheatsheet 2024-2025
2 pages
Assignment Activity
No ratings yet
Assignment Activity
4 pages
Model Question Paper - AIML
No ratings yet
Model Question Paper - AIML
4 pages
Digital Finance Suggestion
No ratings yet
Digital Finance Suggestion
15 pages
The Choice of Scaling Technique Matters For Classidication Performance
No ratings yet
The Choice of Scaling Technique Matters For Classidication Performance
37 pages
10 EST Solution
No ratings yet
10 EST Solution
16 pages
Chapter 6
No ratings yet
Chapter 6
54 pages
Clustering
No ratings yet
Clustering
35 pages
Unit 1 - Session 1
No ratings yet
Unit 1 - Session 1
20 pages
WK 07
No ratings yet
WK 07
8 pages
Unit 5 Bi
No ratings yet
Unit 5 Bi
3 pages
ML Mod 5
No ratings yet
ML Mod 5
47 pages
DM 23
No ratings yet
DM 23
8 pages
ML Module5
No ratings yet
ML Module5
37 pages
2019BurkovTheHundred pageMachineLearnin2
No ratings yet
2019BurkovTheHundred pageMachineLearnin2
33 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
Ia1 ML Scheme Common To Is, Ai, Cs
No ratings yet
Ia1 ML Scheme Common To Is, Ai, Cs
10 pages
Practice Makes Perfect in Mathematics: Algebra (Volume 2 of 2)
From Everand
Practice Makes Perfect in Mathematics: Algebra (Volume 2 of 2)
John Parnell
No ratings yet