COMP1942 Question Paper
COMP1942 Question Paper
Instructions:
(1) Please answer all questions in Part A and Part B in the answer sheet.
(2) You can optionally answer the bonus question in Part C in the answer sheet. You can obtain additional
marks for the bonus question if you answer it correctly.
(3) You can use a calculator.
Question Paper
1/7
COMP1942 Question Paper
Q2 (20 Marks)
(a) Consider Algorithm forgetful sequential k-means clustering. Let a be a constant defined in this
algorithm.
(i) Please write down the steps for Algorithm forgetful sequential k-means clustering.
(ii) Consider a cluster found in the algorithm containing n examples where its initial mean is equal to
m0. Let xj be the first j-th example in this cluster and mj be the mean vector of this cluster after the
first j-th examples are added for j = 1, 2, …, n. We can express mn in the following form.
n
mn X m0 Y x p
p 1
2/7
COMP1942 Question Paper
Q3 (20 Marks)
Q4 (20 Marks)
The following shows a history of customers with their incomes, ages and an attribute called “Have_iPhone”
indicating whether they have an iPhone. We also indicate whether they will buy an iPad or not in the last
column. You cannot use XLMiner in this question.
No. Income Age Have_iPhone Buy_iPad
1 high young yes yes
2 high old yes yes
3 medium young no yes
4 high old no yes
5 medium young no no
6 medium young no no
7 medium old no no
8 medium old no no
We want to train a CART decision tree classifier to predict whether a new customer will buy an iPad or not.
We define the value of attribute Buy_iPad to be the label of a record.
(a) Please find a CART decision tree according to the above example. In the decision tree, whenever
a node contains at most 3 records, we do not continue to process this node for splitting.
(b) Consider a new young customer whose income is medium and he has an iPhone. Please predict
whether this new customer will buy an iPad or not.
3/7
COMP1942 Question Paper
Q5. [Removed]
A. [Removed]
B. [Removed]
C. [Removed]
D. [Removed]
E. [Removed]
Q6. [Removed]
A. [Removed]
B. [Removed]
C. [Removed]
D. [Removed]
E. [Removed]
4/7
COMP1942 Question Paper
Q7. [Removed]
A. [Removed]
B. [Removed]
C. [Removed]
D. [Removed]
E. [Removed]
5/7
COMP1942 Question Paper
Q8. [Removed]
A. [Removed]
B. [Removed]
C. [Removed]
D. [Removed]
E. [Removed]
6/7
COMP1942 Question Paper
We are given four items, namely A, B, C and D. Their corresponding unit profits are pA, pB, pC and pD.
The following shows five transactions with these items. Each row corresponds to a transaction where a non-
negative integer shown in the row corresponds to the total number of occurrences of the correspondence
item present in the transaction.
A B C D
0 0 3 2
3 4 0 0
0 0 1 3
1 0 3 5
6 0 0 0
The frequency of an itemset in a row is defined to be the minimum of the number of occurrences of all items
in the itemset. For example, itemset {C, D} in the first row has frequency = 2. But, itemset {C, D} in the
third row has frequency = 1.
The frequency of an itemset in the dataset is defined to be the sum of the frequencies of the itemset in all
rows in the dataset. For example, itemset {C, D} has frequency = 2+0+1+3+0 = 6.
Define a function f on an itemset s. This function will be specified later. One example of this function is f(s)
= ispi. In this example, if s = {C, D}, then f(s) = pC + pD.
The profit of an itemset s in the dataset is defined to be the product of the frequency of this itemset in the
dataset and f(s).
For example, itemset {C, D} has profit = 6 . f({C, D})
(a) Assume that we adopt function f such that f(s) = (ispi)/|s| where |s| denotes the no. of items in s.
Suppose that we know that pA = 10, pB = 10, pC = 10 and pD = 10.
We want to find all itemsets with profit at least 50.
Can the Apriori Algorithm be adapted to find these itemsets?
If yes, please write down the pseudo-code and illustrate it with the above example.
If no, please explain why the Apriori Algorithm cannot be adapted. In this case, please also design
an algorithm, write down the pseudo-code and illustrate it with the above example.
(b) Assume that we adopt function f such that f(s) = ispi.
Suppose that we know that pA = 5, pB = 10, pC = 6 and pD = 4.
We want to find all itemsets with profit at least 50.
Can the Apriori Algorithm be adapted to find these itemsets?
If yes, please write down the pseudo-code and illustrate it with the above example.
If no, please explain why the Apriori Algorithm cannot be adapted. In this case, please also design
an algorithm, write down the pseudo-code and illustrate it with the above example.
End of Paper
7/7