0% found this document useful (0 votes)

98 views

COMP1942 Question Paper

This document is a question paper for the COMP1942 Exploring and Visualizing Data midterm examination from Spring Semester 2014. It contains 4 parts: Part A with 4 short answer questions worth 20 marks each, Part B with 4 multiple choice questions worth 5 marks each for a total of 20 marks, and an optional Part C bonus question worth 10 additional marks. The paper tests concepts related to association rule mining, clustering algorithms, decision trees, and profit calculations for itemsets.

Uploaded by

pakaMuziki

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views

COMP1942 Question Paper

Uploaded by

pakaMuziki

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

COMP1942 Question Paper

COMP1942 Exploring and Visualizing Data (Spring Semester 2014)

Midterm Examination (Question Paper)
Date: 1 April, 2014 (Tue)
Time: 12:05-13:20
Duration: 1 hour 15 minutes

Student ID: Student Name:_______________

Seat No. :__________________

Instructions:
(1) Please answer all questions in Part A and Part B in the answer sheet.
(2) You can optionally answer the bonus question in Part C in the answer sheet. You can obtain additional
marks for the bonus question if you answer it correctly.
(3) You can use a calculator.

Question Paper

1/7
COMP1942 Question Paper

Part A (Compulsory Short Questions)

Q1 (20 Marks)

(a) Consider a data set containing 10 transactions and 6 items.

We know that the lift ratio of association rule “{A, B}  C” is 1.25.
We also know that the support of {A} is 7, the support of {B} is 5, the support of {C} is 6, the support
of {A, B} is 4, the support of {A, C} is 5 and the support of {B, C} is 3.
Is it always true that we can find the support of “{A, B}  C”? If yes, please explain it and write down
the support of “{A, B}  C”. Otherwise, please elaborate it.
(b) In the Apriori algorithm, we know how to find some sets L1, C2, L2, ….
(i) Is it always true that the number of itemsets in L2 is smaller than or equal to the number of itemsets
in C2? If yes, please explain it. Otherwise, please give a counter example.
(ii) Is it always true that the number of itemsets in C2 is larger than or equal to the number of itemsets in
L1? If yes, please explain it. Otherwise, please give a counter example.
(c) We know that conditional FP-trees are constructed from an FP-tree. Is it always true that we can
construct the FP-tree based on all conditional FP-trees constructed? Please elaborate it.

Q2 (20 Marks)

(a) Consider Algorithm forgetful sequential k-means clustering. Let a be a constant defined in this
algorithm.
(i) Please write down the steps for Algorithm forgetful sequential k-means clustering.
(ii) Consider a cluster found in the algorithm containing n examples where its initial mean is equal to
m0. Let xj be the first j-th example in this cluster and mj be the mean vector of this cluster after the
first j-th examples are added for j = 1, 2, …, n. We can express mn in the following form.
n
mn  X  m0   Y  x p
p 1

where X and Y are some expressions.

Please show that mn can be expressed in this form. After you show this statement, please also
write down what is X and what is Y.
(You are not required to memorize the formula for this question. You just need to show how you
obtain the above expression and finally you can obtain X and Y.)
(b) We are given the following table with 3 input attributes, namely “Gender”, “Child” and “Income”, and
1 target attribute, namely “Insurance”. “Actual Insurance” corresponds to the actual values for
attribute “Insurance” and “Predicted Insurance” corresponds to the values for attribute “Insurance”
given by a classification model (e.g., decision tree).
Gender Child Income Actual Insurance Predicted Insurance
Male Yes High Yes Yes
Male Yes Low No Yes
Male No High No No
Female Yes High Yes No
Female Yes Medium No Yes
Female No Medium Yes Yes
Female No Low No No
(i) Please give the confusion matrix.
(ii)Please give the lift chart.

2/7
COMP1942 Question Paper

Q3 (20 Marks)

(a) Please give two reasons why we need to do clustering.

(b) We are given five data points.
a: (1, 2), b: (2, 4), c: (7, 6), d: (6, 9), e: (8, 9)
Suppose that there are two clusters. The first cluster contains points a and b while the second cluster
contains points c, d and e.
(i) (1) What is the center of the first cluster if we use the centroid linkage as a distance measurement?
(2) What is the center of the second cluster if we use the centroid linkage as a distance measurement?
(ii) Consider the agglomerative approach for hierarchical clustering.
Suppose that these two clusters are merged.
(1) What is the center of the merged cluster if we use the centroid linkage as a distance measurement?
(2) What is the center of the merged cluster if we use the median linkage as a distance measurement?

Q4 (20 Marks)
The following shows a history of customers with their incomes, ages and an attribute called “Have_iPhone”
indicating whether they have an iPhone. We also indicate whether they will buy an iPad or not in the last
column. You cannot use XLMiner in this question.
No. Income Age Have_iPhone Buy_iPad
1 high young yes yes
2 high old yes yes
3 medium young no yes
4 high old no yes
5 medium young no no
6 medium young no no
7 medium old no no
8 medium old no no

We want to train a CART decision tree classifier to predict whether a new customer will buy an iPad or not.
We define the value of attribute Buy_iPad to be the label of a record.
(a) Please find a CART decision tree according to the above example. In the decision tree, whenever
a node contains at most 3 records, we do not continue to process this node for splitting.
(b) Consider a new young customer whose income is medium and he has an iPhone. Please predict
whether this new customer will buy an iPad or not.

3/7
COMP1942 Question Paper

Part B (Compulsory Multiple-Choice (MC) Questions)

In this part, there are 4 multiple-choice questions, namely Q5, Q6, Q7 and Q8. The total scores in this part
are 20 scores. Each question weighs 5 scores.

Q5. [Removed]

A. [Removed]
B. [Removed]
C. [Removed]
D. [Removed]
E. [Removed]

Q6. [Removed]

A. [Removed]
B. [Removed]
C. [Removed]
D. [Removed]
E. [Removed]

4/7
COMP1942 Question Paper

Q7. [Removed]

A. [Removed]
B. [Removed]
C. [Removed]
D. [Removed]
E. [Removed]

5/7
COMP1942 Question Paper
Q8. [Removed]

A. [Removed]
B. [Removed]
C. [Removed]
D. [Removed]
E. [Removed]

6/7
COMP1942 Question Paper

Part C (Bonus Question)

Note: The following bonus question is an OPTIONAL question. You can decide whether you will answer
it or not.

Q9 (10 Additional Marks)

We are given four items, namely A, B, C and D. Their corresponding unit profits are pA, pB, pC and pD.

The following shows five transactions with these items. Each row corresponds to a transaction where a non-
negative integer shown in the row corresponds to the total number of occurrences of the correspondence
item present in the transaction.
A B C D
0 0 3 2
3 4 0 0
0 0 1 3
1 0 3 5
6 0 0 0
The frequency of an itemset in a row is defined to be the minimum of the number of occurrences of all items
in the itemset. For example, itemset {C, D} in the first row has frequency = 2. But, itemset {C, D} in the
third row has frequency = 1.
The frequency of an itemset in the dataset is defined to be the sum of the frequencies of the itemset in all
rows in the dataset. For example, itemset {C, D} has frequency = 2+0+1+3+0 = 6.
Define a function f on an itemset s. This function will be specified later. One example of this function is f(s)
= ispi. In this example, if s = {C, D}, then f(s) = pC + pD.
The profit of an itemset s in the dataset is defined to be the product of the frequency of this itemset in the
dataset and f(s).
For example, itemset {C, D} has profit = 6 . f({C, D})
(a) Assume that we adopt function f such that f(s) = (ispi)/|s| where |s| denotes the no. of items in s.
Suppose that we know that pA = 10, pB = 10, pC = 10 and pD = 10.
We want to find all itemsets with profit at least 50.
Can the Apriori Algorithm be adapted to find these itemsets?
If yes, please write down the pseudo-code and illustrate it with the above example.
If no, please explain why the Apriori Algorithm cannot be adapted. In this case, please also design
an algorithm, write down the pseudo-code and illustrate it with the above example.
(b) Assume that we adopt function f such that f(s) = ispi.
Suppose that we know that pA = 5, pB = 10, pC = 6 and pD = 4.
We want to find all itemsets with profit at least 50.
Can the Apriori Algorithm be adapted to find these itemsets?
If yes, please write down the pseudo-code and illustrate it with the above example.
If no, please explain why the Apriori Algorithm cannot be adapted. In this case, please also design
an algorithm, write down the pseudo-code and illustrate it with the above example.

End of Paper

7/7

DataMining - Workbook MCQ
No ratings yet
DataMining - Workbook MCQ
16 pages
Data Mining Practice Final Exam Solutions: True/False Questions
100% (1)
Data Mining Practice Final Exam Solutions: True/False Questions
5 pages
COMP1942 Question Paper
No ratings yet
COMP1942 Question Paper
5 pages
Comp 1942 finalExamQuestion-2016
No ratings yet
Comp 1942 finalExamQuestion-2016
11 pages
Comp 1942 finalExamQuestion-2019
No ratings yet
Comp 1942 finalExamQuestion-2019
14 pages
Midterm F07 Solutions
No ratings yet
Midterm F07 Solutions
4 pages
(COMP1942)[2022](s)midterm~thliai^_91588
No ratings yet
(COMP1942)[2022](s)midterm~thliai^_91588
13 pages
HW 1
No ratings yet
HW 1
5 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
Data Mining Practice Final Sol
No ratings yet
Data Mining Practice Final Sol
5 pages
HW 2
No ratings yet
HW 2
7 pages
B.Tech May2022 Comp CSPE-64 Sem4
No ratings yet
B.Tech May2022 Comp CSPE-64 Sem4
4 pages
C-3 Pap365er
No ratings yet
C-3 Pap365er
4 pages
GTU-COMPUTER-3160714-SUMMER-2023
No ratings yet
GTU-COMPUTER-3160714-SUMMER-2023
3 pages
unit4 mcqs
No ratings yet
unit4 mcqs
7 pages
640005
No ratings yet
640005
4 pages
unit 4- Question Bank
No ratings yet
unit 4- Question Bank
11 pages
Week 7 Assignment 1
No ratings yet
Week 7 Assignment 1
6 pages
Assignment Data Mining
No ratings yet
Assignment Data Mining
27 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
Mid Term
No ratings yet
Mid Term
12 pages
HW_02
No ratings yet
HW_02
3 pages
B.Tech Degree S8 (S, FE) / S6 (PT) (S, FE) Examination June 2023 (2015 Scheme)
No ratings yet
B.Tech Degree S8 (S, FE) / S6 (PT) (S, FE) Examination June 2023 (2015 Scheme)
4 pages
DW Model Questions
No ratings yet
DW Model Questions
8 pages
DM 2019
No ratings yet
DM 2019
7 pages
Major 2020
No ratings yet
Major 2020
2 pages
EE4146_Test1_202324_semB_solution
No ratings yet
EE4146_Test1_202324_semB_solution
7 pages
Exam DUT 070816 Ans
No ratings yet
Exam DUT 070816 Ans
5 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
Question Bank Semester: IV Sem Subject: Data Science Sub Code: 17MCA441 SL - No. Questions Marks
No ratings yet
Question Bank Semester: IV Sem Subject: Data Science Sub Code: 17MCA441 SL - No. Questions Marks
4 pages
Data Mining List of Important Question
No ratings yet
Data Mining List of Important Question
4 pages
Exam-dm1-121017-ans
No ratings yet
Exam-dm1-121017-ans
8 pages
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
100% (1)
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
5 pages
DM-Question Bank 2024-25 Objective Question Bank
No ratings yet
DM-Question Bank 2024-25 Objective Question Bank
14 pages
DM_Practice_Problem_Set-2
No ratings yet
DM_Practice_Problem_Set-2
7 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
Data Mining Exam
No ratings yet
Data Mining Exam
14 pages
Data Mining Notes
No ratings yet
Data Mining Notes
31 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
CS 515 Data Warehousing and Data Mining
No ratings yet
CS 515 Data Warehousing and Data Mining
5 pages
Comp 1942 finalExamSol-2016
No ratings yet
Comp 1942 finalExamSol-2016
24 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
CSC 501 mid term 2-Assignment
No ratings yet
CSC 501 mid term 2-Assignment
2 pages
Data Mining BITS-PILANI Mid Semester Sample
No ratings yet
Data Mining BITS-PILANI Mid Semester Sample
10 pages
Mid-Sem Model Answer 7
No ratings yet
Mid-Sem Model Answer 7
5 pages
(LASER) survival8-DM An DSAD-2-print Pending
No ratings yet
(LASER) survival8-DM An DSAD-2-print Pending
29 pages
16CS531-Data Warehousing and Data Mining (1)
No ratings yet
16CS531-Data Warehousing and Data Mining (1)
6 pages
(6)C3.DataMining
No ratings yet
(6)C3.DataMining
3 pages
Mid-Semester Regular Data Mining QP v1 PDF
No ratings yet
Mid-Semester Regular Data Mining QP v1 PDF
2 pages
data_mining_end_23_24
No ratings yet
data_mining_end_23_24
2 pages
QB Students DM
No ratings yet
QB Students DM
12 pages
Assignment
No ratings yet
Assignment
2 pages
Dcs 7302
No ratings yet
Dcs 7302
17 pages
Data Mining - UOG (HH) - Final - F23-1
No ratings yet
Data Mining - UOG (HH) - Final - F23-1
10 pages
Q1.Bayes' Theorem
No ratings yet
Q1.Bayes' Theorem
5 pages
It-3031 (DMDW) - CS End Nov 2023
No ratings yet
It-3031 (DMDW) - CS End Nov 2023
23 pages
Big Data Exercieses
No ratings yet
Big Data Exercieses
6 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
30 pages
Exam DM 071214 Ans
No ratings yet
Exam DM 071214 Ans
7 pages
Master ACT Math Prep: Maths, #1
From Everand
Master ACT Math Prep: Maths, #1
Subbalakshmi Devaki
No ratings yet
Spontaneity of Chemical Reactions
No ratings yet
Spontaneity of Chemical Reactions
26 pages
Assignment 1
100% (2)
Assignment 1
2 pages
Worksheet - 4 Monotonicity
No ratings yet
Worksheet - 4 Monotonicity
13 pages
Jeas 0220 8104 PDF
No ratings yet
Jeas 0220 8104 PDF
8 pages
Hansen-J Test STATA
No ratings yet
Hansen-J Test STATA
5 pages
From Excel To Machine Learning
100% (1)
From Excel To Machine Learning
48 pages
ML PPTS Merged
No ratings yet
ML PPTS Merged
514 pages
Chapter 5 Artificial Intelligence notes
No ratings yet
Chapter 5 Artificial Intelligence notes
7 pages
Unit 4 - OR (Assignement)
No ratings yet
Unit 4 - OR (Assignement)
41 pages
Chapter 6 AI
No ratings yet
Chapter 6 AI
52 pages
DSP Assignment-5 F I R Filter-3
No ratings yet
DSP Assignment-5 F I R Filter-3
5 pages
Unit - 3 - ML
No ratings yet
Unit - 3 - ML
53 pages
Data Science Infographic en
No ratings yet
Data Science Infographic en
4 pages
A Review of Deep Learning Models To Detect Malware in Android Applications
No ratings yet
A Review of Deep Learning Models To Detect Malware in Android Applications
9 pages
Risk-Sensitive Prescriptive Analytics: Real Estate Case Study
No ratings yet
Risk-Sensitive Prescriptive Analytics: Real Estate Case Study
5 pages
Introduction To Polynomial Regression
No ratings yet
Introduction To Polynomial Regression
5 pages
Restaurant Sales and Customer Demand Forecasting: Literature Survey and Categorization of Methods
No ratings yet
Restaurant Sales and Customer Demand Forecasting: Literature Survey and Categorization of Methods
13 pages
CSCE 221 Cover Page Programming Assignment #3
No ratings yet
CSCE 221 Cover Page Programming Assignment #3
3 pages
CS Unit 2 EoT Exam Solutions
No ratings yet
CS Unit 2 EoT Exam Solutions
5 pages
Word 2 Vector
No ratings yet
Word 2 Vector
4 pages
Introduction To Simulation Grid Design and Upscaling Methods
No ratings yet
Introduction To Simulation Grid Design and Upscaling Methods
2 pages
Q Function and Error Functions
No ratings yet
Q Function and Error Functions
3 pages
Esha Saha: Carlosfranco A
No ratings yet
Esha Saha: Carlosfranco A
2 pages
Quiz 1 - : Solution
No ratings yet
Quiz 1 - : Solution
2 pages
CS273a Final Exam
No ratings yet
CS273a Final Exam
9 pages
Loki 97
100% (1)
Loki 97
22 pages
AI Presentation
No ratings yet
AI Presentation
5 pages
Electrodynamics - DAMTP Cambridge University
No ratings yet
Electrodynamics - DAMTP Cambridge University
39 pages
H170227e Vimbainashe Chigumbu Daa-3
No ratings yet
H170227e Vimbainashe Chigumbu Daa-3
8 pages