DWH&DM Ver2
DWH&DM Ver2
Instructions:
Attempt all questions.
BE PRECISE AND DON’T COPY.
Attempt the paper in sequence as given in question paper. Make sure to write
actual question number on the answer sheet.
Each question must be attempted on A4 sheet (HAND-WRITTEN) with
following personal information on the top of each sheet:
STUDENT NAME, REGISTRATION NUMBER, CNIC NUMBER,
CONTINUATION SHEET NUMBER (E.G., 1/4, 2/4 …), SIGNATURE.
Submit one pdf file on CUOnline on time. Late submissions will not be
accepted. So take care of time.
Long Questions
Given the data in the following table. Apply Hierarchical Agglomerative clustering
using Average link. Use Euclidean distance as distance measure and draw
dendrogram. Clearly mention all steps and calculations.
S# x y
1 0.40 0.53
2 0.A2 0.38
3 0.35 0.32
4 0.26 0.C9
5 0.B8 0.41
6 0.45 0.30
Where,
A= Use (reg_no_last2digits % 5) + 1
B= Use (reg_no_last2digits % 5) + 2
C= Use (reg_no_last2digits % 5) + 3
Instruction: No marks will be given if numbers used other than your own registration
number.
Apply the Decision Tree(DT) algorithm to the training data in below table and
clearly mention all steps and calculations of Entropy and Information Gain. Make
Decision Tree after applying calculations.
a) Which attribute would information gain choose as the root of the tree?
b) Draw the decision tree that would be constructed by recursively applying
information gain to select roots of sub-trees, as in the Decision-Tree-Learning
algorithm.
c) Generate Decision rules from decision tree.
Run the FP Growth algorithm on the following transactional database with minimum
support equal to 50%. Mine the FP-tree and extract the set of frequent patterns.
Show step by step execution.
TID ITEMS
T1 {A,B,C,E}
T2 {B,D,E,F}
T3 {A,B,C,D,F,G}
T4 {A,B,C,D,E,G,H}
T5 {A,C,D,E,H}
Good Luck