CST466
CST466
PART A
Answer all questions, each carries 3 marks. Marks
Page 1of 5
0400CST466082401
10 TF-IDF is preferred over raw term frequency counts for document similarity (3)
analysis because it reduces the influence of common words and highlights
distinctive terms
PART B
Answer one full question from each module, each carries 14 marks.
Module I
11 a) Three tier architecture - diagram (4)
Explanation (3)
Page 2of 5
0400CST466082401
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25 (2)
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9 (3)
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15 (3)
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
OR
14 a) Stepwise forward selection with example (2)
Stepwise backward elimination with example (2)
Combination of forward selection and backward elimination with example (2)
b) PCA Explanation (4)
Example (4)
Module III
15 a) Gain of attribute A = 0.5769 (7)
Gain of attribute B = 1.0365
Root is B
b) Explain DBSCAN with core point, border point and noise point (4)
Write any three advantages of DBSCAN
Can find clusters of arbitrary shape. (3)
Does not require specifying the number of clusters in advance.
Can identify noise and outliers.
Works well with clusters of varying density.
OR
16 a) Explain the use of dendrogram in hierarchical clustering. (3)
Single Linkage: Minimum distance between points in the two clusters.
Complete Linkage: Maximum distance between points in the two clusters. (3)
Average Linkage: Average distance between points in the two clusters.
Page 3of 5
0400CST466082401
OR
18 a) Frequent Itemsets (support ≥ 40%): {milk, bread}, {bread, eggs}, {bread, (5)
butter}
Association Rules (confidence ≥ 60%):
{milk} → {bread} (Confidence = 100%) (3)
{bread} → {milk} (Confidence = 80%)
{eggs} → {bread} (Confidence = 100%)
{butter} → {bread} (Confidence = 100%)
Page 4of 5
0400CST466082401
****
Page 5of 5