IS328 Final Exam

IS328

Uploaded by

Tetz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

199 views12 pages

IS328 Final Exam

IS328

Uploaded by

Tetz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 12

Name: ID Number: @USP THE UNIVERSITY OF THE SOUTH TACINC 15228: Data Mining Faculty of Science, Technology and Environment School of Computing Information and Mathematical Sciences Final Exomination Semester 22016 Mode [Face to Face) Duration of Exam: 3 hours + 10 minutes Reading Time: 10 minutes Witing Time: 3 hours Total mark: 100 Question and Answer Booklet Instructions This exam hos thvee sections: A, B, and C Answer ALL questions in Sections A and 8. ‘Answer ONLY ONE Question from Section C ‘Answer multiple choice questions on the multiple choice grid provided on page &, Wie your answers fo Sections 8 and C in the answer sctipt provided. This exam paper has 11 pages incluaing ths cover page. You may use non-programmable calcviators. Tis exams worth 80% of your overall mark, Minimum pas maak for ths exam is 40/100, Hand this examination booklet to your supervisor when you complete the ‘examination —_———$—$<$<$<$<$<<_—_—_—___— e808 Fn am Sumer 2 2018 Peete 11Name: ID Number: Multipk (01) Which ofthe following is not an atibute type in data mining? A Interval B Ratio © Random D — Oxdinal E Nominal Consider two objects represented by the tuples (22, 1, 42, 10) and Q20, 0,36, 8) and answer the questions 2 to 4 (02) The Euclidean distance berwoun the two objets P and Q. A 116 B76 © 6m D617 (03) The Manhattan distance between the two objects P and Q. 9 U 1B 5 pow> (04) The Minkowski distance between the two objects P and Q, using A 516 B63 © Set D Gis (05). Assume that we have the following fequent itemsets fora given transaction database {1A,B}, {A.C}, (B.C), {A.D}, {B,D}, {A,B,C} and (A,B, D}} How many different candidate rules are there? 2 1% 16 2 oweName: ID Number: (96) A common weakness of association rule mining is that A Itistoo inefficient B_ It produces too many rules © produces not enough interesting rules D_— Allorthe above Use the three-class confusion matrix below to answer questions (07) through (10) Computed Decision Glass1 | Class2 | Class 3 Class1 | 10 5 Class2 | 5 16 Class3 | 2 4 " (07) How many instances were correctly classified? A B36 cs DoT (08) How many instances were incorrectly classified with class 2 6 com> 7 5 9 109) Which class instances were classified withthe least error rate? Class 1 Class 2 Class 3 (Class [and Class 2 voe> 10) What isthe misclassification error rate for the mode!” A 40% B 50% C 60e%, D 70%Nam 11) Given the following two objects. Attibutel Atributed AtributeS Attabutes Object 1 10 0 Objest2 1 o 4 ° ‘What i the distance between the objects if all variables are symmetric? ‘What is the distance between the objects if all variables are asymmetric? 0s, 0s 03, 067 067,08 067,067 come 12) Which ofthe following is nota normalisation metho? ‘min-max normalisation decimal sealing ‘score normalisation logarithmie normalisation woe 413) Assume the APRIORI algorithm identified the following seven 4-item sets that satisfy a user given suppor threshold acde, acdt, adfg, bede, beef, bed, cdef. ‘What inital GandiateS-itemsets are created by the APRIORI algorithm? A acdef, boot, Bacdef, bedet © bodef, acd D —bodef, abed 114) Which ofthe following isnot @ data reduction technique? Data cube aggregation Dimensionality Reduction Data Compression Data Transformation Numerosity Reduetion mone 45) Suppose that doct and doc? are two vectors as follows: 5, 0,3,0,2, 0,0,2, 0,0) 2.0,1,0,5,1,0,5,0,2) ‘The cosine similarity between the two vectors is A 0798 B 0756 © 0765 D 07s9 15208 Eam Sameer 2 216 Page 4 1ID Number Nam 16) Suppose a group of 12 students with the test scores sted as follows: 19, 71,48, 63,35, 85,69, 81, 72, 88, 99,95 By partitioning them into four bins sing equal width method, ‘how many numbers are therein the third bin? vom> Consider the transactions below and answer the questions 17, 18 and 19. Transaction-id items 1 ABCE 2 ABDE 3 BCDE 4 BDE 5 ABD 6 BEC 7 BAE 8 CBE 9 BE 0 CE 17) What isthe support ofthe itemset (B,C.E}? ‘A 20% B 30% 40% D 50% E 60% 18) The length of the possible largest frequent itemSet is A 2 BS c 4 DoS EB 6 19), Which ofthe following rues has the highest confidence? AREAS BD B RUB DE © RECS BE DRED SAE BREED AB 20) Which ofthe following ate strategies for data tansformation? A Smoothing B Attribute Construction Aggregation D_— Allof the aboveName: ID Number: Section A - Multiple Choice Questions (Each question has only one answer) 1) (A) (8) (©) (©) 2 (A) (8) () 6) OE) 3) A) (8) () 6) OE) 4 ~) () (©) © 5) (A) (8) (C) (OE) 6 (A) (8) (Cc) (OE) n A (@) (©) © € 8) (A) (8) (Cc) (0) (BE) 9) (A) (8) (C) (BD) (E) 10) (A) (B) (Cc) (BD) (E) 11) (A) (8) (C) (DE) 12) (A) (8) (C) (DB) (ED 13) (A) (B) (C) (D) (ED 14) (A) (8) (C) (D) ED 15) (A) (8) (C) (DB) (ED 16) (A) (8) (C) (DB) (EY 17) (4) (8) (C) OE) 18) (A) (B) (Cc) (0) (E) 19) (A) (B) (Cc) (0) (E) 20) (A) (B) (C) (0) (E)Name: ID Number: Section B Short Answers and Calculations (60 Marks) Question 21: Frequent Itemset_and Sequence Mining (20 marks) (2) Explain the diference betwean the following [2 marks] (Frequent mst (i) Candidate temset (©) Given are the following five transactions on items (A,B, C,D, K) 7. Teme 100 (A.B.K] 200 (A.B 300, (A.D. 40 (C.D) 00 ick ‘0 (A.D. Use the Aprior algorithm to compute all frequent itemsets, and their support, with minimum support of 33.34%, Its important that you clearly indicate the steps ofthe algorithm. [8 marks] (©) Which of the itemsets from b) are closed? Which ofthe temsets from b) are maximal? [2 marks} Consider the following frequent 3-sequences and answer the questions (d) ~ (0). <{1,.2,3) >.< (1,2}13) >, < (112, 3} >< {1,2} 44) >< (1,3) (4) >, < 11,2, 4) >, < (2, 3}13) >, < £2, 3)44} > < {2} 43) (3) >, and < {2} (3) 44) > (4) Listall the candidate 4-soquences produced by the candidate generation step ofthe ‘Generalized Sequential Pattern (GSP) algorithm. [3 marks] (6) Listall the candidate 4-soquences pruned during the candidate pruning step of the GSP algorithm (assuming no timing constrains), (2 marks) (9 Listall the candidate 4-sequences pruned during the candidate pruning step of the GSP algorithm (assuming maxgap = 1). [3 marks)Nam ID Number ‘Question 22: Classification Techniques (20 marks) (@) Describe principles and ideas ofthe decision tree-based classification [4 marks] (©) Derive all possible rules from the decision tree below and write down a set of classification rales [4 mars] Refund Yes, nS Marital (single, | Status Divorced} (Married) Taxable ee Income (6) Whats. confusion matrix? (11 mark] Using the following tet set evaluate the above model, Create the confusion matrix and caleulat the classification accuracy and eror rate (7 marks) [Refund Mavi Staus —[ Taxable Income No Divered 75000" Yes Single ‘$90. No. Dyssad ‘ro0000 Yer Mare 000 No Single 35000 Ne: Mate $5000 (@) A datase of 1000 cases was partitioned into a training st of 600 cases and a validation set of ‘400 cases, A K-Nearest Neighbours model with k-I had a misclassifcation err rate oF 8% ‘on the validation data. It was subsequently found thatthe partioning had been done incorrectly and that 100 eases from the taining data set had been aceidentally duplested and had overritten 100 cases in the validation dataset. What isthe mislassifcaion error rate for ‘the 300 cases that were truly part ofthe validation data? [4 marks}Name: ID Number: Question 23: Cluster Analysis (20_ marks) (2) List and briefly describe the following three approaches for clustering. [3 marks] (i) Partitioning Methods (ii) Hierarchical Methods (iil) Density-Based Methods () List at least six requirements of clustering in data mining. (3 marks] (@)K-Means Clustering ~ 8 marks ‘Suppose you want to cluster the eight points shown below using K-means. [ar_[ar 2 [0 2s sie sie 715 o4 1? rm ‘Assume that k = 3 and that initially the points are assigned to clusters as follows: Cl = (xl, x2, x3}, C2 = {x4, x5, x6), C3 = (X7, x8}, Apply the k-means algorithm until convergence (i., until the clusters do not change), using the Manhattan distance. Make sure you clearly identify the final clustering and show your steps. Give the value of the k-means error function after ‘convergence. [8 marks] (6) Hierarchical Clustering - 6 marks Describe the principles and ideas regarding Agglomorative Hierarchical (Clustering. Show the different steps of the algorithm using the dissimilarity matrix below and complete link clustering. Give partial results after each step. (6 marks}Name: ID Number: Section C Answer only ONE Question (20 Marks) uestion 24: General Data Mi Issues (20 marks 8) Why do we pre-process data? Briefly desribe the processes involved in data pre-processing. [4marks] 'b) Explain the difference between classification and prediction. Illustrate the difference using examples [2 marks] €) Briefly outline how to compute the dissimilarity between objects described by the following types of variables (Numerical (ntrva-scaled) variables [2 marks] i) Asymmetric binary variables [2 marks} (Gi) Categorical variables [2 marks} 4) Given the following measurements forthe variable age: 18; 22; 25; 42528; 43; 33; 35; 56,28; standardize the variable by the following: (Compute the mean absolute deviation of age[4 marks] ii) Compute the z-score forthe frst four measurements: [4 marks)Name: ID Number: 1» 25 Data Mining Applications and Big Data (20 marks) a) Data Mining Applications ~ 6 marks ») Data mining applications can be found in five areas namely: financial data analysis (FDA), ‘etal and tclecommunication industries (RTI); science and engineering (SE); intrusion ‘detection and prevention (IDP); and recommender systems (RS). ‘Choose ONLY one ftom the five areas named above and describe how the aplication will ‘work and the benefit(s) of such an application (6 marks} Privacy and Data Mining — 6 marks ‘Vinay banks with the Bank of the South Pacific (BSP) which has a robust data mining system, The bank's data mining team has been studying Vinay's bank eaed usage patterns, “They notice that reently he has made numerous payments at Vinod Patel Hardware Stores. Based on their data analysis, the bank then decided to contact him to discuss thei special loans package for home renovations, (Discuss how this may conflict with your righ o privacy. [2 marks] i) Deserbe a privacy-preserving data mining method tht may allow the bank to perform customer patter analysis without infringing on customers’ right to privacy. [2 marks) (ii) Describe an example where data mining could be used to help society, [1 mark] Gv) Explain how data mining may be detrimental to society [1 mark] ©) Big Data Analytics ~ 8 marks (i) Whats ig Data in simple terms? [1 mark] (it How can big data be desrived? [1 mark] (i) Diseuss Some Key enabiers for big data [2 marks} (je) Explain the ference between structured data and unstructured data. Give examples, (2 ‘marks} (0) What ae the special requirements for data mining procedures when? handling big ata? [2 marks} End of Paper {Mond this exemination booke! ond the answer Sco To your supeniso when you complete ‘he examination]

Memory Based Reasoning - BIA
100% (1)
Memory Based Reasoning - BIA
19 pages
Final Exam Paper Fall 2020
No ratings yet
Final Exam Paper Fall 2020
3 pages
Data Mining
No ratings yet
Data Mining
6 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
Data Mining: Concepts and Techniques: - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Introduction
44 pages
IS328 Data Mining-Tutorial 1 Solution
No ratings yet
IS328 Data Mining-Tutorial 1 Solution
5 pages
K-Means Clustering Algorithm With Numerical Example
No ratings yet
K-Means Clustering Algorithm With Numerical Example
11 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Data Science Questions and Answers - Clustering
No ratings yet
Data Science Questions and Answers - Clustering
4 pages
Tutorial 11 Answers
No ratings yet
Tutorial 11 Answers
4 pages
Tree Indexes: Immanuel Trummer
No ratings yet
Tree Indexes: Immanuel Trummer
65 pages
Decision Tables Exercises
100% (1)
Decision Tables Exercises
3 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
Distributed Databases: Solutions To Practice Exercises
No ratings yet
Distributed Databases: Solutions To Practice Exercises
4 pages
EC9560 Data Mining: Lab 02: Classification and Prediction Using WEKA
No ratings yet
EC9560 Data Mining: Lab 02: Classification and Prediction Using WEKA
5 pages
Data Mining Exercises - Solutions
No ratings yet
Data Mining Exercises - Solutions
5 pages
Data Warehousing, OLAP, Data Mining Practice Questions Solutions
No ratings yet
Data Warehousing, OLAP, Data Mining Practice Questions Solutions
4 pages
Data Mining Worksheet One
No ratings yet
Data Mining Worksheet One
2 pages
Course Plan Natural Language Processing
No ratings yet
Course Plan Natural Language Processing
5 pages
Cluster Analysis Chapter 8 Solution
No ratings yet
Cluster Analysis Chapter 8 Solution
8 pages
Sample Final AI
No ratings yet
Sample Final AI
9 pages
Data Mining-Rule Based Classification
No ratings yet
Data Mining-Rule Based Classification
4 pages
Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist
100% (1)
Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist
6 pages
Association Analysis: Basic Concepts and Algorithms
No ratings yet
Association Analysis: Basic Concepts and Algorithms
28 pages
Data Mining and Warehousing
100% (3)
Data Mining and Warehousing
30 pages
ER Practical 7r
No ratings yet
ER Practical 7r
5 pages
Programming Assign. Unit 6
No ratings yet
Programming Assign. Unit 6
3 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
3.1 What Is Data Warehouse?: Unit Iii
No ratings yet
3.1 What Is Data Warehouse?: Unit Iii
33 pages
Space and Time Trade Off
No ratings yet
Space and Time Trade Off
8 pages
CH 6
No ratings yet
CH 6
72 pages
K Means R and Rapid Miner Patient and Mall Case Study
No ratings yet
K Means R and Rapid Miner Patient and Mall Case Study
80 pages
Data Mining and Visualization
No ratings yet
Data Mining and Visualization
8 pages
Data Structures Midterm Exam - Model Answer
No ratings yet
Data Structures Midterm Exam - Model Answer
6 pages
Lab File Input Output
No ratings yet
Lab File Input Output
10 pages
Fuzzy Queueing Model Using DSW Algorithm: Abstract
No ratings yet
Fuzzy Queueing Model Using DSW Algorithm: Abstract
6 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Data Mining Comprehensive Exam - Regular PDF
No ratings yet
Data Mining Comprehensive Exam - Regular PDF
3 pages
Chapter 6 Measures of Skewness and Kurtosis
No ratings yet
Chapter 6 Measures of Skewness and Kurtosis
25 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
19 pages
GT Pract Sem 5
No ratings yet
GT Pract Sem 5
19 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
34 pages
Introduction To Tree Methods
No ratings yet
Introduction To Tree Methods
15 pages
Lecture 01 05.08.2024 AI-ML Introduction
No ratings yet
Lecture 01 05.08.2024 AI-ML Introduction
46 pages
Syllabus
No ratings yet
Syllabus
9 pages
Programming Logic and Design Final Exam PDF
No ratings yet
Programming Logic and Design Final Exam PDF
22 pages
Data Mining Final Exam
No ratings yet
Data Mining Final Exam
1 page
ML 2
No ratings yet
ML 2
6 pages
Final Exam Data Mining and Machine Learning
No ratings yet
Final Exam Data Mining and Machine Learning
5 pages
Dbms Aicte Lab
No ratings yet
Dbms Aicte Lab
42 pages
Daa Assignment
No ratings yet
Daa Assignment
12 pages
Bresenham Line Drawing Algo
No ratings yet
Bresenham Line Drawing Algo
6 pages
Planning: Russell and Norvig
No ratings yet
Planning: Russell and Norvig
33 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
UNIT-1 Introduction: Dr. C.Nagaraju Head of Cse Ysrec of YVU Proddatur
100% (1)
UNIT-1 Introduction: Dr. C.Nagaraju Head of Cse Ysrec of YVU Proddatur
86 pages
Unit 4 - 4.4
No ratings yet
Unit 4 - 4.4
23 pages
KCG College of Technology Karapakkam Chennai-600 097
No ratings yet
KCG College of Technology Karapakkam Chennai-600 097
3 pages
Exam DUT 070816 Ans
No ratings yet
Exam DUT 070816 Ans
5 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
CS211 Exam
No ratings yet
CS211 Exam
10 pages
Lab Week 8: A. AON Network Digram
No ratings yet
Lab Week 8: A. AON Network Digram
2 pages
Bundding 2
No ratings yet
Bundding 2
4 pages
CS311 Exam
No ratings yet
CS311 Exam
16 pages
CS310 Exam
No ratings yet
CS310 Exam
24 pages
The University of The South Pacific
No ratings yet
The University of The South Pacific
6 pages
CH312 Exam
No ratings yet
CH312 Exam
9 pages
CS341 Software Quality Assurance and Testing: Final Examination Semester 1 2017
No ratings yet
CS341 Software Quality Assurance and Testing: Final Examination Semester 1 2017
10 pages
CH102 Principles and Reactions in Organic Chemistry: Fste School of Biological and Chemical Sciences
No ratings yet
CH102 Principles and Reactions in Organic Chemistry: Fste School of Biological and Chemical Sciences
13 pages
CH405 Biochemistry: Fste School of Biological and Chemical Sciences
No ratings yet
CH405 Biochemistry: Fste School of Biological and Chemical Sciences
4 pages
CH414 Exam
No ratings yet
CH414 Exam
16 pages
CS111 Exam
No ratings yet
CS111 Exam
18 pages
The University of The South Pacific
No ratings yet
The University of The South Pacific
6 pages
The University of The South Pacific
No ratings yet
The University of The South Pacific
18 pages
MG101 Exam Print
No ratings yet
MG101 Exam Print
15 pages
CH306 Exam
No ratings yet
CH306 Exam
9 pages
MG101 Exam F2F
No ratings yet
MG101 Exam F2F
9 pages
MG201 Exam
No ratings yet
MG201 Exam
4 pages
T The Un Niversit Tyofth He Sou Uth Pa Cific: SC Chool O of Educ Cation
No ratings yet
T The Un Niversit Tyofth He Sou Uth Pa Cific: SC Chool O of Educ Cation
6 pages
Course Code: MG202 Course Title: Operations Management
No ratings yet
Course Code: MG202 Course Title: Operations Management
7 pages
CH405 Exam
No ratings yet
CH405 Exam
7 pages
AG373 Exam
100% (1)
AG373 Exam
4 pages
CH312 Exam
No ratings yet
CH312 Exam
8 pages
MG204 Exam
No ratings yet
MG204 Exam
5 pages
The University of The South Pacific: Chemistry Division
No ratings yet
The University of The South Pacific: Chemistry Division
10 pages
The University of The South Pacific: School of Accounting and Finance
No ratings yet
The University of The South Pacific: School of Accounting and Finance
7 pages
BIF02 Exam
No ratings yet
BIF02 Exam
21 pages
CH312
No ratings yet
CH312
8 pages
The University South Pacific: School of Accounting and Finance
No ratings yet
The University South Pacific: School of Accounting and Finance
9 pages
The University of The South Pacific: Faculty of Business & Economics
No ratings yet
The University of The South Pacific: Faculty of Business & Economics
9 pages

IS328 Final Exam

Uploaded by

IS328 Final Exam

Uploaded by

You might also like