Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
188 views
12 pages
IS328 Final Exam
IS328
Uploaded by
Tetz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download
Save
Save IS328 Final Exam For Later
Share
0%
0% found this document useful, undefined
0%
, undefined
Print
Embed
Report
0 ratings
0% found this document useful (0 votes)
188 views
12 pages
IS328 Final Exam
IS328
Uploaded by
Tetz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Carousel Previous
Carousel Next
Download
Save
Save IS328 Final Exam For Later
Share
0%
0% found this document useful, undefined
0%
, undefined
Print
Embed
Report
Download
Save IS328 Final Exam For Later
You are on page 1
/ 12
Search
Fullscreen
Name: ID Number: @USP THE UNIVERSITY OF THE SOUTH TACINC 15228: Data Mining Faculty of Science, Technology and Environment School of Computing Information and Mathematical Sciences Final Exomination Semester 22016 Mode [Face to Face) Duration of Exam: 3 hours + 10 minutes Reading Time: 10 minutes Witing Time: 3 hours Total mark: 100 Question and Answer Booklet Instructions This exam hos thvee sections: A, B, and C Answer ALL questions in Sections A and 8. ‘Answer ONLY ONE Question from Section C ‘Answer multiple choice questions on the multiple choice grid provided on page &, Wie your answers fo Sections 8 and C in the answer sctipt provided. This exam paper has 11 pages incluaing ths cover page. You may use non-programmable calcviators. Tis exams worth 80% of your overall mark, Minimum pas maak for ths exam is 40/100, Hand this examination booklet to your supervisor when you complete the ‘examination —_———$—$<$<$<$<$<<_—_—_—___— e808 Fn am Sumer 2 2018 Peete 11Name: ID Number: Multipk (01) Which ofthe following is not an atibute type in data mining? A Interval B Ratio © Random D — Oxdinal E Nominal Consider two objects represented by the tuples (22, 1, 42, 10) and Q20, 0,36, 8) and answer the questions 2 to 4 (02) The Euclidean distance berwoun the two objets P and Q. A 116 B76 © 6m D617 (03) The Manhattan distance between the two objects P and Q. 9 U 1B 5 pow> (04) The Minkowski distance between the two objects P and Q, using A 516 B63 © Set D Gis (05). Assume that we have the following fequent itemsets fora given transaction database {1A,B}, {A.C}, (B.C), {A.D}, {B,D}, {A,B,C} and (A,B, D}} How many different candidate rules are there? 2 1% 16 2 oweName: ID Number: (96) A common weakness of association rule mining is that A Itistoo inefficient B_ It produces too many rules © produces not enough interesting rules D_— Allorthe above Use the three-class confusion matrix below to answer questions (07) through (10) Computed Decision Glass1 | Class2 | Class 3 Class1 | 10 5 Class2 | 5 16 Class3 | 2 4 " (07) How many instances were correctly classified? A B36 cs DoT (08) How many instances were incorrectly classified with class 2 6 com> 7 5 9 109) Which class instances were classified withthe least error rate? Class 1 Class 2 Class 3 (Class [and Class 2 voe> 10) What isthe misclassification error rate for the mode!” A 40% B 50% C 60e%, D 70%Nam 11) Given the following two objects. Attibutel Atributed AtributeS Attabutes Object 1 10 0 Objest2 1 o 4 ° ‘What i the distance between the objects if all variables are symmetric? ‘What is the distance between the objects if all variables are asymmetric? 0s, 0s 03, 067 067,08 067,067 come 12) Which ofthe following is nota normalisation metho? ‘min-max normalisation decimal sealing ‘score normalisation logarithmie normalisation woe 413) Assume the APRIORI algorithm identified the following seven 4-item sets that satisfy a user given suppor threshold acde, acdt, adfg, bede, beef, bed, cdef. ‘What inital GandiateS-itemsets are created by the APRIORI algorithm? A acdef, boot, Bacdef, bedet © bodef, acd D —bodef, abed 114) Which ofthe following isnot @ data reduction technique? Data cube aggregation Dimensionality Reduction Data Compression Data Transformation Numerosity Reduetion mone 45) Suppose that doct and doc? are two vectors as follows: 5, 0,3,0,2, 0,0,2, 0,0) 2.0,1,0,5,1,0,5,0,2) ‘The cosine similarity between the two vectors is A 0798 B 0756 © 0765 D 07s9 15208 Eam Sameer 2 216 Page 4 1ID Number Nam 16) Suppose a group of 12 students with the test scores sted as follows: 19, 71,48, 63,35, 85,69, 81, 72, 88, 99,95 By partitioning them into four bins sing equal width method, ‘how many numbers are therein the third bin? vom> Consider the transactions below and answer the questions 17, 18 and 19. Transaction-id items 1 ABCE 2 ABDE 3 BCDE 4 BDE 5 ABD 6 BEC 7 BAE 8 CBE 9 BE 0 CE 17) What isthe support ofthe itemset (B,C.E}? ‘A 20% B 30% 40% D 50% E 60% 18) The length of the possible largest frequent itemSet is A 2 BS c 4 DoS EB 6 19), Which ofthe following rues has the highest confidence? AREAS BD B RUB DE © RECS BE DRED SAE BREED AB 20) Which ofthe following ate strategies for data tansformation? A Smoothing B Attribute Construction Aggregation D_— Allof the aboveName: ID Number: Section A - Multiple Choice Questions (Each question has only one answer) 1) (A) (8) (©) (©) 2 (A) (8) () 6) OE) 3) A) (8) () 6) OE) 4 ~) () (©) © 5) (A) (8) (C) (OE) 6 (A) (8) (Cc) (OE) n A (@) (©) © € 8) (A) (8) (Cc) (0) (BE) 9) (A) (8) (C) (BD) (E) 10) (A) (B) (Cc) (BD) (E) 11) (A) (8) (C) (DE) 12) (A) (8) (C) (DB) (ED 13) (A) (B) (C) (D) (ED 14) (A) (8) (C) (D) ED 15) (A) (8) (C) (DB) (ED 16) (A) (8) (C) (DB) (EY 17) (4) (8) (C) OE) 18) (A) (B) (Cc) (0) (E) 19) (A) (B) (Cc) (0) (E) 20) (A) (B) (C) (0) (E)Name: ID Number: Section B Short Answers and Calculations (60 Marks) Question 21: Frequent Itemset_and Sequence Mining (20 marks) (2) Explain the diference betwean the following [2 marks] (Frequent mst (i) Candidate temset (©) Given are the following five transactions on items (A,B, C,D, K) 7. Teme 100 (A.B.K] 200 (A.B 300, (A.D. 40 (C.D) 00 ick ‘0 (A.D. Use the Aprior algorithm to compute all frequent itemsets, and their support, with minimum support of 33.34%, Its important that you clearly indicate the steps ofthe algorithm. [8 marks] (©) Which of the itemsets from b) are closed? Which ofthe temsets from b) are maximal? [2 marks} Consider the following frequent 3-sequences and answer the questions (d) ~ (0). <{1,.2,3) >.< (1,2}13) >, < (112, 3} >< {1,2} 44) >< (1,3) (4) >, < 11,2, 4) >, < (2, 3}13) >, < £2, 3)44} > < {2} 43) (3) >, and < {2} (3) 44) > (4) Listall the candidate 4-soquences produced by the candidate generation step ofthe ‘Generalized Sequential Pattern (GSP) algorithm. [3 marks] (6) Listall the candidate 4-soquences pruned during the candidate pruning step of the GSP algorithm (assuming no timing constrains), (2 marks) (9 Listall the candidate 4-sequences pruned during the candidate pruning step of the GSP algorithm (assuming maxgap = 1). [3 marks)Nam ID Number ‘Question 22: Classification Techniques (20 marks) (@) Describe principles and ideas ofthe decision tree-based classification [4 marks] (©) Derive all possible rules from the decision tree below and write down a set of classification rales [4 mars] Refund Yes, nS Marital (single, | Status Divorced} (Married) Taxable ee Income (6) Whats. confusion matrix? (11 mark] Using the following tet set evaluate the above model, Create the confusion matrix and caleulat the classification accuracy and eror rate (7 marks) [Refund Mavi Staus —[ Taxable Income No Divered 75000" Yes Single ‘$90. No. Dyssad ‘ro0000 Yer Mare 000 No Single 35000 Ne: Mate $5000 (@) A datase of 1000 cases was partitioned into a training st of 600 cases and a validation set of ‘400 cases, A K-Nearest Neighbours model with k-I had a misclassifcation err rate oF 8% ‘on the validation data. It was subsequently found thatthe partioning had been done incorrectly and that 100 eases from the taining data set had been aceidentally duplested and had overritten 100 cases in the validation dataset. What isthe mislassifcaion error rate for ‘the 300 cases that were truly part ofthe validation data? [4 marks}Name: ID Number: Question 23: Cluster Analysis (20_ marks) (2) List and briefly describe the following three approaches for clustering. [3 marks] (i) Partitioning Methods (ii) Hierarchical Methods (iil) Density-Based Methods () List at least six requirements of clustering in data mining. (3 marks] (@)K-Means Clustering ~ 8 marks ‘Suppose you want to cluster the eight points shown below using K-means. [ar_[ar 2 [0 2s sie sie 715 o4 1? rm ‘Assume that k = 3 and that initially the points are assigned to clusters as follows: Cl = (xl, x2, x3}, C2 = {x4, x5, x6), C3 = (X7, x8}, Apply the k-means algorithm until convergence (i., until the clusters do not change), using the Manhattan distance. Make sure you clearly identify the final clustering and show your steps. Give the value of the k-means error function after ‘convergence. [8 marks] (6) Hierarchical Clustering - 6 marks Describe the principles and ideas regarding Agglomorative Hierarchical (Clustering. Show the different steps of the algorithm using the dissimilarity matrix below and complete link clustering. Give partial results after each step. (6 marks}Name: ID Number: Section C Answer only ONE Question (20 Marks) uestion 24: General Data Mi Issues (20 marks 8) Why do we pre-process data? Briefly desribe the processes involved in data pre-processing. [4marks] 'b) Explain the difference between classification and prediction. Illustrate the difference using examples [2 marks] €) Briefly outline how to compute the dissimilarity between objects described by the following types of variables (Numerical (ntrva-scaled) variables [2 marks] i) Asymmetric binary variables [2 marks} (Gi) Categorical variables [2 marks} 4) Given the following measurements forthe variable age: 18; 22; 25; 42528; 43; 33; 35; 56,28; standardize the variable by the following: (Compute the mean absolute deviation of age[4 marks] ii) Compute the z-score forthe frst four measurements: [4 marks)Name: ID Number: 1» 25 Data Mining Applications and Big Data (20 marks) a) Data Mining Applications ~ 6 marks ») Data mining applications can be found in five areas namely: financial data analysis (FDA), ‘etal and tclecommunication industries (RTI); science and engineering (SE); intrusion ‘detection and prevention (IDP); and recommender systems (RS). ‘Choose ONLY one ftom the five areas named above and describe how the aplication will ‘work and the benefit(s) of such an application (6 marks} Privacy and Data Mining — 6 marks ‘Vinay banks with the Bank of the South Pacific (BSP) which has a robust data mining system, The bank's data mining team has been studying Vinay's bank eaed usage patterns, “They notice that reently he has made numerous payments at Vinod Patel Hardware Stores. Based on their data analysis, the bank then decided to contact him to discuss thei special loans package for home renovations, (Discuss how this may conflict with your righ o privacy. [2 marks] i) Deserbe a privacy-preserving data mining method tht may allow the bank to perform customer patter analysis without infringing on customers’ right to privacy. [2 marks) (ii) Describe an example where data mining could be used to help society, [1 mark] Gv) Explain how data mining may be detrimental to society [1 mark] ©) Big Data Analytics ~ 8 marks (i) Whats ig Data in simple terms? [1 mark] (it How can big data be desrived? [1 mark] (i) Diseuss Some Key enabiers for big data [2 marks} (je) Explain the ference between structured data and unstructured data. Give examples, (2 ‘marks} (0) What ae the special requirements for data mining procedures when? handling big ata? [2 marks} End of Paper {Mond this exemination booke! ond the answer Sco To your supeniso when you complete ‘he examination]
You might also like
Data Science Questions and Answers - Clustering
PDF
No ratings yet
Data Science Questions and Answers - Clustering
4 pages
Basic Statistical Descriptions of Data: Dr. Amiya Ranjan Panda
PDF
No ratings yet
Basic Statistical Descriptions of Data: Dr. Amiya Ranjan Panda
35 pages
Answer Midterm Exam Data Mining1 2021 - 2022
PDF
100% (2)
Answer Midterm Exam Data Mining1 2021 - 2022
4 pages
Kruskal Wallis With R
PDF
No ratings yet
Kruskal Wallis With R
4 pages
Ilovepdf Merged
PDF
No ratings yet
Ilovepdf Merged
13 pages
Data Mining
PDF
No ratings yet
Data Mining
6 pages
IS421 Exam
PDF
No ratings yet
IS421 Exam
8 pages
Foundations of Data Science
PDF
No ratings yet
Foundations of Data Science
4 pages
DAA Question Bank
PDF
No ratings yet
DAA Question Bank
10 pages
K-Means Clustering Algorithm With Numerical Example
PDF
No ratings yet
K-Means Clustering Algorithm With Numerical Example
11 pages
IS328 Data Mining-Tutorial 1 Solution
PDF
No ratings yet
IS328 Data Mining-Tutorial 1 Solution
5 pages
Notes - EDA-Unit1
PDF
No ratings yet
Notes - EDA-Unit1
34 pages
Data Mining and Warehousing
PDF
100% (3)
Data Mining and Warehousing
30 pages
Lecture 3 Data Mining
PDF
No ratings yet
Lecture 3 Data Mining
30 pages
Bahria University, Islamabad Campus: Department of Computer Science
PDF
No ratings yet
Bahria University, Islamabad Campus: Department of Computer Science
3 pages
CH 6
PDF
No ratings yet
CH 6
72 pages
Final Exam Paper Fall 2020
PDF
No ratings yet
Final Exam Paper Fall 2020
3 pages
Unsupervised Learning 2024-PPG
PDF
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
ER Practical 7r
PDF
No ratings yet
ER Practical 7r
5 pages
MTech (DS) Sem-II Data Mining and Predictive Analytics - Out
PDF
No ratings yet
MTech (DS) Sem-II Data Mining and Predictive Analytics - Out
2 pages
Data Mining Exam
PDF
No ratings yet
Data Mining Exam
14 pages
Assign 1
PDF
No ratings yet
Assign 1
1 page
Data Structures and Algorithms: Assignment 1
PDF
No ratings yet
Data Structures and Algorithms: Assignment 1
4 pages
Quiz M2
PDF
100% (1)
Quiz M2
7 pages
Exercises 695 Clas
PDF
No ratings yet
Exercises 695 Clas
3 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
PDF
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Data Mining: Concepts and Techniques: - Introduction
PDF
No ratings yet
Data Mining: Concepts and Techniques: - Introduction
44 pages
Machine Learning CA 2
PDF
No ratings yet
Machine Learning CA 2
19 pages
Exam 2003
PDF
No ratings yet
Exam 2003
21 pages
Decision Tables Exercises
PDF
100% (1)
Decision Tables Exercises
3 pages
Exam DUT 070816 Ans
PDF
No ratings yet
Exam DUT 070816 Ans
5 pages
Pycryptodome Master
PDF
100% (1)
Pycryptodome Master
82 pages
Instructor Support
PDF
100% (1)
Instructor Support
150 pages
Association Rules FP Growth
PDF
No ratings yet
Association Rules FP Growth
32 pages
Unsupervised Learning
PDF
No ratings yet
Unsupervised Learning
24 pages
Data Mining Exercises - Solutions
PDF
No ratings yet
Data Mining Exercises - Solutions
5 pages
Programming Assign. Unit 6
PDF
No ratings yet
Programming Assign. Unit 6
3 pages
Tutorial 11 Answers
PDF
No ratings yet
Tutorial 11 Answers
4 pages
Data Mining Worksheet One
PDF
No ratings yet
Data Mining Worksheet One
2 pages
The Design and Implementation of Host-Based Intrusion Detection System
PDF
100% (1)
The Design and Implementation of Host-Based Intrusion Detection System
4 pages
Data Preprocessing: L1+ Freq
PDF
No ratings yet
Data Preprocessing: L1+ Freq
13 pages
Week-1 Assessment-1 Answers
PDF
No ratings yet
Week-1 Assessment-1 Answers
3 pages
12.10 Create An ER Diagram For Each of The Following Descriptions
PDF
No ratings yet
12.10 Create An ER Diagram For Each of The Following Descriptions
4 pages
GT Pract Sem 5
PDF
No ratings yet
GT Pract Sem 5
19 pages
Ain Shams University Faculty of Engineering
PDF
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Memory Based Reasoning - BIA
PDF
100% (1)
Memory Based Reasoning - BIA
19 pages
08250771
PDF
No ratings yet
08250771
8 pages
Data Mining Unit 1
PDF
No ratings yet
Data Mining Unit 1
91 pages
4A. Decision Trees
PDF
No ratings yet
4A. Decision Trees
35 pages
Data Mining-Rule Based Classification
PDF
No ratings yet
Data Mining-Rule Based Classification
4 pages
Course Plan Natural Language Processing
PDF
No ratings yet
Course Plan Natural Language Processing
5 pages
1.write A Program in Prolog To Show The Sum of N Natural Numbers. Code
PDF
No ratings yet
1.write A Program in Prolog To Show The Sum of N Natural Numbers. Code
2 pages
Distributed Databases: Solutions To Practice Exercises
PDF
No ratings yet
Distributed Databases: Solutions To Practice Exercises
4 pages
MG101 Exam Print
PDF
No ratings yet
MG101 Exam Print
15 pages
Data Warehousing, OLAP, Data Mining Practice Questions Solutions
PDF
No ratings yet
Data Warehousing, OLAP, Data Mining Practice Questions Solutions
4 pages
Sample Final AI
PDF
No ratings yet
Sample Final AI
9 pages
Certainty Factor
PDF
100% (2)
Certainty Factor
41 pages
Machine Learning Assignment
PDF
No ratings yet
Machine Learning Assignment
5 pages
CS311 Exam
PDF
No ratings yet
CS311 Exam
16 pages
Outline: Problem Statement Definitions & Examples Strategies
PDF
No ratings yet
Outline: Problem Statement Definitions & Examples Strategies
7 pages
CS211 Exam
PDF
No ratings yet
CS211 Exam
10 pages
T The Un Niversit Tyofth He Sou Uth Pa Cific: SC Chool O of Educ Cation
PDF
No ratings yet
T The Un Niversit Tyofth He Sou Uth Pa Cific: SC Chool O of Educ Cation
6 pages
Planning: Russell and Norvig
PDF
No ratings yet
Planning: Russell and Norvig
33 pages
MG101 Exam F2F
PDF
No ratings yet
MG101 Exam F2F
9 pages
CS310 Exam
PDF
No ratings yet
CS310 Exam
24 pages
CS111 Exam
PDF
No ratings yet
CS111 Exam
18 pages
The University of The South Pacific: School of Accounting and Finance
PDF
No ratings yet
The University of The South Pacific: School of Accounting and Finance
7 pages
CH102 Principles and Reactions in Organic Chemistry: Fste School of Biological and Chemical Sciences
PDF
No ratings yet
CH102 Principles and Reactions in Organic Chemistry: Fste School of Biological and Chemical Sciences
13 pages
Unit 1: Daa Two Mark Question and Answer 1
PDF
No ratings yet
Unit 1: Daa Two Mark Question and Answer 1
22 pages
Bundding 2
PDF
No ratings yet
Bundding 2
4 pages
The University of The South Pacific
PDF
No ratings yet
The University of The South Pacific
6 pages
CS341 Software Quality Assurance and Testing: Final Examination Semester 1 2017
PDF
No ratings yet
CS341 Software Quality Assurance and Testing: Final Examination Semester 1 2017
10 pages
CH312 Exam
PDF
No ratings yet
CH312 Exam
8 pages
CH312 Exam
PDF
No ratings yet
CH312 Exam
9 pages
The University of The South Pacific: Faculty of Business & Economics
PDF
No ratings yet
The University of The South Pacific: Faculty of Business & Economics
9 pages
BIF02 Exam
PDF
No ratings yet
BIF02 Exam
21 pages
MG204 Exam
PDF
No ratings yet
MG204 Exam
5 pages
AG373 Exam
PDF
100% (1)
AG373 Exam
4 pages
The University of The South Pacific
PDF
No ratings yet
The University of The South Pacific
18 pages
CH405 Biochemistry: Fste School of Biological and Chemical Sciences
PDF
No ratings yet
CH405 Biochemistry: Fste School of Biological and Chemical Sciences
4 pages
CH405 Exam
PDF
No ratings yet
CH405 Exam
7 pages
CH414 Exam
PDF
No ratings yet
CH414 Exam
16 pages
CH312
PDF
No ratings yet
CH312
8 pages
CH306 Exam
PDF
No ratings yet
CH306 Exam
9 pages
The University of The South Pacific: Chemistry Division
PDF
No ratings yet
The University of The South Pacific: Chemistry Division
10 pages
The University South Pacific: School of Accounting and Finance
PDF
No ratings yet
The University South Pacific: School of Accounting and Finance
9 pages
Course Code: MG202 Course Title: Operations Management
PDF
No ratings yet
Course Code: MG202 Course Title: Operations Management
7 pages
The University of The South Pacific
PDF
No ratings yet
The University of The South Pacific
6 pages
MG201 Exam
PDF
No ratings yet
MG201 Exam
4 pages
Lab Week 8: A. AON Network Digram
PDF
No ratings yet
Lab Week 8: A. AON Network Digram
2 pages