0% found this document useful (0 votes)
19 views8 pages

IS421 Exam

Uploaded by

Shikha Nand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views8 pages

IS421 Exam

Uploaded by

Shikha Nand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IS421: Knowledge Discovery in Databases

School of Computing, Information and Mathematical Sciences

Final Examination
Semester 1, 2017

F2F Mode

Duration of Exam: 3 hours + 10 minutes

Reading Time: 10 minutes

Writing Time: 3 hours

Instructions:

1. This exam has two sections:


a. Section A – 7 questions (30 marks)
b. Section B – 5 questions (70 marks)
2. Answer ALL questions in the two sections.
3. The exam is worth 50% of the overall course mark. Students must score a
minimum of 40 marks in this exam to pass the course.
4. There are a total of 8 pages (including the cover page) of this exam questions
booklet.
5. This is a CLOSED book exam.
6. No other materials are allowed into the exam room.
7. A non-programmable calculator may be used during the exam.
Section A – Short Answers (30 marks)

Write your answers in the Answer Book provided.

1. Discuss in your own words along with examples the four factors that enhance data
quality. (4 marks)

2. Outline three methods for cleaning data. (3 marks)

3. You have been hired as a data analyst for the Tappoos Fiji Ltd. Upon examining the
price of certain items sold at their Duty Free Department you realized that the data
needs preprocessing. Illustrate three binning methods that you will use to smooth the
data using the following prices: 15, 21, 8, 4, 21, 25, 24, 34, 28 (6 marks)

4. Calculate the z-score normalization for the attribute income of $39,500. The mean and
standard deviation values are $30,000 and $8,000, respectively. Elaborate on the
significance of the z-score and when it is most suitable to be used. (4 marks)

5. Explain the four major features of data warehouse. (4 marks)

6. Discuss four benefits of using information from data warehouses. (4 marks)

7. Compare and contrast online transaction processing (OLTP) systems and online
analytical processing (OLAP) systems. (5 marks)

2
Section B (70 marks)

Question 8 KDD Process [14 marks]


Knowledge discovery in databases is the process of identifying hidden knowledge buried in
the huge volumes of data that have been created and stored.

a) Elaborate on the steps involved in the KDD process. (5 marks)

Figure 1

b) Figure 1 illustrates the application of a data mining technique in the field of medicine.
Identify the data mining technique and justify your choice. (4 marks)

c) Uncovering fraudulent use of credit cards can be detected using a data mining
functionality. Discuss the functionality and how this is done. (3 marks)

d) Examine Figure 2 and discuss the data mining technique applied to derived new
knowledge. (2 marks)

Figure 2
3
Question 9 Association Rule [14 marks]
Study the data provided below and answer the questions that follow.

Customer Items purchased


1 Orange juice, potato chips
2 Milk, orange juice, window cleaner
3 Orange juice, washing detergent
4 Orange juice, washing detergent, potato chips
5 Window cleaner, potato chips

a) Calculate the confidence score for all two items purchased by customers (5 marks)

b) Using above 50% as the threshold figure for the confidence score, which 2 items
purchased together. (3 marks)

c) Determine which item is never purchased with potato chips or washing detergent.
(2 marks)

d) Discuss the apriori algorithm and its significance in KDD. Use an example to support
your discussion. (4 marks)

4
Question 10 Cluster Analysis [16 marks]

Use Figure 3 shown below to answer the questions that follow.

Figure 3

The distance function is Euclidean distance and points A1, B1, and C1 are initially assigned
as the center of each cluster, respectively. Use the k-means algorithm to:

a) show the three cluster centers after the first cycle. (4 marks)

b) determine the final number of clusters in the data set by showing all the calculations
required to arrive at your answer. (12 marks)

5
Question 11 Classification [14 marks]

Use Table 1 to answer the questions that follow.


Outlook Temperature Humidity Windy Play?
sunny hot high false No
sunny hot high true No
overcast hot high false Yes
rain mild high false Yes
rain cool normal false Yes
rain cool normal false No
overcast cool normal true Yes
sunny mild high false No
sunny cool normal false Yes
rain mild normal false Yes
sunny mild normal false Yes
overcast mild high true Yes
overcast hot normal false Yes
rain mild high True No

Table 1

a) Develop a decision tree for “Play”. (4 marks)

b) Calculate the information gain for the attributes: outlook, temperature, windy, and
humidity. (8 marks)

c) Identify the best attribute and justify your choice. (2 marks)

6
Question 12 Data Cube [12 marks]

a) Discuss the use of data cube technology in data mining. (2 marks)

b) You have been hired by the Rups Big Bear Company Ltd as a data analyst and you
saw an opportunity to build data cubes to satisfy the marketing department’s request
to analyze all of the sales by products and customers that were made in the 2016
calendar year. List the key steps in building the required data cube named “Sales”
(6 marks)

c) Use the data cube shown in Figures 3 and 4 to answer the questions that follow:

Figure 3

7
Figure 4

i) Identify the customer and the store location that sold the highest number of a
single part. (1 mark)

ii) Determine the part number and store location that sells the highest and lowest
items (3 marks)

THE END.

You might also like