0% found this document useful (0 votes)

109 views11 pages

Data Mining Questions Q&A

This document provides instructions for a 3-question exam assessing skills in data mining tools WEKA, RapidMiner, and SPSS. Question 1 requires using WEKA and SPSS to perform tasks on an absenteeism dataset including feature selection, data visualization, classification/clustering, descriptive statistics, discretization, and comparison of variables. Question 2 involves using RapidMiner to construct a decision tree from a sample purchase computer dataset and addressing issues like missing/outlier handling. Question 3 describes a medical drug sample testing scenario and spreadsheet dataset for analytical deductions.

Uploaded by

aaakandoh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views11 pages

Data Mining Questions Q&A

Uploaded by

aaakandoh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

SESSION: REGULAR DURATION: 2.

5 HOURS

INSTRUCTIONS:
There are five (5) Questions in this Exam with each Question worth a total of 25
Marks. Read the Questions carefully and attempt Question 1 OR Question 2, and any
other Two (2) Questions. In all, you are answering THREE (3) Questions out of the
five provided. You are expected to type your answers to Question 1 OR Question 2
(depending on the one you choose) in MS Word, and save in pdf with your
StudentID as the file name. Note that, answers to your two selected Questions from
Questions 3 to 5 must be written in the Answer Booklet provided.
Submit your saved pdf together with all relevant files by uploading them to the
Moodle LMS via the Exam thread created.
Remember Question 1 OR 2 is mandatory to answer.
The following toolkits are required for answering Questions 1 or 2.
-WEKA
-RapidMiner
-SPSS

Question 1. [25 Marks]

This Question requires the use of WEKA and SPSS toolkits. Consider the Absenteeism
dataset saved as a .txt file with the name absenteeism.txt with the attribute information
provided at Appendix A at the end of Question 5. You are expected to download the dataset
from the link. https://tinyurl.com/wvac2a6z
The dataset was created with records of absenteeism at work from July 2007 to July 2010 at a
courier company in a given country.
You are expected to perform the following tasks:
a) Create both .arff and .sav files from the given absenteeism.txt file. Save your file with
the names absenteeism.arff and absenteeism.sav
You need to make sure you consider all the necessary details needed when saving
your file as .arff i.e., @relation, @attribute and @data before calling it in WEKA.
Again, provide the necessary variable names and details for the .sav file before calling
it in SPSS. You need to submit both .arff and .sav files. [4 marks]

b) You are to call your .arff file in WEKA and consider a relevant feature selection
attribute to select your features bearing in mind the label or target feature as shown in
Appendix
A. Consider using the attribute evaluator and search method functions. Report on
your selected features as well as the feature selection algorithm used. That is provide
the total number of features selected and their respective names. [3 marks]

c) At the preprocess tab in WEKA, select the features reported in (b) using the invert
and remove buttons. Report a data visualization of the selected features together with
the target feature using the visualize all button. [1 mark]

Examiner: Dr Solomon Mensah Page 1 of 6

d) Based on the label feature, identify whether the given dataset can be used for a
classification or clustering problem. [2 marks]

e) With regard to your response in (d), use any suitable classification or clustering
technique to train and validate the dataset in WEKA. Report your result with respect to
significant information. [5 marks]

f) Call the dataset in SPSS and provide a descriptive statistics for all features. A single
table will do for this part. Report on relevant statistics (Mean, mode, median, min,
max, range, standard deviation) based on the features. [3 marks]

g) Out of the selected features, you are to discretize all the continuous features or
variables. You can consider using the recode into different variables function in SPSS.
You are to save and send the updated version of the .sav dataset bearing the recoded
variable names. [2 marks]

h) For each of your discretized variables, provide either a bar chart or pie chat. [2 marks]

i) Using the discretized variables, make any comparison between the target variable and
any of your discretized variables. You can consider using the cross tabulation
functionality in SPSS. Explain your result. [3 marks]

Question 2. [25 Marks]

This Question requires the use of RapidMiner toolkit. Consider the dataset in Table 1 to
be used to train a decision tree. The dataset comprises of the following attributes, namely
age, income, student, credit_rating and buys_computer. The buys_computer attribute is
considered as the dependent variable and the remaining attributes considered as the
independent variables. Imagine you are asked to setup a decision tree for training the
dataset, briefly explain how you will address the following issues:
a) How many features are in the dataset presented in the table below? [1 mark]
Answer: There are five(5) in the dataset
b) How many tuples are in the dataset? [1 mark]
Answer: There are fourteen (14) tuples in the dataset
c) Which feature of the dataset makes it suitable for considering a supervised
learning algorithm such as decision tree? [1 mark]
Answer: Buy Computer, as it is considered as the dependent variable that is
labelled.
d) Aside of the decision tree, list any two supervised learning algorithms that can
also be used for training the dataset. [1 mark]
Answer: random forest, linear regression, logistic regression, neural network,
support vector machine.
e) Out of the categorical variables, list two (2) dichotomous variables. [1 mark]
Answer: Student, Credit_Rating and Buy_computer.
f) Compute the information gain for each of the independent attributes. [5 marks]
g) With reference to the information gains computed in (f), determine which attribute
can be considered as the root node for the decision tree. [1 mark]
Answer: Highest info gain = Age = 0.2465
h) Complete the construction of the decision tree showing how you arrived at the tree.
Examiner: Dr Solomon Mensah Page 2 of 6
[5 marks]
i) Assume there were missing values in the dataset, discuss two ways of handling
them. [2 marks]
Answer
 Fill in the missing value manually: tedious + infeasible?
 Fill in it automatically with a global constant, the attribute mean, the most
probable value etc.
j) Assume there were outliers in the dataset, discuss two ways of handling them.
Answer: Remove outliers and smoothen the data statical model.
[2 marks] k) Call the Purchase Computer
dataset (Table 1) in RapidMiner and construct the tree using the various operators.
Report on the step by step procedure you considered in constructing the tree based on
your selected operators in RapidMiner. [5 marks]

Steps in setting up predictive models

1. Extract data (Primary/Secondary)
2. Preprocess extracted data (trimming and log transformation)
3. Feature selection(recommend prior feature)
4. Sample selection (Bellwethers)
5. Training + validation needs
6. Learner (Deep learning)
7. Performance evaluation(Recommend single evaluator
8. Statistical and practical significance (yuen’s test, brunners ANOVA like
test, cliff’s delta effect size.

Examiner: Dr Solomon Mensah Page 3 of 6

Table 1. Purchase Computer Dataset
age income student credit_rating buys_computer
≤30 high no fair no
≤30 high no excellent no
31-40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31-40 low yes excellent yes
≤30 medium no fair no
≤30 low yes fair yes
>40 medium yes fair yes
≤30 medium yes excellent yes
31-40 medium no excellent yes
31-40 high yes fair yes
>40 medium no excellent no

Question 3. [25 Marks]

a) Given a set 100,000 medical drug products each emerging from two (2) different
pharmaceutical companies, namely Company A and Company B to be provided to
patients at a hospital in Tema. Imagine that out of the total products, only a sample
of 300 products were tested by the Food and Drugs Authority (FDA). The FDA
found at least two defective/fake products resulting in discarding the total set of
products. After prior testing done by the two companies on each of the 100,000
drugs, it was found that 0.5% of the products emerging from Company A were
defective and none was defective from the perspective of Company B. As a
research student abreast with Data Mining techniques, you are presented with the
dataset from these two companies on a spreadsheet and need to make analytical
deductions and predictions from it. Assume there are 5 input features located on
A2:E100001 and one target feature located on F2:F100001 on the spreadsheet
respectively.
i Per the information given above about the medical products, which type of
classification algorithm will you use to perform your mining - supervised
or unsupervised classification? Provide a reason. [2 marks]
Answer: For this problem you will use Supervised because you have
labelled data (defective or not) from the FDA testing, which will allow
us to train the model to predict the target feature based on the input
features.

ii Mention any four algorithms you can use per your recommended type of
classification in (i) above. [2 marks]

Answer: random forest, logistic regression, neural network, support

vector machine.

iii. Explain any three major tasks you will undertake during preprocessing of
the data. [3 marks]
Answer
• Data cleaning: Fill in missing values, smooth noisy data, identify or remove
Examiner: Dr Solomon Mensah Page 4 of 6
outliers, and resolve inconsistencies
• Data integration: Integration of multiple databases, data cubes, or files
• Data transformation: Normalization and aggregation
• Data reduction: Obtains reduced representation in volume but produces the
same or similar analytical results
• Data discretization: Part of data reduction but with particular importance,
especially for numerical data

iv. Explain how you will normalize your dataset with any suitable
normalization technique. [2 marks]
• Normalization: scaled to fall within a small, specified range
– min-max normalization
– z-score normalization
– normalization by decimal scaling

b. Explain how you will separate the dataset into the right percentages or
partitions before subjecting it to your chosen algorithm. [3 marks]
Answer: 70 for training and 30 testing Or 80 for training and 20 for
testing.

c. Do you think prediction or forecasting can be made from your chosen

model implemented from the algorithm used? If yes, how can prediction be made
for new input values. [3 marks] Yes, using regression.

Examiner: Dr Solomon Mensah Page 5 of 6

d. Give with valid evidence the type of probability model used to subject
the 300 sampled products to test.[2 marks]
Answer: Naïve Bayesian Distribution

v. Consider the following set of frequent 3-itemsets

{1,2,3}, {1,2,4}, {1,3,4}, {1,3,5}, {2,3,4} i List all candidate 4-itemsets obtained
using the candidate generation step of the Apriori algorithm. [4 marks]
ii List all candidate 4-itemsets that survive the candidate pruning step of the
Apriori algorithm before support counting. [4 marks]

Binomial Distribution:
The binomial distribution is commonly used when conducting tests involving
binary outcomes (e.g., defective or non-defective, fake or genuine).
In this scenario, the FDA is testing a sample of products to identify defects, which
is a binary outcome (defective or non-defective).
The binomial distribution describes the number of successes (defective products)
in a fixed number of independent Bernoulli trials (testing each product).
Hypothesis Testing for Proportions:

The FDA may have formulated hypotheses about the proportion of defective
products in the entire population.
They would then collect a sample of products and test whether the proportion of
defective products in the sample differs significantly from a specified value (e.g., the
proportion of defective products from Company A).
Evidence:

The scenario mentions that the FDA found "at least two defective/fake products"
in the sample of 300 products. This suggests that the FDA was interested in
determining the proportion of defective products in the sample.
The use of a binomial test or hypothesis testing for proportions aligns with the
objective of identifying defective products in a sample through statistical inference.
In summary, based on the scenario and the objective of testing the sampled
products for defects, it is likely that the FDA used a probability model such as a
binomial test or a hypothesis test for proportions. These models are commonly
employed when dealing with binary outcomes and testing hypotheses about
proportions in a population.

Question 4. [25 Marks]

Consider a dataset, namely weather with four input features – outlook,
temperature, humidity and windy. The target for the given dataset is play which is
a dichotomous variable with labels yes and no. The dataset has 14 instances with
Examiner: Dr Solomon Mensah Page 6 of 6
the target variable having 9 instances in the yes class and 5 instances in the no
class.
In the attempt of setting up a classification model for the given dataset, two main
classification algorithms, namely Naïve Bayes and Logistic Regression were set
up in WEKA and their outputs are given below in Fig. 1 and Fig. 2 respectively.

a) Comparing the two outputs from LHS and RHS above, which model will you
recommend as optimal for classification of the given dataset. [4 marks]
Ans: Naïve Bayesian model

b) Justify your answer for the best model in (a) above with valid reasons based on the
outputs presented. [7 marks]
Ans: Performance or evaluation metrics are higher in Naïve Bayesian model
than Logistic regression. For instance the weighted average recall of the
Naïve Bayesian is 0.643 which is greater than logistic regression with a value
of 0.571.

c) Explain the Confusion Matrix for your model selected in (a). [10 marks]
For Naïve Bayesian model, the TP is 8, TN is 1, FP = 4 and FN = 1
TP = 8 FN = 1
FP = 4 TN = 1

d) Imagine the yes class has 4 instances instead of 9 and the no class has 10 instances
instead of 5, which technique can be considered to increase the success (yes)
instances while maintaining the failure (no) instances. [4 marks]
ANS: SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE/
OVERSAMPLING

Examiner: Dr Solomon Mensah Page 7 of 6

Fig. 1 Fig. 2
e)
f)
g)
h)
i)
j)
k)
l)
m)
n)

Question 5. [25 Marks]

A. Assume that the support vector machine (SVM) classifier is applied on a given dataset
and the output from the classifier benchmarked against the actual labels of the dataset
is depicted in the following table:
Actual Label Y Y Y N N N N Y N N N
SVM Output Y Y N Y N Y N Y Y Y Y

a) Provide a general overview of the confusion matrix in a tabular form showing the true
positives (TP), true negatives (TN), false positives (FP) and false negatives (FN).

ANS: CONFUSION MATRIX

TP FN
FP TN

[4 marks]

b) Create a confusion matrix in a tabular form for the output from the classifier and the
actual dataset labels. [7 marks]
ANS:
TP=3 FN=1
FP= 5 TN=2

c) From the confusion matrix in (b), compute the following performance measures by
showing the step-by-step procedure involved in arriving at your results.
i. Accuracy = TP+TN/ total [2 marks]
3+2/(5+3+1+2) = 5/11
Examiner: Dr Solomon Mensah Page 8 of 6
ii. Precision = TP/TP+FP [2 marks]
3/3+5 = 3/8
iii. Recall = TP/ TP+ FN [2 marks]
3/3+1 = 3/4
iv. F-measure = 2 * [2 marks]
(precision*recall)/(precison+recall) =
3/4

B. Consider the following set of one-dimensional points: {6, 12, 18, 24, 30, 42, 48}.
For each of the following sets of initial centroids
a) {18, 45} [3 marks]
b) {15, 40} [3 marks]
create two clusters by assigning each point to the nearest centroid, and then calculate
the sum squared error for each set of two clusters after updating the centroids.

Answer: a) Cluster 1{6,12,18,24,30} Cluster 2{42,48}

New Centroid 1: 6+12+18+24+30/5 = 13.2
New Centroid 2: 42+48/2 = 45

Answer b) Cluster 1{6,12,18,24} Cluster 2{30,42,48}

New Centroid 1: 6+12+18+24/4 = 15
New Centroid 2: 30+42+48/3 = 40

TAKE HOME

Question 6:

Answer B:
Both sets of initial centroids already seem to be located close to the centers of their
respective clusters. Additionally, the clusters appear to be well-separated. Therefore, it is
likely that the K-means algorithm, when applied with these initial centroids, would
converge without any further changes in the cluster assignments. Hence, both sets of
centroids represent stable solutions for this specific dataset.

Answer C:
The output of the function is 1 only when B is 1 and A is 0. Visually, the
points corresponding to the output class 1 form a single line (B=1, A=0), which can be
linearly separated from the points corresponding to output class 0. Therefore, the
function (NOT A) AND B is linearly separable.

The output of the function is 1 only when A=0 and B=1 or A=1 and B=0. Visually, the
points corresponding to the output class 1 form two separate clusters: (A=0, B=1) and
(A=1, B=0). These clusters cannot be separated by a single straight line (hyperplane) in
the input space. Therefore, the function (A XOR B) AND (A OR B) is not linearly
separable.

Examiner: Dr Solomon Mensah Page 9 of 6

Examiner: Dr Solomon Mensah Page 10 of
6
Appendix A: Attribute Information of Absenteeism Dataset
1. Individual identification (ID)
2. Reason for absence (ICD).
Absences attested by the International Code of Diseases (ICD) stratified into 21 categories (I to XXI)
as follows:
I Certain infectious and parasitic
diseases II Neoplasms
III Diseases of the blood and blood-forming organs and certain disorders involving the
immune mechanism
IV Endocrine, nutritional and metabolic diseases
V Mental and behavioural disorders
VI Diseases of the nervous
system VII Diseases of the
eye and adnexa
VIII Diseases of the ear and mastoid process
IX Diseases of the circulatory system
X Diseases of the respiratory
system XI Diseases of the digestive
system
XII Diseases of the skin and subcutaneous tissue
XIII Diseases of the musculoskeletal system and connective
tissue XIV Diseases of the genitourinary system
XV Pregnancy, childbirth and the puerperium
XVI Certain conditions originating in the perinatal period
XVII Congenital malformations, deformations and chromosomal abnormalities
XVIII Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere
classified XIX Injury, poisoning and certain other consequences of external causes
XX External causes of morbidity and mortality
XXI Factors influencing health status and contact with health services.

And 7 categories without (CID) patient follow-up (22), medical consultation (23), blood donation (24),
laboratory examination (25), unjustified absence (26), physiotherapy (27), dental consultation (28).
3. Month of absence
4. Day of the week (Monday (2), Tuesday (3), Wednesday (4), Thursday (5), Friday (6))
5. Seasons
6. Transportation expense
7. Distance from Residence to Work (kilometers)
8. Service time
9. Age
10. Work load Average/day
11. Hit target
12. Disciplinary failure (yes=1; no=0)
13. Education (high school (1), graduate (2), postgraduate (3), master and doctor (4))
14. Son (number of children)
15. Social drinker (yes=1; no=0)
16. Social smoker (yes=1; no=0)
17. Pet (number of pet)
18. Weight
19. Height
20. Body mass index
21. Absenteeism time in hours (target)

Examiner: Dr Solomon Mensah Page 11 of

Xii-Pst Book PDF
0% (1)
Xii-Pst Book PDF
96 pages
Isp565 - Its665 Feb 22
No ratings yet
Isp565 - Its665 Feb 22
17 pages
Problem 1: Linear Regression
54% (13)
Problem 1: Linear Regression
14 pages
MATH 5 - Q1 - Mod1 PDF
78% (49)
MATH 5 - Q1 - Mod1 PDF
25 pages
Machine Learning
100% (2)
Machine Learning
30 pages
VaibhavKumar Extendedproject PDF
100% (2)
VaibhavKumar Extendedproject PDF
10 pages
General Description: ISO 17987/LIN 2.x/SAE J2602 Transceiver
100% (1)
General Description: ISO 17987/LIN 2.x/SAE J2602 Transceiver
24 pages
WQD7005 (Alternative Assessment)
100% (1)
WQD7005 (Alternative Assessment)
4 pages
DSML Problem Statements
No ratings yet
DSML Problem Statements
8 pages
DM Lab Manual IV Cse I Sem
No ratings yet
DM Lab Manual IV Cse I Sem
36 pages
Account Based Analytics Final Spring 2025
No ratings yet
Account Based Analytics Final Spring 2025
2 pages
Work at Height Permit
No ratings yet
Work at Height Permit
1 page
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
DM Lab External Q.P Model
No ratings yet
DM Lab External Q.P Model
6 pages
Employee Performance Analysis
No ratings yet
Employee Performance Analysis
3 pages
Sessional Marks (Theory)
0% (1)
Sessional Marks (Theory)
1 page
Project Data Mining
No ratings yet
Project Data Mining
55 pages
Arch Dam Design - U.S. Army Corps of Engineers-Part A
No ratings yet
Arch Dam Design - U.S. Army Corps of Engineers-Part A
120 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
Multiple Choice Questions (1-5) 1 Tick For Each Correct Answer PDF
No ratings yet
Multiple Choice Questions (1-5) 1 Tick For Each Correct Answer PDF
2 pages
Assignment 1 DA - E Oct 2023 V1-1
No ratings yet
Assignment 1 DA - E Oct 2023 V1-1
3 pages
WQD7005 (Alternative Assessment)
No ratings yet
WQD7005 (Alternative Assessment)
4 pages
Exam
No ratings yet
Exam
3 pages
Eda Fat
No ratings yet
Eda Fat
3 pages
Assignment III
No ratings yet
Assignment III
3 pages
Cookbook - Cuisine of The United Kingdom
No ratings yet
Cookbook - Cuisine of The United Kingdom
4 pages
Id5059 23 2 1
No ratings yet
Id5059 23 2 1
8 pages
BDS 2020-21
No ratings yet
BDS 2020-21
5 pages
BDS 2019-20
No ratings yet
BDS 2019-20
5 pages
BAUDM Assignment2
No ratings yet
BAUDM Assignment2
16 pages
CS-7830 Assignment-2 Questions 2022
No ratings yet
CS-7830 Assignment-2 Questions 2022
4 pages
CSE5ML 2024 SEM2 Assignment 1
No ratings yet
CSE5ML 2024 SEM2 Assignment 1
6 pages
DS4420 Coding Midterm
No ratings yet
DS4420 Coding Midterm
5 pages
FAQ's - FMT Project
No ratings yet
FAQ's - FMT Project
3 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
RIGZONE - How Does Coiled Tubing Work
No ratings yet
RIGZONE - How Does Coiled Tubing Work
2 pages
M818A: Machine Learning and Cyber Security-A
No ratings yet
M818A: Machine Learning and Cyber Security-A
11 pages
YUKI ENDO - FInalexam
No ratings yet
YUKI ENDO - FInalexam
2 pages
2022 Final Exam - All
No ratings yet
2022 Final Exam - All
9 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
No ratings yet
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
11 pages
Ids Final Sol
No ratings yet
Ids Final Sol
16 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
PM Alternate Project
No ratings yet
PM Alternate Project
2 pages
Mid-Sem Model Answer 7
No ratings yet
Mid-Sem Model Answer 7
5 pages
Assignment Report - Predictive Modelling - Rahul Dubey
No ratings yet
Assignment Report - Predictive Modelling - Rahul Dubey
18 pages
HSB1003 - Sample Exam 2023
No ratings yet
HSB1003 - Sample Exam 2023
9 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
CS2B Nov 24 QP
No ratings yet
CS2B Nov 24 QP
5 pages
MBA786M Project
No ratings yet
MBA786M Project
2 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Final Paper MF 450 BA
No ratings yet
Final Paper MF 450 BA
1 page
B. Sc. H Computer S 3OWYH6v
No ratings yet
B. Sc. H Computer S 3OWYH6v
6 pages
Usl 70 Marks Set 1
No ratings yet
Usl 70 Marks Set 1
2 pages
SLC 70 Marks Set 1
No ratings yet
SLC 70 Marks Set 1
3 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
Answer Adm Sample
No ratings yet
Answer Adm Sample
4 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
DM Lab Record PDF
No ratings yet
DM Lab Record PDF
32 pages
Dsa - DK Question Paper
No ratings yet
Dsa - DK Question Paper
4 pages
DM-I Q Paper 2024
No ratings yet
DM-I Q Paper 2024
12 pages
Final Exam BWA44603
No ratings yet
Final Exam BWA44603
4 pages
Chapter 2 Basic Physics of Semiconductors
No ratings yet
Chapter 2 Basic Physics of Semiconductors
42 pages
Lesson Planning in Teaching
No ratings yet
Lesson Planning in Teaching
10 pages
Sample QP For Mid-Semester Exam
No ratings yet
Sample QP For Mid-Semester Exam
5 pages
2024 Fods Ques
No ratings yet
2024 Fods Ques
4 pages
06 - Class 06 - Trade Setups
No ratings yet
06 - Class 06 - Trade Setups
12 pages
EG8145V5 Quick Start 01 (R20C00)
No ratings yet
EG8145V5 Quick Start 01 (R20C00)
16 pages
Updated Resume
No ratings yet
Updated Resume
3 pages
0812 0819BL
No ratings yet
0812 0819BL
15 pages
Research Thesis
No ratings yet
Research Thesis
6 pages
Contracting Activity and Technical Staff Requirements
No ratings yet
Contracting Activity and Technical Staff Requirements
2 pages
Lesson-Plan 1
No ratings yet
Lesson-Plan 1
2 pages
Besongntor Orockakwa
No ratings yet
Besongntor Orockakwa
37 pages
THEONE ? Sentence Improvement Pre 4th Oct Level Up Your English
No ratings yet
THEONE ? Sentence Improvement Pre 4th Oct Level Up Your English
145 pages
Social Psychology Assignment
No ratings yet
Social Psychology Assignment
12 pages
Comparative Media Systems / Sociology of News: Rdb6@nyu - Edu
No ratings yet
Comparative Media Systems / Sociology of News: Rdb6@nyu - Edu
9 pages
Internship Report
No ratings yet
Internship Report
10 pages
CHM2032L Lab Manual 8 Spectrophotometry Yavuz-Petrowski Fall 2021 Tde88JS
No ratings yet
CHM2032L Lab Manual 8 Spectrophotometry Yavuz-Petrowski Fall 2021 Tde88JS
21 pages
Spring Lighting 2013 - HKD1800 Travel Reimbursement
No ratings yet
Spring Lighting 2013 - HKD1800 Travel Reimbursement
1 page
Literature Review Last Edit
No ratings yet
Literature Review Last Edit
11 pages
Satyam Cnlu Torts Roughdraft
No ratings yet
Satyam Cnlu Torts Roughdraft
4 pages
FAC3761 - Exam Prep - Mock Question Paper - Suggested Solution
No ratings yet
FAC3761 - Exam Prep - Mock Question Paper - Suggested Solution
9 pages
What Is Weather in Canada
No ratings yet
What Is Weather in Canada
5 pages
UEFA Euro 2020 Case Study
No ratings yet
UEFA Euro 2020 Case Study
3 pages

Data Mining Questions Q&A

Uploaded by

Data Mining Questions Q&A

Uploaded by

SESSION: REGULAR DURATION: 2.

Question 1. [25 Marks]

Examiner: Dr Solomon Mensah Page 1 of 6

Question 2. [25 Marks]

Steps in setting up predictive models

Examiner: Dr Solomon Mensah Page 3 of 6

Question 3. [25 Marks]

Answer: random forest, logistic regression, neural network, support

c. Do you think prediction or forecasting can be made from your chosen

Examiner: Dr Solomon Mensah Page 5 of 6

v. Consider the following set of frequent 3-itemsets

Question 4. [25 Marks]

Examiner: Dr Solomon Mensah Page 7 of 6

Question 5. [25 Marks]

ANS: CONFUSION MATRIX

Answer: a) Cluster 1{6,12,18,24,30} Cluster 2{42,48}

Answer b) Cluster 1{6,12,18,24} Cluster 2{30,42,48}

Examiner: Dr Solomon Mensah Page 9 of 6

Examiner: Dr Solomon Mensah Page 11 of

You might also like