0% found this document useful (0 votes)
82 views8 pages

ML SP24 Mid Term Exam - Solution

The document outlines the Mid-Term Examination for the Machine Learning course at COMSATS University Islamabad, detailing the exam structure, questions, and topics covered. It includes various tasks related to Market Basket Analysis, Decision Trees, Performance Evaluation Metrics, and algorithms like E-Clat and Naïve Bayes. Additionally, it describes a semester project focused on Financial Fraud Detection using machine learning techniques.

Uploaded by

Hanzala Shafique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views8 pages

ML SP24 Mid Term Exam - Solution

The document outlines the Mid-Term Examination for the Machine Learning course at COMSATS University Islamabad, detailing the exam structure, questions, and topics covered. It includes various tasks related to Market Basket Analysis, Decision Trees, Performance Evaluation Metrics, and algorithms like E-Clat and Naïve Bayes. Additionally, it describes a semester project focused on Financial Fraud Detection using machine learning techniques.

Uploaded by

Hanzala Shafique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

COMSATS University Islamabad, Wah Campus

Mid-Term Examination Spring 2024


Department of Computer Science

Program(s)/Classes: BCS 6 A Date: 20th April 2024


Subject: Machine Learning (CSC354) Maximum Marks: 50 Marks
Instructor Name(s): Prof. Dr Sheraz Anjum Total Time Allowed: 1.5 hr

Note: Solve the questions on separately provided Answer Sheet. Attempt all questions.
Question No. 1 (CLO-1(SO-1)) (03+04+06 = 13 Marks)
a- Please fill in the table below with information relevant to Market Basket Analysis.
Rules Confidence Support
0.4
{Mango,Plum} -> Dates 1
0.4
{Plum} -> {Mango, Dates} 0.67

{Apricot} -> {Pear, Plum} 0 0

b- The FP-Growth tree for a dataset has been computed and depicted below, along with the
frequent item set table. Determine the association rules between frequent items using
confidence, considering a threshold of over 75%.

• Confidence of (B→E) = S(B∩E) / S(B) = 3/4 =0.75


• Confidence of (E→B) = S(E∩B) / S(E) = 3/3 =1.0
• Confidence of (J→C) = S(J∩C) / S(J) = 3/4 = 0.75
• Confidence of (C→J) = S(C∩J) / S(C) = 3/3 =1
• Confidence of (B→J) = S(B∩J) / S(B) =3/4 = 0.75
• Confidence of (J→B) = S(J∩B) / S(J) = 3/4 = 0.75

Page 1 of 8
c- Apply the E-Clat algorithm to the following dataset with a minimum support = 02.

Page 2 of 8
Question No. 2 (CLO-2(SO-2,4)) (05+05+05+02 = 17 Marks)

a- Decision Tree: Imagine you're conducting a wildlife survey in a particular region, and you've
identified four main animal species: dogs, cats, monkeys, and sparrows. After careful
observation, you've recorded the following occurrences: 12 dogs, 18 cats, Zero monkeys,
and 5 sparrows. Suppose you are constructing a Decision Tree using the ID3 algorithm to
analyze the diversity and distribution of these species. Compute both the Gini index and
entropy for each class label.

Page 3 of 8
b- Performance Evaluation Metrics: You are a student conducting a study on a medical
diagnostic model designed to predict whether a patient has a rare disease based on certain
test results. The model has been tested on 200 patient cases. Out of 50 patients who had the
disease, the model correctly identified 40 of them as positive. Out of the 150 patients who
did not have the disease, the model correctly identified 120 of them as negative. Construct
the Confusion Matrix according to the above scenario and calculate Accuracy and F1-
Score.

Page 4 of 8
c- Underfitting & Overfitting: Briefly define “sweet spot” and complete the following table.
Sweet Spot:
The "sweet spot" refers to an optimal balance between two competing factors or variables,
where an ideal outcome or performance is achieved. In machine learning, the sweet spot
often refers to finding the right balance between bias and variance in a model to achieve
optimal predictive performance.
Bias Variance Fit Type Example

Low High Over Fit Model fits the training data too closely,
capturing noise.

Low Low Best Fit / Ideal Model captures the underlying patterns
Fit perfectly.

High Low Under Fit Model fails to capture the underlying


patterns in the data.

d- Cross Validation: Mention any Two differences between Hold-out method and K-fold
method along with an example.

Page 5 of 8
Question No. 3 (CLO-2(SO-2,4)) (05+10 = 15 Marks)

a- Linear Regression: Emma recently bought a new car and decided to track the gallons of
gas she used on five of her business trips. She seeks assistance in predicting the consumed
gallons of gas. So, calculate the consumed gallons using the equation ŷ = -2125 + 1.08x for
2005, 2006, 2007, 2009, and 2013.

Page 6 of 8
b- Naïve Bayes is a probabilistic machine learning algorithm that can be used in a wide variety
of classification tasks. Consider the following dataset and predict whether a person named
AMISH can buy a computer or not. The attribute for AMISH stability is given below.

AMISH (X) = (age > 40 , income = medium, student = no, credit_rating = fair)

P(Ci ):
P(buys_computer = “yes”) = 9/14 = 0.643

P(buys_computer = “no”) = 5/14= 0.357

Compute P(X|Ci) for each class


P(age = “> 40” | buys_computer = “yes”) = 3/9 = 0.33

P(age = “> 40” | buys_computer = “no”) = 2/5 = 0.4

Page 7 of 8
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444

P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4

P(student = “no” | buys_computer = “yes) = 2/9 = 0.22

P(student = “no” | buys_computer = “no”) = 4/5 = 0.8

P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667

P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) :
P(X|buys_computer = “yes”) = 0.33 x 0.44 x 0.22 x 0.66 = 0.021

P(X|buys_computer = “no”) = 0.4 x 0.4 x 0.8 x 0.4 = 0.051

P(X|Ci)*P(Ci) :
P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.021 * 0.643 = 0.0135

P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.051 * 0.357 = 0.0182

Therefore, X belongs to class (“buys_computer = NO”)

Question No. 4 (CLO-5(SO-2,3,4,5)) (05 Marks)

• What is the name of your semester project along with a brief description of the main aim
and idea of your proposed semester project?
Financial Fraud Detection
Financial fraud poses a significant threat to individuals and institutions, necessitating
robust and adaptive solutions for detection and prevention. This semester project aims to
develop a comprehensive Financial Fraud Detection System using machine learning
techniques.
• Which dataset do you plan to explore? (mention the main web source of your dataset)
Synthetic Financial Datasets For Fraud Detection
From Kaggle
• What algorithms do you intend to apply for your project, and what platform
(language/library/tool) will you use?
SVM, Decision Trees
Python: Libraries: scikit-learn, TensorFlow, PyTorch, Keras, XGBoost, LightGBM
Tools: Jupyter Notebook, Pandas, NumPy

Page 8 of 8

You might also like