ML SP24 Mid Term Exam - Solution
ML SP24 Mid Term Exam - Solution
Note: Solve the questions on separately provided Answer Sheet. Attempt all questions.
Question No. 1 (CLO-1(SO-1)) (03+04+06 = 13 Marks)
a- Please fill in the table below with information relevant to Market Basket Analysis.
Rules Confidence Support
0.4
{Mango,Plum} -> Dates 1
0.4
{Plum} -> {Mango, Dates} 0.67
b- The FP-Growth tree for a dataset has been computed and depicted below, along with the
frequent item set table. Determine the association rules between frequent items using
confidence, considering a threshold of over 75%.
Page 1 of 8
c- Apply the E-Clat algorithm to the following dataset with a minimum support = 02.
Page 2 of 8
Question No. 2 (CLO-2(SO-2,4)) (05+05+05+02 = 17 Marks)
a- Decision Tree: Imagine you're conducting a wildlife survey in a particular region, and you've
identified four main animal species: dogs, cats, monkeys, and sparrows. After careful
observation, you've recorded the following occurrences: 12 dogs, 18 cats, Zero monkeys,
and 5 sparrows. Suppose you are constructing a Decision Tree using the ID3 algorithm to
analyze the diversity and distribution of these species. Compute both the Gini index and
entropy for each class label.
Page 3 of 8
b- Performance Evaluation Metrics: You are a student conducting a study on a medical
diagnostic model designed to predict whether a patient has a rare disease based on certain
test results. The model has been tested on 200 patient cases. Out of 50 patients who had the
disease, the model correctly identified 40 of them as positive. Out of the 150 patients who
did not have the disease, the model correctly identified 120 of them as negative. Construct
the Confusion Matrix according to the above scenario and calculate Accuracy and F1-
Score.
Page 4 of 8
c- Underfitting & Overfitting: Briefly define “sweet spot” and complete the following table.
Sweet Spot:
The "sweet spot" refers to an optimal balance between two competing factors or variables,
where an ideal outcome or performance is achieved. In machine learning, the sweet spot
often refers to finding the right balance between bias and variance in a model to achieve
optimal predictive performance.
Bias Variance Fit Type Example
Low High Over Fit Model fits the training data too closely,
capturing noise.
Low Low Best Fit / Ideal Model captures the underlying patterns
Fit perfectly.
d- Cross Validation: Mention any Two differences between Hold-out method and K-fold
method along with an example.
Page 5 of 8
Question No. 3 (CLO-2(SO-2,4)) (05+10 = 15 Marks)
a- Linear Regression: Emma recently bought a new car and decided to track the gallons of
gas she used on five of her business trips. She seeks assistance in predicting the consumed
gallons of gas. So, calculate the consumed gallons using the equation ŷ = -2125 + 1.08x for
2005, 2006, 2007, 2009, and 2013.
Page 6 of 8
b- Naïve Bayes is a probabilistic machine learning algorithm that can be used in a wide variety
of classification tasks. Consider the following dataset and predict whether a person named
AMISH can buy a computer or not. The attribute for AMISH stability is given below.
AMISH (X) = (age > 40 , income = medium, student = no, credit_rating = fair)
P(Ci ):
P(buys_computer = “yes”) = 9/14 = 0.643
Page 7 of 8
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(X|Ci) :
P(X|buys_computer = “yes”) = 0.33 x 0.44 x 0.22 x 0.66 = 0.021
P(X|Ci)*P(Ci) :
P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.021 * 0.643 = 0.0135
• What is the name of your semester project along with a brief description of the main aim
and idea of your proposed semester project?
Financial Fraud Detection
Financial fraud poses a significant threat to individuals and institutions, necessitating
robust and adaptive solutions for detection and prevention. This semester project aims to
develop a comprehensive Financial Fraud Detection System using machine learning
techniques.
• Which dataset do you plan to explore? (mention the main web source of your dataset)
Synthetic Financial Datasets For Fraud Detection
From Kaggle
• What algorithms do you intend to apply for your project, and what platform
(language/library/tool) will you use?
SVM, Decision Trees
Python: Libraries: scikit-learn, TensorFlow, PyTorch, Keras, XGBoost, LightGBM
Tools: Jupyter Notebook, Pandas, NumPy
Page 8 of 8