QUIZ1 Solution
QUIZ1 Solution
b) To transform data into a common scale without distorting differences in the range of values
d) To detect outliers
4)In a scatter plot, what does a strong positive correlation between two variables look like?
a) Descriptive analytics forecasts future events, while predictive analytics summarizes historical data.
b) Descriptive analytics explains what happened in the past, while predictive analytics forecasts what is
likely to happen in the future.
d) Descriptive analytics optimizes business processes, while predictive analytics identifies patterns.
a) Regression Analysis
b) Simulation Models
c) Clustering
d) Data Summarization
7) How does Business Intelligence (BI) generally differ from Business Analytics?
a) BI focuses on historical reporting, while Business Analytics involves predictive and prescriptive
modeling.
b) BI involves data visualization, while Business Analytics includes real-time data analysis.
R-squared: 0.820
Adjusted R-squared: 0.815
Coefficient
F-statistic: 150.00
Prob (F-statistic): 0.000
9)The coefficient for the variable "Distance to City" is -20000. What does this coefficient imply?
A) For every additional mile away from the city, the house price decreases by $20,000.
B) For every additional mile closer to the city, the house price decreases by $20,000.
C) For every additional mile away from the city, the house price increases by $20,000.
D) For every additional mile away from the city, the house price stays the same.
10) The R-squared value is 0.820. What does this tell you about the model?
11) What does a p-value of 0.000 for all independent variables indicate?
12) Which variable has a larger impact on house prices according to the regression output?
A) Number of Bedrooms
B) Square Footage
C) Distance to City
D) Intercept
13)The adjusted R-squared value is 0.815, slightly lower than the R-squared of 0.820. Why is this?
14) The model's intercept is 50000. What does this value represent?
A) The baseline house price with zero bedrooms, zero square footage, and zero distance from the city
center.
B) The baseline house price when all variables are at their mean value.
C) The maximum possible house price in the model.
D) It has no practical significance.
15) If a house has 4 bedrooms, 3000 square feet, and is located 5 miles from the city center, what would
the predicted price be?
16) The model’s F-statistic is 150. What does a high F-statistic suggest about the model?
17) A retail company wants to forecast monthly sales based on advertising spend, time of the year, and
number of competitors in the market. The company fits a linear regression model and finds that although
the model performs well on historical data, the forecasted sales for the upcoming months are consistently
underestimated.
Question:
What could be the possible reasons for this underestimation? Suggest a strategy to address this issue and
improve the accuracy of the forecasts.
18) A logistic regression model predicts a probability of 0.62 for an event. If the threshold for
classification is 0.5, what is the predicted class for this observation?
A) Class 0
B) Class 1
C) The model is inconclusive
D) Need more data to classify
19) A logistic regression model predicts 80 instances, out of which 60 were classified correctly. What
is the model's accuracy?
A) 0.60 B) 0.65 C) 0.75 D) 0.80
20) If a logistic regression model has a true positive rate of 0.85 and a false positive rate of 0.10, what is
the value of the area under the ROC curve (AUC)?
A) 0.10
B) 0.50
C) 0.85
D) 0.95
Q21 ) An online payment platform is using logistic regression to predict whether a transaction is
fraudulent. The dataset includes features such as transaction amount, transaction time, location, and
device type. The model outputs the probability of fraud for each transaction. (10 marks)
Questions:
1. For a particular transaction, the model predicts a probability of 0.85 for fraud. What decision
would the platform make if it sets the classification threshold at 0.7? Would you recommend
adjusting the threshold? Why or why not?
Detailed Soln -
Decision: Since the predicted probability of fraud (0.85) exceeds the classification threshold of
0.7, the platform would classify this transaction as fraudulent and flag it for further investigation.
Threshold Adjustment Recommendation: Whether to adjust the threshold depends on the business
goals:
If the platform wants to minimize false positives (legitimate transactions incorrectly flagged), it
should increase the threshold to make the fraud detection stricter.
If the platform wants to catch more fraud (reduce false negatives), it should lower the threshold to
flag more transactions as fraudulent.
Ans :
False Positives (FP): A legitimate transaction is incorrectly flagged as fraud. The consequences are:
Customer frustration due to transaction delays or cancellations.
False Negatives (FN): A fraudulent transaction is allowed to proceed. The consequences are:
Increased fraud risk, as fraudsters may continue exploiting weaknesses in the system.
More Critical to Minimize: In this case, false negatives (fraudulent transactions allowed to
proceed) are more critical to minimize. Fraud results in direct financial loss and potential legal and
reputational risks, making it the higher-priority issue.
3. Precision-Recall Trade-off:
After evaluating the model, you find that the precision is 0.80 and the recall is 0.60. If the
company wants to minimize the number of legitimate transactions incorrectly flagged as fraud,
how would you adjust the model’s threshold?
Ans :
Precision (0.80): Out of all flagged fraudulent transactions, 80% were actually fraudulent.
Recall (0.60): Out of all actual fraud cases, only 60% were detected by the model.
Adjusting the Threshold:
If the company wants to minimize the number of legitimate transactions incorrectly flagged as
fraud, it needs to increase precision. This can be achieved by raising the threshold for flagging
fraud. By increasing the threshold, the model will become more conservative and flag fewer
transactions, thereby reducing false positives but potentially missing more actual fraud cases
(lowering recall).
Recommendation: Adjust the threshold upward if the cost of incorrectly flagging legitimate
transactions is higher than missing a few fraud cases.
Calculate the model’s accuracy and F1 score. What do these metrics reveal about the model's fraud
detection capability?
Interpretation:
Accuracy (0.75): The model correctly classifies 75% of the transactions overall. However, accuracy alone
may not fully reflect the model's performance in an imbalanced fraud detection context, where false
negatives can be very costly.
F1 Score (0.666): The F1 score is lower than the accuracy, which suggests that the model is
struggling to balance precision and recall. This metric shows that the model's overall performance,
especially in detecting fraud (TP vs FP and FN), could be improved.
5. Business Decision:
Fraudulent transactions cost the company $1000 per case, while incorrectly flagging a legitimate
transaction costs $50. Based on the confusion matrix, calculate the total cost of the current
model’s predictions. How would you adjust the model to reduce overall costs?
The goal is to balance the trade-off between false positives and false negatives in a way that
minimizes the overall financial cost. Lowering the threshold would likely lead to a higher recall,
which would reduce the more expensive false negatives.
MCQ Answers:
20) C - The AUC is approximately equal to the true positive rate (0.85) when the false positive
rate is small (0.10)