0% found this document useful (0 votes)

7 views6 pages

QUIZ1 Solution

The document consists of a series of multiple-choice questions and case study questions related to data analysis, regression models, and business analytics. It covers topics such as box plots, histograms, normalization, correlation, and the differences between descriptive and predictive analytics. Additionally, it includes a case study on predicting house prices and a logistic regression model for fraud detection, along with calculations and recommendations for improving model accuracy.

Uploaded by

bharat.goel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views6 pages

QUIZ1 Solution

Uploaded by

bharat.goel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1) What does a box plot primarily represent?

A) Mean and standard deviation

B) Median, quartiles, and outliers

C) Mode and range

D) Skewness and kurtosis

2) What information does a histogram provide about a dataset?

a) The frequency distribution of continuous or discrete data

b) The correlation between two variables

c) The causal relationships between variables

d) The central tendency and spread of categorical data

3) What is the purpose of normalization in data preprocessing?

a) To reduce the data's dimensionality

b) To transform data into a common scale without distorting differences in the range of values

c) To handle missing values

d) To detect outliers

4)In a scatter plot, what does a strong positive correlation between two variables look like?

a) A cloud of points scattered randomly

b) A horizontal line of points

c) A vertical line of points

d) Points forming an upward-sloping line

5) What is a key difference between descriptive analytics and predictive analytics?

a) Descriptive analytics forecasts future events, while predictive analytics summarizes historical data.

b) Descriptive analytics explains what happened in the past, while predictive analytics forecasts what is
likely to happen in the future.

c) Descriptive analytics provides recommendations, while predictive analytics visualizes data.

d) Descriptive analytics optimizes business processes, while predictive analytics identifies patterns.

6) Which technique is commonly used in prescriptive analytics to provide actionable recommendations?

a) Regression Analysis
b) Simulation Models
c) Clustering
d) Data Summarization
7) How does Business Intelligence (BI) generally differ from Business Analytics?

a) BI focuses on historical reporting, while Business Analytics involves predictive and prescriptive
modeling.

b) BI involves data visualization, while Business Analytics includes real-time data analysis.

c) BI and Business Analytics are used interchangeably in all contexts.

d) BI focuses on forecasting, while Business Analytics is concerned with descriptive statistics.

8) In business analytics, which type of problem is best addressed by classification techniques?

a) Predicting continuous numerical values

b) Determining the impact of independent variables on a dependent variable

c) Assigning data points to predefined categories or classes

d) Estimating future trends based on historical patterns

Case Study Questions 9 - 16

Dependent Variable: House Prices (in $)

R-squared: 0.820
Adjusted R-squared: 0.815

Coefficient

Variable Coefficient Std. Error t-Statistic P-value

Intercept 50000 10000 5.00 0.000

Number of Bedrooms 30000 5000 6.00 0.000

Square Footage 150 25 6.00 0.000

Distance to City -20000 1000 -20.00 0.000

F-statistic: 150.00
Prob (F-statistic): 0.000

9)The coefficient for the variable "Distance to City" is -20000. What does this coefficient imply?
A) For every additional mile away from the city, the house price decreases by $20,000.
B) For every additional mile closer to the city, the house price decreases by $20,000.
C) For every additional mile away from the city, the house price increases by $20,000.
D) For every additional mile away from the city, the house price stays the same.

10) The R-squared value is 0.820. What does this tell you about the model?

A) The model explains 18% of the variance in house prices.

B) The model explains 82% of the variance in house prices.
C) The model fits the data perfectly.
D) The model does not explain any of the variance in house prices.

11) What does a p-value of 0.000 for all independent variables indicate?

A) All independent variables have no effect on house prices.

B) All independent variables are statistically insignificant.
C) All independent variables are statistically significant at any conventional level (e.g., 1%, 5%).
D) The model does not explain any variance in house prices.

12) Which variable has a larger impact on house prices according to the regression output?

A) Number of Bedrooms
B) Square Footage
C) Distance to City
D) Intercept

13)The adjusted R-squared value is 0.815, slightly lower than the R-squared of 0.820. Why is this?

A) Adjusted R-squared is always higher than R-squared.

B) Adjusted R-squared accounts for the number of predictors and penalizes adding irrelevant variables.
C) Adjusted R-squared only applies to time series models.
D) Adjusted R-squared is not relevant in this case.

14) The model's intercept is 50000. What does this value represent?

A) The baseline house price with zero bedrooms, zero square footage, and zero distance from the city
center.
B) The baseline house price when all variables are at their mean value.
C) The maximum possible house price in the model.
D) It has no practical significance.

15) If a house has 4 bedrooms, 3000 square feet, and is located 5 miles from the city center, what would
the predicted price be?

16) The model’s F-statistic is 150. What does a high F-statistic suggest about the model?

A) The model explains all of the variance in house prices.

B) The overall regression model is statistically significant, and at least one predictor has a non-zero
coefficient.
C) The independent variables are not related to the dependent variable.
D) The model does not fit the data well.

17) A retail company wants to forecast monthly sales based on advertising spend, time of the year, and
number of competitors in the market. The company fits a linear regression model and finds that although
the model performs well on historical data, the forecasted sales for the upcoming months are consistently
underestimated.
Question:
What could be the possible reasons for this underestimation? Suggest a strategy to address this issue and
improve the accuracy of the forecasts.

18) A logistic regression model predicts a probability of 0.62 for an event. If the threshold for
classification is 0.5, what is the predicted class for this observation?

A) Class 0
B) Class 1
C) The model is inconclusive
D) Need more data to classify

19) A logistic regression model predicts 80 instances, out of which 60 were classified correctly. What
is the model's accuracy?
A) 0.60 B) 0.65 C) 0.75 D) 0.80

20) If a logistic regression model has a true positive rate of 0.85 and a false positive rate of 0.10, what is
the value of the area under the ROC curve (AUC)?
A) 0.10
B) 0.50
C) 0.85
D) 0.95

Q21 ) An online payment platform is using logistic regression to predict whether a transaction is
fraudulent. The dataset includes features such as transaction amount, transaction time, location, and
device type. The model outputs the probability of fraud for each transaction. (10 marks)

Questions:

1. For a particular transaction, the model predicts a probability of 0.85 for fraud. What decision
would the platform make if it sets the classification threshold at 0.7? Would you recommend
adjusting the threshold? Why or why not?
Detailed Soln -
Decision: Since the predicted probability of fraud (0.85) exceeds the classification threshold of
0.7, the platform would classify this transaction as fraudulent and flag it for further investigation.

Threshold Adjustment Recommendation: Whether to adjust the threshold depends on the business
goals:

If the platform wants to minimize false positives (legitimate transactions incorrectly flagged), it
should increase the threshold to make the fraud detection stricter.

If the platform wants to catch more fraud (reduce false negatives), it should lower the threshold to
flag more transactions as fraudulent.

Recommendation: If minimizing fraud is a priority (which is common in fraud detection), you

may not want to raise the threshold. In contrast, if the platform is flagging too many legitimate
transactions, adjusting the threshold upward could help reduce the false positives.
2. Explain the consequences of false positives (a legitimate transaction flagged as fraudulent) and
false negatives (a fraudulent transaction allowed to proceed). In this context, which is more
critical to minimize?

Ans :

False Positives (FP): A legitimate transaction is incorrectly flagged as fraud. The consequences are:
Customer frustration due to transaction delays or cancellations.

Possible loss of business or customer trust.

Operational costs associated with investigating false alerts.

False Negatives (FN): A fraudulent transaction is allowed to proceed. The consequences are:

Financial loss to the company due to undetected fraud.

Reputational damage, especially if the fraud is discovered later.

Increased fraud risk, as fraudsters may continue exploiting weaknesses in the system.
More Critical to Minimize: In this case, false negatives (fraudulent transactions allowed to
proceed) are more critical to minimize. Fraud results in direct financial loss and potential legal and
reputational risks, making it the higher-priority issue.

3. Precision-Recall Trade-off:
After evaluating the model, you find that the precision is 0.80 and the recall is 0.60. If the
company wants to minimize the number of legitimate transactions incorrectly flagged as fraud,
how would you adjust the model’s threshold?

Ans :

Precision (0.80): Out of all flagged fraudulent transactions, 80% were actually fraudulent.

Recall (0.60): Out of all actual fraud cases, only 60% were detected by the model.
Adjusting the Threshold:

If the company wants to minimize the number of legitimate transactions incorrectly flagged as
fraud, it needs to increase precision. This can be achieved by raising the threshold for flagging
fraud. By increasing the threshold, the model will become more conservative and flag fewer
transactions, thereby reducing false positives but potentially missing more actual fraud cases
(lowering recall).

Recommendation: Adjust the threshold upward if the cost of incorrectly flagging legitimate
transactions is higher than missing a few fraud cases.

4. Confusion Matrix Analysis:

The confusion matrix for the model is as follows:
True Positives: 250, True Negatives: 500, False Positives: 150, False Negatives : 100

Calculate the model’s accuracy and F1 score. What do these metrics reveal about the model's fraud
detection capability?
Interpretation:

Accuracy (0.75): The model correctly classifies 75% of the transactions overall. However, accuracy alone
may not fully reflect the model's performance in an imbalanced fraud detection context, where false
negatives can be very costly.

F1 Score (0.666): The F1 score is lower than the accuracy, which suggests that the model is
struggling to balance precision and recall. This metric shows that the model's overall performance,
especially in detecting fraud (TP vs FP and FN), could be improved.

5. Business Decision:
Fraudulent transactions cost the company $1000 per case, while incorrectly flagging a legitimate
transaction costs $50. Based on the confusion matrix, calculate the total cost of the current
model’s predictions. How would you adjust the model to reduce overall costs?

Ans : Total cost for false positives = 150 × 50 = $7,500

Total cost for false negatives = 100 × 1,000 = $100,000

Total Cost: $7,500 + $100,000 = $107,500

Adjusting the Model:

Since the cost of false negatives is much higher than false positives, the model should prioritize
minimizing false negatives (i.e., detecting more fraud). To do this, lowering the threshold for
classifying fraud could help capture more fraudulent transactions, even if it slightly increases
the number of false positives.

The goal is to balance the trade-off between false positives and false negatives in a way that
minimizes the overall financial cost. Lowering the threshold would likely lead to a higher recall,
which would reduce the more expensive false negatives.

MCQ Answers:

1 – B , 2 -A, 3- B, 4- D, 5-B, 6-B, 7 –B, 8 -C, 9-A ,10-B, 11- C, 12 – A, 13 – B, 14-A

15) Predicted Price = 50000 + (30000 × 4) + (150 × 3000) + (-20000 × 5)

= 50,000 + 120,000 + 450,000 - 100,000 = $520,000

16- B 17) - 18 - B, 19 -C, 20-C 60/80=0.75

20) C - The AUC is approximately equal to the true positive rate (0.85) when the false positive
rate is small (0.10)

Subjective Questions
92% (13)
Subjective Questions
6 pages
Hindu Conceptions of Law
No ratings yet
Hindu Conceptions of Law
25 pages
Important!: Read Before Proceeding!
No ratings yet
Important!: Read Before Proceeding!
10 pages
(FREE PDF Sample) Mostly Codeless Game Development: New School Game Engines Robert Ciesla Ebooks
100% (2)
(FREE PDF Sample) Mostly Codeless Game Development: New School Game Engines Robert Ciesla Ebooks
55 pages
4 Transpiration
No ratings yet
4 Transpiration
15 pages
Linear Regression Assignment
0% (2)
Linear Regression Assignment
8 pages
Farzana Khatri
No ratings yet
Farzana Khatri
9 pages
WK 10 Assignment (Marla)
No ratings yet
WK 10 Assignment (Marla)
3 pages
Up Jumped Spring by Freddie Hubbard - Voice - Digital Sheet Music Sheet Music Plus
No ratings yet
Up Jumped Spring by Freddie Hubbard - Voice - Digital Sheet Music Sheet Music Plus
1 page
Exam Final
100% (1)
Exam Final
21 pages
English Form 3 Sameeco 2023
100% (1)
English Form 3 Sameeco 2023
88 pages
Chapter 4 Regression Models
No ratings yet
Chapter 4 Regression Models
28 pages
Unit - 6 Promotion Decisions: Jacqueline
No ratings yet
Unit - 6 Promotion Decisions: Jacqueline
22 pages
DADM Kahoot
No ratings yet
DADM Kahoot
20 pages
DADM Original Cheat Data
No ratings yet
DADM Original Cheat Data
25 pages
Critical Thinking Exercise-Real Estate
No ratings yet
Critical Thinking Exercise-Real Estate
11 pages
KhanhNgo - 1677046 - Critical Thinking Exercise-Real Estate
No ratings yet
KhanhNgo - 1677046 - Critical Thinking Exercise-Real Estate
10 pages
D-DS-FN-23 Dell Data Science Foundations 2023 Updated Dumps
No ratings yet
D-DS-FN-23 Dell Data Science Foundations 2023 Updated Dumps
16 pages
Manual de Instalación XLED
No ratings yet
Manual de Instalación XLED
92 pages
trắc nghiệm phân tích dữ liệu trong kế toán
No ratings yet
trắc nghiệm phân tích dữ liệu trong kế toán
24 pages
L5 Evaluating Materials
No ratings yet
L5 Evaluating Materials
11 pages
Linear Regression Assignment - Subjective
No ratings yet
Linear Regression Assignment - Subjective
7 pages
BSQM - Deatiled Answer
No ratings yet
BSQM - Deatiled Answer
27 pages
58 Circular 2023
No ratings yet
58 Circular 2023
8 pages
Trắc Nghiệm BA TestBank
100% (2)
Trắc Nghiệm BA TestBank
37 pages
Unit V - Update
No ratings yet
Unit V - Update
53 pages
Exam All Questions
No ratings yet
Exam All Questions
566 pages
2024 - PCS - 24P2CSC04 - Question Bank ML
No ratings yet
2024 - PCS - 24P2CSC04 - Question Bank ML
7 pages
Linear Regression Assignment Questions and Answer
No ratings yet
Linear Regression Assignment Questions and Answer
7 pages
Mba S Big Data & Business Analytics Mgu
No ratings yet
Mba S Big Data & Business Analytics Mgu
12 pages
ESC201T L13 Sinusoidal Analysis Phasors
No ratings yet
ESC201T L13 Sinusoidal Analysis Phasors
29 pages
Itae006 Test 1 and 2
No ratings yet
Itae006 Test 1 and 2
18 pages
Revision 235
No ratings yet
Revision 235
8 pages
Subjective Ques SKS
No ratings yet
Subjective Ques SKS
8 pages
Lab Lec 1a - Laboratory Rules and Safety Precautions
No ratings yet
Lab Lec 1a - Laboratory Rules and Safety Precautions
52 pages
The Classical Civilization of Greece and Rome.
No ratings yet
The Classical Civilization of Greece and Rome.
8 pages
Iba Notes With Links
No ratings yet
Iba Notes With Links
57 pages
CIA Understanding
No ratings yet
CIA Understanding
5 pages
Interview Questions Companie
No ratings yet
Interview Questions Companie
72 pages
Question Big Data-1
No ratings yet
Question Big Data-1
11 pages
Working of A Human Ear: PHASE:-#02. Chapter: - Sound
No ratings yet
Working of A Human Ear: PHASE:-#02. Chapter: - Sound
14 pages
Mosdorfer Catalog Clamps
No ratings yet
Mosdorfer Catalog Clamps
44 pages
Jennifer Bridges
No ratings yet
Jennifer Bridges
3 pages
Drug Study On CARBOPROST
100% (1)
Drug Study On CARBOPROST
4 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
31 pages
Archlinux - Grub
No ratings yet
Archlinux - Grub
15 pages
MSC in Chemical and Bioengineering Study Plan ETH Zurich
No ratings yet
MSC in Chemical and Bioengineering Study Plan ETH Zurich
1 page
Revision Exercise SDSC5001 Midterm
No ratings yet
Revision Exercise SDSC5001 Midterm
4 pages
Assigniment Econometrics RVU 2024 Summer
No ratings yet
Assigniment Econometrics RVU 2024 Summer
5 pages
MCQs (Machine Learning)
50% (22)
MCQs (Machine Learning)
7 pages
Test 03a
No ratings yet
Test 03a
4 pages
Performance Management (Final)
No ratings yet
Performance Management (Final)
16 pages
Econometrics Work Sheet
No ratings yet
Econometrics Work Sheet
5 pages
Ejobscircular Com SSC Result 2020
No ratings yet
Ejobscircular Com SSC Result 2020
17 pages
Cisco LISP Configuration Guide: Version 3 2 July 2010
No ratings yet
Cisco LISP Configuration Guide: Version 3 2 July 2010
26 pages
Vac QP
No ratings yet
Vac QP
6 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
Formulation Development
No ratings yet
Formulation Development
1 page
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
Week 1 - Lecture Prinsip Perakaunan Principles of Accounting (Bt11003)
No ratings yet
Week 1 - Lecture Prinsip Perakaunan Principles of Accounting (Bt11003)
30 pages
Class-Teachers Program - Lalaine Bonifacio
No ratings yet
Class-Teachers Program - Lalaine Bonifacio
2 pages
SDS Solution1
No ratings yet
SDS Solution1
26 pages
Itae0006 Exam
No ratings yet
Itae0006 Exam
4 pages
Creative Non-Fiction - Q3 - W6
100% (5)
Creative Non-Fiction - Q3 - W6
17 pages
DS100-2-Grp#4 Chapter 6 Advanced Analytical Theory and Methods Regression (CADAY, CASTOR, CRUZ, SANORIA, TAN)
No ratings yet
DS100-2-Grp#4 Chapter 6 Advanced Analytical Theory and Methods Regression (CADAY, CASTOR, CRUZ, SANORIA, TAN)
4 pages
Resume Piping Superintendent Gedeandi
No ratings yet
Resume Piping Superintendent Gedeandi
5 pages
MODULE 2 Coursera
No ratings yet
MODULE 2 Coursera
9 pages
Questions Stats and Trix
No ratings yet
Questions Stats and Trix
39 pages
Instructions: Answer Each of The Following Questions and Justify Your Answer (Write It)
No ratings yet
Instructions: Answer Each of The Following Questions and Justify Your Answer (Write It)
3 pages
NSE BA Sample Paper With Solution
100% (1)
NSE BA Sample Paper With Solution
18 pages
Final Exam-Solution
No ratings yet
Final Exam-Solution
11 pages
Business Statistics Paper Fall 2019
No ratings yet
Business Statistics Paper Fall 2019
7 pages
BA Sample Paper Questions2
0% (1)
BA Sample Paper Questions2
17 pages
Axioms:: Simultaneously Meannormalization
No ratings yet
Axioms:: Simultaneously Meannormalization
2 pages
Graded Quiz Unit 3 PDF
No ratings yet
Graded Quiz Unit 3 PDF
10 pages
Grade 3 Data Mining: Question Text
No ratings yet
Grade 3 Data Mining: Question Text
28 pages
Datascience Interview
100% (1)
Datascience Interview
31 pages
Bmgt230 Exam 1
No ratings yet
Bmgt230 Exam 1
11 pages
Quiz2 ISDS 361B
No ratings yet
Quiz2 ISDS 361B
5 pages
The Owner of A Company Has Recently Decided To Raise The Salary of One Employee, Who Was Already Making The Highest Salary, by 20%. Which of The Following Is (Are) Expected To Be Affected by
No ratings yet
The Owner of A Company Has Recently Decided To Raise The Salary of One Employee, Who Was Already Making The Highest Salary, by 20%. Which of The Following Is (Are) Expected To Be Affected by
8 pages
Chapter 4 Regression Models: Quantitative Analysis For Management, 11e (Render)
No ratings yet
Chapter 4 Regression Models: Quantitative Analysis For Management, 11e (Render)
27 pages
Ch20 Exercises
No ratings yet
Ch20 Exercises
10 pages

QUIZ1 Solution

Uploaded by

QUIZ1 Solution

Uploaded by

1) What does a box plot primarily represent?

A) Mean and standard deviation

B) Median, quartiles, and outliers

C) Mode and range

D) Skewness and kurtosis

2) What information does a histogram provide about a dataset?

a) The frequency distribution of continuous or discrete data

b) The correlation between two variables

c) The causal relationships between variables

3) What is the purpose of normalization in data preprocessing?

a) To reduce the data's dimensionality

c) To handle missing values

a) A cloud of points scattered randomly

b) A horizontal line of points

c) A vertical line of points

5) What is a key difference between descriptive analytics and predictive analytics?

c) Descriptive analytics provides recommendations, while predictive analytics visualizes data.

6) Which technique is commonly used in prescriptive analytics to provide actionable recommendations?

c) BI and Business Analytics are used interchangeably in all contexts.

d) BI focuses on forecasting, while Business Analytics is concerned with descriptive statistics.

8) In business analytics, which type of problem is best addressed by classification techniques?

a) Predicting continuous numerical values

b) Determining the impact of independent variables on a dependent variable

c) Assigning data points to predefined categories or classes

d) Estimating future trends based on historical patterns

Case Study Questions 9 - 16

Dependent Variable: House Prices (in $)

Variable Coefficient Std. Error t-Statistic P-value

Intercept 50000 10000 5.00 0.000

Number of Bedrooms 30000 5000 6.00 0.000

Square Footage 150 25 6.00 0.000

Distance to City -20000 1000 -20.00 0.000

A) The model explains 18% of the variance in house prices.

A) All independent variables have no effect on house prices.

A) Adjusted R-squared is always higher than R-squared.

A) The model explains all of the variance in house prices.

Recommendation: If minimizing fraud is a priority (which is common in fraud detection), you

Possible loss of business or customer trust.

Operational costs associated with investigating false alerts.

Financial loss to the company due to undetected fraud.

Reputational damage, especially if the fraud is discovered later.

4. Confusion Matrix Analysis:

Ans : Total cost for false positives = 150 × 50 = $7,500

Total cost for false negatives = 100 × 1,000 = $100,000

Total Cost: $7,500 + $100,000 = $107,500

Adjusting the Model:

1 – B , 2 -A, 3- B, 4- D, 5-B, 6-B, 7 –B, 8 -C, 9-A ,10-B, 11- C, 12 – A, 13 – B, 14-A

15) Predicted Price = 50000 + (30000 × 4) + (150 × 3000) + (-20000 × 5)

= 50,000 + 120,000 + 450,000 - 100,000 = $520,000

16- B 17) - 18 - B, 19 -C, 20-C 60/80=0.75

You might also like