0% found this document useful (0 votes)
182 views

P (Y 1) e 1+ E: Business Analytics - Assignment

1. A bank built a logistic regression model to predict the probability of customers signing up for direct payroll deposits based on their average monthly balance. The model found a positive relationship between balance and probability of signup. 2. A factory manager built a linear regression model to predict overhead costs based on machine hours and production runs. The model was a good fit and explained 86.6% of variation in costs. 3. A study found that higher budget Bollywood movies were more likely to fail at the box office based on a logistic regression model. The break-even budget was estimated to be 101.3 million rupees.

Uploaded by

Videhi Bajaj
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
182 views

P (Y 1) e 1+ E: Business Analytics - Assignment

1. A bank built a logistic regression model to predict the probability of customers signing up for direct payroll deposits based on their average monthly balance. The model found a positive relationship between balance and probability of signup. 2. A factory manager built a linear regression model to predict overhead costs based on machine hours and production runs. The model was a good fit and explained 86.6% of variation in costs. 3. A study found that higher budget Bollywood movies were more likely to fail at the box office based on a logistic regression model. The break-even budget was estimated to be 101.3 million rupees.

Uploaded by

Videhi Bajaj
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Business Analytics – Assignment 09/Oct/2019

______________________________________________________________________________

I. ‘FNB Bank’ is looking to increase the number of customers who use a ‘Direct payroll’ deposit.
The Management is considering a new marketing campaign that will require each branch
manager to call up customers who do not have this deposit. Because of the time and cost
involved, the management wants to focus their efforts only on those customers who have the
highest probability of signing up for the Direct payroll deposit.
The management believes that the ‘average monthly balance’ in a customer’s savings account
may be a useful predictor of whether or not the customer will sign up for the Direct Payroll
deposit.
To investigate the relation between these two variables, the bank built a model using the
average monthly balance (in hundreds of $) and whether the customer signed up for the
payroll deposit (coded 1 if they signed and 0 if not).
A portion of the output is as follows:
Parameter Coeff. SE Coef. P
Constant -2.6335 0.7985 0.001
Monthly Balance 0.22018 0.09002 0.0001
(a) Write the Equation that this model represents.

P(Y=1) = e-2.6335 + 0.22018(Monthly Balance)


1+ e-2.6335 + 0.22018(Monthly Balance)
Here P(Y=1) is the probability of customers signing up for the Pay Roll deposit
And Z = -2.6335 + 0.22018( Average Monthly Balance )
(b) Estimate the probability that customers with an average monthly balance of $1000
will sign up for a Direct payroll deposit.
0.39372046

(c) Suppose the bank wants to contact customers who have a 0.5 or higher probability of
opening a Direct payroll deposit, what is the average monthly balance required to
achieve this probability?

Using the formula


ln{p(y=1)/1-p(y=1)} =b0+b1
ln {(0.5/1-0.5)} =-2.6335 + 0.22018 (monthly balance)
ln (1) = -2.6335 + 0.22018 (monthly balance)
0 + 2.6335 = 0.22018 (monthly balance)
Monthly balance= 2.6335/0.22018
Monthly balance = 11.96066854*100
Therefore, minimum monthly balance required to achieve probability is
$1196.066854

II. Bendrix company manufactures various types of parts for automobiles. The manager of the
factory wants to get a better understanding of overhead costs. Over the past 36 months, the
manager has tracked total overhead costs. To help explain these, he has also collected data
on two variables that relate to the work done at the factory, these are Machine hours (no.
of machine hours in a month) and Production Runs. (number of runs during a month). Refer
to the file ‘Overhead costs.xls”.
a. Build a model to help the manager predict Overhead costs.

Overhead Cost = 3996.678 +43.536(Machine Hours) + 883.617(Production Runs)


b. Is your model a good fit to predict overhead costs? Explain
H0- Machine hours and Production Runs have no effect on Overhead Costs.
H1- Machine Hours and Production Runs have effect on Overhead Costs
P<alpha, reject H0
One of these is making significant difference on Overhead costs model
As area under 107.03 is 0.001 lesser than the alpha, thus we have enough evidence. Also, R^2
is 86.64%, which shows that 86.64% of variation in overhead cost is explained by Machine
hours and Production Runs.
Thus, the model is a good fit to predict Overhead Costs.

c. What would the predicted overhead cost be for 60 production runs and 1500
Machine hours.
Y= 3996.67 + 43.53(1500) + 883.61(60)
Y= 122,308.3

III. Box Office success of Bollywood movies was analysed using several variables. A model was
developed to predict success (1) or failure (0) of a movie using the movie Budget as an
independent variable. A sample of the output is shown below:
Parameter Parameter Estimates P value
Budget -0.016 0.001
Constant 1.621 0.002

(a) Are higher budget movies more likely to fail at the box office? Explain.
P(Y=1) = e1.621 -0.016(Budget)
1+ e1.621-0.016(Budget)
From the equation above, it can be noted that the co-efficient of budget is -0.016. So,
the higher the budget the less is likelihood of success of the movie.
Budget 50 100 200 350
Z 0.821 0.021 -1.579 -3.979
Odds 2.272771 1.021222 0.206181 0.018704
Probability 0.694449 0.50525 0.170937 0.018361
Yes, The Higher budget movies are more likely to fail. As, according to the table above it
can be seen that there exists a negative relationship between budget and success and of
the movie.

(b) Calculate the budget for which box office success and failure are equally likely.
Here, P (Success = 1) = 0.5
Odds = 1(i.e. No. of Fav outcomes/No. of Un-fav. Outcomes)
Budget = (1.621- Ln(Odds))/0.016 = 101.325

IV. The ‘Restaurant Customer Satisfaction’ Survey was conducted during the period 2016-17.
The data is available in the excel file ‘Restaurant Ratings’. The variable ‘Type’ indicates if the
restaurant is ‘’Italian” or “Chinese”. ‘Price’ indicates the average amount paid per person for
dinner. ‘Score’ reflect the customer’s overall satisfaction, with higher values indicating
greater satisfaction.
a. Develop a model to show how ‘overall customer satisfaction’(Score) is related to
‘average price of the meal’ and ‘Type of restaurant’. Is the ‘Type of restaurant’ a
significant factor in overall customer satisfaction? Paste your output and answer the
question.
H0 : There is no Relationship between Type of Restaurant , Average Price of the
meal and, Overall Customer Satisfaction.
H1 : There is Relationship between Type of Restaurant , Average Price of the meal
and, Overall Customer Satisfaction.
Here, P < Alpha, Reject H0.
Yes, The type of restaurant will be a significant factor in the Overall Customer
Satisfaction as 0.017 i.e. the p-value of the restaurant types is less than alpha i.e.
0.05
b. Use the model to predict the satisfaction score of a Chinese restaurant that has an
average meal price of $20.
Satisfaction Score = 67.40 + 0.573(Avg. Meal Price) + 3.038 (Italian Restaurant)
Satisfaction Score of Chinese Restaurant with avg. meal price of Rupees 20 = 78.86

c. How much would the predicted score have changed for an Italian restaurant?
Satisfaction Score = 67.40 + 0.573(Avg. Meal Price) + 3.038 (Italian Restaurant)
Satisfaction Score of Italian Restaurant with avg. meal price of Rupees 20 = 81.898

V. A bank is interested in predicting which customers will respond to its direct marketing
campaign to open a Fixed Deposit with the bank. The response variable Y = 1 implies that a
customer will open an FD after the campaign. The Bank built a model using ‘Job’ as a
predictor variable. The kind of jobs was categorized into 5 levels ; Blue Collar, Management,
Self Employed, Unemployed and Other. The model output is shown as follows:
Variables Parameter Estimates Significance
Blue Collar -0.627 0.0001
Self Employed -0.285 0.002
Management 0.060 0.003
Unemployed -0.264 0.0001
Constant -1.916 0.001

a) Write the Equation that this model represents.


b) Calculate the probability of FD subscription for the job category ‘Others’
c) From the following classification table, calculate the Sensitivity, Specificity and
overall accuracy of this model.

Predicted 0 Predicted 1
Actual 0 3871 129
Actual 1 452 69

(BONUS) If you were responsible for this direct marketing campaign to open FDs in the bank,
which part of the classification matrix would be critical to you

You might also like