MMLA Assignment01 FT201070
MMLA Assignment01 FT201070
Assignment 01
Name: Sandeep Kumar Pallo (FT201070)
1. Find one good source(s) of data (on the Web) that you feel provide useful
research data for the purpose of business analytics (can be in any context).
Answer the following questions.
a. Provide a snapshot of the web site explaining the variables in the data and a brief
discussion of why this data would be potentially useful to a manager or a policy
maker.
The data analysts have collected 2013 sales data for 1559 products across 10 stores of
Big Mart located in different cities. The target is to build a predictive model and find out
the sales of each product at a particular store. Big Mart will try to understand the
properties of products and stores which play a key role in increasing sales.
We have 2 types of Fat (Regular, Low). We can scale it to categorical variable (1,2). The Model
is built using 1 Categorical (Item_Fat_Content) and 5 Continuous variables (Item_Type,
Item_MRP, Outlet_Size, Outlet_Location_Type, Outlet_Type) as independent variables to
predict the Item_Outlet_Sales.
Model
Sales=β0 + β1 Item_Fat_Content+ β2 Item_Type +β3 Item_MRP +β4
Outlet_Size + β5 Outlet_Location_Type + β6 Outlet_Type + €id
c. How will you estimate your proposed model?
I will read the dataset into the tool and then scale the Fat Content column. I will use a multiple
linear regression. The betas of the model will be estimated through using root mean square error
method. The lower the value of the root mean square error the better is the model.
d. How can you improve this model, if you have access to more data? What data
would you like? Be specific
2. You are a data scientist assigned with the task of understanding how
customer banking habits contribute to revenues and profitability. The bank
has customer age and bank account information.
a. Plot the distribution of total revenue. What do you infer from such a distribution?
The distribution is a right skewed distribution. The distribution at 0 forms the highest percent,
approx. 58%. The revenue is increasing slowly, indicating that that the profit is minimal.
From the descriptive statistics we can find that the mean of Revenue is 2.16 whereas the median
is 1.175. This means some of the customers have generated very high revenue which has shifted
the mean towards the right. But 50 percent of the customers generate Revenue less than 1.175.
Based on the inference above, the dependent variable here is a continuous variable. The
appropriate model for this is a Multiple Linear regression.
Model
Model Output:
The model seems to be statistically significant. RMSE is approx. 3.05 which suggests that
fitness of the model is good, therefore the observed data points are much closer to the
model’s predicted values. Here the predictors Balance total, Offers, Card, Loan, and
Insurance are statistically significant.
d. Based on your estimation results, what are your conclusions regarding the effects
of the various predictors on the profitability of a future customer of the bank?
What suggestions / recommendations do you have for the bank?
Conclusion regarding the effects of various predictors:
From above figure we can see the t val is lowest for LOAN. The variable is statistically
significant, the beta value is the highest among all the other predictors. It also indicates
that 1 unit increase of customers availing the loan, the total revenue increases by 2.273
equivalent unit provided everything else remaining constant.