Example for Regression
Example for Regression
Fourteen Twenty-Two Food Stores, Inc., is planning to expand its convenience store chain.
To aid in selecting locations for the new stores, it has collected weekly sales data from each
of its 23 stores. To help explain the variability in weekly sales, it has also collected
information describing four variables that it believes are related to sales. The variables are
defined as follows:
SALES: average weekly sales for each store in thousands of dollars
AUTOS: average weekly auto traffic volume in thousands of cars
ENTRY: ease of entry/exit measures on a scale of 1 to 100
ANNINC: average annual household income for the area in thousands of dollars
DISTANCE: distance in miles from the store to the nearest supermarket
Regression Statistics
Multiple 0.915034
R Square 0.837288
Adjusted R
Square 0.186441
Standard
Error 0.73646
Observations 23
ANOVA
Significanc
df SS MS F e
Regression 4 2861495 715374 102.39 0.002
Residual 18 125761 6896.7
Total 22 2987256
Coefficient Standard
s deviation t Stat P-value
Intercept 175.37 92.62 1.89 0.075
AUTOS -0.028 0.315 -0.09 0.929
ENTRY 3.775 1.272 2.97 0.008
ANNINC 1.990 4.510 0.44 0.664
DISTANCE 212.41 28.090 7.56 0.000
Yes, the regression is significant as a whole. This is indicated by the F-statistic obtained from
the ANOVA table, which tests the overall significance of the regression model. The F-
statistic value is 102.39, and the associated p-value is 0.002. Since the p-value is less than the
conventional significance level of 0.05, we can reject the null hypothesis, which states that
there is no significant relationship between the predictor variables (AUTOS, ENTRY,
ANNINC, and DISTANCE) and the sales. The low p-value suggests that at least one of the
predictor variables has a significant effect on sales, justifying the overall significance of the
regression model.
b. Provide the best fitting regression equation for the sales of Fourteen Twenty-Two
Food Stores, Inc.
The regression equation is given by: SALES = 175.37 - 0.028(AUTOS) + 3.775(ENTRY) +
1.990(ANNINC) + 212.41(DISTANCE)
c. Comment on the significance of each predictor. How would it impact the sales? What
measures should be taken by the store manager to have a successful run of the new
stores?
The significance of each predictor can be determined by examining their individual p-values:
AUTOS: The coefficient for AUTOS is -0.028, and its p-value is 0.929. The p-value
> 0.05, indicating that the average weekly auto traffic volume (AUTOS) is not a
significant predictor of sales. Changes in auto traffic are unlikely to have a significant
impact on sales.
ENTRY: The coefficient for ENTRY is 3.775, and its p-value is 0.008. The p-value <
0.05, suggesting that the ease of entry/exit measures (ENTRY) is a significant
predictor of sales. A higher ENTRY score (easier access to the store) positively
affects sales. The store manager should prioritize locations with easier entry and exit
measures to potentially boost sales.
ANNINC: The coefficient for ANNINC is 1.990, and its p-value is 0.664. The p-
value > 0.05, indicating that the average annual household income for the area
(ANNINC) is not a significant predictor of sales. In this case, the average household
income does not have a significant impact on sales.
DISTANCE: The coefficient for DISTANCE is 212.41, and its p-value is 0.000. The
p-value is much less than 0.05, indicating that the distance from the store to the
nearest supermarket (DISTANCE) is a highly significant predictor of sales. The
negative coefficient suggests that as the distance to the nearest supermarket decreases,
sales increase. Store locations closer to supermarkets are likely to attract more
customers and generate higher sales.
To have a successful run of the new stores, the store manager should focus on selecting
locations with easier access (lower ENTRY scores) and proximity to supermarkets (lower
DISTANCE). These factors have the most significant impact on sales, as indicated by the
regression analysis.
d. How do you rate the predictive power of the model? Is it sufficient for generalization
of the model? Discuss.
The model's predictive power can be assessed using the R-squared value, which is a
measure of how well the model explains the variability in the dependent variable (SALES)
based on the independent variables (AUTOS, ENTRY, ANNINC, and DISTANCE).
The R-squared value of the model is 0.837288, which means approximately 83.73% of the
variability in weekly sales can be explained by the predictor variables. This is a relatively
high R-squared value, indicating that the model fits the data well and the predictor variables
collectively have a strong relationship with sales.
However, it's important to note that the adjusted R-squared value is 0.186441, which is
considerably lower than the R-squared value. The adjusted R-squared takes into account the
number of predictor variables and penalizes the model for including irrelevant or redundant
variables. The large difference between R-squared and adjusted R-squared suggests that
some of the predictor variables (AUTOS, ANNINC) might not be adding much value to
the model's predictive power.
Regarding the generalization of the model, it's important to be cautious. While the model
shows a strong relationship between the predictor variables and sales based on the available
data, generalization to new stores and locations requires additional validation and testing. The
model should be tested on new data from different stores and locations to assess its
performance and predictive accuracy before making business decisions based solely on the
current model.
e. The residuals do not follow homoscedasticity. Would it be a major decision-making
factor in the opening of the store? Explain with proper reasoning and data support.
Residuals are the differences between the observed sales values and the predicted sales values
obtained from the regression model. Homoscedasticity refers to the assumption that the
residuals should have a constant variance across all levels of the predictor variables. If
the residuals exhibit non-constant variance (heteroscedasticity), it can affect the reliability
and validity of the regression model's results.
In this case, the Standard Error of the regression model is 0.73646, which provides an
indication of the average deviation of the actual sales values from the predicted values.
However, the presence of heteroscedasticity indicates that the variability of the residuals
changes across different levels of predictor variables, leading to less reliable predictions.