0% found this document useful (0 votes)
3 views8 pages

Deliverytime 3

Uploaded by

KamalSilvas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views8 pages

Deliverytime 3

Uploaded by

KamalSilvas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Multiple Linera Regression on Delivery Time data

Delivery Time Data

Predicting the amount of time required by the route driver to service the vending machines in an outlet.
Response: delivery time (y) Predictors: number of cases of product stocked (x1) distance walked by the
route driver (x2).

library(readxl)
DeliveryTime <- read_excel("DeliveryTime.xlsx")

colnames(DeliveryTime)<-c("Time","NumberofCases", "Distance")
summary(DeliveryTime)

## Time NumberofCases Distance


## Min. : 8.00 Min. : 2.00 Min. : 36.0
## 1st Qu.:13.75 1st Qu.: 4.00 1st Qu.: 150.0
## Median :18.11 Median : 7.00 Median : 330.0
## Mean :22.38 Mean : 8.76 Mean : 409.3
## 3rd Qu.:21.50 3rd Qu.:10.00 3rd Qu.: 605.0
## Max. :79.24 Max. :30.00 Max. :1460.0

pairs(DeliveryTime)

1
5 10 15 20 25 30

70
50
Time

30
10
25

NumberofCases
15
5

1000
Distance

400
0
10 20 30 40 50 60 70 80 0 200 600 1000 1400

Scatter plots y vs x1 and y vs x2 shows linear relationships. Addition, x1 vs x2 plot also shows linear
relationship, resulting multicollinearity.
If there is only one (or a few) dominant regressor, or if the regressors operate nearly independently, the matrix
of scatterplots is most useful. However, when several important regressors are themselves interrelated, then
these scatter diagrams can be very misleading.

cor(DeliveryTime)

## Time NumberofCases Distance


## Time 1.0000000 0.9646146 0.8916701
## NumberofCases 0.9646146 1.0000000 0.8242150
## Distance 0.8916701 0.8242150 1.0000000

cor.test(as.numeric(DeliveryTime$Distance),DeliveryTime$Time)

##
## Pearson’s product-moment correlation
##
## data: as.numeric(DeliveryTime$Distance) and DeliveryTime$Time
## t = 9.4465, df = 23, p-value = 2.214e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7666503 0.9515461
## sample estimates:

2
## cor
## 0.8916701

cor.test(DeliveryTime$NumberofCases,DeliveryTime$Time)

##
## Pearson’s product-moment correlation
##
## data: DeliveryTime$NumberofCases and DeliveryTime$Time
## t = 17.546, df = 23, p-value = 8.22e-15
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9202275 0.9845031
## sample estimates:
## cor
## 0.9646146

Both predictor variables have significant and strong positive linear correlations with the response.
Obtain the 3D visual as below.

Del_lm2<-lm(Time~NumberofCases+Distance,data=DeliveryTime)

sp<-scatterplot3d(DeliveryTime$NumberofCases,DeliveryTime$Distance,DeliveryTime$Time,xlab = "No. of Case

sp$plane3d(Del_lm2, lty.box = "solid")#,


80
60
Delivary time

40

Distance

1500
20

1000
500
0
0

0 5 10 15 20 25 30

No. of Cases

3
Modeling
form of the linear model:
y = β0 + β1 x 1 + β2 x 2 + ϵ

colnames(DeliveryTime)

## [1] "Time" "NumberofCases" "Distance"

First model with the simple linear model using NumberofCases only.

Del_lm1<-lm(Time~NumberofCases,data=DeliveryTime)

Now add the second variable Distance, to the multiple linear regression model.

Del_lm2<-lm(Time~NumberofCases+Distance,data=DeliveryTime)

To get the anova table,

anova(Del_lm2)

## Analysis of Variance Table


##
## Response: Time
## Df Sum Sq Mean Sq F value Pr(>F)
## NumberofCases 1 5382.4 5382.4 506.619 < 2.2e-16 ***
## Distance 1 168.4 168.4 15.851 0.0006312 ***
## Residuals 22 233.7 10.6
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

In the R output anova table shows SS for seperate predictor. But we need overall SS. Let’s obtain it as
below.
Null hypothesis: All the regression coefficients are zero. beta_1=beta2=0 Alternative hypothesis: at least
one regression coefficient(beta_1 or beta_2) is non zero
SSR = 5382.4+168.4=5550 with df=2 SSRes=233.7 with df=25-2-1=22 Fstatistic=(5550/2)/(233.7/22)=261.27
Critical Value of F=F(2,22)(alpha=0.05)=3.44

qf(0.95,2,22)

## [1] 3.443357

Fstatitic is greater than the critical value, therefore, we reject the null hypothesis and conclude that atleast
one regression coefficient is non zero. therefore the regression is significant.
H0 : β1 = β2 = 0 is rejected based on the F-test since pvalue~=0.
Since the P value of the F statistic is very small, we conclude that delivery time is related to delivery volume
and/or distance. However, this does not necessarily imply that the relationship found is an appropriate
one for predicting delivery time as a function of volume and distance. Further tests of model adequacy are
required.

4
summary(Del_lm2)

##
## Call:
## lm(formula = Time ~ NumberofCases + Distance, data = DeliveryTime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7880 -0.6629 0.4364 1.1566 7.4197
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.341231 1.096730 2.135 0.044170 *
## NumberofCases 1.615907 0.170735 9.464 3.25e-09 ***
## Distance 0.014385 0.003613 3.981 0.000631 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 3.259 on 22 degrees of freedom
## Multiple R-squared: 0.9596, Adjusted R-squared: 0.9559
## F-statistic: 261.2 on 2 and 22 DF, p-value: 4.687e-16

In the bottom of the Summary of the model has the overall F-statistic as well.
Further,
R2 for the multiple regression model for the delivery time data as R2 = 0.96, or 96.0%.
Compare the R squared for single predictor and multiple predictor models.

summary(Del_lm1)

##
## Call:
## lm(formula = Time ~ NumberofCases, data = DeliveryTime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5811 -1.8739 -0.3493 2.1807 10.6342
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.321 1.371 2.422 0.0237 *
## NumberofCases 2.176 0.124 17.546 8.22e-15 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 4.181 on 23 degrees of freedom
## Multiple R-squared: 0.9305, Adjusted R-squared: 0.9275
## F-statistic: 307.8 on 1 and 23 DF, p-value: 8.22e-15

2
RAdj = 0.956 . (95.6%) for the two - variable model, while for the simple linear regression model with only
2
x1 (cases), RAdj = 0.930 . , or 93%. Therefore, we would conclude that adding x2 (distance) to the model

5
did result in a meaningful reduction of total variability. This implies having distance and no of cases both
in the model is better than having only no. of cases in the model.
Also comment on the significance of single predictors using the t-test given in the summary.
H0 : β2 = 0 is rejected based on the t-test since pvalue is 0.000631. Hence, conclude that the regressor x2
(distance) contributes significantly to the model given that x1 (cases) is also in the model.
here, tstatistic=0.014385/0.003613=3.98 critical value=t(n-k-1)(0.05/2)=2.073873

qt(0.975,22)

## [1] 2.073873

#plot(Del_lm2)
confint.lm(Del_lm2)

## 2.5 % 97.5 %
## (Intercept) 0.066751987 4.61571030
## NumberofCases 1.261824662 1.96998976
## Distance 0.006891745 0.02187791

extra - sum - of - squares method.

measuring the contribution of xj as if it were the last variable added to the model.

anova(Del_lm1)

## Analysis of Variance Table


##
## Response: Time
## Df Sum Sq Mean Sq F value Pr(>F)
## NumberofCases 1 5382.4 5382.4 307.85 8.22e-15 ***
## Residuals 23 402.1 17.5
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

anova(Del_lm2)

6
## Analysis of Variance Table
##
## Response: Time
## Df Sum Sq Mean Sq F value Pr(>F)
## NumberofCases 1 5382.4 5382.4 506.619 < 2.2e-16 ***
## Distance 1 168.4 168.4 15.851 0.0006312 ***
## Residuals 22 233.7 10.6
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

anova(Del_lm1,Del_lm2)

## Analysis of Variance Table


##
## Model 1: Time ~ NumberofCases
## Model 2: Time ~ NumberofCases + Distance
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 23 402.13
## 2 22 233.73 1 168.4 15.851 0.0006312 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

confidence interval for regression coefficients

# compute 95% confidence interval for coefficients in 'linear_model'


confint(Del_lm2,level = 0.95)

## 2.5 % 97.5 %
## (Intercept) 0.066751987 4.61571030
## NumberofCases 1.261824662 1.96998976
## Distance 0.006891745 0.02187791

# compute 95% bonferroni confidence intervals


confint(Del_lm2, level = 1 -0.05/length(coef(Del_lm2)))

## 0.833 % 99.167 %
## (Intercept) -0.500628740 5.1830910
## NumberofCases 1.173496917 2.0583175
## Distance 0.005022556 0.0237471

CI Estimation of the Mean Response

predict(Del_lm2, data.frame(NumberofCases=8, Distance=275), interval = "confidence", level = 0.95)

## fit lwr upr


## 1 19.22432 17.6539 20.79474

7
Prediction interval for the new observation

predict(Del_lm2, data.frame(NumberofCases=8, Distance=275), interval = "prediction", level = 0.95)

## fit lwr upr


## 1 19.22432 12.28456 26.16407

You might also like