Project
Project
library(pROC)
##
## Attaching package: ’pROC’
## X id Gender Customer.Type
## Min. : 0 Min. : 2 Female:36949 disloyal Customer:13371
## 1st Qu.:18183 1st Qu.: 32441 Male :35784 Loyal Customer :59362
## Median :36366 Median : 64717
## Mean :36366 Mean : 64825
## 3rd Qu.:54549 3rd Qu.: 97240
## Max. :72732 Max. :129880
##
## Age Type.of.Travel Class Flight.Distance
## Min. : 7.00 Business travel:50228 Business:34777 Min. : 31
## 1st Qu.:27.00 Personal Travel:22505 Eco :32719 1st Qu.: 413
1
## Median :40.00 Eco Plus: 5237 Median : 842
## Mean :39.37 Mean :1188
## 3rd Qu.:51.00 3rd Qu.:1741
## Max. :85.00 Max. :4983
##
## Inflight.wifi.service Departure.Arrival.time.convenient Ease.of.Online.booking
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :3.000
## Mean :2.732 Mean :3.059 Mean :2.756
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000
##
## Gate.location Food.and.drink Online.boarding Seat.comfort
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :3.000 Median :4.000
## Mean :2.973 Mean :3.206 Mean :3.252 Mean :3.443
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
##
## Inflight.entertainment On.board.service Leg.room.service Baggage.handling
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:3.000
## Median :4.000 Median :4.000 Median :4.000 Median :4.000
## Mean :3.359 Mean :3.381 Mean :3.348 Mean :3.633
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
##
## Checkin.service Inflight.service Cleanliness Departure.Delay.in.Minutes
## Min. :0.000 Min. :0.000 Min. :0.000 Min. : 0.0
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:2.000 1st Qu.: 0.0
## Median :3.000 Median :4.000 Median :3.000 Median : 0.0
## Mean :3.307 Mean :3.641 Mean :3.289 Mean : 14.8
## 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.: 12.0
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :1305.0
##
## Arrival.Delay.in.Minutes satisfaction
## Min. : 0.00 neutral or dissatisfied:41133
## 1st Qu.: 0.00 satisfied :31600
## Median : 0.00
## Mean : 15.14
## 3rd Qu.: 13.00
## Max. :1280.00
## NA’s :220
##
## Call:
## glm(formula = satisfaction ~ ., family = "binomial", data = train[,
## 3:25])
##
2
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8399 -0.4966 -0.1781 0.3931 4.0168
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.800e+00 9.363e-02 -83.305 < 2e-16 ***
## GenderMale 6.586e-02 2.318e-02 2.841 0.00450 **
## Customer.TypeLoyal Customer 2.030e+00 3.566e-02 56.944 < 2e-16 ***
## Age -8.651e-03 8.469e-04 -10.215 < 2e-16 ***
## Type.of.TravelPersonal Travel -2.729e+00 3.748e-02 -72.801 < 2e-16 ***
## ClassEco -7.087e-01 3.049e-02 -23.242 < 2e-16 ***
## ClassEco Plus -8.450e-01 4.951e-02 -17.068 < 2e-16 ***
## Flight.Distance -2.133e-05 1.349e-05 -1.581 0.11388
## Inflight.wifi.service 3.897e-01 1.362e-02 28.612 < 2e-16 ***
## Departure.Arrival.time.convenient -1.290e-01 9.747e-03 -13.236 < 2e-16 ***
## Ease.of.Online.booking -1.448e-01 1.345e-02 -10.770 < 2e-16 ***
## Gate.location 3.079e-02 1.089e-02 2.827 0.00469 **
## Food.and.drink -2.466e-02 1.275e-02 -1.935 0.05305 .
## Online.boarding 6.122e-01 1.221e-02 50.147 < 2e-16 ***
## Seat.comfort 6.755e-02 1.331e-02 5.076 3.86e-07 ***
## Inflight.entertainment 5.335e-02 1.698e-02 3.142 0.00168 **
## On.board.service 3.033e-01 1.211e-02 25.051 < 2e-16 ***
## Leg.room.service 2.543e-01 1.014e-02 25.083 < 2e-16 ***
## Baggage.handling 1.332e-01 1.356e-02 9.825 < 2e-16 ***
## Checkin.service 3.173e-01 1.018e-02 31.152 < 2e-16 ***
## Inflight.service 1.244e-01 1.435e-02 8.667 < 2e-16 ***
## Cleanliness 2.201e-01 1.446e-02 15.220 < 2e-16 ***
## Departure.Delay.in.Minutes 4.876e-03 1.180e-03 4.133 3.58e-05 ***
## Arrival.Delay.in.Minutes -9.301e-03 1.165e-03 -7.987 1.39e-15 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 99276 on 72512 degrees of freedom
## Residual deviance: 48864 on 72489 degrees of freedom
## (220 observations deleted due to missingness)
## AIC: 48912
##
## Number of Fisher Scoring iterations: 5
plot(density(train$Flight.Distance))
3
density.default(x = train$Flight.Distance)
0e+00 2e−04 4e−04 6e−04 8e−04
Density
plot(table(train$Food.and.drink, train$satisfaction))
4
table(train$Food.and.drink, train$satisfaction)
neutral or dissatisfied
satisfied 0 1 2 3 4 5
Looking at the summary for the logistic regression, the two main predictors struggling with significance are
flight distance and Food and Drink satisfaction. Quickly looking at the density of flight distance, it looks
like there is a lot of good data and it may just be a weak predictor in this model. The stacked bar chart
of Food and Drink ratings looks like a similar number are in each container where a rank of 1 on food and
drink seems good at being related to dissatisfaction and 2,3 are roughly the same but slightly higher levels
of satisfaction and then 4,5 improve slightly more on those two, it seems like a good predictor as well so
there may be something else going on.
5
4
3
residuals(model.LR)
2
1
0
−1
−2
−3
−10 −5 0 5
predict(model.LR)
6
Logistic Regression ROC
1.0
0.8
0.6
Sensitivity
0.4
0.2
0.0
7
## Detection Prevalence : 0.5837
## Balanced Accuracy : 0.8731
##
## ’Positive’ Class : neutral or dissatisfied
##
So the diagnostic plots are suspicions, there are clear patterns in the residuals (some of them are quiet pretty
as well). The ROC curve looks reasonable and the confusion matrix makes it more clear with accuracy CI
95% (0.8739, 0.8812).
8
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
9
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
10
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
caret::confusionMatrix(lda.valpredictions$class, val$satisfaction)
11
## Prevalence : 0.5692
## Detection Rate : 0.5150
## Detection Prevalence : 0.5850
## Balanced Accuracy : 0.8711
##
## ’Positive’ Class : neutral or dissatisfied
##
12
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
13
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
14
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
caret::confusionMatrix(qda.valpredictions$class, val$satisfaction)
15
## Balanced Accuracy : 0.8563
##
## ’Positive’ Class : neutral or dissatisfied
##
16