Chapter 4
Chapter 4
Learning in R: Regression
Logistic regression to
predict probabilities
Predicting Probabilities
Predicting whether an event occurs (yes/no): classification
Predicting the probability that an event occurs: regression
Linear regression: predicts values in [−∞, ∞]
Probabilities: limited to [0,1] interval
So we'll call it non-linear
DataCamp Supervised Learning in R: Regression
outcome: has_dmd
inputs: CK, H
DataCamp Supervised Learning in R: Regression
0: FALSE
1: TRUE
DataCamp Supervised Learning in R: Regression
Logistic Regression
p
log( ) = β0 + β1 x1 + β2 x2 + ...
1−p
DMD model
> model <- glm(has_dmd ~ CK + H, data = train, family = binomial)
outcome: two classes, e.g. a and b
model returns P rob(b)
Recommend: 0/1 or FALSE/TRUE
DataCamp Supervised Learning in R: Regression
DMD Model
> model <- glm(has_dmd ~ CK + H, data = train, family = binomial)
> test$pred <- predict(model, newdata = test, type = "response")
DataCamp Supervised Learning in R: Regression
2
Evaluating a logistic regression model: pseudo-R
2 RSS
R =1−
SS T ot
2 deviance
pseudoR = 1 −
null.deviance
2
Pseudo-R on Training data
Using broom::glance()
## pseudoR2
## 1 0.5922402
Using sigr::wrapChiSqTest()
> wrapChiSqTest(model)
2
Pseudo-R on Test data
# Test data
> test %>%
+ mutate(pred = predict(model, newdata = test, type = "response")) %>%
+ wrapChiSqTest("pred", "has_dmd", TRUE)
Arguments:
data frame
prediction column name
outcome column name
target value (target event)
DataCamp Supervised Learning in R: Regression
Let's practice!
DataCamp Supervised Learning in R: Regression
Poisson and
quasipoisson
regression to predict
Nina Zumel and John Mount
Win-Vector, LLC counts
DataCamp Supervised Learning in R: Regression
Predicting Counts
Linear regression: predicts values in [−∞, ∞]
Counts: integers in range [0, ∞]
DataCamp Supervised Learning in R: Regression
Poisson/Quasipoisson Regression
glm(formula, data, family)
family: either poisson or quasipoisson
Poisson/Quasipoisson Regression
glm(formula, data, family)
family: either poisson or quasipoisson
## mean var
## 1 130.5587 14351.25
2 deviance
pseudoR = 1 −
null.deviance
## pseudoR2
## 1 0.7654358
DataCamp Supervised Learning in R: Regression
## rmse
## 1 69.32869
> sd(bikesFeb$cnt)
[1] 134.2865
DataCamp Supervised Learning in R: Regression
Let's practice!
DataCamp Supervised Learning in R: Regression
family:
> summary(model)
## ...
##
## R-sq.(adj) = 0.619 Deviance explained = 64.1%
## GCV = 49.132 Scale est. = 45.153 n = 40
DataCamp Supervised Learning in R: Regression
Let's practice!