0% found this document useful (0 votes)

39 views10 pages

Logistic Regression Essentials in R - Articles - STHDA

Uploaded by

hadrilka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views10 pages

Logistic Regression Essentials in R - Articles - STHDA

Uploaded by

hadrilka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA

This website uses cookies to ensure you get the best experience on our website, to personalize content and ads and to analyze our traffic. We also share information
about your use of our site with our social media, advertising and analytics partners. By using our site you agree to our use of cookies Learn more
DeclineOK Advertisement

STHDA
Stati s t i c a l t o o l s f or high-through put data analysis

Licence:

Search... 

Home Basics Data Visualize Analyze Resources Our Products Support About

Home / Articles / Machine Learning / Classification Methods Essentials / Logistic Regression Essentials in R

 Articles - Classification Methods Essentials

Logistic Regression Essentials in R
 kassambara |  11/03/2018 |  307754 |  Comments (3) |  Classification Methods Essentials
Logistic regression is used to predict the class (or category) of individuals based on one or multiple predictor variables (x). It is used to model a binary outcome, that is
a variable, which can have only two possible values: 0 or 1, yes or no, diseased or non-diseased.

Logistic regression belongs to a family, named Generalized Linear Model (GLM), developed for extending the linear regression model (Chapter @ref(linear-regression)) to
other situations. Other synonyms are binary logistic regression, binomial logistic regression and logit model.

Logistic regression does not return directly the class of observations. It allows us to estimate the probability (p) of class membership. The probability will range be-
tween 0 and 1. You need to decide the threshold probability at which the category flips from one to the other. By default, this is set to p = 0.5, but in reality it should
be settled based on the analysis purpose.

In this chapter you’ll learn how to:

Define the logistic regression equation and key terms such as log-odds and logit
Perform logistic regression in R and interpret the results
Make predictions on new test data and evaluate the model accuracy

Contents:

Logistic function
Loading required R packages
Preparing the data
Computing logistic regression
Quick start R code
Simple logistic regression
Multiple logistic regression

Interpretation
Making predictions
Assessing model accuracy
Discussion
References

The Book:

www.sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/?authuser=0 1/10
21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA

Machine Learning Essentials: Practical Guide in R

Logistic function
The standard logistic regression function, for predicting the outcome of an observation given a predictor variable (x), is an s-shaped curve defined as p = exp(y) / [1
+ exp(y)] (James et al. 2014). This can be also simply written as p = 1/[1 + exp(-y)], where:

y = b0 + b1*x,
exp() is the exponential and
p is the probability of event to occur (1) given x. Mathematically, this is written as p(event=1|x) and abbreviated asp(x), sopx = 1/[1 + exp(-(b0 + b1*x))]`

By a bit of manipulation, it can be demonstrated that p/(1-p) = exp(b0 + b1*x). By taking the logarithm of both sides, the formula becomes a linear combination of
predictors: log[p/(1-p)] = b0 + b1*x.

When you have multiple predictor variables, the logistic function looks like: log[p/(1-p)] = b0 + b1*x1 + b2*x2 + ... + bn*xn

b0 and b1 are the regression beta coefficients. A positive b1 indicates that increasing x will be associated with increasing p. Conversely, a negative b1 indicates that in-
creasing x will be associated with decreasing p.

The quantity log[p/(1-p)] is called the logarithm of the odd, also known as log-odd or logit.

The odds reflect the likelihood that the event will occur. It can be seen as the ratio of “successes” to “non-successes”. Technically, odds are the probability of an event
divided by the probability that the event will not take place (P. Bruce and Bruce 2017). For example, if the probability of being diabetes-positive is 0.5, the probability of
“won’t be” is 1-0.5 = 0.5, and the odds are 1.0.

Note that, the probability can be calculated from the odds as p = Odds/(1 + Odds).

Loading required R packages

tidyverse for easy data manipulation and visualization
caret for easy machine learning workflow

library(tidyverse)
library(caret)
theme_set(theme_bw())

Preparing the data

Logistic regression works for a data that contain continuous and/or categorical predictor variables.

Performing the following steps might improve the accuracy of your model

Remove potential outliers

Make sure that the predictor variables are normally distributed. If not, you can use log, root, Box-Cox transformation.
Remove highly correlated predictors to minimize overfitting. The presence of highly correlated predictors might lead to an unstable model solution.

Here, we’ll use the PimaIndiansDiabetes2 [in mlbench package], introduced in Chapter @ref(classification-in-r), for predicting the probability of being diabetes positive
based on multiple clinical variables.

We’ll randomly split the data into training set (80% for building a predictive model) and test set (20% for evaluating the model). Make sure to set seed for
reproducibility.

# Load the data and remove NAs

data("PimaIndiansDiabetes2", package = "mlbench")
PimaIndiansDiabetes2 <- na.omit(PimaIndiansDiabetes2)
# Inspect the data
sample_n(PimaIndiansDiabetes2, 3)
# Split the data into training and test set
set.seed(123)
training.samples <- PimaIndiansDiabetes2$diabetes %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- PimaIndiansDiabetes2[training.samples, ]
test.data <- PimaIndiansDiabetes2[-training.samples, ]

Computing logistic regression

The R function glm(), for generalized linear model, can be used to compute logistic regression. You need to specify the option family = binomial, which tells to R that
we want to fit logistic regression.

www.sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/?authuser=0 2/10
21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA
Quick start R code

# Fit the model

model <- glm( diabetes ~., data = train.data, family = binomial)
# Summarize the model
summary(model)
# Make predictions
probabilities <- model %>% predict(test.data, type = "response")
predicted.classes <- ifelse(probabilities > 0.5, "pos", "neg")
# Model accuracy
mean(predicted.classes == test.data$diabetes)

Simple logistic regression

The simple logistic regression is used to predict the probability of class membership based on one single predictor variable.

The following R code builds a model to predict the probability of being diabetes-positive based on the plasma glucose concentration:

model <- glm( diabetes ~ glucose, data = train.data, family = binomial)

summary(model)$coef

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) -6.3267 0.7241 -8.74 2.39e-18
## glucose 0.0437 0.0054 8.09 6.01e-16

The output above shows the estimate of the regression beta coefficients and their significance levels. The intercept (b0) is -6.32 and the coefficient of glucose variable is
0.043.

The logistic equation can be written as p = exp(-6.32 + 0.043*glucose)/ [1 + exp(-6.32 + 0.043*glucose)]. Using this formula, for each new glucose plasma
concentration value, you can predict the probability of the individuals in being diabetes positive.

Predictions can be easily made using the function predict(). Use the option type = “response” to directly obtain the probabilities

newdata <- data.frame(glucose = c(20, 180))

probabilities <- model %>% predict(newdata, type = "response")
predicted.classes <- ifelse(probabilities > 0.5, "pos", "neg")
predicted.classes

The logistic function gives an s-shaped probability curve illustrated as follow:

train.data %>%
mutate(prob = ifelse(diabetes == "pos", 1, 0)) %>%
ggplot(aes(glucose, prob)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "glm", method.args = list(family = "binomial")) +
labs(
title = "Logistic Regression Model",
x = "Plasma Glucose Concentration",
y = "Probability of being diabete-pos"
)

Multiple logistic regression

The multiple logistic regression is used to predict the probability of class membership based on multiple predictor variables, as follow:

www.sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/?authuser=0 3/10
21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA

model <- glm( diabetes ~ glucose + mass + pregnant,

data = train.data, family = binomial)
summary(model)$coef

Here, we want to include all the predictor variables available in the data set. This is done using ~.:

model <- glm( diabetes ~., data = train.data, family = binomial)

summary(model)$coef

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) -9.50372 1.31719 -7.215 5.39e-13
## pregnant 0.04571 0.06218 0.735 4.62e-01
## glucose 0.04230 0.00657 6.439 1.20e-10
## pressure -0.00700 0.01291 -0.542 5.87e-01
## triceps 0.01858 0.01861 0.998 3.18e-01
## insulin -0.00159 0.00139 -1.144 2.52e-01
## mass 0.04502 0.02887 1.559 1.19e-01
## pedigree 0.96845 0.46020 2.104 3.53e-02
## age 0.04256 0.02158 1.972 4.86e-02

From the output above, the coefficients table shows the beta coefficient estimates and their significance levels. Columns are:

Estimate: the intercept (b0) and the beta coefficient estimates associated to each predictor variable
Std.Error: the standard error of the coefficient estimates. This represents the accuracy of the coefficients. The larger the standard error, the less confident we are
about the estimate.
z value: the z-statistic, which is the coefficient estimate (column 2) divided by the standard error of the estimate (column 3)
Pr(>|z|): The p-value corresponding to the z-statistic. The smaller the p-value, the more significant the estimate is.

Note that, the functions coef() and summary() can be used to extract only the coefficients, as follow:

coef(model)
summary(model )$coef

Interpretation
It can be seen that only 5 out of the 8 predictors are significantly associated to the outcome. These include: pregnant, glucose, pressure, mass and pedigree.

The coefficient estimate of the variable glucose is b = 0.045, which is positive. This means that an increase in glucose is associated with increase in the probability of
being diabetes-positive. However the coefficient for the variable pressure is b = -0.007, which is negative. This means that an increase in blood pressure will be associ-
ated with a decreased probability of being diabetes-positive.

An important concept to understand, for interpreting the logistic beta coefficients, is the odds ratio. An odds ratio measures the association between a predictor vari-
able (x) and the outcome variable (y). It represents the ratio of the odds that an event will occur (event = 1) given the presence of the predictor x (x = 1), compared to
the odds of the event occurring in the absence of that predictor (x = 0).

For a given predictor (say x1), the associated beta coefficient (b1) in the logistic regression function corresponds to the log of the odds ratio for that predictor.

If the odds ratio is 2, then the odds that the event occurs (event = 1) are two times higher when the predictor x is present (x = 1) versus x is absent (x = 0).
For example, the regression coefficient for glucose is 0.042. This indicate that one unit increase in the glucose concentration will increase the odds of being diabetes-
positive by exp(0.042) 1.04 times.

From the logistic regression results, it can be noticed that some variables - triceps, insulin and age - are not statistically significant. Keeping them in the model may con-
tribute to overfitting. Therefore, they should be eliminated. This can be done automatically using statistical techniques, including stepwise regression and penalized
regression methods. This methods are described in the next section. Briefly, they consist of selecting an optimal model with a reduced set of variables, without com-
promising the model curacy.

Here, as we have a small number of predictors (n = 9), we can select manually the most significant:

model <- glm( diabetes ~ pregnant + glucose + pressure + mass + pedigree,

data = train.data, family = binomial)

Making predictions
We’ll make predictions using the test data in order to evaluate the performance of our logistic regression model.

The procedure is as follow:

1. Predict the class membership probabilities of observations based on predictor variables

2. Assign the observations to the class with highest probability score (i.e above 0.5)

The R function predict() can be used to predict the probability of being diabetes-positive, given the predictor values.

Predict the probabilities of being diabetes-positive:

probabilities <- model %>% predict(test.data, type = "response")

head(probabilities)

## 21 25 28 29 32 36
## 0.3914 0.6706 0.0501 0.5735 0.6444 0.1494

www.sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/?authuser=0 4/10
21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA
Which classes do these probabilities refer to? In our example, the output is the probability that the diabetes test will be positive. We know that these values correspond
to the probability of the test to be positive, rather than negative, because the contrasts() function indicates that R has created a dummy variable with a 1 for “pos”
and “0” for neg. The probabilities always refer to the class dummy-coded as “1”.

Check the dummy coding:

contrasts(test.data$diabetes)

## pos
## neg 0
## pos 1

Predict the class of individuals:

The following R code categorizes individuals into two groups based on their predicted probabilities (p) of being diabetes-positive. Individuals, with p above 0.5 (random
guessing), are considered as diabetes-positive.

predicted.classes <- ifelse(probabilities > 0.5, "pos", "neg")

head(predicted.classes)

## 21 25 28 29 32 36
## "neg" "pos" "neg" "pos" "pos" "neg"

Assessing model accuracy

The model accuracy is measured as the proportion of observations that have been correctly classified. Inversely, the classification error is defined as the proportion of
observations that have been misclassified.

Proportion of correctly classified observations:

mean(predicted.classes == test.data$diabetes)

## [1] 0.756

 The classification prediction accuracy is about 76%, which is good. The misclassification error rate is 24%.
Note that, there are several metrics for evaluating the performance of a classification model (Chapter @ref(classification-model-evaluation)).

Discussion
In this chapter, we have described how logistic regression works and we have provided R codes to compute logistic regression. Additionally, we demonstrated how to
make predictions and to assess the model accuracy. Logistic regression model output is very easy to interpret compared to other classification methods. Additionally,
because of its simplicity it is less prone to overfitting than flexible methods such as decision trees.
Note that, many concepts for linear regression hold true for the logistic regression modeling. For example, you need to perform some diagnostics (Chapter
@ref(logistic-regression-assumptions-and-diagnostics)) to make sure that the assumptions made by the model are met for your data.

Furthermore, you need to measure how good the model is in predicting the outcome of new test data observations. Here, we described how to compute the raw clas-
sification accuracy, but not that other important performance metric exists (Chapter @ref(classification-model-evaluation))

In a situation, where you have many predictors you can select, without compromising the prediction accuracy, a minimal list of predictor variables that contribute the
most to the model using stepwise regression (Chapter @ref(stepwise-logistic-regression)) and lasso regression techniques (Chapter @ref(penalized-logistic-regression)).

Additionally, you can add interaction terms in the model, or include spline terms.

The same problems concerning confounding and correlated variables apply to logistic regression (see Chapter @ref(confounding-variables) and
@ref(multicollinearity)).

You can also fit generalized additive models (Chapter @ref(polynomial-and-spline-regression)), when linearity of the predictor cannot be assumed. This can be done us-
ing the mgcv package:

library("mgcv")
# Fit the model
gam.model <- gam(diabetes ~ s(glucose) + mass + pregnant,
data = train.data, family = "binomial")
# Summarize model
summary(gam.model )
# Make predictions
probabilities <- gam.model %>% predict(test.data, type = "response")
predicted.classes <- ifelse(probabilities> 0.5, "pos", "neg")
# Model Accuracy
mean(predicted.classes == test.data$diabetes)

Logistic regression is limited to only two-class classification problems. There is an extension, called multinomial logistic regression, for multiclass classification problem
(Chapter @ref(multinomial-logistic-regression)).

Note that, the most popular method, for multiclass tasks, is the Linear Discriminant Analysis (Chapter @ref(discriminant-analysis)).

www.sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/?authuser=0 5/10
21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA

References
Bruce, Peter, and Andrew Bruce. 2017. Practical Statistics for Data Scientists. O’Reilly Media.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company,
Incorporated.

Last update : 19/05/2018

     3 Notes

 Enjoyed this article? Give us 5 stars    (just above this text block)! Reader needs to be STHDA member for voting. I’d be very grateful if you’d help it
spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Recommended for You!

Machine Learning Essentials: Practical Guide in R Practical Guide to Cluster Analysis in R Practical Guide to Principal Component Methods
in R


More books on R and data science
R Graphics Essentials for Great Data Visualization Network Analysis and Visualization in R

Recommended for you

 This section contains best data science and self-development resources to help you on your path.
Coursera - Online Courses and Specialization
Data science
Course: Machine Learning: Master the Fundamentals by Standford
Specialization: Data Science by Johns Hopkins University
Specialization: Python for Everybody by University of Michigan
Courses: Build Skills for a Top Job in any Industry by Coursera
Specialization: Master Machine Learning Fundamentals by University of Washington
Specialization: Statistics with R by Duke University

www.sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/?authuser=0 6/10
21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA
Specialization: Software Development in R by Johns Hopkins University
Specialization: Genomic Data Science by Johns Hopkins University

Popular Courses Launched in 2020

Google IT Automation with Python by Google
AI for Medicine by deeplearning.ai
Epidemiology in Public Health Practice by Johns Hopkins University
AWS Fundamentals by Amazon Web Services

Trending Courses
The Science of Well-Being by Yale University
Google IT Support Professional by Google
Python for Everybody by University of Michigan
IBM Data Science Professional Certificate by IBM
Business Foundations by University of Pennsylvania
Introduction to Psychology by Yale University
Excel Skills for Business by Macquarie University
Psychological First Aid by Johns Hopkins University
Graphic Design by Cal Arts

Books - Data Science

Our Books
Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
Network Analysis and Visualization in R by A. Kassambara (Datanovia)
Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

Others
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
Deep Learning with R by François Chollet & J.J. Allaire
Deep Learning with Python by François Chollet

 You are not authorized to post a comment

Thanos 04/27/2019 at 13h58

Visitor

Must be players exited to join it nice fun to play <a href="http://ginrummy.me">gin rummy</a> free online game it is the best way to all of you.

#765

kassambara 05/19/2018 at 12h07

Administrator

Fixed know, many thanks for your feedback.

#472

sfd 04/24/2018 at 22h20

Member

Great and clear article, Kassambara.

Thanks!.

A possible typo:
near the end, where you say:

"Proportion of correctly classified observations:

mean(predicted.classes, test.data$diabetes)
## [1] NA

The classification prediction accuracy is about 77%, which is good.

The misclassification error rate is 23%".

I think you meant:

mean(predicted.classes == test.data$diabetes)
instead of the current:
mean(predicted.classes , test.data$diabetes)

Hope this helps!

Wonderful post...

www.sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/?authuser=0 7/10
21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA
SFd
-------------

Password
Password

Auto connect

 Register 
 Forgotten password

Welcome!
Want to Learn More on R Programming and Data Science?
Follow us by Email

Subscribe
by FeedBurner

Click to see our collection of resources to help you on your path...

Course & Specialization

Recommended for You (on Coursera):

Course: Machine Learning: Master the Fundamentals
Specialization: Data Science
Specialization: Python for Everybody
Course: Build Skills for a Top Job in any Industry
Specialization: Master Machine Learning Fundamentals
Specialization: Statistics with R
Specialization: Software Development in R
Specialization: Genomic Data Science

See More Resources

factoextra

survminer

ggpubr

ggcorrplot

fastqcr

Our Books

3D Plots in R

www.sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/?authuser=0 8/10
21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA

R Graphics Essentials for Great Data Visualization: 200 Practical Examples You Want to Know for Data Science
 NEW!!

Practical Guide to Cluster Analysis in R

Practical Guide to Principal Component Methods in R

Datanovia: Online Data Science Courses

R-Bloggers

www.sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/?authuser=0 9/10
21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA

Newsletter Email 

Boosted by PHPBoost

www.sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/?authuser=0 10/10

Michael G. Gore - Spectrophotometry and Spectrofluorimetry - A Practical Approach-Oxford University Press, USA (2000)
100% (2)
Michael G. Gore - Spectrophotometry and Spectrofluorimetry - A Practical Approach-Oxford University Press, USA (2000)
387 pages
Coraline Script
75% (8)
Coraline Script
62 pages
Logistic Regression
0% (1)
Logistic Regression
49 pages
Joseph M. Hilbe - Practical Guide To Logistic Regression (2016, Taylor & Francis)
No ratings yet
Joseph M. Hilbe - Practical Guide To Logistic Regression (2016, Taylor & Francis)
162 pages
An Introduction To Theories of Personality 7th Edition by Robert Ewen B, Robert B. Ewen (Z-Lib - Org) - 39
No ratings yet
An Introduction To Theories of Personality 7th Edition by Robert Ewen B, Robert B. Ewen (Z-Lib - Org) - 39
1 page
A Research Project On Applying Logistic Regression To Predict Result of Binary Classification Problems
No ratings yet
A Research Project On Applying Logistic Regression To Predict Result of Binary Classification Problems
6 pages
Exp2 Milf
No ratings yet
Exp2 Milf
7 pages
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
No ratings yet
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
54 pages
YLP Logistic Regression
No ratings yet
YLP Logistic Regression
61 pages
Detailed Logistic Regression
No ratings yet
Detailed Logistic Regression
30 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Week 04 Logistic Regression
No ratings yet
Week 04 Logistic Regression
5 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
02 LogisticRegression
No ratings yet
02 LogisticRegression
29 pages
Class
No ratings yet
Class
102 pages
Unit - 5
No ratings yet
Unit - 5
111 pages
Lecture 2.3.1
No ratings yet
Lecture 2.3.1
50 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Practical Guide To Logistic Regression - Joseph M. Hilbe (2017)
100% (1)
Practical Guide To Logistic Regression - Joseph M. Hilbe (2017)
170 pages
(Book) Bayesian Logistik - Hilbe Practical Guide To Logistic Regression (PDFDrive)
No ratings yet
(Book) Bayesian Logistik - Hilbe Practical Guide To Logistic Regression (PDFDrive)
170 pages
G26 Report
No ratings yet
G26 Report
4 pages
Logistic Regression Monograph - DSBA v2
No ratings yet
Logistic Regression Monograph - DSBA v2
54 pages
Logistic Regression
100% (2)
Logistic Regression
32 pages
Heart Disease App With Code
No ratings yet
Heart Disease App With Code
22 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Unit 3-2
No ratings yet
Unit 3-2
20 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Lec-4 Logistic Regression
No ratings yet
Lec-4 Logistic Regression
54 pages
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
No ratings yet
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
10 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
79 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
RPubs - The Analytics Edge EdX MIT15
No ratings yet
RPubs - The Analytics Edge EdX MIT15
57 pages
3 Classification
No ratings yet
3 Classification
26 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic+Regression+Monograph+ +DSBA+v2
No ratings yet
Logistic+Regression+Monograph+ +DSBA+v2
54 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
ML Exp 8
No ratings yet
ML Exp 8
22 pages
Logistic Regression Monograph
No ratings yet
Logistic Regression Monograph
33 pages
BA TopicB LoR
No ratings yet
BA TopicB LoR
29 pages
Final Cc01 Group05-1
No ratings yet
Final Cc01 Group05-1
26 pages
Logistic Regression: 30 March 2016
No ratings yet
Logistic Regression: 30 March 2016
49 pages
Logistic Regression Playbook
No ratings yet
Logistic Regression Playbook
19 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Logisticregression PDF
No ratings yet
Logisticregression PDF
48 pages
Dissertation Using Logistic Regression
100% (2)
Dissertation Using Logistic Regression
6 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
ML 4
No ratings yet
ML 4
80 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
Logit Regression - R Data Analysis Examples
No ratings yet
Logit Regression - R Data Analysis Examples
12 pages
Chapter 11 Binomial Regression - Statistical Methods II
No ratings yet
Chapter 11 Binomial Regression - Statistical Methods II
53 pages
High School and Beyond Dataset
No ratings yet
High School and Beyond Dataset
2 pages
Current Medicine Research and Practice: Tulsi Chugh
No ratings yet
Current Medicine Research and Practice: Tulsi Chugh
2 pages
Gastrointestinal System: Diseases of The Digestive System
No ratings yet
Gastrointestinal System: Diseases of The Digestive System
7 pages
Animal Sheep PDF
100% (3)
Animal Sheep PDF
202 pages
Etiologia AsmatTrejo Nadia
No ratings yet
Etiologia AsmatTrejo Nadia
30 pages
Cerebral Angiography
No ratings yet
Cerebral Angiography
9 pages
Custom PC - January 2022
No ratings yet
Custom PC - January 2022
118 pages
TOEFL
No ratings yet
TOEFL
19 pages
Maintenance: Mcgraw-Hill/Irwin
No ratings yet
Maintenance: Mcgraw-Hill/Irwin
14 pages
Formulation & Evaluation of Sustained Release Microsphere of Propanolol
No ratings yet
Formulation & Evaluation of Sustained Release Microsphere of Propanolol
11 pages
Case Study 8 Final
No ratings yet
Case Study 8 Final
3 pages
Laboratory Report
No ratings yet
Laboratory Report
2 pages
Effect of Tai Chi Exercise On The Stress of Elderly Women With Hypertension
No ratings yet
Effect of Tai Chi Exercise On The Stress of Elderly Women With Hypertension
6 pages
Design of Grit Chambers
No ratings yet
Design of Grit Chambers
30 pages
Pelekasis Opening Menu
No ratings yet
Pelekasis Opening Menu
1 page
Fire Protection
No ratings yet
Fire Protection
7 pages
AFCAT
No ratings yet
AFCAT
15 pages
GL XX Mobil Delvac Synthetic Transmission Fluid 50
No ratings yet
GL XX Mobil Delvac Synthetic Transmission Fluid 50
3 pages
Piping Designers Vessel Orientation
100% (2)
Piping Designers Vessel Orientation
13 pages
Educ 2 - Module 1.2
No ratings yet
Educ 2 - Module 1.2
5 pages
PM Project Mineral Water
No ratings yet
PM Project Mineral Water
23 pages
Gce A Level Marking Scheme: SUMMER 2022
No ratings yet
Gce A Level Marking Scheme: SUMMER 2022
17 pages
Sme Ar 17629 Moksh 2019 2020 08092020165630
No ratings yet
Sme Ar 17629 Moksh 2019 2020 08092020165630
85 pages
Medical Technology Prayer: PAMET: Philippine Association of Medical Technologists
No ratings yet
Medical Technology Prayer: PAMET: Philippine Association of Medical Technologists
3 pages
123 ArticleText 922 1 10 20230531
No ratings yet
123 ArticleText 922 1 10 20230531
10 pages
SR4 Controlador
No ratings yet
SR4 Controlador
2 pages
Liste de Normes Site Web
No ratings yet
Liste de Normes Site Web
11 pages
Elisa Reader & Microplate Washer: Farheen Saba PH.D Ist Year Deptt. of Zoology Manuu
No ratings yet
Elisa Reader & Microplate Washer: Farheen Saba PH.D Ist Year Deptt. of Zoology Manuu
30 pages
Bruce Lipton - Mind Over Genes
100% (2)
Bruce Lipton - Mind Over Genes
4 pages
Microbial Culture and Its Application: Fermentation Process
No ratings yet
Microbial Culture and Its Application: Fermentation Process
8 pages
Avon-ISI Chem Bio Reservoir CBR50i
No ratings yet
Avon-ISI Chem Bio Reservoir CBR50i
2 pages
Lecture 21 PDF
100% (1)
Lecture 21 PDF
14 pages

Logistic Regression Essentials in R - Articles - STHDA

Uploaded by

Logistic Regression Essentials in R - Articles - STHDA

Uploaded by

21/10/24, 7:30 Logistic Regression Essentials in R - Articles - STHDA

 Articles - Classification Methods Essentials

In this chapter you’ll learn how to:

Machine Learning Essentials: Practical Guide in R

Loading required R packages

Preparing the data

Remove potential outliers

# Load the data and remove NAs

Computing logistic regression

# Fit the model

Simple logistic regression

model <- glm( diabetes ~ glucose, data = train.data, family = binomial)

## Estimate Std. Error z value Pr(>|z|)

newdata <- data.frame(glucose = c(20, 180))

The logistic function gives an s-shaped probability curve illustrated as follow:

Multiple logistic regression

model <- glm( diabetes ~ glucose + mass + pregnant,

model <- glm( diabetes ~., data = train.data, family = binomial)

## Estimate Std. Error z value Pr(>|z|)

model <- glm( diabetes ~ pregnant + glucose + pressure + mass + pedigree,

The procedure is as follow:

1. Predict the class membership probabilities of observations based on predictor variables

Predict the probabilities of being diabetes-positive:

probabilities <- model %>% predict(test.data, type = "response")

Check the dummy coding:

Predict the class of individuals:

predicted.classes <- ifelse(probabilities > 0.5, "pos", "neg")

Assessing model accuracy

Proportion of correctly classified observations:

Last update : 19/05/2018

Recommended for You!

Recommended for you

Popular Courses Launched in 2020

Books - Data Science

 You are not authorized to post a comment

Thanos 04/27/2019 at 13h58

kassambara 05/19/2018 at 12h07

Fixed know, many thanks for your feedback.

sfd 04/24/2018 at 22h20

Great and clear article, Kassambara.

"Proportion of correctly classified observations:

The classification prediction accuracy is about 77%, which is good.

I think you meant:

Hope this helps!

Click to see our collection of resources to help you on your path...

Course & Specialization

Recommended for You (on Coursera):

See More Resources

Practical Guide to Cluster Analysis in R

Practical Guide to Principal Component Methods in R

Datanovia: Online Data Science Courses

You might also like