0% found this document useful (0 votes)

92 views5 pages

Logistic Regression With R

- The document describes building a logistic regression model to predict student admissions using variables like GRE, GPA and rank. - The logistic regression model is trained on 80% of the data and finds GPA and rank are statistically significant predictors while GRE is not. - The model is used to predict probabilities of admission for the remaining 20% test data, achieving a misclassification error rate of 29.3%. - A goodness-of-fit test confirms the model is a statistically significant improvement over the null model.

Uploaded by

Selin Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views5 pages

Logistic Regression With R

Uploaded by

Selin Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Logistic Regression

Youtube video : https://www.youtube.com/watch?v=AVx7Wc1CQ7Y

R code

#Logistic Regression

# Read data file (choose the data from your folder)

mydata <- read.csv(file.choose(), header = T)
str(mydata)

'data.frame': 400 obs. of 4 variables:

$ admit: int 0 1 1 1 0 1 1 0 1 0 ...
$ gre : int 380 660 800 640 520 760 560 400 540 700 ...
$ gpa : num 3.61 3.67 4 3.19 2.93 3 2.98 3.08 3.39 3.92 ...
$ rank : int 3 3 1 4 4 2 1 2 3 2 ...

Notice that admit and rank were classified as int (actually they are factors)

mydata$admit <- as.factor(mydata$admit)

mydata$rank <- as.factor(mydata$rank)
str(mydata)

'data.frame': 400 obs. of 4 variables:

$ admit: Factor w/ 2 levels "0","1": 1 2 2 2 1 2 2 1 2 1 ...
$ gre : int 380 660 800 640 520 760 560 400 540 700 ...
$ gpa : num 3.61 3.67 4 3.19 2.93 3 2.98 3.08 3.39 3.92 ...
$ rank : Factor w/ 4 levels "1","2","3","4": 3 3 1 4 4 2 1 2 3 2 ...

# Two-way table as factor variables

xtabs(~admit + rank, data = mydata)

rank
admit 1 2 3 4
0 28 97 93 55
1 33 54 28 12
#Partition data - train (80%) & test (20%)
set.seed(1234)
ind <- sample(2, nrow(mydata), replace = T, prob = c(0.8, 0.2))
train <- mydata[ind==1,]
test <- mydata[ind==2,]

#Logistic regression model

mymodel <- glm(admit ~ gre + gpa + rank, data = train, family = 'binomial')
summary(mymodel)

Call:
glm(formula = admit ~ gre + gpa + rank, family = "binomial",
data = train)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5873 -0.8679 -0.6181 1.1301 2.1178

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.009514 1.316514 -3.805 0.000142 ***
gre 0.001631 0.001217 1.340 0.180180
gpa 1.166408 0.388899 2.999 0.002706 **
rank2 -0.570976 0.358273 -1.594 0.111005
rank3 -1.125341 0.383372 -2.935 0.003331 **
rank4 -1.532942 0.477377 -3.211 0.001322 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 404.39 on 324 degrees of freedom

Residual deviance: 369.99 on 319 degrees of freedom
AIC: 381.99

Number of Fisher Scoring iterations: 4

From the result, GRE is not statistically significant, so we drop that

variable

#without gre
mymodel <- glm(admit ~ gpa + rank, data = train, family = 'binomial')
summary(mymodel)

Call:
glm(formula = admit ~ gpa + rank, family = "binomial", data = train)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5156 -0.8880 -0.6318 1.1091 2.1688

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.7270 1.2918 -3.659 0.000253 ***
gpa 1.3735 0.3590 3.826 0.000130 ***
rank2 -0.5712 0.3564 -1.603 0.108976
rank3 -1.1645 0.3804 -3.061 0.002203 **
rank4 -1.5642 0.4756 -3.289 0.001005 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 404.39 on 324 degrees of freedom

Residual deviance: 371.81 on 320 degrees of freedom
AIC: 381.81

Number of Fisher Scoring iterations: 4

#Prediction
p1 <- predict(mymodel, train, type = 'response')
head(p1)

1 2 3 4 6 7
0.2822956 0.2992879 0.6828897 0.1290134 0.2354735 0.3466234

From the result, number 5 is not there because it should have been taken for test data

The result shows for applicant number 1 the chance for being admitted is only 28.22 %

head(train)

admit gre gpa rank

1 0 380 3.61 3
2 1 660 3.67 3
3 1 800 4.00 1
4 1 640 3.19 4
6 1 760 3.00 2
7 1 560 2.98 1

Probability calculation
For applicant #1
> y <- -4.7270 + (1.3735*3.61) + (1*-1.1645)
> y
[1] -0.933165
> exp(y)/(1+exp(y))
[1] 0.282283

For applicant #3
> y <- -4.7270 + (1.3735*4)
> exp(y)/(1+exp(y))
[1] 0.6828716

#Misclassification error - train data

pred1 <- ifelse(p1>0.5, 1, 0)
tab1 <- table(Predicted = pred1, Actual = train$admit)
tab1

Actual
Predicted 0 1
0 208 73
1 15 29
The table above is called Confusion Matrix. 208 means that there are 208 data which are predicted not
to be admitted and it’s true (predicted=actual) while 29 are predicted to be admitted and it’s also true.
73 : predicted not admitted but actually they were admitted and 15 : predicted admitted but actually
they were not admitted

So the misclassification error is, 27.08%

1 - sum(diag(tab1))/sum(tab1)

> 1 - sum(diag(tab1))/sum(tab1)
[1] 0.2707692
#Misclassification error - test data
p2 <- predict(mymodel, test, type = 'response')
pred2 <- ifelse(p2>0.5, 1, 0)
tab2 <- table(Predicted = pred2, Actual = test$admit)
tab2

Actual
Predicted 0 1
0 48 20
1 2 5

1 - sum(diag(tab2))/sum(tab2)

> 1 - sum(diag(tab2))/sum(tab2)
[1] 0.2933333

# Goodness-of-fit test
with(mymodel, pchisq(null.deviance - deviance, df.null-df.residual,
lower.tail = F))

> with(mymodel, pchisq(null.deviance - deviance, df.null-df.residual, lower.t

ail = F))
[1] 1.450537e-06

Because of the value is very low, we are confident that the model is statistically significant.

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Simulating A Basketball Match With A Homogeneous Markov Model and Forecasting The Outcome
100% (1)
Simulating A Basketball Match With A Homogeneous Markov Model and Forecasting The Outcome
11 pages
DwightTimothie Assignment04
No ratings yet
DwightTimothie Assignment04
5 pages
2101 F 12 Logistic Regression With R1
No ratings yet
2101 F 12 Logistic Regression With R1
10 pages
Lab-4: Regression Analysis: Logistic & Multinomial Logistic Regression
No ratings yet
Lab-4: Regression Analysis: Logistic & Multinomial Logistic Regression
10 pages
Atelier Regression Logistique
No ratings yet
Atelier Regression Logistique
4 pages
Statistical Modelling Assignment II
No ratings yet
Statistical Modelling Assignment II
3 pages
R Code Default Data PDF
No ratings yet
R Code Default Data PDF
10 pages
Logit Regression - R Data Analysis Examples
No ratings yet
Logit Regression - R Data Analysis Examples
12 pages
HW5 JW
No ratings yet
HW5 JW
12 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Multinomial Logit Models With R: Library (Mlogit)
No ratings yet
Multinomial Logit Models With R: Library (Mlogit)
8 pages
Logistic Regression in R
No ratings yet
Logistic Regression in R
19 pages
Homework 3: Jiawei Li Sahil Bhagat Shahrzad Baraeinezhad
No ratings yet
Homework 3: Jiawei Li Sahil Bhagat Shahrzad Baraeinezhad
16 pages
Logistic Regression (With R) : 1 Theory
No ratings yet
Logistic Regression (With R) : 1 Theory
15 pages
Assignment3 Finaldraft
No ratings yet
Assignment3 Finaldraft
38 pages
Exercice V
No ratings yet
Exercice V
5 pages
PSQF6270 Example4a Binomial
No ratings yet
PSQF6270 Example4a Binomial
13 pages
Logistic Regression Implementation in R: The Dataset
No ratings yet
Logistic Regression Implementation in R: The Dataset
8 pages
Analyzing GRT Data in Stata
No ratings yet
Analyzing GRT Data in Stata
17 pages
GLM Sol
No ratings yet
GLM Sol
11 pages
Note 4
No ratings yet
Note 4
18 pages
Logit Probit
No ratings yet
Logit Probit
66 pages
Section B: R Programming Output
No ratings yet
Section B: R Programming Output
19 pages
CS1B April 2024
No ratings yet
CS1B April 2024
9 pages
GLM in R
No ratings yet
GLM in R
6 pages
Mock Test Econ
No ratings yet
Mock Test Econ
9 pages
Empirical Exercises 6
No ratings yet
Empirical Exercises 6
7 pages
Business Analytics Report
No ratings yet
Business Analytics Report
2 pages
Problem Set
No ratings yet
Problem Set
8 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
Data Analytics Tools and Techniques: Post Graduate Diploma in Management (2019-2021)
No ratings yet
Data Analytics Tools and Techniques: Post Graduate Diploma in Management (2019-2021)
5 pages
HWK 5
No ratings yet
HWK 5
16 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Longitudinal Data Analysis Instructor: Natasha Sarkisian
No ratings yet
Longitudinal Data Analysis Instructor: Natasha Sarkisian
31 pages
Actuarial Internship Test
No ratings yet
Actuarial Internship Test
9 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
TP Regression
100% (1)
TP Regression
1 page
Assignment R New 1
No ratings yet
Assignment R New 1
26 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Problem 4.1 A)
No ratings yet
Problem 4.1 A)
11 pages
STAT511Q2Q4
No ratings yet
STAT511Q2Q4
11 pages
Introduction To Generalized Linear Models: Logit Model With Categorical Predictors. Before
No ratings yet
Introduction To Generalized Linear Models: Logit Model With Categorical Predictors. Before
24 pages
Regression hw3
No ratings yet
Regression hw3
3 pages
Stat 362 UNIT 5
No ratings yet
Stat 362 UNIT 5
10 pages
Apotelesmata 2000+ Robust
No ratings yet
Apotelesmata 2000+ Robust
4 pages
Regression in R
No ratings yet
Regression in R
40 pages
MultilevelPRI Workshop
No ratings yet
MultilevelPRI Workshop
32 pages
Coding Activity 3.ipynb - Colaboratory
No ratings yet
Coding Activity 3.ipynb - Colaboratory
7 pages
Homework 2 Questions
No ratings yet
Homework 2 Questions
7 pages
Logit and Spss
No ratings yet
Logit and Spss
37 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
A028 GLM-SC3
No ratings yet
A028 GLM-SC3
137 pages
Practice Exam2 PDF
100% (1)
Practice Exam2 PDF
8 pages
Results 1
No ratings yet
Results 1
4 pages
Probit and Logit Models Stata Program and Output PDF
No ratings yet
Probit and Logit Models Stata Program and Output PDF
10 pages
Fixed Versus Random Effects
No ratings yet
Fixed Versus Random Effects
82 pages
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Analytic Geometry: Graphic Solutions Using Matlab Language
From Everand
Analytic Geometry: Graphic Solutions Using Matlab Language
Ing. Mario Castillo
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Gupta - 2015 - Forecasting Bankruptcy For SMEs Using Hazard Function - To What Extent Does Size Matter
No ratings yet
Gupta - 2015 - Forecasting Bankruptcy For SMEs Using Hazard Function - To What Extent Does Size Matter
25 pages
Unit 2 - Class - Preceptron
No ratings yet
Unit 2 - Class - Preceptron
13 pages
Machine Learning in Futures Markets: Waldow, Fabian Schnaubelt, Matthias Krauss, Christopher Fischer, Thomas G
No ratings yet
Machine Learning in Futures Markets: Waldow, Fabian Schnaubelt, Matthias Krauss, Christopher Fischer, Thomas G
15 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Arif 2022 (Sinta 2)
No ratings yet
Arif 2022 (Sinta 2)
24 pages
!JACOB BERCOVITCH - The Nature of The Dispute and The Effectiveness of International Mediation
No ratings yet
!JACOB BERCOVITCH - The Nature of The Dispute and The Effectiveness of International Mediation
23 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
Logistic Regression
No ratings yet
Logistic Regression
7 pages
Numpy Pandas Matplotlib
No ratings yet
Numpy Pandas Matplotlib
70 pages
Identifying Appropriate Statistical Method
No ratings yet
Identifying Appropriate Statistical Method
18 pages
Final Proposal Based On Comments
100% (1)
Final Proposal Based On Comments
51 pages
8614 2nd Assignment
No ratings yet
8614 2nd Assignment
12 pages
Machine Learning Programming Exercise
100% (2)
Machine Learning Programming Exercise
118 pages
Elliott, Raghunathan, & Schenker For Wiley StatsRef PDF
No ratings yet
Elliott, Raghunathan, & Schenker For Wiley StatsRef PDF
10 pages
Searching For The GOAT of Tennis Win Prediction: Stephanie Ann Kovalchik
No ratings yet
Searching For The GOAT of Tennis Win Prediction: Stephanie Ann Kovalchik
12 pages
Homicides and Weapons Examining The Covariates of Weapon Choice
No ratings yet
Homicides and Weapons Examining The Covariates of Weapon Choice
23 pages
Sex Terapi
0% (1)
Sex Terapi
7 pages
Profitable Strategies in Horse Race Betting Markets: University of Melbourne
No ratings yet
Profitable Strategies in Horse Race Betting Markets: University of Melbourne
75 pages
Predictive Modeling Using Segmentation: Nissanlevin Jacob Zahavi
No ratings yet
Predictive Modeling Using Segmentation: Nissanlevin Jacob Zahavi
21 pages
Milburn Et Al Spelling Predictors in PSCH Research Paper 1468798415624482
No ratings yet
Milburn Et Al Spelling Predictors in PSCH Research Paper 1468798415624482
26 pages
WyseSelwynSmithSuter2017 - The BERA - SAGE Handbook of Educational Research
No ratings yet
WyseSelwynSmithSuter2017 - The BERA - SAGE Handbook of Educational Research
1,131 pages
The Effects of Weather On Football Matches Played Within The German Bundesliga
No ratings yet
The Effects of Weather On Football Matches Played Within The German Bundesliga
193 pages
Using Outreg2 To Report Regression Output, Descriptive Statistics, Frequencies and Basic Crosstabulations
No ratings yet
Using Outreg2 To Report Regression Output, Descriptive Statistics, Frequencies and Basic Crosstabulations
16 pages
Deep Learning in Object Detection: A Review: August 2020
No ratings yet
Deep Learning in Object Detection: A Review: August 2020
12 pages
Rmprobit
No ratings yet
Rmprobit
8 pages
The Effect of Extracurricular Activities On School Dropout
No ratings yet
The Effect of Extracurricular Activities On School Dropout
67 pages
AFRICDSA Certified Data Scientist Syllabus - V1.2
No ratings yet
AFRICDSA Certified Data Scientist Syllabus - V1.2
12 pages
Exit Exam Model Hawassa University
No ratings yet
Exit Exam Model Hawassa University
32 pages
Predicting The Winning Side of Dota2: Kuangyan Song, Tianyi Zhang, Chao Ma
No ratings yet
Predicting The Winning Side of Dota2: Kuangyan Song, Tianyi Zhang, Chao Ma
4 pages

Logistic Regression With R

Uploaded by

Logistic Regression With R

Uploaded by

Logistic Regression

Youtube video : https://www.youtube.com/watch?v=AVx7Wc1CQ7Y

# Read data file (choose the data from your folder)

'data.frame': 400 obs. of 4 variables:

mydata$admit <- as.factor(mydata$admit)

'data.frame': 400 obs. of 4 variables:

# Two-way table as factor variables

#Logistic regression model

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 404.39 on 324 degrees of freedom

Number of Fisher Scoring iterations: 4

From the result, GRE is not statistically significant, so we drop that

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 404.39 on 324 degrees of freedom

Number of Fisher Scoring iterations: 4

admit gre gpa rank

#Misclassification error - train data

So the misclassification error is, 27.08%

> with(mymodel, pchisq(null.deviance - deviance, df.null-df.residual, lower.t

You might also like