0% found this document useful (0 votes)
20 views6 pages

222ECO01 Anand Advanced Econometrics Activity1

This document describes a dataset on diabetes patients and examines various regression models. It provides background on the data and defines the dependent and explanatory variables. Both a linear probability model and logit model are estimated and their coefficients are interpreted.

Uploaded by

premium info2222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

222ECO01 Anand Advanced Econometrics Activity1

This document describes a dataset on diabetes patients and examines various regression models. It provides background on the data and defines the dependent and explanatory variables. Both a linear probability model and logit model are estimated and their coefficients are interpreted.

Uploaded by

premium info2222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

222ECO01__Anand__advanced_econometrics_activity1.

R
andand

2024-02-02
Question 1:

a) Source of data: This dataset is originally from the National Institute


of Diabetes and Digestive and Kidney Diseases.

b) Time Period of data: The data was collected between 1965 and 1988

c) Dependent variable: The dependent variable is “Outcome”. This is a


binary variable that takes the dummy value of 1 if a patient is
Diabetic and 0 if a patient is Not Diabetic

d) Explanatory variables: The explanatory variables include Pregnancies,


Glucose, Blood Pressure, Skin Thickness, Insulin, BMI, Diabetes
Pedigree Function, and Age of the patients.

The objective of the dataset is to predict whether or not a patient has


diabetes, based on the explanatory variables included in the dataset

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.3.2

##
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':


##
## filter, lag

## The following objects are masked from 'package:base':


##
## intersect, setdiff, setequal, union

library(purrr)

## Warning: package 'purrr' was built under R version 4.3.2

library(Ecdat)

## Warning: package 'Ecdat' was built under R version 4.3.2


## Loading required package: Ecfun

## Warning: package 'Ecfun' was built under R version 4.3.2

##
## Attaching package: 'Ecfun'

## The following object is masked from 'package:base':


##
## sign

##
## Attaching package: 'Ecdat'

## The following object is masked from 'package:datasets':


##
## Orange

library(broom)

## Warning: package 'broom' was built under R version 4.3.2

library(aod)

## Warning: package 'aod' was built under R version 4.3.2

library(margins)

## Warning: package 'margins' was built under R version 4.3.2

library(lmtest)

## Warning: package 'lmtest' was built under R version 4.3.2

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 4.3.2

##
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':


##
## as.Date, as.Date.numeric

library(sandwich)

## Warning: package 'sandwich' was built under R version 4.3.2

library(DescTools)

## Warning: package 'DescTools' was built under R version 4.3.2

##
## Attaching package: 'DescTools'
## The following object is masked from 'package:Ecfun':
##
## BoxCox

library(mfx)

## Warning: package 'mfx' was built under R version 4.3.2

## Loading required package: MASS

##
## Attaching package: 'MASS'

## The following object is masked from 'package:Ecdat':


##
## SP500

## The following object is masked from 'package:dplyr':


##
## select

## Loading required package: betareg

## Warning: package 'betareg' was built under R version 4.3.2

library(brant)

## Warning: package 'brant' was built under R version 4.3.2

library(tidyr)

## Warning: package 'tidyr' was built under R version 4.3.2

library(janitor)

## Warning: package 'janitor' was built under R version 4.3.2

##
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':


##
## chisq.test, fisher.test

library(nnet)

## Warning: package 'nnet' was built under R version 4.3.2

library(readxl)

## Warning: package 'readxl' was built under R version 4.3.2

#Question 2) Run a Linear Probability Model and interpret the


coefficients
diabetes <- read_excel("C:\\Users\\andand\\Desktop\\diabetes.xlsx")
diabetes <- diabetes %>%
mutate(Outcome_num = ifelse(Outcome == "NotDiabetic", 0, 1))

model_lpm <- lm(Outcome_num ~ Pregnancies + Glucose + BloodPressure +


SkinThickness + Insulin + BMI + DiabetesPedigreeFunction +
Age, data = diabetes)
summary(model_lpm)

##
## Call:
## lm(formula = Outcome_num ~ Pregnancies + Glucose + BloodPressure +
## SkinThickness + Insulin + BMI + DiabetesPedigreeFunction +
## Age, data = diabetes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.01348 -0.29513 -0.09541 0.32112 1.24160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.8538943 0.0854850 -9.989 < 2e-16 ***
## Pregnancies 0.0205919 0.0051300 4.014 6.56e-05 ***
## Glucose 0.0059203 0.0005151 11.493 < 2e-16 ***
## BloodPressure -0.0023319 0.0008116 -2.873 0.00418 **
## SkinThickness 0.0001545 0.0011122 0.139 0.88954
## Insulin -0.0001805 0.0001498 -1.205 0.22857
## BMI 0.0132440 0.0020878 6.344 3.85e-10 ***
## DiabetesPedigreeFunction 0.1472374 0.0450539 3.268 0.00113 **
## Age 0.0026214 0.0015486 1.693 0.09092 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4002 on 759 degrees of freedom
## Multiple R-squared: 0.3033, Adjusted R-squared: 0.2959
## F-statistic: 41.29 on 8 and 759 DF, p-value: < 2.2e-16

Interpretation of coefficients

i) An increase in Pregnancies by one unit increases the probability of the


patient being Diabetic by 0.0205919, holding other variables constant

ii) An increase in Glucose by one unit increases the probability of the patient
being Diabetic by 0.0059203, holding other variables constant

iii) An increase in Blood Pressure by one unit decreases the probability of the
patient being Diabetic by 0.0023319, holding other variables constant
iv) An increase in Skin Thickness by one unit increases the probability of the
patient being Diabetic by 0.0001545, holding other variables constant

v) An increase in Insulin by one unit decreases the probability of the patient


being Diabetic by 0.0001805, holding other variables constant

vi) An increase in BMI by one unit increases the probability of the patient
being Diabetic by 0.0132440, holding other variables constant

vii) An increase in Diabetes Pedigree Function by one unit increases the


probability of the patient being Diabetic by 0.1472374, holding other
variables constant

viii) An increase in Age by one unit increases the probability of the patient
being Diabetic by 0.0026214, holding other variables constant

#Question 3) Run a Logit/Probit model and interpret the coefficients

model_logit <- glm(Outcome_num ~ Pregnancies + Glucose + BloodPressure +


SkinThickness + Insulin + BMI + DiabetesPedigreeFunction
+ Age, data = diabetes, family = binomial(link = "logit"))

summary(model_logit)

##
## Call:
## glm(formula = Outcome_num ~ Pregnancies + Glucose + BloodPressure +
## SkinThickness + Insulin + BMI + DiabetesPedigreeFunction +
## Age, family = binomial(link = "logit"), data = diabetes)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.4046964 0.7166359 -11.728 < 2e-16 ***
## Pregnancies 0.1231823 0.0320776 3.840 0.000123 ***
## Glucose 0.0351637 0.0037087 9.481 < 2e-16 ***
## BloodPressure -0.0132955 0.0052336 -2.540 0.011072 *
## SkinThickness 0.0006190 0.0068994 0.090 0.928515
## Insulin -0.0011917 0.0009012 -1.322 0.186065
## BMI 0.0897010 0.0150876 5.945 2.76e-09 ***
## DiabetesPedigreeFunction 0.9451797 0.2991475 3.160 0.001580 **
## Age 0.0148690 0.0093348 1.593 0.111192
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 993.48 on 767 degrees of freedom
## Residual deviance: 723.45 on 759 degrees of freedom
## AIC: 741.45
##
## Number of Fisher Scoring iterations: 5

Interpretation of the coefficients:

i) An increase in Pregnancies by one unit increases the log odds of a patient


being Diabetic by 0.1231823, holding other variables constant

ii) An increase in Glucose by one unit increases the log odds of a patient being
Diabetic by 0.0351637, holding other variables constant

iii) An increase in Blood Pressure by one unit decreases the log odds of a
patient being Diabetic by 0.0132955, holding other variables constant

iv) An increase in Skin Thickness by one unit increases the log odds of a
patient being Diabetic by 0.0006190, holding other variables constant

v) An increase in Insulin by one unit decreases the log odds of a patient being
Diabetic by 0.0011917, holding other variables constant

vi) An increase in BMI by one unit increases the log odds of a patient being
Diabetic by 0.0897010, holding other variables constant

vii) An increase in Diabetes Pedigree Function by one unit increases the log
odds of a patient being Diabetic by 0.9451797, holding other variables
constant

viii) An increase in Age by one unit increases the log odds of a patient being
Diabetic by 0.0148690, holding other variables constant

You might also like