Bioestadistica: Clara Carner 2023-05-29

Bioestadistica
Clara Carner
2023-05-29
install.packages(“HardyWeinberg”) library(“HardyWeinberg”) x <- c(MM = 298, MN = 489, NN = 213)

HW.test <- HWChisq(x, cc = 0, verbose = TRUE)
##LOGISTIC REGRESSION
#SIMULATION
estem mirant si la p estimada es la mateixa que la p per simulacio
set.seed(2) #stars the same way to ganarate

d<-rbinom(50,1,0.3) #50 variables generated with prob 0,3
phat<-mean(d) #p estimada
simulation under a model:
set.seed(2)
X1<-c(rep(1,500),rep(0,500))
X2<-c(rep(0,250),rep(1,500),rep(0,250)) #all the combinations of 1 and 0
z<-0.1+0.5*X1+0.7*X2
p<-exp(z)/(1+exp(z)) #logistic function, prob to have the desease given x1 etc
Y<-rbinom(1000,1,p)
output<-glm(Y~X1+X2, family=binomial) #glm is used to fit generalized linear
summary(output)
##
## Call:
## glm(formula = Y ~ X1 + X2, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7555 -1.2216 0.6943 0.9345 1.1338
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1034 0.1125 0.919 0.358210
## X1 0.4989 0.1369 3.645 0.000268 ***
## X2 0.6976 0.1373 5.079 3.79e-07 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
1
## Null deviance: 1279.4 on 999 degrees of freedom
## Residual deviance: 1240.0 on 997 degrees of freedom
## AIC: 1246
##
## Number of Fisher Scoring iterations: 4
glm: generalized linear models, specified by giving a symbolic description of the linear predictor and a
description of the error distribution. donades les y, i les x, busquem les bettes
- on surt estimate, es el valor de les bettes
• z value -> wald test: ex: B2=0 -> estiamate b2/st.error b2= z value
#EXERCICIS
##EXERCICI 5## #(d) Test the null hypothesis of HWE using R (see lecture) #150 GG, 40 G1, 1O AA
#install the library if(require(HardyWeinberg)){ install.packages(“HardyWeinberg”);require(HardyWeinberg)}
#vector of genotype frequencies x<-c(GG=150, GA=40, AA=10) #Perform the test HW.test<-
HWChisq(x,cc=0, verbose=TRUE) # no em funciona #HO is rejected at 5% level #The HW equilibrium
does not hold
##Exercise 4## #clean the R environment rm(list=ls()) #Exposure probability p_exp<-0.25 #Disease
probability given the exposure p_d_exp<-0.25 ##Disease probability given the subject is not exposed
p_d_notexp<-0.5
Small simulation study in R. Consider a hypothetical disease and

exposure.
#In a population the P(E = 1)=0.25, P(D = 1|E = 1)=0.25 and P(D = 1|E = 0)=0.5
(a) Give the odds ratio that D will occur for E versus non E in this
population.
odds_ratio<-(p_d_exp/(1-p_d_exp))/(p_d_notexp/(1-p_d_notexp))
2
(b) Compute the probability of the disease in this population.
#Use the law of Total probability #P(D)=P(D|E)P(E)+P(D|notE)(not E) p_d<-p_d_expp_exp+p_d_notexp(1-
p_exp)
(c) Compute the following probabilities p1=P(E = 1|D = 1) and

p2=P(E = 1|D = 0).
#p1=P(E = 1|D = 1) = p(D|E)p(E)/p(D) p1<-(p_d_expp_exp)/p_d
#p2=P(E = 1|D = 0) = p((1-D)p(E))/p(D) p2<-((1-p_d_exp)p_exp)/(1-p_d)
(d) You can use R to obtain observations from distributions.
Try out the functions rbionom() and rnorm for the binomial and
the
normal distribution respectively. For example generate a series of
ones and zeros of size 1000 with a probabilty of a one of 0.3 and
check whether indeed about one third of your sample is one.
Do a similar exercise for the normal distribution.

x<-rbinom(1000,1,0.3) p<-sum(x)/1000 #we see that the probability is the same
(e) Now generate data for a case control study.
Assume you have 100 cases and 100 controls.
Code to generate the exposure variables for
cases and controls is as follows (you need
to fill in numbers for p1 and p2)

exposure<-as.vector(c(rbinom(100,1,p1),rbinom(100,1,p2))) outcome<-as.vector(c(rep(1,100),rep(0,100)))
# 100 ,1 son els casos, 100,0 son els controls data<-cbind(outcome, exposure) colnames(data)<-
c(“outcome”,“exposure”) data<- as.data.frame(data) # (f) Check whether the probability of E = 1 in the
cases and in the controls #agrees with your simulation settings.
3
#estimated prob of exposure among the cases prob_cases<-mean(data$exposure[1:100]) prob_cases
print(prob_cases-p1)
#estimated prob of exposure among the controls prob_controls<-mean(data$exposure[101:200])
prob_controls print(prob_controls-p2)
(g) Use your sample to estimate the odds ratio of interest.

#Odds of exposure among the cases odds_cases<-prob_cases/(1-prob_cases) odds_controls<-
prob_controls/(1-prob_controls) #Estimated odds ratio from the generated sample odds_ratio_est<-
odds_cases/odds_controls

Bioestadistica: Clara Carner 2023-05-29

Uploaded by

Copyright:

Available Formats

Bioestadistica: Clara Carner 2023-05-29

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bioestadistica: Clara Carner 2023-05-29

Uploaded by

Copyright:

Available Formats

Bioestadistica

install.packages(“HardyWeinberg”) library(“HardyWeinberg”) x <- c(MM = 298, MN = 489, NN = 213)

set.seed(2) #stars the same way to ganarate

simulation under a model:

Small simulation study in R. Consider a hypothetical disease and

(c) Compute the following probabilities p1=P(E = 1|D = 1) and

(d) You can use R to obtain observations from distributions.

normal distribution respectively. For example generate a series of

check whether indeed about one third of your sample is one.

Do a similar exercise for the normal distribution.

(e) Now generate data for a case control study.

Assume you have 100 cases and 100 controls.

Code to generate the exposure variables for

cases and controls is as follows (you need

to fill in numbers for p1 and p2)

(g) Use your sample to estimate the odds ratio of interest.

You might also like