Bioestadistica: Clara Carner 2023-05-29

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Bioestadistica

Clara Carner

2023-05-29

install.packages(“HardyWeinberg”) library(“HardyWeinberg”) x <- c(MM = 298, MN = 489, NN = 213)


HW.test <- HWChisq(x, cc = 0, verbose = TRUE)
##LOGISTIC REGRESSION
#SIMULATION
estem mirant si la p estimada es la mateixa que la p per simulacio

set.seed(2) #stars the same way to ganarate


d<-rbinom(50,1,0.3) #50 variables generated with prob 0,3
phat<-mean(d) #p estimada

simulation under a model:

set.seed(2)
X1<-c(rep(1,500),rep(0,500))
X2<-c(rep(0,250),rep(1,500),rep(0,250)) #all the combinations of 1 and 0
z<-0.1+0.5*X1+0.7*X2
p<-exp(z)/(1+exp(z)) #logistic function, prob to have the desease given x1 etc
Y<-rbinom(1000,1,p)
output<-glm(Y~X1+X2, family=binomial) #glm is used to fit generalized linear
summary(output)

##
## Call:
## glm(formula = Y ~ X1 + X2, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7555 -1.2216 0.6943 0.9345 1.1338
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1034 0.1125 0.919 0.358210
## X1 0.4989 0.1369 3.645 0.000268 ***
## X2 0.6976 0.1373 5.079 3.79e-07 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## (Dispersion parameter for binomial family taken to be 1)
##

1
## Null deviance: 1279.4 on 999 degrees of freedom
## Residual deviance: 1240.0 on 997 degrees of freedom
## AIC: 1246
##
## Number of Fisher Scoring iterations: 4

glm: generalized linear models, specified by giving a symbolic description of the linear predictor and a
description of the error distribution. donades les y, i les x, busquem les bettes
- on surt estimate, es el valor de les bettes

• z value -> wald test: ex: B2=0 -> estiamate b2/st.error b2= z value

#EXERCICIS
##EXERCICI 5## #(d) Test the null hypothesis of HWE using R (see lecture) #150 GG, 40 G1, 1O AA
#install the library if(require(HardyWeinberg)){ install.packages(“HardyWeinberg”);require(HardyWeinberg)}
#vector of genotype frequencies x<-c(GG=150, GA=40, AA=10) #Perform the test HW.test<-
HWChisq(x,cc=0, verbose=TRUE) # no em funciona #HO is rejected at 5% level #The HW equilibrium
does not hold
##Exercise 4## #clean the R environment rm(list=ls()) #Exposure probability p_exp<-0.25 #Disease
probability given the exposure p_d_exp<-0.25 ##Disease probability given the subject is not exposed
p_d_notexp<-0.5

Small simulation study in R. Consider a hypothetical disease and


exposure.
#In a population the P(E = 1)=0.25, P(D = 1|E = 1)=0.25 and P(D = 1|E = 0)=0.5

(a) Give the odds ratio that D will occur for E versus non E in this
population.
odds_ratio<-(p_d_exp/(1-p_d_exp))/(p_d_notexp/(1-p_d_notexp))

2
(b) Compute the probability of the disease in this population.
#Use the law of Total probability #P(D)=P(D|E)P(E)+P(D|notE)(not E) p_d<-p_d_expp_exp+p_d_notexp(1-
p_exp)

(c) Compute the following probabilities p1=P(E = 1|D = 1) and


p2=P(E = 1|D = 0).
#p1=P(E = 1|D = 1) = p(D|E)p(E)/p(D) p1<-(p_d_expp_exp)/p_d
#p2=P(E = 1|D = 0) = p((1-D)p(E))/p(D) p2<-((1-p_d_exp)p_exp)/(1-p_d)

(d) You can use R to obtain observations from distributions.

Try out the functions rbionom() and rnorm for the binomial and
the

normal distribution respectively. For example generate a series of

ones and zeros of size 1000 with a probabilty of a one of 0.3 and

check whether indeed about one third of your sample is one.

Do a similar exercise for the normal distribution.


x<-rbinom(1000,1,0.3) p<-sum(x)/1000 #we see that the probability is the same

(e) Now generate data for a case control study.

Assume you have 100 cases and 100 controls.

Code to generate the exposure variables for

cases and controls is as follows (you need

to fill in numbers for p1 and p2)


exposure<-as.vector(c(rbinom(100,1,p1),rbinom(100,1,p2))) outcome<-as.vector(c(rep(1,100),rep(0,100)))
# 100 ,1 son els casos, 100,0 son els controls data<-cbind(outcome, exposure) colnames(data)<-
c(“outcome”,“exposure”) data<- as.data.frame(data) # (f) Check whether the probability of E = 1 in the
cases and in the controls #agrees with your simulation settings.

3
#estimated prob of exposure among the cases prob_cases<-mean(data$exposure[1:100]) prob_cases
print(prob_cases-p1)
#estimated prob of exposure among the controls prob_controls<-mean(data$exposure[101:200])
prob_controls print(prob_controls-p2)

(g) Use your sample to estimate the odds ratio of interest.


#Odds of exposure among the cases odds_cases<-prob_cases/(1-prob_cases) odds_controls<-
prob_controls/(1-prob_controls) #Estimated odds ratio from the generated sample odds_ratio_est<-
odds_cases/odds_controls

You might also like