100% found this document useful (1 vote)
76 views4 pages

STA255 - Statistical Theory: Assignment #1 (/36 Marks)

This document describes an assignment for a statistics course. It includes 3 questions assessing students' abilities to: 1) Analyze health data using R by comparing height distributions between genders and summarizing BMI distributions. 2) Calculate properties of a probability distribution describing drug approval times, generate random values in R to estimate these properties, and compare the estimates. 3) Simulate hypergeometric and binomial distributions in R to compare their probability estimates under different sampling scenarios.

Uploaded by

Tom Qi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
76 views4 pages

STA255 - Statistical Theory: Assignment #1 (/36 Marks)

This document describes an assignment for a statistics course. It includes 3 questions assessing students' abilities to: 1) Analyze health data using R by comparing height distributions between genders and summarizing BMI distributions. 2) Calculate properties of a probability distribution describing drug approval times, generate random values in R to estimate these properties, and compare the estimates. 3) Simulate hypergeometric and binomial distributions in R to compare their probability estimates under different sampling scenarios.

Uploaded by

Tom Qi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

STA255StatisticalTheory

Assignment#1(/36marks)
DuethroughaPortalTestat10pm,Wednesday,Feb8,2017


Question1(11marks)

Thefilebody.csv*(postedonPortal)containsseveralmeasurementson507physicallyactiveadults(247men
and260women),mostconsideredtobewithinahealthyweightrange.Themeasurementsinclude:
Age(years)
Weight(kg)
Height(cm)
Gender(codedas1=male,0=female)
SavethesedatatoyourcomputeranduploaditintoR.UseRtocreateappropriatenumericalandgraphical
summariestoexploredistributionsandassociationsbetweenvariables.

a. (4marks)Comparethedistributionsofheightsformalesandfemalesinthesample.Besuretomake
explicitreferencestotheappropriateRoutput/plotsyoucreated(i.e.,stateexactlywhat
output/plot(s)youusedaswellasyourinterpretationofthem).

b. (5marks)BodyMassIndex(BMI)canbecomputedasmass(or,weightinthedata)inkilograms

dividedbyheightinmetressquared( ).UsethedatatocomputeandstoreBMIs

foreveryoneinthesampleinanewRvariablecalledBMIandproduceappropriatenumericaland
graphicalsummariesofBMI.
i. Describethecentre,spreadandshapeoftheBMIdistributioninthesample.Besuretomake
explicitreferencestotheappropriateRoutput/plotsyoucreated(i.e.,stateexactlywhat
output/plotsyouusedaswellasyourinterpretationofthem).
ii. BMIisoftenusedtoclassifyapersonasunderweight(BMI<18.5kg/m2),normalweight(18.5
kg/m2BMI<25kg/m2),overweight(25kg/m2BMI<30kg/m2),orobese(BMI30kg/m2).
UseRtodeterminehowmanypeopleinthesamplewouldbeclassifiedasbeinginthenormal
weightrange.Reportthisnumber.Knowingthatmostoftheindividualsinthesamplewere
consideredtobewithinahealthy(i.e.,normal)weightrange,whatdoesthissuggestaboutthe
BMIclassifications?

c. (2marks)CopyandpasteallyourRcodeandoutputfromyourRconsolewindowthatyouusedto
answerpartsacofthisquestion.Includeonlytheworkingcode/output(i.e.,removeanylinesofcode
thatdidntworkaswellastheirerrormessages)andonlyincludeoutputthatdirectlyrelatestoyour
answerstothisquestion.

*DataadaptedfromJournalofStatisticsEducationDataSetandStoryhttp://ww2.amstat.org/publications/jse/datasets/body.txt.
STA255Assignment1Spring2017 1
Question2(14marks)

Themaximumpatentlifeforanewdrugisseventeenyearsbutthethetestingandapprovalprocesseswith
theUSFoodandDrugAdministration(FDA)cantakeyears.Theactualpatentlifeforthedrug(i.e.,thelength
oftimethatthepharmaceuticalcompanyhastorecovercostsandtomakeaprofit)canbethoughtofas
maximumpatentlifelessapprovalprocesstime.Supposethefollowingtablesummarizesthedistributionof
actualpatentlivesforallnewdrugs.

x (years) 3 4 5 6 7 8 9 10 11 12 13
p(x) 0.03 0.05 0.07 0.10 0.14 0.20 0.18 0.12 0.07 0.03 0.01

a. (6marks)Compute,2,andP(X10).Explainyoursteps(e.g.,sincewearelimitedtotextinthe
Portaltest,youcandescribeyourcalculationsinwords,orexpresstheformulasusingplaintext,
bracketsandarithmeticoperations).

b. (6marks)UseRtogenerateobservedvaluesoftherandomvariablethatfollowstheprobability
distributiongivenaboveandestimateE(X),V(X)andP(X10)basedon,10,100,and10,000
repetitionsoftheexperiment(i.e.,fillinthefollowingtable).
Estimate 10 100 10,000
of repetitions repetitions repetitions
E(X)
V(X)
P(X10)
Reportyourresults(completethetableaboveandcopyandpasteitdirectlyintothePortaltest)and
commentonhoweachsetofestimatescomparetothevaluesyoucomputedinQuestions2parta.

c. (2marks)CopyandpasteallyourRcodeandoutputfromyourRconsolewindowthatyouusedto
answerpartbofthisquestion.Includeonlytheworkingcode/output(i.e.,removeanylinesofcode
thatdidntworkaswellastheirerrormessages)andonlyincludeoutputthatdirectlyrelatestoyour
answerstothisquestion.

STA255Assignment1Spring2017 2
Question3(11marks)

SupposewearesamplingindividualsrandomlywithoutreplacementfromapopulationofN=100individuals
withproportionofsuccesses .Xfollowsahypergeometricdistributionwithpmf:

, max 0, min ,

0,

HowwelldoBinomialprobabilitiesapproximateHypergeometricprobabilities?Conductasimulationstudy
inRtocompareprobabilityestimatesbasedontheBinomialDistributiontothosebasedonHypergeometric
Distribution(thetruedistributionofthenumberofsuccessesinthesampleinthissituation).Inthissimulation
study,youwillvarythesamplesize(n),relativetothepopulationsize(N)andestimate ).

a. (6marks)UseRtosimulaterepetitionsofHypergeometricandBinomialexperimentsandusethe
observedvaluesoftherandomvariablestoestimate ).Use5000repetitions.Fillineachof
thetablesbelowandcopyandpasteyourtablesintothePortalassignment.
Table3.1:Lowproportionofsuccessesinthepopulation( 0.2)
Estimateof 0.01 0.05 0.1 0.25 0.5 0.75 0.9
)
Hypergeometric
Binomial

Table3.2:Moderateproportionofsuccessesinthepopulation( 0.5)
Estimateof 0.01 0.05 0.1 0.25 0.5 0.75 0.9
)
Hypergeometric
Binomial

Table3.3:Highproportionofsuccessesinthepopulation( 0.8)
Estimateof 0.01 0.05 0.1 0.25 0.5 0.75 0.9
)
Hypergeometric
Binomial

HereissomesampleRcodethatrandomlygenerates5000observationsofaBinomialrandomvariableand
5000observationsofaHypergeometricrandomvariableandusesthemtoobtainestimatesoftheprobability
).Modifythiscode,asnecessary,tofillintheabovetables.
N=100
n=round(0.01*N,0)
p<-0.5
pn<-p*n
binomreps<-rbinom(5000,n, p)
binomp<-mean(binomreps > pn)
hyperreps<-rhyper(5000,round(p*N,0),N-round(p*N,0),n)
hyperp<-mean(hyperreps > pn)
hyperp
binomp

STA255Assignment1Spring2017 3
b. (3marks)ReviewyoursimulationresultsinQuestion3parta.Basedonthese,whatcanyouconclude
abouttheappropriatenessofusingtheBinomialDistribution(i.e.,treatingtheexperimentassampling
withreplacement)whenthetruedistributionofthenumberofsuccessesisHypergeometric(i.e.,when
youareactuallysamplingwithoutreplacement)?Refertoanypatternsthatyouobservedinyour
simulationtojustifyyouranswer.

c. (2marks)CopyandpasteallyourRcodeandoutputfromyourRconsolewindowthatyouusedto
answerpartaofthisquestion.Includeonlytheworkingcode/output(i.e.,removeanylinesofcode
thatdidntworkaswellastheirerrormessages)andonlyincludeoutputthatdirectlyrelatestoyour
answerstothisquestion.

STA255Assignment1Spring2017 4

You might also like