0% found this document useful (0 votes)
35 views4 pages

Practical 5

TP 5 de statistiques

Uploaded by

rtchuidjangnana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views4 pages

Practical 5

TP 5 de statistiques

Uploaded by

rtchuidjangnana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

University of Geneva GSEM

Statistics I Fall 2017


Prof. Eva Cantoni Practical 5

Point Estimation - Sampling distributions

Goals: This practical has two objectives. First, we shall review some notions of useful in
point estimation theory. Second, use R to simulate the sampling distribution of the median.

1 Theoretical Exercices
1.1 Exercise 1
Which of the following are random variables?

1. Population mean.

2. Population size.

3. Sample size.

4. Sample mean.

5. Variance of the sample mean.

6. Maximum of the sample.

7. Variance of the population.

8. Estimated variance of the sample mean.

1.2 Exercice 2
Suppose we are interested in estimating the proportion of households living below the poverty
line in a given swiss canton. For this purpose, a random sample of households is drawn from
the population.

1. What type of computations should we carry on the sample in order to answer our
question? Formalize these steps by proposing an estimator and specify its distribution.

2. If the proportion of households living below the poverty line is equal to 0.15 at the
population level, what should the sample size be so that the standard deviation of the
estimator we defined at point 1. is (at the most) equal to 0.02?

1
University of Geneva GSEM
Statistics I Fall 2017
Prof. Eva Cantoni Practical 5

1.3 Exercice 3
The sample mean X̄ of a sample of size n is used to estimate the population mean µ. We
would like to find n such that the absolute error of estimation |X̄ − µ| is (at the most) equal
to a fixed value d with a (large) probability 1 − α (α given). Assume we have a sequence of
random variables X1 , . . . , Xn independent and identically distributed (i.i.d.) drawn from a
N (µ, 4) distribution.

1. Is there a value of n for any value of α?

2. Find n if α = 0.05 and d = 1.

2 R simulation
2.1 Distributions in R
R has built-in functions to evaluate quantities associated with many common probability
distributions. You can compute values of the cumulative distribution function (cdf) using
functions with prefix “p", quantiles using the prefix “q". Moreover, we can evaluate the
probability density functions (pdf for continuous distributions) or probability mass functions
(pmf for discrete distributions) with the prefix “d" and randomly generate observations
drawn from given distributions using the prefix “r".
The following table summarizes the available functions for some common probability distri-
butions.

F (q) = P (X 6 q) F −1 (p) = Q(p) f (x) or P (X = x) Simulation


Uniform U(a, b) punif(q,a,b) qunif(p,a,b) dunif(x,a,b) runif(n,a,b)
Normal N (µ, σ 2 ) pnorm(q,µ, σ) qnorm(p,µ, σ) dnorm(x,µ, σ) rnorm(n,µ, σ)
Binomial B(m, p) pbinom(q,m,p) qbinom(p,m,p) dbinom(x,m,p) rbinom(n,m,p)
Poisson P(λ) ppois(q,λ) qpois(p,λ) dpois(q,λ) rpois(n,λ)
Exponential E(λ) pexp(q,λ) qexp(p,λ) dexp(x,λ) rexp(n,λ)
Chi-square χ2df pchisq(q,df) qchisq(p,df) dchisq(x,df) rchisq(n,df)
Student Stdf pt(q,df) qt(p,df) dt(x,df) rt(n,df)
Fisher Fdf 1,df 2 pf(q, df1, df2) qf(p, df1, df2) df(p, df1, df2) rf(n, df1, df2)

where :

• p : the value of probability at which evaluate a quantile;

• q : the quantile at which evaluate a lower tail probability.;

• x : a value at which evaluate the density or probability;

• n : the size of the sample to simulate.

2
University of Geneva GSEM
Statistics I Fall 2017
Prof. Eva Cantoni Practical 5

The R implementations of these probability distributions employ the following parameters


(listed with their values by default):

• a et b (min=0 and max=1): the parameters of the Uniform distribution (beginning and
end of the interval);

• µ et σ (mean=0 and sd=1): mean and standard deviation parameters of the Normal
distribution;

• m et p (size and prob): Number of trials and probability of success for a binomial
distribution.

• λ (lambda or rate = 1): Rate parameter for a Poisson and exponential distribution.

• df (df) : Degrees of freedom for the Student and Chi-square distributions.

Consider the following set of applications of the above:

1. For X ∼ N (10, 4) compute P (X < 12) and P (10 < X < 12).

2. Compute the 90% quantile of a E(1/2) distribution.

3. For X ∼ P(4), compute P (X = 2) and P (X 6 2).

4. Generate a random sample of size 200 from a N (1.5, 4) distribution by :


norm . sample = rnorm ( n = 200 , mean = 1.5 , sd = 2)

Draw a kernel density plot of the sample (see Practical 3). Add the probability density
function of a N (1.5, 4) distribution on the same plot with :
sorted . norm . sample = sort ( norm . sample )
lines ( sorted . norm . sample , dnorm ( sorted . norm . sample ,
mean = 1.5 , sd = 2) , col = ' ' red ' ')
# Try lines(norm.sample, dnorm(norm.sample, mean = 1.5, sd = 2),
# col=”red”) instead of the above. What happens ?

2.2 Sampling distribution of the median


As an example, we shall study the behavior of the sampling distribution of the median
of a sample, median(X1 , . . . , Xn ), when the observations are drawn from an exponential
iid
distribution, i.e. X1 , . . . , Xn ∼ E(λ). Remember that, with the parametrization used by R,
the cumulative distribution function of such a random variable is :

F (x) = 1 − exp(−λx)

3
University of Geneva GSEM
Statistics I Fall 2017
Prof. Eva Cantoni Practical 5

1. Give the population mean E(X) and population median m(λ) of a random variable
X ∼ E(λ).

2. To look at the sampling distribution of the median when λ = 1/2 and sample size is
100, we simulate 500 samples:
Exp . median = numeric (500)
# prepare a vector of size 500 to store the results
for ( i in 1:500)
{
Exp . median [ i ] = median ( rexp (100 , rate = 1 / 2 ))
# store results at each iteration
}

Based on the above simulated object Exp.median:

(a) Explore graphically the sampling distribution of the median (with histograms,
boxplots etc).
(b) Can the sampling distribution be considered normal? Check graphically with the
appropriate tool.
(c) Around which value is the sampling distribution of median(X1 , . . . , Xn ) concen-
trated?

Redo steps 1.-4. above for the 90%-quantile.

You might also like