Statistics I: Introduction and Distributions of Sampling Statistics
Statistics I: Introduction and Distributions of Sampling Statistics
petar popovski
assistant professor
antennas, propagation and radio networking (APNET)
department of electronic systems
aalborg university
e-mail: [email protected]
lecture outline
introduction
descriptive statistics
– description and summarization of data sets
– chebyshev’s inequality and the weak law of large numbers
– normal data sets
– sample correlation coefficient
2 / 22
introduction
3 / 22
description of data sets (1)
4 / 22
description of data sets (2)
histograms
– bins, class intervals, left-end inclusion convention
5 / 22
description of data sets (3)
ogive=cumulative
frequency plot
– to approximate the
cumulative
distribution function of
given pdf
6 / 22
sample mean, median and mode
1 n
sample mean x = ∑ xi
n i =1
linear property calculation with frequencies
∀i, yi = axi + b ⇒ y = ax + b k
v f relation to the
x =∑ i i mean value of
i =1 n random variable
sample median
– if n is odd, it is the (n+1)/2-th value
– if n is even, it is the average of the values in positions n/2 and n/2+1
7 / 22
sample variance and standard deviation
variance standard deviation
1 n 1 n
s =
2
∑ i
n − 1 i =1
( x − x ) 2
s= ∑
n − 1 i =1
( xi − x ) 2
∑ i
( x
i =1
− x ) =2
∑ i
x 2
− x
i =1
2
∀i, yi = a + bxi ⇒ s y2 = b 2 s x2
example
203 − 9( 359 )
2
s =
2
= 8.361
8
8 / 22
sample percentiles
example
let the population size be n=33. the sample 10 percentile is the 4th
smallest value, since ⎡33·0.1⎤=4
quartiles
– first (25%), second (50%), third (75%)
9 / 22
chebyshev’s inequality
(
for any value of k ≥ 1 more than 100 1 − k12 ) percent of
the data lie within the interval
(x − ks, x + ks )
– it is universal, but therefore the bound can be loose
probability-version of the chebyshev’s inequality
10 / 22
normal and skewed data sets
normal histogram approximately normal
11 / 22
sample correlation coefficient
the statistical data can be given as pairs of values and we want to
find if there is a relation between those values
r =∑
i i
i =1 i i =
r= i =1 (n − 1) s x s y =
(nn − 1) s x s y
n
∑ (xi − x )( yi − y )
= ∑i =(1xi − x )( yi − y )
=
∑(x(x− −x )x ) ∑(y(y− −y )y )
in=1 n
2 2
n n
∑ ∑
i 2 j 2
i =1 i j =1 j
i =1 j =1
Y = f ( X 1 , X 2 ,L X n )
14 / 22
sample mean
∑X i
X= i =1
n
σ2
E [X ] = μ Var (X ) =
n
15 / 22
central limit theorem (1)
a fundamental result in probability theory
∑X
z2
− nμ 1 −
i
pZ ( z ) = e 2
Z= i =1
2π
σ n
16 / 22
central limit theorem (2)
example
n
⎡ n ⎤
E ⎢∑ X i − W ⎥ = 3n − 400 ∑X i − W − (3n − 400)
⎣ i =1 ⎦ Z= i =1
⎛ n ⎞ 0.09n + 1600
Var⎜ ∑ X i − W ⎟ = 0.09n + 1600
⎝ i =1 ⎠
400 − 3n
≤ 1.28 ⇒ n ≥ 117
0.09n + 1600
17 / 22
central limit theorem (3)
an important application of the central limit theorem is
for binomial random variables
⎧1 with prob. p
X = X1 + X 2 + L + X n Xi = ⎨
⎩0 with prob. (1 − p )
X is a random variable that represent the number of
successes in n trials, where the probability of success
in each trial is p
E [X i ] = p; Var ( X i ) = p (1 − p )
X − np
the central limit theorem states that Z =
np (1 − p )
is approximately a standard normal random variable
see problem 15 of chapter 6
18 / 22
sample variance
∑ i
( X − X ) 2
S2 = i =1
n −1
[ ]
E S2 =σ 2
19 / 22
sampling from normal population (1)
(
X ~ N μ , σn
2
)
to find the distribution of the sample variance
n
∑ i
( X − X ) 2
S2 = i =1
n −1
recall the chi-square distribution
– Y = Z1 + Z 2 + L Z n has chi-square distribution with n degrees
2 2 2
20 / 22
sampling from normal population (2)
(n − 1) S 2
the variable σ2 has a chi-square distribution
with n-1 degrees of freedom
X −μ
then it follows that n
S
has a t-distribution with n-1 degrees of freedom
21 / 22
sampling from a finite population
random sample of population of N elements
⎛N⎞
– each of the ⎜⎜ ⎟⎟ subsets is equally likely to be sample
⎝n⎠
consider the case where the fraction p of the
population has some feature i. e. in total Np elements
– let Xi be the indicator variable
X = X1 + X 2 + L + X n
22 / 22