Data Analysis
Data Analysis
ex : P(65 X 66)
F (66)
F (65)
P ( X 66) P ( X 65)
More generally, P ( a X b)
F (b)
F (a )
P ( X b ) P ( X a )
1. Continuous Random variables
Assume F(x) differentiable and let f(x) denote the derivative of F(x). f(x) is
called the probability density function (pdf) of X.
a b x a x
with by construction
f ( x)dx 1
1. Continuous Random variables
E[ X ] pi xi E[ X ] xf ( x)dx
i
X V[X ] p ( x EV [ X ])
i
i i
2
X E ( X E[ X ]) 2
2
( x E[ X ]) f ( x)dx
It appears in the central limit theorem, which is the basis for calculating confidence
intervals and testing hypotheses.
1. Continuous Random variables 2.1 The density function
2. The normal distribution 2.2 Computing probabilities
3. The standard normal distribution
f ( x)
Properties: E[ X ] µ
0,2
=3
0,15
f(x)
0,1
0,05
0
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13
µ =2 x µ =6
The curve is flatter as σ …. increases
µ=6
0,25
0,2
---- see live questions----
0,15
f(x)
0,1
0,05 =5
=3
0
0 1 2 3 4 5 6 7 8 9 10 11
=2
12
x
1. Continuous Random variables 2.1 The density function
2. The normal distribution 2.2 Computing probabilities
3. The standard normal distribution
Computing probabilities and illustrating them graphically is made easy using applets
like stapplets.
X N ( µ 6, 2)
P (4 X 6)
P(2 X 10)
X N ( µ 6, 2)
1. Continuous Random variables 2.1 The density function
2. The normal distribution 2.2 Computing probabilities
3. The standard normal distribution
P(1.5 Z 0)
P(0 Z 1.5)
P( Z 1.5) P( Z 1.5)
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution
P(1 Z 1)
P(2 Z 2)
P(3 Z 3)
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution
Previous use
▪ z0 known. Find a probability
0 z
z0.05 1.645
1.645: Distance from the mean, measured in SD units, which has a 5% chance to be
exceeded (on the right)
Put another way, there is a 5% chance for Z to exceed 1.645, meaning to be located
at more than 1.645 SD from the mean. --------- see live questions----------
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation
Due to the symetry of the normal distribution, it follows that z is negative and
satisfies
P(z z )
z : negative z-value (Distance from the mean, measured in SD), which has a α
probability to be exceeded (on the left)
z
z 0
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation
z0.3
Distance from the mean (measured in SD) which
has a 30% chance to be exceeded on the left
/2 /2
-z/2 z/2
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation
3.4 Standardization
with µX = 180 cm σX = 10 cm
Instead of saying that an individual is 190 cm tall, we can say that this
individual is « located » at +1 SD of the expected height in the distribution
Q) Other example. If someone is 165 cm tall, it cas be said that his/her height is
……
165 180
1.5 SD inferior to the mean (180 cm)
10
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation
3.4 Standardization
If furthermore X N ( µX , X ) then Z N ( 0 ; 1 )
µZ Z
X N (100 ; 15)
P (Z Q1 ) 1
P ( Z Q1 )
z
More about the Normal approximation of the binomial distribution
pmf X ( x)
Then lim 1 (Laplace-De Moivre theorem)
n f np ( x)