0% found this document useful (0 votes)
20 views30 pages

Data Analysis

English course of Data Analysis. Quick reminder

Uploaded by

simon.abitbol01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views30 pages

Data Analysis

English course of Data Analysis. Quick reminder

Uploaded by

simon.abitbol01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Session 3: Continuous random variable and the Normal distribution

1. Continuous Random variables

Let X be a continuous random variable

ex : X = weight of a random student with RX  [30, 200]

 P(X = x) = 0 for all x ex : P( X  65,0000000000... kg )  0

 Probabilities are then calculated over an interval using the cumulative


probability function of X defined by

ex : P(65  X  66)  
F (66)  
F (65)
P ( X  66) P ( X  65)

 More generally, P ( a  X  b)  
F (b)  
F (a )
P ( X b ) P ( X a )
1. Continuous Random variables

Assume F(x) differentiable and let f(x) denote the derivative of F(x). f(x) is
called the probability density function (pdf) of X.

Then, according to the fundamental theorem of calculus,


b
P(a  X  b)  F (b)  F (a )   f ( x)dx
a
Graphically, the probability that X takes a value in an interval is now represented by
the area under f(x) over this interval
f(x) f(x)
b
a
P (a  X  b)   f ( x)dx
a
P( X  a)  

f ( x)dx

a b x a x



with by construction 

f ( x)dx  1
1. Continuous Random variables

Mathematical expectation and standard deviation of a continuous random variable


Same interpretations but different expressions

For a discrete RV For a continuous RV

E[ X ]   pi xi E[ X ]   xf ( x)dx

i

 X  V[X ]   p ( x  EV [ X ])
i
i i
2
 X  E ( X  E[ X ]) 2 

   2
( x E[ X ]) f ( x)dx

(expressions not directly used


in this course)
1. Continuous Random variables

There is an infinity of continuous probability distributions. Their shapes depend on


the random experiment considered

The most important one is the normal distribution

It appears in the central limit theorem, which is the basis for calculating confidence
intervals and testing hypotheses.
1. Continuous Random variables 2.1 The density function
2. The normal distribution 2.2 Computing probabilities
3. The standard normal distribution

Definition : A random variable X follows a normal distribution of parameters µ


and σ ,
X  N ( µ,  )
1  1  x  µ 2 
if its density function is given by f ( x)  exp     
2  2    

where E[ X ]  µ V (X )  

f ( x)

 
Properties: E[ X ]  µ

▪ f(x) is symetrical and centered around E[ X ]  µ ( median  mode)


▪ f(x) is bell-shaped
1. Continuous Random variables 2.1 The density function
2. The normal distribution 2.2 Computing probabilities
3. The standard normal distribution

 Graphically, µ defines the position of the curve on the horizontal axis


Q: When µ increases, the curve moves … to the right
0,25

0,2
=3
0,15
f(x)

0,1

0,05

0
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13

µ =2 x µ =6
 The curve is flatter as σ …. increases
µ=6

0,25

0,2
---- see live questions----
0,15
f(x)

0,1

0,05  =5
 =3
0
0 1 2 3 4 5 6 7 8 9 10 11
 =2
12

x
1. Continuous Random variables 2.1 The density function
2. The normal distribution 2.2 Computing probabilities
3. The standard normal distribution

Computing probabilities and illustrating them graphically is made easy using applets
like stapplets.
X  N ( µ  6,   2)

P (X  4)? P (X < µ) = P (X > µ) = 0.5


1. Continuous Random variables 2.1 The density function
2. The normal distribution 2.2 Computing probabilities
3. The standard normal distribution

P (4  X  6)

Due to the symetry


1. Continuous Random variables 2.1 The density function
2. The normal distribution 2.2 Computing probabilities
3. The standard normal distribution

P(2  X  10)

X  N ( µ  6,   2)
1. Continuous Random variables 2.1 The density function
2. The normal distribution 2.2 Computing probabilities
3. The standard normal distribution

Probabilities of standard intervals

68.27% 95.45% 99.73%

µ- µ µ+ µ-2 µ µ+2 µ-3 µ µ+3

--------- see live questions----------


1. Continuous random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution

Standard normal distribution: Normal distribution with mean = 0 and SD = 1


The random variable following such a standard normal distribution is denoted Z.
Z  N ( µZ  0,  Z  1)

Z=1 means that Z is is located at +1


Z=-3 means that Z is located at SD from the mean (0)
-3 SD from the mean (0)
In a standard normal distribution, the value taken by the random variable Z is a
measure of the distance, in SD units, from the mean.
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution

P(1.5  Z  0)
P(0  Z  1.5)

P( Z  1.5) P( Z  1.5)
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution

P(1  Z  1)

P(2  Z  2)

P(3  Z  3)
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution

Previous use
▪ z0 known. Find a probability

Example: P(Z  z0 ) ? or P(-z 0  Z  0) ?

Other possible questionning


▪ Assume a probability given (known). Find z0

Example: Find z0 such that P(Z  z0 )  pgiven


1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution

Example 1 : Find z0 such that P(Z  z0 )  0.1


1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution

Example 2 : Find z0 such that P(-z 0  Z  z0 )  0.95


1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution

Example 3: Find z0 such that P(-z 0  Z  z0 )  0.99


1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation

Let   0.5 and denote z the positive z-value satisfying P(Z  z )  


1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation

Let   0.5 and denote z the positive z-value satisfying P(Z  z )  

Best quick interpretation of z : positive z-value which has a α probability to be


exceeded (on the right)
Distance from the mean, in SD units

0 z
z0.05  1.645

1.645: Distance from the mean, measured in SD units, which has a 5% chance to be
exceeded (on the right)
Put another way, there is a 5% chance for Z to exceed 1.645, meaning to be located
at more than 1.645 SD from the mean. --------- see live questions----------
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation

Due to the symetry of the normal distribution, it follows that  z is negative and
satisfies
P(z   z )  
 z : negative z-value (Distance from the mean, measured in SD), which has a α
probability to be exceeded (on the left)

 z
 z 0
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation

 z0.3 ? negative z  value which satisfies P (z   z0.3 )  0.3

 z0.3
Distance from the mean (measured in SD) which
has a 30% chance to be exceeded on the left

In a standard normal distribution, there is a 30% chance for Z to be lower than


-0.524, meaning to be located at more than -0.524 SD from the mean
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation

Interpreting z /2 ( with   0.5)

▪ positive z-value which has a α/2 probability to be exceeded


Due to the symetry, on can also defined z /2 as
▪ positive z-value such that the probability for Z to be outside
[  z / 2 ,  z / 2 ] is equal to α

P (Z < - z/2  Z > z/2)

/2 /2
-z/2 z/2
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation
3.4 Standardization

Let X be the height of an individual randomly chosen in the population (not


necessarily normally distributed)

with µX = 180 cm σX = 10 cm

Instead of saying that an individual is 190 cm tall, we can say that this
individual is « located » at +1 SD of the expected height in the distribution
Q) Other example. If someone is 165 cm tall, it cas be said that his/her height is
……

165  180
 1.5 SD inferior to the mean (180 cm)
10
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation
3.4 Standardization

Let X be a random variable with expected mean µX and standard deviation σX .


X  µX
Denote Z
X
Z re-expresses any possible value taken by X in number of SD away from the mean
Z is called a standardized variable
We can check that E[ Z ]  0 and  Z  1

Optional: Following the properties of the mean and the SD,


we can write
 X  µX  1  
E[ Z ]  E    E[ X ]  E [ µ X ]   0
  X   X     0   
 X  µX 
   1
V (Z )  V  cste
  V X  1  Z 1
  X 2
X 
 cste 
1. Continuous Random variables 3.1 Definition
2. The normal distribution 3.2 Determining z-values
3. The standard normal distribution 3.3 A useful notation
3.4 Standardization

If furthermore X  N ( µX ,  X ) then Z  N ( 0 ; 1 )
µZ  Z

Very useful for interpretations and calculations

Example: Let X be the IQ of an individual randomly chosen in a population.

X  N (100 ; 15)

P(X  130)? X  100 130  100  P(Z  2)


P(X  130)  P(  )
15 15
1. Continuous Random variables
2. The normal distribution
3. The standard normal distribution
4. Normal approximation of the binomial law

Assume X B (n, p  0.6) and let us increase n using Stapplets.


Using the same scale, the distribution
would be shifted to the right and flatter

n=5 n=30 n=300


The graph of the pmf of
It reflects a more fondamental property X looks like bell-shaped
approx
X  N (np, np (1  p )) as n grows and tends towards +∞
(for 0 < p < 1)

Rule of thumb for a not too bad approximation : np  5 and n(1  p)  5


or 10 for a better approximation
Additional comments (optional)
zα is a quantile

Note: z  1   th quantile of the distribution

Denote Q1 the 1   th quantile of the distribution which by definition satisfies

P (Z  Q1 )  1  

P ( Z  Q1 )  
z
More about the Normal approximation of the binomial distribution

Assume X  B (n, p ) Denote f np ( x) the probability density function


E[ X ]  np  X  np (1  p ) of a normal distribution of parameters
Denote pmf X ( x) its probability mass µ  np and   np (1  p )
function
n 1  1  x  np  2 
pmf X ( x)    p x (1  p ) n  x for x  n f np ( x)  exp     
x  2 np (1  p )  2 
 np (1  p )  

n!

x!( n  x )!

pmf X ( x)
Then lim 1 (Laplace-De Moivre theorem)
n  f np ( x)

As a result, for n large pmf X ( x)  f np ( x)


approx

which implies X  N (np, np (1  p ))

The convergence is faster when p is closer to


0.5.

You might also like