Notes-04-Random variables
Notes-04-Random variables
Random Variables
In set 3 of the notes, we study the basic probability theory, where the outcome of an experiment can
be anything. In order to make analytical analysis, one need to transform the outcome of an
experiment to numerical values.
Random variable:
For a given sample space S of some experiment, a random variable (rv) is any rule that
associates a number with each outcome in S. In mathematical language, a random variable is
a function whose domain is the sample space and whose range is the set of real numbers.
We usually let X stand for the random variable.
Examples:
Consider the experiment of tossing a fair coin three times independently. Define the random variable
X to be the number of heads obtained in the three tosses. A complete enumeration of the value of X
for each point in the sample space is:
s HHH HHT HTH THH TTH THT HTT TTT
X(s) 3 2 2 2 1 1 1 0
Consider the experiment that we toss coin until first head shows up. Define the random variable to be
the number of Tails before the first Head shows up.
s H TH TTH TTTH TTTTH ….
X(s) 0 1 2 3 4 ….
Consider the experiment of tossing a fair dice two times independently. Define the random variable X
to be the summation of the two value.
Consider the experiment where we measure the chemical reaction time. We can define a random
variable by an identity function.
Continuous: is an rv whose possible values consists either of all numbers in a single interval on the
number line or all numbers in a disjoint union of such intervals, and no possible value of the variable
has positive probability, that is, P(X = c) =0 for all possible c.
1
STAT 511 Course Notes set 4
Probability distribution: The probability distribution of a random variable is tells us the randomness
of this random variable. This randomness is complete determined by the probability function on S,
and the random variable function X(s).
Definiton:
Note: The probability distribution of discrete rv also satisfies all the axioms.
Examples:
Consider also the experiment of tossing a fair coin three times independently. Define the random
variable X to be the number of heads obtained in the three tosses. The rv is:
s HHH HHT HTH THH TTH THT HTT TTT
X(s) 3 2 2 2 1 1 1 0
Consider the experiment that we toss coin until first head shows up. Define the random variable to be
the number of Tails before the first Head shows up.
s H TH TTH TTTH TTTTH ….
X(s) 0 1 2 3 4 ….
2
STAT 511 Course Notes set 4
Consider a group of five potential blood donors—a, b, c, d, and e—of whom only a and b have type
O+ blood. Five blood samples, one from each individual, will be typed in random order until an O+
individual is identified. Let the rv be the number of typings necessary to identify an O+ individual.
Then the pmf of Y is
x 1 2 3 4
p(x)
Note: the pmf completely determines the randomness of the discrete random variable. Or we
can claim that we know everything about this rv once we have the knowledge of its pmf.
Cumulative distribution function (cdf): F(x) of a discrete rv variable X with pmf p(x) is defined for
every number x by
F( x)=P( X⩽x )=∑ y⩽x p( y)
Note: the cdf is not only defined on the possible values of X, but any value on R.
Example:
Consider also the experiment of tossing a fair coin three times independently. Define the random
variable X to be the number of heads obtained in the three tosses. The cdf of this rv is
Note: cdf is a exactly equivalent representation of pmf. Given the cdf, we can also retrieve the pmf
using p(x)=F(x )−F (x - ) = the jump at x in the cdf plot.
3
STAT 511 Course Notes set 4
Example
Let X be number of children born up to and including the first boy. Assume p is the probability of
x
having a toy in each birth, then p( x)=p (1− p) for all positive integer x. Then the mean is
∞ 1
E( X)=∑ x=0 xp (1− p)x = .
p
Sometimes interest will focus on the expected value of some function h(X) rather than on just E(X).
Define a new rv Y=h(X), then
E(Y )=∑all possible y y P(Y = y)=∑ all possible y y P(X ∈{x :h ( x)= y})=∑all possible x h( x ) p (x) .
Example:
Consider also the experiment of tossing a fair coin three times independently. Define the random
variable X to be the number of heads obtained in the three tosses. What is E(3-X)?
Population variance
Definition of the expected value of a discrete rv:
2 2 2
Var (X )=σ x =E( X −μ X ) =∑all possible x ( x−μ X ) p( x ).
4
STAT 511 Course Notes set 4
2 2 2 2
Alternative formula Var (X )=E (X )−( E(X )) =∑ all possible x x p( x )−μ X .
Example
Let X be number of children born up to and including the first boy. Assume p is the probability of
x−1
having a boy in each birth, then p(x)=p (1− p) for all positive integer x. Then the variance is
∞ 1 2 x 1− p
Var ( X )=∑ x=1 ( x− ) p (1− p) = 2 .
p p
2
Proposition: Var (a∗X + b)=a ∗Var (X ) .
Suppose a pmf function is completely determined by some quantity. Such a quantity is called a
parameter of the distribution. The collection of all probability distributions for different values of the
parameter is called a family of probability distributions. In the section, we study some popular discrete
distribution families.
Bernoulli trial: A Bernoulli trail is an experiment with two, and only two, possible outcomes. And A
random variable X has a Bernoulli(p) distribution if its pmf follows:
P( X =1)=p , P( X=0)=1− p ,
where 0 and 1 stand for two different outcomes (usually called failure and success).
The mean and variance of a Bernoulli(p) random variable are easily seen to be EX = (1)(p) + (0)(1 −
p) = p and VarX = (1 − p)2 p + (0 − p)2 (1 − p) = p(1 − p).
Binomial experiment: consists of n indepedent Bernoulli trials with same parameter p. Let X be the
number of successes among all these n trials, then X has a Binomial distribution, denoted as Bin(n,
p).
If every time after you draw the ball, you actually put the ball back into the urn, i.e. draw with
replacement, then X follows a Bin(n, M/N) distribution.
If every time after you draw the ball, you don’t put the ball back into the urn, i.e. draw without
replacement, then X follows the so-called hypergeometric distribution hm(n,M,N).
pmf of hm(n,M,N):
M N −M
p(x ; n , M , N )=
( x )( n−x )
, if max(0, n−N +M )⩽x⩽min(n , M )
N
(n )
Mean and variance of hm(n,M,N)
nM M M
E( X)=
N
;Var ( X )=n
N
1− (
N )( NN −n
−1 )
.
What is the difference comparing with mean and variance of binomial rv?
Example: Five individuals from an animal population thought to be near extinction in a certain region
have been caught, tagged, and released to mix into the population. After they have had an
opportunity to mix, a random sample of 10 of these animals is selected.
(1) Let X be the number of tagged animals in the second sample. If there are actually 25 animals of
this type in the region, what is the probability that (a) X = 2 ? (b) X is less than 2 ?
(2)S uppose the population size N is not actually known, so the value x is observed and propose a
way to estimate N based on x.
6
STAT 511 Course Notes set 4
Negative Binomial experiment: consists of indefinite number of indepedent Bernoulli trials with
same parameter p. The experiment stops when r successes have been observed.
Let X be the number of failures that precede the rth success, then X has a Negative Binomial
distribution, denoted as NBin(r, p).
pmf of Nbin(r,p):
r ( 1− p) r ( 1− p )
E( X)= ; Var ( X )= .
p p2
Note: This is a legitmate pmf, which follows by the Taylor expansion of exponential function.
EX =VarX =μ .
It is additive: the sum of two independent poisson rv’s is still a poisson rv.
Poisson process.
7
STAT 511 Course Notes set 4
R provides four utility functions for each of the many commonly used distributions:
r- for data simulation,
d- for probability density function (pdf),
p- for cumulative distribution function (cdf), and
q- for quantiles (inverse of cdf).
Sample code:
8
STAT 511 Course Notes set 4
Definiton: Given a continuous rv X, its probability density function (pdf) is the nonnegative
function f that satisfies
b
P( X ∈(a , b))=∫a f ( x) dx , for any real values b>a .
Geometrically, the probability value is the area under the density curve.
Properties:
∞
• ∫−∞ f (x )dx =P (X ∈R)=1
b
• P( X ∈[a ,b ])=∫a f ( x)dx .
Example: Let X be the distance between two randomly chosen consecutive cars on a freeway.
Assume that X has a pdf as:
f (x)=λ e−λ(x−a ) if x >a ; and f ( x)=0, otherwise .
Cumulative distribution function (cdf): F(x) of a continuous rv variable X with pdf p(x) is defined for
every number x by
x
F( x)=P( X⩽x )=∫−∞ f ( y ) dy
Propositions:
• F(x) is a continous, non-decreasing function;
• F( ∞ ) = 1 and F( −∞ ) =0;
9
STAT 511 Course Notes set 4
Example: Let X be the distance between two randomly chosen consecutive cars on a freeway.
Assume that X has a pdf as:
−λ(x−a )
f (x)=λ e if x >a ; and f ( x)=0, otherwise .
2
Its variance is defined as Var (X )=∫ [x −E(X )] f ( x) dx . Its standard deviation is defined as
√ Var ( X ).
1
f (x ; a , b)= if a< x <b , and f ( x ; a ,b)=0 otherwise .
(b−a)
μ = 0, σ 2 = 0.2,
μ = 0, σ 2 = 1.0,
1 2 2
0.8
μ = 0, σ 2 = 5.0,
f (x ; μ , σ 2 )=e−(x−μ ) /(2 σ ) for all x , μ = −2, σ 2 = 0.5,
√2 π σ 0.6
φμ,σ (x)
where μ is any real value, and σ is positive.
2
0.4
The normal distribution has a symmetric, bell-shape density curve and 0.2
Normal distribution is one the most important distribution. However the probability
calculation, or the cdf for normal rv doesn’t have a closed formula.
2
Calculation for standard Normal distribution N (0,1 )
z
Normal Z-Table: Φ (z)=∫−∞ ϕ(u)du can not be computed by hand. Z-table provides values of
Φ (z) for different z up to 2 decimals.
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.00 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.10 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.20 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.30 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.40 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
11
STAT 511 Course Notes set 4
Probability calculation:
Percentile calculation:
12
STAT 511 Course Notes set 4
2
Calculation for General Normal distribution N (μ , σ )
Probability calculation:
Example Assume that the wingspan of adult dragonflies is normally distributed with μ = 4 inches and
σ = 0.25 inches. Let X represent the wingspan of a randomly chosen adult dragonfly.
a) Find the probability the wingspan of a randomly selected adult dragonfly is less than 4.3 inches.
4.3 4.0
First calculate the z score: z 1.20 Look up cumulative probability for 1.20.
0.25
b) Find the probability that a randomly selected adult dragonfly has a wingspan of more than 4.3
inches.
c) Find the probability that a randomly selected adult dragonfly has a wingspan between 3.61 and
4.73 inches.
d) Find the probability that a randomly selected adult dragonfly has a wingspan less than 3.8 or
greater than 4.2 inches.
13
STAT 511 Course Notes set 4
Example
What wingspan is longer than 90% of all adult dragonfly wingspans?
The wingspan that is longer than 90% of all adult dragonfly wingspans is ____ inches.
Between what two lengths do the middle 95% of all adult dragonfly wingspans fall?
Let X be a binomial random variable Bin(n,p). Then, X has approximately a normal distribution with
μ=np and σ =√ np(1−p) , if sample size is large enough, namely np ≥ 10 and n(1− p)≥ 10.
14
STAT 511 Course Notes set 4
Example: A coin is tossed 100 times. Estimate the probability that the number of heads lies between
40 and 60 (the word “between” in mathematics means inclusive of the endpoints).
Solution: The expected number of heads is 100·1/2 = 50, and the variance for the number of heads is
100 · 1/2 · 1/2 = 5. Thus, since n = 100 is reasonably large, we have
P(39.5⩽X⩽60.5)≈ P( 40⩽N (50,5)⩽60)=P(−2.1⩽Z⩽2.1)≈0.9642
The actual value is .96480, to five decimal places.
Var ( X )=
F( x)=
M=
Memoryless propery:
P( X ⩾t+t 0|X ⩾t 0)
= P(X ⩾t)
15
STAT 511 Course Notes set 4
Mean, variance
EX=α β
Var (X )=αβ2
Probability Plots
Question: How can you decide if a set has normal distribution or exponential distribution?
• It is risky to assume data follows certain distribution without actually inspecting the data.
• Histograms and box plots can help, since it reveals the shape of the data distribution
• However, sometimes we need a more sensitive way to judge the adequacy of a Normal model
(say if we assume the data follow normal distribution).
• The most useful tool for assessing it is another graph, the quantile plot (or probability plot).
Sample Percentile
Order the n sample observations from smallest to largest. Then the ith smallest observation in the list
is taken to be the [100(i-0.5)/n]th percentile.
Probability plot
[100(i-.5)/n]th percentile of the distribution versus ith smallest sample observation
Diagnose
If the sample percentiles are close to the corresponding population distribution percentiles, the plotted
points will then fall close to a 45-degree diagonal line. Substantial deviations of the plotted points
from a diagonal line cast doubt on the assumption that the distribution under consideration is the
correct one.
16
STAT 511 Course Notes set 4
Diagnose
If the true distribution is indeed a normal distribution, not necessary standard normal, the plotted
points will then fall close to a straight line. Substantial deviations of the plotted points from a straight
line cast doubt on the assumption that the distribution under consideration is the correct one.
R provides four utility functions for each of the many commonly used distributions:
Sample code:
x<-rnorm(100)
qqnorm(x);qqline(x)
17