3 Expectation
3 Expectation
X(.)
s x = X(s)
Expectation
• Example
• “Expected value” for the uniform random variable modelling die roll
• Values on die are {1,2,3,4,5,6}
• E[X] = 3.5
• Expectation of a uniform random variable (discrete case)
• If X has uniform distribution over n consecutive integers over [a,b],
then E[X] = (a+b)/2
Expectation
• Example
• Expectation of a binomial random variable (when n=1, this is Bernoulli)
j := k – 1
m := n – 1
Expectation
• Example
• Expectation of a Poisson random variable
• Consider random arrivals/hits occurring at a constant average rate λ>0,
i.e., λ arrivals/hits (typically) per unit time
P(x)
" " X(s)
• E[X] :=∫!" 𝑥𝑃 𝑥 𝑑𝑥 = ∫!" 𝑋 𝑠 𝑃 𝑠 𝑑𝑠
• Intuition remains the same as in the discrete case [x,x+Δx]
• Using probability-mass conservation:
P(x)Δx is approximated by P(s1)Δs1 + P(s2)Δs2 + …
[s1,s1+Δs1] [s2,s2+Δs2] s
• Thus, x.P(x)Δx is approximated by
X(s1).P(s1)Δs1 + X(s2).P(s2)Δs2 + … P(s)
• A more rigorous proof needs advanced results in real analysis
Expectation
• Mean as the center of mass
P(x)
• By definition,
mean m := E[X] :=∫! 𝑥𝑃 𝑥 𝑑𝑥
• Thus, ∫!(𝑥 − 𝑚)𝑃 𝑥 𝑑𝑥 = 0
x m
• Mass P(x)dx dx
placed around location ‘x’
applies a torque ∝ P(x)dx.(x-m)
at the fulcrum placed at location ‘m’
• Because the integral ∫#(𝑥 − 𝑚)𝑃 𝑥 𝑑𝑥 is zero,
the net torque around the fulcrum ‘m’ is zero
• Hence, ‘m’ is the center of mass
Expectation
• Example
• Expectation of a uniform random variable (continuous case)
Expectation PDF
P(x) = 0, for all x < 0
P(x) = 𝜆 exp −𝜆𝑥 , ∀𝑥 ≥ 0
• Example CDF
f(x) = 0, for all x < 0
• Expectation of an exponential random variable f(x) = 1 − exp −𝜆𝑥 , ∀𝑥 ≥ 0
• Consider random arrivals/hits occurring
at a constant average rate λ > 0
• Define β := 1/λ
binomial
tends to a
“Gaussian” form
X(.) Y(.)
• Then,
P(x=1)
P(x=2) P(x=2)
P(x=3) P(x=3) P(x=3)
• Proof: P(x=4) P(x=4) P(x=4) P(x=4)
…
x t
x
t x
Expectation in Life
• Action without expectation à Happiness [Indian Philosophy]
Quantile, Quartile
• Definition: For a discrete/continuous random variable
with a PMF/PDF P(.), the q-th quantile
(where 0<q<1) is any real number ‘xq’
such that P(X≤xq) ≥ q and P(X≥xq) ≥ 1-q
• Quartiles: q = 0.25 (1st quartile),
q = 0.5 (2nd), q = 0.75 (3rd)
• Percentiles
• q=0.25 à 25th percentile
• Box plot,
box-and-whisker plot
• Inter-Quartile Range
(IQR)
Quantile, Median
• Definition:
For a discrete/continuous random variable with a PMF/PDF P(.),
the median is any real number ‘m’
such that P(X≤m) ≥ 0.5 and P(X≥m) ≥ 0.5
• Median = second quartile
• Definition:
For a continuous random variable with a PDF P(.),
the median is any real number ‘m’
such that P(X≤m) = P(X>m)
• CDF fX(m) = 0.5
• A PDF can be associated with multiple medians
Mode
• For discrete X
• Mode m is a value for which the PMF value P(X=m) is maximum
• A PMF can have multiple modes
• For continuous X
• Mode ‘m’ is any local maximum of the PDF P(.)
• A PDF can have multiple modes
• Unimodal PDF = A PDF having only 1 local maximum
• Bimodal PDF:
2 local maxima
• Multimodal PDF:
2 or more
local maxima
Mean, Median, Mode
• For continuous X, for unimodal and symmetric distributions,
mode = mean = median
• Assuming symmetry
around mode,
mass on left of mode =
mass on right of mode
• So, mode = median
• Assuming symmetry
around mode,
every P(x)dx mass on left of mode
is matched by
a P(x)dx mass on right of mode
• So, mode = mean
Variance
• Definition: Var(X) := E[(X-E[X])2]
• A measure of the spread of the mass (in PMF or PDF) around the mean
• Property: Variance is always non-negative
• Property: Var(X) = E[X2] – (E[X])2
• Proof: LHS =
E[(X-E[X])2]
= E[ X2 + (E[X])2 – 2.X.E[X] ]
= E[X2] + (E[X])2 – 2(E[X])2
= E[X2] – (E[X])2 = RHS
• Definition: Standard deviation is the square root of the variance
• Units of variance = square of units of values taken by random variable
• Units of standard deviation = units of values taken by random variable
Variance
• Variance of a Uniform Random Variable
• Discrete case
• X has uniform distribution over n integers {a, a+1, …, b}
• Here, n = b–a+1
• Variance = (n2 – 1) / 12
Variance
• Variance of a Binomial Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = np
Variance
• Variance of a Binomial Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = np
• So, E[X2]
= np (mp + 1)
= np ((n–1)p + 1)
= (np)2 + np(1-p)
• Thus, Var(X) = np(1–p) = npq
• Interpretation
• When p=0 or p=1,
then Var(X) = 0,
which is the minimum possible
• When p=q=0.5,
then Var(X) is maximized
Variance
• Variance of a Poisson Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = λ
Variance
• Variance of a Poisson Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = λ
• So, E[X2]
= λ (λ.1 + 1)
= λ2 + λ
• Thus, Var(X) = λ
• Interpretation
• Mean of Poisson random variable was also λ
• Standard deviation of Poisson random variable is λ0.5
• As mean increases, so does variance (and standard deviation)
• When mean increase by factor of N (i.e., N time larger signal = number of arrivals/hits),
then the standard deviation (spread) increases only by a factor of N0.5
• As N increases,
then variability in number of arrivals/hits, relative to average arrival/hit rate, decreases
Variance
• Variance of a Uniform Random Variable
• Continuous case
• X has uniform distribution over [a,b]
• Variance = (b – a)2 / 12
Variance PDF
P(x) = 0, for all x < 0
P(x) = 𝜆 exp −𝜆𝑥 , ∀𝑥 ≥ 0
• Variance of a Exponential Random Variable CDF
f(x) = 0, for all x < 0
• Var(X) = E[X2] – (E[X])2 , where E[X] = β := 1/λ f(x) = 1 − exp −𝜆𝑥 , ∀𝑥 ≥ 0
t . (t.exp(-t2))
Variance
• Example
• Variance of a limiting case of binomial
• As n tends to infinity,
binomial
tends to
Gaussian
E[Y2]
E[Y2]
www.nature
.com/article
s/nmeth.26
13
Covariance
• For random variables X and Y, consider the joint PMF/PDF P(X,Y)
• Covariance: A measure of how the values taken by X and Y vary
together (“co”-“vary”)
• Definition: Cov(X,Y) := E[(X – E[X])(Y – E[Y])]
• Interpretation:
• Define U(X) := X – E[X] and V(Y) := Y – E[Y] (Note: U and V have expectation 0)
• In the joint distribution P(U,V),
if larger (more +ve) values of U typically correspond to larger values of V, and
smaller (more –ve) values of U typically correspond to smaller values of V,
then U and V co-vary positively
• In the joint distribution P(U,V),
if larger values of U typically correspond to smaller values of V, and …
then U and V co-vary negatively
• Property: Symmetry: Cov(X,Y) = Cov(Y,X)
Covariance
• Examples
Covariance
• Property: Cov(X,Y) = E[XY] – E[X]E[Y]
• Proof:
• Cov(X,Y) = E[(X – E[X])(Y – E[Y])] = E[XY] – E[X]E[Y] – E[X]E[Y] + E[X]E[Y] = E[XY] – E[X]E[Y]
• So, Var(X+Y) = Var(X) + Var(Y) + 2(E[XY] – E[X]E[Y]) = Var(X) + Var(Y) + 2Cov(X,Y)
• Also, when X and Y are independent, then Cov(X,Y) = 0
• Property: When Var(X) and Var(Y) are finite, and one of them is 0,
then Cov(X,Y)=0
• Property: When Y := mX + c (with finite m), what is Cov(X,Y) ?
• Cov(X,Y) = E[XY] – E[X]E[Y]
= E[mX2 + cX] – E[X](m.E[X] + c)
= m.E[X2] – m(E[X])2 = m.Var(X)
• When Var(X)>0, covariance is ∝ line-slope ‘m’, and has same sign as that of m
Covariance
• Bilinearity of Covariance
• Let X, X1, X2, Y, Y1, Y2 be random variables. Let c be a scalar constant.
• Property: Cov(X1 + X2, Y) = Cov(X1, Y) + Cov(X2, Y) = Cov(Y, X1 + X2)
• Proof (first part; second part follows from symmetry):
• X* is unit-less
• X* is obtained by:
• First shifting/translating X to make mean 0, and
• Then scaling the shifted variable to make variance 1
Correlation
• For covariance, the magnitude isn’t easy to interpret (unlike its sign)
• Correlation: A measure of how the values taken by X and Y vary
together (“co”-“relate”) obtained by rescaling covariance
• Pearson’s correlation coefficient
• Assuming X and Y are linearly related, correlation magnitude shows the
strength of the (functional/deterministic) relationship between X and Y
• Let ‘SD’ = standard deviation
• Definition: Cor(X,Y) :=
• Second inequality
• 0 ≤ E[(X*–Y*)2]
= E[(X*)2] + E[(Y*)2] – 2E[X*Y*]
= 2(1 – Cor(X,Y))
• So, Cor(X,Y) ≤ 1
Correlation
• Property: If X and Y are linearly related, i.e., Y = mX + c,
and are non-constant (i.e., SD(X)>0 and SD(Y)>0),
then |Cor(X,Y)| = 1
• Proof:
• When Y = mX + c, then SD(Y) = |m| SD(X)
• Cor(X,Y)
= Cov(X,Y) / (SD(X) SD(Y))
= mVar(X) / (SD(X) |m|SD(X))
= ±1
= sign of the slope m
Correlation
• Property: If |Cor(X,Y)| = 1, then X and Y are linearly related
• Proof:
• If Cor(X,Y) = 1, then E[(X*–Y*)2] = 2(1 – Cor(X,Y)) = 0
• For discrete X,Y: this must imply X*=Y* for all (x’,y’) where P(X=x’,Y=y’) > 0
• Else the summation underlying the expectation cannot be zero
• For continuous X,Y: this must imply X*=Y* for all measures (dx’,dy’) where P(dx’,dy’) > 0
• X* and Y* can be unequal only on a countable set of isolated points where P(dx’,dy’) > 0
• Else the integral underlying the expectation cannot be zero
• If Cor(X,Y) = (–1), then E[(X*+Y*)2] = 2(1 + Cor(X,Y)) = 0
• For discrete X,Y: this must imply X*=(–Y*) for all (x’,y’) where P(X=x’,Y=y’) > 0
• For continuous X,Y: this must imply X*=(–Y*) for all measures (dx’,dy’) where P(dx’,dy’) > 0
• Inequality can hold only on a countable set of isolated points where P(dx’,dy’) > 0
• If X* = ±Y*, then Y must be of the form mX+c
Correlation
• If |Cor(X,Y)|=1 (or Y=mX+c), then
how to find the equation of the line from data {(xi,yi): i=1,…,n}?
• By the way: line must pass through (E[X],E[Y])
• Because, when X=E[X], value of Y must be mE[X]+c, but that also equals E[Y]
• We proved that: if Y=mX+c, then |Cor(X,Y)|=1 and Y* = ±X* = Cor(X,Y) X*
• So, (Y – E[Y]) / SD(Y) = Cor(X,Y) (X – E[X]) / SD(X)
• So, Y = E[Y] + SD(Y) Cor(X,Y) (X – E[X]) / SD(X)
• So, Y = E[Y] + Cov(X,Y) (X – E[X]) / Var(X)
• This gives the equation of the line with:
• Slope m := Cov(X,Y) / Var(X)
• Intercept c := E[Y] – Cov(X,Y) E[X] / Var(X)
Correlation
• Examples
Correlation
• Four sets of data with the same correlation of 0.816
• Blue line indicates the line passing through (E[X],E[Y]) with slope = 0.816
(more on this when we study estimation)
• So, correlation = 0.816
doesn’t always mean that data
lies along a line of slope 0.816
• This indicates the likely
misinterpretation of correlation
when variables underlying data
aren’t linearly related
Correlation
• Zero correlation doesn’t imply independence