0% found this document useful (0 votes)

11 views70 pages

3 Expectation

The document discusses the concept of expectation in probability, defining it for both discrete and continuous random variables, and providing examples such as uniform, binomial, Poisson, and Gaussian distributions. It also covers related concepts like quantiles, quartiles, median, mode, variance, and standard deviation, emphasizing their definitions and properties. Additionally, it explains the linearity of expectation and the expectation of functions of random variables.

Uploaded by

Ananya Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views70 pages

3 Expectation

Uploaded by

Ananya Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

CS 215

Data Analysis and Interpretation

Expectation
Suyash P. Awate
Expectation
• “Expectation” of the random variable;
“Expected value” of the random variable;
“Mean” of the random variable.
• “Expected value” isn’t the value that is
most likely to be observed in the
random experiment
• Can think of it as the center of mass of
the probability mass/density function
Expectation
• Definition:
Expectation of a Discrete Random Variable:
• Frequentist interpretation of probabilities and expectation
• If a random experiment is repeated infinitely many times,
then the proportion of number of times event E occurs is the probability P(E)
• If a random experiment underlying a discrete random variable X
is repeated infinitely many times,
then the proportion of number of experiments when X takes value x is P(X=x)
• So, in N→∞ experiments, number of times X takes value xi will → N.P(X=xi)
• So, across all N→∞ experiments,
arithmetic average of observed values will
→ (1/N) ∑i (xi) (N.P(X=xi))
= E[X]
Expectation
• Another Formulation of Expectation
• Recall:
• Discrete random variable X is a function defined on a probability space {Ω,ẞ,P}
• Function X:Ω→R, maps each element in sample space Ω to a single numerical value
belonging to the set of real numbers

X(.)

s x = X(s)
Expectation
• Example
• “Expected value” for the uniform random variable modelling die roll
• Values on die are {1,2,3,4,5,6}
• E[X] = 3.5
• Expectation of a uniform random variable (discrete case)
• If X has uniform distribution over n consecutive integers over [a,b],
then E[X] = (a+b)/2
Expectation
• Example
• Expectation of a binomial random variable (when n=1, this is Bernoulli)

j := k – 1
m := n – 1
Expectation
• Example
• Expectation of a Poisson random variable
• Consider random arrivals/hits occurring at a constant average rate λ>0,
i.e., λ arrivals/hits (typically) per unit time

• This gives meaning to parameter λ as average number of arrivals in unit time

Expectation
• Definition:
"
Expectation of a Continuous Random variable: E[X] :=∫!" 𝑥𝑃 𝑥 𝑑𝑥
• Frequentist interpretation of probabilities and expectation
• If a random experiment underlying a continuous random variable X
is repeated N→∞ times,
then,
for a tiny interval [x,x+Δx],
the proportion of time X takes values within interval is approximately P(x)Δx
• So, in N→∞ experiments,
number of times we will get X within [xi,xi+Δx] is approximately N.P(xi)Δx
• So, across all N→∞ experiments,
arithmetic average of all observed values is
approximately (1/N) ∑i (xi) (N.P(xi)Δx)
• In the limit that Δx→0, this average→E[X]
Expectation
X(.)
• Another Formulation of Expectation
s x = X(s)
• Recall:
• Random variable X is a function defined on a probability space {Ω,ẞ,P}
• Function X:Ω→R, maps each element in sample space Ω to a single numerical value
belonging to the set of real numbers x

P(x)
" " X(s)
• E[X] :=∫!" 𝑥𝑃 𝑥 𝑑𝑥 = ∫!" 𝑋 𝑠 𝑃 𝑠 𝑑𝑠
• Intuition remains the same as in the discrete case [x,x+Δx]
• Using probability-mass conservation:
P(x)Δx is approximated by P(s1)Δs1 + P(s2)Δs2 + …
[s1,s1+Δs1] [s2,s2+Δs2] s
• Thus, x.P(x)Δx is approximated by
X(s1).P(s1)Δs1 + X(s2).P(s2)Δs2 + … P(s)
• A more rigorous proof needs advanced results in real analysis
Expectation
• Mean as the center of mass

P(x)
• By definition,
mean m := E[X] :=∫! 𝑥𝑃 𝑥 𝑑𝑥
• Thus, ∫!(𝑥 − 𝑚)𝑃 𝑥 𝑑𝑥 = 0
x m
• Mass P(x)dx dx
placed around location ‘x’
applies a torque ∝ P(x)dx.(x-m)
at the fulcrum placed at location ‘m’
• Because the integral ∫#(𝑥 − 𝑚)𝑃 𝑥 𝑑𝑥 is zero,
the net torque around the fulcrum ‘m’ is zero
• Hence, ‘m’ is the center of mass
Expectation
• Example
• Expectation of a uniform random variable (continuous case)
Expectation PDF
P(x) = 0, for all x < 0
P(x) = 𝜆 exp −𝜆𝑥 , ∀𝑥 ≥ 0
• Example CDF
f(x) = 0, for all x < 0
• Expectation of an exponential random variable f(x) = 1 − exp −𝜆𝑥 , ∀𝑥 ≥ 0
• Consider random arrivals/hits occurring
at a constant average rate λ > 0
• Define β := 1/λ

• This gives meaning to parameter β as average inter-arrival time

• Larger arrival/hit rate leads to lesser inter-arrival time
Expectation
• Example
• Expectation of a Gaussian random variable
Expectation
• Example
• Expectation of a limiting case of binomial
• As n tends to infinity,

binomial

tends to a

“Gaussian” form

• Gaussian expectation μ(=np here) is

consistent with binomial expectation np
Expectation
• Linearity of Expectation
• For both discrete and continuous random variables
• For random variables X and Y having a joint probability space (Ω,ẞ,P),
the following rules hold:
• E[X + Y] = E[X] + E[Y]
• Either
• Or LHS = ∫# ∫/ 𝑥 + 𝑦 𝑃 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = ∫# 𝑥 ∫/ 𝑃 𝑥, 𝑦 𝑑𝑦 𝑑𝑥 + ∫/ 𝑦 ∫# 𝑃 𝑥, 𝑦 𝑑𝑥 𝑑𝑦 = RHS
• E[X + c] = E[X] + c, where ‘c’ is a constant

• E[a X] = a E[X], where ‘a’ is a scalar constant

• This generalizes to:

Expectation
• Expectation of a “function of a random variable”
• Let us define values y := Y(x), or “Y(.) is a function of the random variable X”

X(.) Y(.)

s x := X(s) y := Y(x) := Y(X(s))

• Discrete random variable: 𝐸 𝑌 𝑋 ≔ 𝐸" # 𝑌(𝑋) ≔ ∑!! 𝑌 𝑥$ 𝑃(𝑥$ )

• Continuous random variable: 𝐸 𝑌 𝑋 ≔ 𝐸" # 𝑌(𝑋) ≔ ∫! 𝑌 𝑥 𝑃 𝑥 𝑑𝑥
• Property:
• Just as EP(S)[X(S)] = EP(X)[X], …
• … we get EP(X)[Y(X)] = EP(Y)[Y]
Expectation
• Expectation of a function of multiple random variables
• Definition: When we have multiple random variables X1,…,Xn with
a joint PMF/PDF P(X1,…,Xn) and
a function of the multiple random variables g(X1,…,Xn),
then we define the expectation of g(X1,…,Xn) as:
𝐸 𝑔 𝑋% , … , 𝑋& ∶= 4 𝑔 𝑥% , … , 𝑥& 𝑃(𝑋% = 𝑥% , … , 𝑋& = 𝑥& )
!" ,…,!#
or
𝐸 𝑔 𝑋% , … , 𝑋& ∶= 5 𝑔 𝑥% , … , 𝑥& 𝑃(𝑥% , … , 𝑥& ) 𝑑𝑥% … 𝑑𝑥&
!" ,…,!#

• If X and Y are independent, then E[XY] = E[X] E[Y]

• Proof:
• ∑#,/ 𝑥𝑦𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = ∑#,/ 𝑥𝑦𝑃 𝑋 = 𝑥 𝑃 𝑌 = 𝑦 = ∑# 𝑥𝑃 𝑋 = 𝑥 ∑/ 𝑦𝑃(𝑌 = 𝑦)
Expectation
• Tail-sum formula
• Let X be a discrete random variable taking values in set of natural numbers

• Then,
P(x=1)
P(x=2) P(x=2)
P(x=3) P(x=3) P(x=3)
• Proof: P(x=4) P(x=4) P(x=4) P(x=4)
…

Sum over rows (row number = x)

Sum over columns (column number = k)

Expectation
• Tail-sum formula
• Let X be a continuous random variable taking non-negative values
• Notation: For random variable X, PDF is fX(.) and CDF is FX(.)
• Then,
t
• Proof:

x t
x
t x
Expectation in Life
• Action without expectation à Happiness [Indian Philosophy]
Quantile, Quartile
• Definition: For a discrete/continuous random variable
with a PMF/PDF P(.), the q-th quantile
(where 0<q<1) is any real number ‘xq’
such that P(X≤xq) ≥ q and P(X≥xq) ≥ 1-q
• Quartiles: q = 0.25 (1st quartile),
q = 0.5 (2nd), q = 0.75 (3rd)
• Percentiles
• q=0.25 à 25th percentile
• Box plot,
box-and-whisker plot
• Inter-Quartile Range
(IQR)
Quantile, Median
• Definition:
For a discrete/continuous random variable with a PMF/PDF P(.),
the median is any real number ‘m’
such that P(X≤m) ≥ 0.5 and P(X≥m) ≥ 0.5
• Median = second quartile
• Definition:
For a continuous random variable with a PDF P(.),
the median is any real number ‘m’
such that P(X≤m) = P(X>m)
• CDF fX(m) = 0.5
• A PDF can be associated with multiple medians
Mode
• For discrete X
• Mode m is a value for which the PMF value P(X=m) is maximum
• A PMF can have multiple modes
• For continuous X
• Mode ‘m’ is any local maximum of the PDF P(.)
• A PDF can have multiple modes
• Unimodal PDF = A PDF having only 1 local maximum
• Bimodal PDF:
2 local maxima
• Multimodal PDF:
2 or more
local maxima
Mean, Median, Mode
• For continuous X, for unimodal and symmetric distributions,
mode = mean = median
• Assuming symmetry
around mode,
mass on left of mode =
mass on right of mode
• So, mode = median
• Assuming symmetry
around mode,
every P(x)dx mass on left of mode
is matched by
a P(x)dx mass on right of mode
• So, mode = mean
Variance
• Definition: Var(X) := E[(X-E[X])2]
• A measure of the spread of the mass (in PMF or PDF) around the mean
• Property: Variance is always non-negative
• Property: Var(X) = E[X2] – (E[X])2
• Proof: LHS =
E[(X-E[X])2]
= E[ X2 + (E[X])2 – 2.X.E[X] ]
= E[X2] + (E[X])2 – 2(E[X])2
= E[X2] – (E[X])2 = RHS
• Definition: Standard deviation is the square root of the variance
• Units of variance = square of units of values taken by random variable
• Units of standard deviation = units of values taken by random variable
Variance
• Variance of a Uniform Random Variable
• Discrete case
• X has uniform distribution over n integers {a, a+1, …, b}
• Here, n = b–a+1
• Variance = (n2 – 1) / 12
Variance
• Variance of a Binomial Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = np
Variance
• Variance of a Binomial Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = np
• So, E[X2]
= np (mp + 1)
= np ((n–1)p + 1)
= (np)2 + np(1-p)
• Thus, Var(X) = np(1–p) = npq
• Interpretation
• When p=0 or p=1,
then Var(X) = 0,
which is the minimum possible
• When p=q=0.5,
then Var(X) is maximized
Variance
• Variance of a Poisson Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = λ
Variance
• Variance of a Poisson Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = λ
• So, E[X2]
= λ (λ.1 + 1)
= λ2 + λ
• Thus, Var(X) = λ
• Interpretation
• Mean of Poisson random variable was also λ
• Standard deviation of Poisson random variable is λ0.5
• As mean increases, so does variance (and standard deviation)
• When mean increase by factor of N (i.e., N time larger signal = number of arrivals/hits),
then the standard deviation (spread) increases only by a factor of N0.5
• As N increases,
then variability in number of arrivals/hits, relative to average arrival/hit rate, decreases
Variance
• Variance of a Uniform Random Variable
• Continuous case
• X has uniform distribution over [a,b]
• Variance = (b – a)2 / 12
Variance PDF
P(x) = 0, for all x < 0
P(x) = 𝜆 exp −𝜆𝑥 , ∀𝑥 ≥ 0
• Variance of a Exponential Random Variable CDF
f(x) = 0, for all x < 0
• Var(X) = E[X2] – (E[X])2 , where E[X] = β := 1/λ f(x) = 1 − exp −𝜆𝑥 , ∀𝑥 ≥ 0

• So, Var(X) = β2 . So, β = E[X] = SD(X); unlike Poisson.

Variance
• Variance of a Gaussian Random Variable
• Var(X) = E[X2] – (E[X])2, where E[X] = μ
Variance
• Variance of a Gaussian Random Variable
• Var(X) = E[X2] – (E[X])2 , where E[X] = μ

t . (t.exp(-t2))
Variance
• Example
• Variance of a limiting case of binomial
• As n tends to infinity,

binomial

tends to

Gaussian

• Gaussian variance σ2 (= npq in this case) is

consistent with binomial variance npq
Variance
• Property: Var(aX+c) = a2Var(X)
• Adding a constant to a random variable doesn’t change the variance (spread)
• This only shifts the PDF/PMF
• If Y := X + c, then Var(Y) = Var(X)
• If we scaling a random variable by ‘a’, then the variance gets scaled by a2
• If Y := aX, then Var(Y) = a2Var(X)
• Proof:
Variance
• Property: Var(X+Y) = Var(X) + Var(Y) + 2(E[XY] – E[X]E[Y])
• Proof:

E[Y2]
E[Y2]

• If X and Y are independent,

then E[XY] = E[X] E[Y], and so Var(X+Y) = Var(X) + Var(Y)
• If X,Y,Z are independent, then
Var(X+Y+Z) = Var(X+Y) + Var(Z) = Var(X) + Var(Y) + Var(Z)
• For independent random variables X1, …, Xn;
Var(X1 + … + Xn) = Var(X1) + … + Var(Xn)
Markov’s Inequality
• Theorem: Let X be a random variable with PDF P(.).
Let u(.) be an non-negative-valued function.
Let ‘c’ be a positive constant.
Then, P(u(X) ≥ c) ≤ E[u(X)] / c
• Proof:
• E[u(X)] = ∫x:u(x)≥c u(x) P(x) dx + ∫x:u(x)<c u(x) P(x) dx
• Because u(.) takes non-negative values, each integral above is non-negative
• So, E[u(X)] ≥ ∫x:u(x)≥c u(x) P(x) dx
≥ c ∫x:u(x)≥c P(x) dx
= c P(u(X) ≥ c)
• Because c>0, we get E[u(X)]/c ≥ P(u(X) ≥ c)
• Special case à
• X takes non-negative values & u(x) := x
Chebyshev’s Inequality Markov’s Inequality:
P(u(X) ≥ c) ≤ E[u(X)] / c

• Theorem: Let X be a random variable with PDF P(.),

finite expectation E[X], and finite variance Var(X).
Then, P(|X-E[X]| ≥ a) ≤ Var(X) / a2
• Proof:
• Define random variable u(X) := (X-E[X])2
• Then, by Markov’s inequality, P(u(X) ≥ a2) ≤ E[u(X)] / a2
• LHS = P(|X-E[X]| ≥ a)
• RHS = Var(X) / a2
• Q.E.D.
• Corollary: If random variable X has standard deviation σ, then
P(|X-E[X]| ≥ kσ) ≤ 1/k2
• This is consistent with the notion of standard deviation (σ) or variance (σ2)
measuring the spread of the PDF around the mean (center of mass)
Chebyshev’s Inequality
Chebyshev
• Pafnuty Chebyshev
• Founding father of Russian mathematics
• Students: Lyapunov, Markov
• First person to think
systematically in terms of
random variables and their
moments and expectations
Markov
• Andrey Markov
• Russian mathematician best known for
his work on stochastic processes
• Advisor: Chebyshev
• Students: Voronoy
• One year after doctoral defense,
appointed extraordinary professor
• He figured out that he could use chains to model
the alliteration of vowels and consonants
in Russian literature
Jensen’s Inequality
• Theorem: Let X be any random variable; f(.) be any convex function.
Then, E[f(X)] ≥ f(E[X]) A real-valued function is called convex if
the line segment between any two points on the graph of the function
• Proof: lies above/never-below the graph between the two points.
• Let m := E[X], can be anywhere on real line
• Consider a tangent (subderivative line) to f(.) at [m,f(m)]
• This line is, say, Y = aX+b,
which lies at/below (never above) f(X)
• Then, f(m) = am+b
• Then,
E[f(X)] ≥ E[aX+b]
= aE[X] + b
= f(E[X])
Jensen’s Inequality
• Corollary: Let X be any random variable; g(.) be any concave function.
Then, E[g(X)] ≤ g(E[X]) A real-valued function is called concave if
the line segment between any two points on the graph of the function
• Proof: lies below/never-above the graph between the two points.
• Let m := E[X], can be anywhere on real line
• Consider a tangent (subderivative line) to g(.) at [m,g(m)]
• This line is, say, Y = aX+b,
which lies at/above (never below) g(X)
• Then, g(m) = am+b
• Then,
E[g(X)] ≤ E[aX+b]
= aE[X] + b
= g(E[X])
Jensen
• Johan Jensen
• Danish mathematician and engineer
• President of the Danish Mathematical Society
from 1892 to 1903
• Never held any academic position
• Engineer for Copenhagen Telephone Company
• Became head of its technical department
• Learned advanced math topics by himself
• All his mathematics research
was carried out in his spare time
Minimizer of Expected Absolute Deviation
• Theorem: E[|X – c|] is minimum when c = Median(X)
• Case 1: Let c ≤ m := Median(X)
+ *
• E[|X – c|] = ∫)* 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 + ∫+ 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, A + B)
, ,
• A = ∫)* 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 − ∫+ 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 (say, A1 – A2)
, *
• B = ∫+ 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 + ∫, 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, B1 + B2)
1
• Now, B1 – A2 = 2 ∫0 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 ≥ 0
, ,
• A1 = ∫)* 𝑐 − 𝑚 𝑃 𝑥 𝑑𝑥 + ∫)* 𝑚 − 𝑥 𝑃 𝑥 𝑑𝑥 (say, A11 + A12)
* *
• B2 = ∫, 𝑥 − 𝑚 𝑃 𝑥 𝑑𝑥 + ∫, 𝑚 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, B21 + B22)
• Now, A11 + B22 = –(m–c) (1–P(x≥m)) + (m–c) P(x≥m) = (m–c) (2P(x≥m)–1) ≥ 0
• Now, A12 + B21 = E[|X – m|]
,
• So, A+B = E[|X – m|] + (m–c) (2P(x≥m) – 1) + 2 ∫+ 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥
• Value of c minimizing A+B is c = m
Minimizer of Expected Absolute Deviation
• Theorem: E[|X – c|] is minimum when c = Median(X)
• Case 2: Let m := Median(X) ≤ c
+ *
• E[|X – c|] = ∫)* 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 + ∫+ 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, A + B)
, +
• A = ∫)* 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 + ∫, 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 (say, A1 + A2)
+ *
• B = − ∫, 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 + ∫, 𝑥 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, – B1 + B2)
0
• Now, A2 – B1 = 2 ∫1 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥 ≥ 0
, ,
• A1 = ∫)* 𝑐 − 𝑚 𝑃 𝑥 𝑑𝑥 + ∫)* 𝑚 − 𝑥 𝑃 𝑥 𝑑𝑥 (say, A11 + A12)
* *
• B2 = ∫, 𝑥 − 𝑚 𝑃 𝑥 𝑑𝑥 + ∫, 𝑚 − 𝑐 𝑃 𝑥 𝑑𝑥 (say, B21 + B22)
• Now, A11 + B22 = (c–m) P(x≤m) – (c–m) (1–P(x≤m)) = (c–m) (2P(x≤m)–1) ≥ 0
• Now, A12 + B21 = E[|X – m|]
+
• So, A+B = E[|X – m|] + (c–m) (2P(x≤m) – 1) + 2 ∫, 𝑐 − 𝑥 𝑃 𝑥 𝑑𝑥
• Value of c minimizing A+B is c = m
Mean, Median, Standard Deviation
• Theorem:
Mean(X) and Median(X) are within a distance of SD(X) of each other
• Proof:
• Distance between mean and median
= |E[X] – Median(X)|
= |E[X – Median(X)]|
This is |E[.]|, where |.| is a convex function. Apply Jensen’s inequality.
≤ E[|X – Median(X)|]
≤ E[|X – E[X]|] (because Median(X) minimizes expected absolute deviation)
= E[Sqrt{ (X – E[X])2 }]
This is E[Sqrt(.)], where Sqrt(.) is a concave function. Apply Jensen’s inequality.
≤ Sqrt{ E[ (X – E[X])2 ] }
= Sqrt{ Var(X) } = SD(X)
Law of Large Numbers
• This justifies why the expectation is motivated as an average over a
large number of random experiments (“long-term average”)
• Let random variables X1, …, Xi, …, Xn be ‘n’ independent and identically
distributed (i.i.d.), each with mean μ=E[Xi] and finite variance v=Var(Xi)
• Let the average, over ‘n’ experiments, be modeled by
a random variable 𝑋 := (X1 + … + Xn) / n
• Then, the expected average E[𝑋] = μ, by the linearity of expectation
• But, in specific runs, how close is 𝑋 to the expectation μ ?
• So, we analyze the spread of 𝑋 around μ
• Var(𝑋) = Var(X1/n) + … + Var(Xn/n) = n(v/n2) = v/n
Law of Large Numbers
• This justifies why the expectation is motivated as an average over a
large number of random experiments
• Law of large numbers: For all ε > 0, as n→∞, P(|𝑋 – μ | ≥ ε) → 0
• Proof: Using Chebyshev’s inequality,
P(|𝑋 – μ | ≥ ε)
≤ Var(𝑋) / ε2
= v / (nε2)
→0, as n→∞
• Thus, as the average 𝑋 uses data from more number of experiments ‘n’,
the event of “𝑋 being farther from μ than ε” has a probability that tends to 0
Law of Large Numbers
• Example
• This also gives us a way to
compute an “estimate” of
the expectation μ of a
random variable X
from “observations”/data
• What is the estimate ?
•𝑋
Law of Large Numbers

www.nature
.com/article
s/nmeth.26
13
Covariance
• For random variables X and Y, consider the joint PMF/PDF P(X,Y)
• Covariance: A measure of how the values taken by X and Y vary
together (“co”-“vary”)
• Definition: Cov(X,Y) := E[(X – E[X])(Y – E[Y])]
• Interpretation:
• Define U(X) := X – E[X] and V(Y) := Y – E[Y] (Note: U and V have expectation 0)
• In the joint distribution P(U,V),
if larger (more +ve) values of U typically correspond to larger values of V, and
smaller (more –ve) values of U typically correspond to smaller values of V,
then U and V co-vary positively
• In the joint distribution P(U,V),
if larger values of U typically correspond to smaller values of V, and …
then U and V co-vary negatively
• Property: Symmetry: Cov(X,Y) = Cov(Y,X)
Covariance
• Examples
Covariance
• Property: Cov(X,Y) = E[XY] – E[X]E[Y]
• Proof:
• Cov(X,Y) = E[(X – E[X])(Y – E[Y])] = E[XY] – E[X]E[Y] – E[X]E[Y] + E[X]E[Y] = E[XY] – E[X]E[Y]
• So, Var(X+Y) = Var(X) + Var(Y) + 2(E[XY] – E[X]E[Y]) = Var(X) + Var(Y) + 2Cov(X,Y)
• Also, when X and Y are independent, then Cov(X,Y) = 0
• Property: When Var(X) and Var(Y) are finite, and one of them is 0,
then Cov(X,Y)=0
• Property: When Y := mX + c (with finite m), what is Cov(X,Y) ?
• Cov(X,Y) = E[XY] – E[X]E[Y]
= E[mX2 + cX] – E[X](m.E[X] + c)
= m.E[X2] – m(E[X])2 = m.Var(X)
• When Var(X)>0, covariance is ∝ line-slope ‘m’, and has same sign as that of m
Covariance
• Bilinearity of Covariance
• Let X, X1, X2, Y, Y1, Y2 be random variables. Let c be a scalar constant.
• Property: Cov(X1 + X2, Y) = Cov(X1, Y) + Cov(X2, Y) = Cov(Y, X1 + X2)
• Proof (first part; second part follows from symmetry):

• Property: Cov(aX, Y) = a.Cov(X, Y) = Cov(X, aY)

• Proof (first part):
• Cov(aX, Y)
= E[ aXY ] − E[ aX ]E[ Y ]
= a (E[ XY ] − E[ X ]E[ Y ])
= a Cov(X,Y)
Standardized Random Variable
• Definition:
If X is a random variable, then its standardized form is given by
X* := (X – E[X]) / SD(X), where SD(.) gives the standard deviation
• Property: E[X*] = 0, Var(X*) = 1
• Proof:

• X* is unit-less
• X* is obtained by:
• First shifting/translating X to make mean 0, and
• Then scaling the shifted variable to make variance 1
Correlation
• For covariance, the magnitude isn’t easy to interpret (unlike its sign)
• Correlation: A measure of how the values taken by X and Y vary
together (“co”-“relate”) obtained by rescaling covariance
• Pearson’s correlation coefficient
• Assuming X and Y are linearly related, correlation magnitude shows the
strength of the (functional/deterministic) relationship between X and Y
• Let ‘SD’ = standard deviation
• Definition: Cor(X,Y) :=

• Thus, Cor(X,Y) = E[XY], where X* and Y* are the standardized variables

= E[X*Y*] – E[X*]E[Y*]
= Cov(X*,Y*)
Correlation
• Property: -1 ≤ Cor(X,Y) ≤ 1
• Proof:
• First inequality
• 0 ≤ E[(X*+Y*)2]
= E[(X*)2] + E[(Y*)2] + 2E[X*Y*]
= 2(1 + Cor(X,Y))
• So, –1 ≤ Cor(X,Y)

• Second inequality
• 0 ≤ E[(X*–Y*)2]
= E[(X*)2] + E[(Y*)2] – 2E[X*Y*]
= 2(1 – Cor(X,Y))
• So, Cor(X,Y) ≤ 1
Correlation
• Property: If X and Y are linearly related, i.e., Y = mX + c,
and are non-constant (i.e., SD(X)>0 and SD(Y)>0),
then |Cor(X,Y)| = 1
• Proof:
• When Y = mX + c, then SD(Y) = |m| SD(X)
• Cor(X,Y)
= Cov(X,Y) / (SD(X) SD(Y))
= mVar(X) / (SD(X) |m|SD(X))
= ±1
= sign of the slope m
Correlation
• Property: If |Cor(X,Y)| = 1, then X and Y are linearly related
• Proof:
• If Cor(X,Y) = 1, then E[(X*–Y*)2] = 2(1 – Cor(X,Y)) = 0
• For discrete X,Y: this must imply X*=Y* for all (x’,y’) where P(X=x’,Y=y’) > 0
• Else the summation underlying the expectation cannot be zero
• For continuous X,Y: this must imply X*=Y* for all measures (dx’,dy’) where P(dx’,dy’) > 0
• X* and Y* can be unequal only on a countable set of isolated points where P(dx’,dy’) > 0
• Else the integral underlying the expectation cannot be zero
• If Cor(X,Y) = (–1), then E[(X*+Y*)2] = 2(1 + Cor(X,Y)) = 0
• For discrete X,Y: this must imply X*=(–Y*) for all (x’,y’) where P(X=x’,Y=y’) > 0
• For continuous X,Y: this must imply X*=(–Y*) for all measures (dx’,dy’) where P(dx’,dy’) > 0
• Inequality can hold only on a countable set of isolated points where P(dx’,dy’) > 0
• If X* = ±Y*, then Y must be of the form mX+c
Correlation
• If |Cor(X,Y)|=1 (or Y=mX+c), then
how to find the equation of the line from data {(xi,yi): i=1,…,n}?
• By the way: line must pass through (E[X],E[Y])
• Because, when X=E[X], value of Y must be mE[X]+c, but that also equals E[Y]
• We proved that: if Y=mX+c, then |Cor(X,Y)|=1 and Y* = ±X* = Cor(X,Y) X*
• So, (Y – E[Y]) / SD(Y) = Cor(X,Y) (X – E[X]) / SD(X)
• So, Y = E[Y] + SD(Y) Cor(X,Y) (X – E[X]) / SD(X)
• So, Y = E[Y] + Cov(X,Y) (X – E[X]) / Var(X)
• This gives the equation of the line with:
• Slope m := Cov(X,Y) / Var(X)
• Intercept c := E[Y] – Cov(X,Y) E[X] / Var(X)
Correlation
• Examples
Correlation
• Four sets of data with the same correlation of 0.816
• Blue line indicates the line passing through (E[X],E[Y]) with slope = 0.816
(more on this when we study estimation)
• So, correlation = 0.816
doesn’t always mean that data
lies along a line of slope 0.816
• This indicates the likely
misinterpretation of correlation
when variables underlying data
aren’t linearly related
Correlation
• Zero correlation doesn’t imply independence

• We showed that independence implies zero covariance/correlation,

but the converse isn’t always true
• Example: Let X be uniformly distributed within [-1,+1]. Let Y := X2.
• Cov(X,X2) = E[X.X2] – E[X]E[X2] = E[X3] – 0.E[X2] = 0
• Thus, Cov(X,Y) = 0 = Cor(X,Y) even though Y is a deterministic function of X
Correlation
• Non-zero correlation doesn’t imply causation
• https://hbr.org/2015/06/beware-spurious-correlations
• https://science.sciencemag.org/content/348/6238/980.2
• http://www.tylervigen.com/spurious-correlations
Correlation
• Non-zero correlation doesn’t imply causation
• https://hbr.org/2015/06/beware-spurious-correlations
• https://science.sciencemag.org/content/348/6238/980.2
• http://www.tylervigen.com/spurious-correlations
Correlation
• Non-zero correlation
doesn’t imply causation
Correlation
• Non-zero correlation doesn’t imply causation
Correlation
• Non-zero correlation
doesn’t imply causation

UNILORIN 2022-23 UTME CUT-OFF (TripleHay)
100% (1)
UNILORIN 2022-23 UTME CUT-OFF (TripleHay)
3 pages
3.5 Food Tests
No ratings yet
3.5 Food Tests
5 pages
A Study On Tata Consultancy Services
No ratings yet
A Study On Tata Consultancy Services
4 pages
Calculating Mast and Rigging PDF
100% (1)
Calculating Mast and Rigging PDF
19 pages
TLE-CSS Grade9 Q1 LAS1
No ratings yet
TLE-CSS Grade9 Q1 LAS1
6 pages
ҰБТ тест жинағы Ағылшын
No ratings yet
ҰБТ тест жинағы Ағылшын
96 pages
Desert Rivers
No ratings yet
Desert Rivers
6 pages
Manual Espejo Retrovisor Con Cámara
100% (1)
Manual Espejo Retrovisor Con Cámara
37 pages
Probability Distribution
0% (1)
Probability Distribution
21 pages
The Hexagon For Trigonometric Identities
No ratings yet
The Hexagon For Trigonometric Identities
11 pages
QUALITATIVE ANALYSIS OF GROUP II CATIONS Lab Chm360 2 Full
No ratings yet
QUALITATIVE ANALYSIS OF GROUP II CATIONS Lab Chm360 2 Full
8 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
OptimalLinearFilters PDF
No ratings yet
OptimalLinearFilters PDF
107 pages
5.atmosphere and Role of Atmosphere in Climate Control
No ratings yet
5.atmosphere and Role of Atmosphere in Climate Control
3 pages
B4 13858
No ratings yet
B4 13858
125 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Probability
No ratings yet
Probability
28 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Random Variables and Probability Distribution
No ratings yet
Random Variables and Probability Distribution
73 pages
Statistics Concepts: An Overview of Upper-Division Statistics With R
No ratings yet
Statistics Concepts: An Overview of Upper-Division Statistics With R
69 pages
Chapter 5 Sampling in Discrete Even Simulation
No ratings yet
Chapter 5 Sampling in Discrete Even Simulation
56 pages
Random Variables: Fall 2017 Instructor: Ajit Rajwade
No ratings yet
Random Variables: Fall 2017 Instructor: Ajit Rajwade
74 pages
Chapter I: Word Structure and Formation: 1.1 The Morpheme
No ratings yet
Chapter I: Word Structure and Formation: 1.1 The Morpheme
12 pages
Axdif
No ratings yet
Axdif
38 pages
Review Some Basic Statistical Concepts: Topic
No ratings yet
Review Some Basic Statistical Concepts: Topic
55 pages
Expectations of Discrete Random Variables: Scott Sheffield
No ratings yet
Expectations of Discrete Random Variables: Scott Sheffield
61 pages
Random Variables
No ratings yet
Random Variables
44 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Random Variables: - Definition of Random Variable
No ratings yet
Random Variables: - Definition of Random Variable
29 pages
Random Variables PDF
No ratings yet
Random Variables PDF
64 pages
FitTrack Gold Manual 2019
No ratings yet
FitTrack Gold Manual 2019
178 pages
1853 - Random Variable & Distribution
No ratings yet
1853 - Random Variable & Distribution
43 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
Lecture 2
No ratings yet
Lecture 2
29 pages
STUDYBLUE - Find and Share Online Flashcards and Notes From StudyBlue
No ratings yet
STUDYBLUE - Find and Share Online Flashcards and Notes From StudyBlue
15 pages
3 - DscrtRandVar.
No ratings yet
3 - DscrtRandVar.
28 pages
Simulation
No ratings yet
Simulation
59 pages
Cartan Formalism
No ratings yet
Cartan Formalism
37 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-01-22 Reference-Material-I
No ratings yet
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-01-22 Reference-Material-I
32 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Slides 9 A
No ratings yet
Slides 9 A
44 pages
Chapter 6
No ratings yet
Chapter 6
39 pages
Urban Studies Case Study-Townships: Location
No ratings yet
Urban Studies Case Study-Townships: Location
10 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
MAS 102 - Topic 1
No ratings yet
MAS 102 - Topic 1
13 pages
Lecture 1-1 - Review of Probability
No ratings yet
Lecture 1-1 - Review of Probability
36 pages
L05 Final
No ratings yet
L05 Final
19 pages
Chapter 5 Prob
No ratings yet
Chapter 5 Prob
6 pages
KPMG 2024 Zimbabwe National Budget Highlights
No ratings yet
KPMG 2024 Zimbabwe National Budget Highlights
11 pages
Notes UnitIV
No ratings yet
Notes UnitIV
32 pages
ECE316 Notes 2
No ratings yet
ECE316 Notes 2
25 pages
Structure and Features: Mounting Holed Type High Rigidity Crossed Roller Bearings V
No ratings yet
Structure and Features: Mounting Holed Type High Rigidity Crossed Roller Bearings V
14 pages
DI&M Part3
No ratings yet
DI&M Part3
18 pages
Ugc Net Economics English Book 2
No ratings yet
Ugc Net Economics English Book 2
17 pages
Unit02 Slide
No ratings yet
Unit02 Slide
53 pages
MA1201 Probability Notes
No ratings yet
MA1201 Probability Notes
30 pages
Draw PDF
No ratings yet
Draw PDF
21 pages
Statistics Week1
No ratings yet
Statistics Week1
48 pages
Discrete Random Variable
No ratings yet
Discrete Random Variable
41 pages
3 Discrete Random Variables and Probability Distributions
No ratings yet
3 Discrete Random Variables and Probability Distributions
22 pages
Week 5-8 Short Notes
No ratings yet
Week 5-8 Short Notes
10 pages
5th Floor - Main Office Ceiling
No ratings yet
5th Floor - Main Office Ceiling
12 pages
Stats Cheat Sheets
No ratings yet
Stats Cheat Sheets
15 pages
Articulo Biologia Molecular
No ratings yet
Articulo Biologia Molecular
12 pages
PML Class 0 2025
No ratings yet
PML Class 0 2025
55 pages
CAT 325D (Final)
No ratings yet
CAT 325D (Final)
7 pages
Chapter 6
No ratings yet
Chapter 6
5 pages
Random Variable and Mathematical Expectation
No ratings yet
Random Variable and Mathematical Expectation
9 pages
Betriebsanleitung Schieber CA Engl
No ratings yet
Betriebsanleitung Schieber CA Engl
9 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
LEC0125 RNG Generation
No ratings yet
LEC0125 RNG Generation
7 pages
Ts X Biology Final Exam Revision 2023-24
No ratings yet
Ts X Biology Final Exam Revision 2023-24
7 pages
Family Miles 23.9.2023
No ratings yet
Family Miles 23.9.2023
5 pages
RV Intro
No ratings yet
RV Intro
5 pages
ECMT1020 Lecture Notes 01 rv1
No ratings yet
ECMT1020 Lecture Notes 01 rv1
6 pages
LN06 Random Variables
No ratings yet
LN06 Random Variables
5 pages
060 Random Variables
No ratings yet
060 Random Variables
5 pages
Easybib 553e7541694d58 39916757
No ratings yet
Easybib 553e7541694d58 39916757
7 pages
DS-M5504HM-T Series Mobile DVR: Main Features
No ratings yet
DS-M5504HM-T Series Mobile DVR: Main Features
4 pages
SAPC
No ratings yet
SAPC
2 pages
Recommendations For Gem Stones: Mousumi Chttopadhyay
No ratings yet
Recommendations For Gem Stones: Mousumi Chttopadhyay
1 page
10 Random Variables
No ratings yet
10 Random Variables
2 pages
2221 Textile Performance Testing Year II Semester II
No ratings yet
2221 Textile Performance Testing Year II Semester II
2 pages
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

3 Expectation

Uploaded by

3 Expectation

Uploaded by

CS 215

Data Analysis and Interpretation

• This gives meaning to parameter λ as average number of arrivals in unit time

• This gives meaning to parameter β as average inter-arrival time

• Gaussian expectation μ(=np here) is

• E[a X] = a E[X], where ‘a’ is a scalar constant

• This generalizes to:

s x := X(s) y := Y(x) := Y(X(s))

• Discrete random variable: 𝐸 𝑌 𝑋 ≔ 𝐸" # 𝑌(𝑋) ≔ ∑!! 𝑌 𝑥$ 𝑃(𝑥$ )

• If X and Y are independent, then E[XY] = E[X] E[Y]

Sum over rows (row number = x)

Sum over columns (column number = k)

• So, Var(X) = β2 . So, β = E[X] = SD(X); unlike Poisson.

• Gaussian variance σ2 (= npq in this case) is

• If X and Y are independent,

• Theorem: Let X be a random variable with PDF P(.),

• Property: Cov(aX, Y) = a.Cov(X, Y) = Cov(X, aY)

• Thus, Cor(X,Y) = E[X*Y*], where X* and Y* are the standardized variables

• We showed that independence implies zero covariance/correlation,

You might also like

• Thus, Cor(X,Y) = E[XY], where X* and Y* are the standardized variables