Intro To Data Science Lecture 2
Intro To Data Science Lecture 2
Intro To Data Science Lecture 2
OUTLINE
2. Probability Distributions
Civil Engineers 3. Estimation
Desirable Properties of Estimator
Maximum Likelihood Estimator
4. Hypothesis Testing and Confidence Intervals
Lecture 1b. Review of Probability and
Statistics
Some of the figures in this presentation are taken from "An Introduction to Statistical
Learning, with applications in R" (Springer, 1st Edition, 2013; 2nd Edition, 2021) with
Fall 2022 permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani
1 2
PROBABILITY
1. PROBABILITY AND RANDOM VARIABLE
Axioms of Probability 𝒴
Probability 0≤𝑃 𝐵 ≤1
𝐴
A measure of the expectation that an event will occur. The 𝐵
𝑃 𝒴 =1
probability 𝑃(𝐸) of an event 𝐸 is a real number in the range
of 0 to 1.
Set operations
Definition Based on Relative Frequency
Let 𝒴 be a universal set, 𝐴 and 𝐵 are two subsets of 𝒴.
An experiment is repeated 𝑛 trials, and 𝑦 is the outcome of
ith trial. Let 𝑦 be define on 𝒴, and 𝐵 be a subset of 𝒴. Union of 𝐴 and 𝐵: 𝐴 ∪ 𝐵
5 6
If a and b are constants, 𝑋 and 𝑌 are random variables 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 ) – 𝜇 = 𝐸(𝑋 ) – 𝐸(𝑋)
𝐸(𝑎) = 𝑎, 𝑉𝑎𝑟(𝑎) = 0
𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2𝑉𝑎𝑟(𝑋)
𝐸(𝑎𝑋 + 𝑏) = 𝑎𝐸(𝑋) + 𝑏
𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) + 2𝐶𝑜𝑣(𝑋, 𝑌)
𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌)
𝑉𝑎𝑟(𝑋 − 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) − 2𝐶𝑜𝑣(𝑋, 𝑌)
𝐸(𝑋 − 𝑌) = 𝐸(𝑋) − 𝐸(𝑌)
If 𝑋 and 𝑌 are uncorrelated, 𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋)
𝐸 𝑋− 𝜇 =0 𝑜𝑟 𝐸(𝑋 − 𝐸(𝑋)) = 0 + 𝑉𝑎𝑟(𝑌)
𝐸((𝑎𝑋)2) = 𝑎2𝐸(𝑋2)
7 8 2
9/6/2022
9 10
11 12 3
9/6/2022
𝜎 2𝜋 0.3
0.25
f(x)
0.2
deviation 𝜎. We write it as 𝑋 ∼ 𝑁 𝜇, 𝜎 .
0.05
Therefore, we have -4 -3 -2 -1
0
0 1 2 3 4
13 14
CHI-SQUARE DISTRIBUTION
PROPERTIES OF NORMAL DISTRIBUTIONS A chi-square distribution is defined as the sum of the
squares of 𝑛 independent unit normal distributions (𝑛 > 0),
If 𝑋~𝑁(𝜇, 𝜎2), then 𝑎𝑋 + 𝑏 ~𝑁(𝑎𝜇 + 𝑏, 𝑎2𝜎2) and 𝑛 is named the degrees of freedom of the chi-square
distribution. That is, if
A linear combination of independent, identically
distributed (i.i.d.) normal random variables will 𝑋= 𝑍
also be normally distributed
If Y1,Y2, … Yn are i.i.d. and ~𝑁(𝜇, 𝜎2), then in which 𝑍 , . . . , 𝑍 are independent, unit normal random variables,
then 𝑋 follows a chi-square distribution with 𝑛 degrees of freedom,
denoted as 𝑋~𝜒 .
2
Y ~ N ,
n
15 16 4
9/6/2022
Some characteristics of chi-square distributions: If 𝑌 is the sum of 𝑛 independent and identically distributed
The distribution is over the nonnegative region; (i.i.d.) random variables, 𝑋 , 𝑋 , ⋯ 𝑋 , then the distribution
of 𝑌 can be well-approximated by a normal distribution
With the increase of the number of degrees of freedom, the when 𝑛 is large.
distribution shifts to the right (larger values) and tends to be
more symmetric and bell-shaped (towards normal distribution If 𝑋 (𝑖 = 1,2, ⋯ 𝑛) has a mean 𝜇 and variance 𝜎 , then 𝑌 = ∑ 𝑋
based on the central limit theorem) has a mean 𝑛𝜇 and a variance 𝑛𝜎 .
17 18
T DISTRIBUTION
T DISTRIBUTION
A random variable 𝑇 that follows a t-distribution
(sometimes also called Student’s t-distribution) with 𝑛 The characteristics of the t-distribution similar to the
degrees of freedom can be written as the quotient of two standardized normal distribution:
independent variables Z and R where Z is unit normal and R Its p.d.f. is bell-shaped, symmetric about the mean (0).
is the root mean square of 𝑛 other independent unit normal The mean, median, and mode of a t-distribution equal 0.
variables; that is, Its p.d.f. curve never touches the x axis.
𝑡 = =
𝝌𝟐
𝒏
Characteristics of the t-distribution that differ from the
note that 𝜒 is chi-square distribution with 𝑛 degrees of freedom; Z standardized normal distribution:
and 𝜒 are statistically independent. 0.45
Normal The variance of a t-distribution is greater than 1.
0.4
t (n=1)
0.35
t (n=2) The t-distribution is a family of curves based on the concept of
0.3
0.25
t (n=5) degrees of freedom.
t (n=10)
f(x)
19 20 5
9/6/2022
21 22
F DISTRIBUTION
3. ESTIMATION
A random variable F that follows an F distribution with
degrees of freedom of 𝑛 and 𝑛 can be written as the For a population with unknown distribution parameters
quotient of two independent variables 𝑅 and 𝑅 where 𝑅 is (e.g., mean μ and variance 𝜎 ), we take a sample of N data
the mean square of 𝑛 independent unit normal variables, and obtain estimates of population characteristics from the
and 𝑅 is the mean square of 𝑛 other independent unit sample.
normal variables; that is,
Estimator: a rule that gives a sample estimate
𝝌𝟐𝒏𝟏 Estimate: a number that is calculated from the sample based on
𝐹 , = =
𝒏𝟏
the estimator
𝝌𝟐𝒏𝟐
𝒏𝟐 For example, let X represent the roughness of pavement in
note that 𝜒 and 𝜒 are chi-square distributions with 𝑛 and 𝑛 Florida. X can be treated as a R.V. (why?)
degrees of freedom, respectively; 𝜒 and 𝜒 are statistically We are interested in the mean of 𝑋 (i.e., 𝐸 𝑋 = 𝜇 =?)
independent. We may measure the roughness at N randomly selected locations
In hypothesis testing of parameters of a linear regression across the Florida pavement network, and get N samples of
model, F distribution is used for testing of a hypothesis roughness, (𝑋 , 𝑋 , ⋯ , 𝑋 ).
involving multiple parameters. 𝑋 , 𝑋 , ⋯ , 𝑋 are i.i.d. (why?)
In contrast, t-distribution is used for testing of a hypothesis involving one parameter.
23 24 6
9/6/2022
1 1 1 1 1 𝜎
𝑣𝑎𝑟 𝑋 = 𝑣𝑎𝑟 𝑋 = 𝑣𝑎𝑟 𝑋 = 𝑣𝑎𝑟(𝑋 ) = 𝜎 = 𝑁𝜎 =
𝑁 𝑁 𝑁 𝑁 𝑁 𝑁
25 26
27 28 7
9/6/2022
29 30
33 34
35 36 9
9/6/2022
𝐿 𝑿; 𝑝 = 𝑝 (1 − 𝑝) = 𝑝 (1 − 𝑝) = 𝑝 (1 − 𝑝) dl n n n
d ( 2 )
2
2 2 4
( y
i 1
i )2 0
The log-likelihood is ℒ 𝑿; 𝑝 = 𝑙𝑜𝑔𝐿 𝑿; 𝑝 = 𝑁 log 𝑝 + 𝑁 − 𝑁 log(1 − 𝑝) Fortunately first of these equations can be solved without knowledge
𝑑ℒ 𝑿; 𝑝 𝑁 𝑁−𝑁 𝑁 1
about the second one. Then if we use result from the first solution in
= − = 0 ⇒ 𝑝̂ = = 𝑋 =𝑋 the second solution (substitute by its estimate) then we can solve
𝑑𝑝 𝑝 1−𝑝 𝑁 𝑁
second
n
equation also. Result of this will be sample variance:
1
The MLE estimator happens to be the sample mean, which is unbiased and s 2 ( yi ˆ )2
efficient. n i 1 Note it is divided by n, not (n-1), so this is a biased estimator!
37 38
If the estimate is less than −𝜃 or greater than 𝜃 , we reject We know the sample mean is an unbiased and efficient
41 42
If we select a level of statistical significance of = 0.01, from Remember our test statistic 𝑍 = . If we replace 𝜎 with S,
a unit normal distribution table, we may find that 𝑍 =2.5758.
the test statistic does not follow a standardized normal
We compute the sample mean 𝑋 from samples (𝑋 , 𝑋 , ⋯ , 𝑋 )
distribution any more. Instead, it follows a t-distribution with
If Z = is in the acceptance region [−𝑍 , 𝑍 ], we accept 𝐻 , N-1 degrees of freedom.
43 44 11
9/6/2022
f(x)
is true.
0.2
0.15
/2 0.1 /2
The probability of making Type I error 0.05
f(x)
0.2
1- 2 2
/2
0.15
𝛼 𝛼
1- β is the probability of correctly = 0.005
0.1
0.05 = 0.005 2
2
rejecting the null hypothesis when it is -4 -2
0
0 2
/2 4 𝑍
x
false. It is called the power of the test. Z/2 Z1-/2 −𝑍 −𝑍 0 𝑍 𝑍
(test statistic
Rejection region Rejection region based on N samples)
45 46
0.2
/2
0.15
0.1 /2 P X−Z ≤μ≤X+Z =1−α
0.05
0
n n
-4 -2 0 2 4
x
Z/2 Z1-/2
47 48 12