Lec 1
Lec 1
2023
1.1. Concept
1.2. Branches of Statistics
1.3. Data Sources
1.4. Population and Sample
1.5. Structure of Classical Data
1.6. Types of Variable
1.7. Revision of Probability
Reference
Book [1] Chapter 1, pp.1 - 9
Revision of Probability: Book [1] Chapter 2,3,4,5
August 9, 2023 4 / 30
1.1. Concept
Opinions Statistics
It seems that in NEU, the number In total 16,000 students, there are
of female students is greater than 10,000 females (62.5%) and 6,000
that of males males (37.5%)
https://ptkt.vietstock.vn/
August 9, 2023 5 / 30
1.2. Branches of Statistics
Inferential Statistics
predict, forecast, and verify knowledge by analyzing data
August 9, 2023 6 / 30
1.3. Data Sources
August 9, 2023 7 / 30
1.4. Population and Sample
August 9, 2023 8 / 30
1.4. Population and Sample
Population Sample
Set of all interested elements subset of Population
Size N, maybe infinite n, finite
Value Parameter Statistics
•
• • • • Sample 1
• •••
•••
•
•
• ⋆ ⋆ ⋆ Sample 2
•⋆⋆⋆ ⋆⋆⋆
• ⋆⋆⋆ ⋆⋆⋆
• ⋆⋆⋆
Population, N = 100
August 9, 2023 9 / 30
Variable
August 9, 2023 10 / 30
1.5. Classical Data Set
Observations (in row); variables (in column); values (in cell)
August 9, 2023 12 / 30
Qualitative vs Quantitative
Qualitative
Data obtained from categorical questions, values are words, and there is
no measurable meaning to the “difference” in numbers
Nominal (incomparable values), e.g., names, addresses
Ordinal (comparable values), e.g., product quality rating (1: poor; 2:
average; 3: good)
Quantitative
Data obtained from numerical variables, values are numerical, and there is
a measurable meaning to the “difference” in numbers
Interval: data are provided relative to an arbitrarily determined
benchmark, e.g., temperature (degree of Celsius), time (Gregorian
calendar)
Ratio: with ratios of two measures having meaning, e.g., weight, age
August 9, 2023 13 / 30
1.7. Revision of Probability
Probability of intersection
P(A ∩ B) = P(A)P(B|A)
August 9, 2023 14 / 30
1.7. Revision: Random Variable (r.v)
Discrete rv: PMF: px = P(X = x); Continuous rv: PDF: f (x)
X Z +∞
pxi = 1 ; f (x)dx = 1
i −∞
Expected value:
X Z +∞
E (X ) = xi pi ; E (X ) = xf (x)dx
i −∞
p
σX = V (X )
Covariance
Cov (X , Y ) = E X − E (X ) Y − E (Y ) = E (XY ) − E (X )E (Y )
August 9, 2023 15 / 30
1.7. Revision: Random Variable (cont.)
Correlation coefficient
Cov (X , Y )
ρX ,Y =
σX σY
E (X ) = µ1 ; V (X ) = µc2 = µ2 − µ21
Skewness and Kurtosis
3 4
E X − E (X ) E X − E (X )
Skew = ; Kurt =
σ3 σ4
August 9, 2023 16 / 30
1.7. Properties of Parameters
E (c) = c V (c) = 0
E (X + c) = E (X ) + c V (X + c) = V (X )
E (cX ) = cE (X ) V (cX ) = c 2 V (X )
E (X ±Y ) = E (X )±E (Y ) V (X ±Y ) = V (X )+V (Y ) ± 2Cov (X , Y )
P P P P
E i Xi = i E (Xi ) V i Xi = i V (Xi ) : Xi independent
August 9, 2023 17 / 30
Example
Example 1.1
Players A and B play a game that have no draw, they are equally likely to
win each match. They intend to play 9 matches, who wins more will take
prize of 1 thousand USD. But after 7 matches and ratio of A:B is 4:3.
How to distribute the money?
Example 1.2
A couple have an online appointment, from 0:00 to 1:00, the first comer
will wait only 20 minutes. Find the probability that they meet each other.
Example 1.3
Consider the Vietnamese gamble “danh de” and its profit X
Find E (X ) and V (X ) when play 1 (mil.VND) in one day;
Compare playing 10 mil in one day, and play 10 days, each day 1 mil.
August 9, 2023 18 / 30
1.7. Revision: Common Discrete Distributions
August 9, 2023 19 / 30
1.7. Revision: Common Continuous Distribution
August 9, 2023 20 / 30
1.7. Revision: Normal Distribution
X −µ
X ∼ N(µ, σ 2 ) ⇒ Z = ∼ N(0, 1)
σ
b−µ
P(X < b) = P Z < ; P(Z < b ⋆ ) = P(X < µ + b ⋆ σ)
σ
f
Z ∼ N(0, 1) X −µ
Z=
σ
X ∼ N(µ, σ 2 )
X = µ + Zσ
• •
0 b−µ µ b x
σ
= b⋆ = µ + b⋆ σ
August 9, 2023 21 / 30
1.7. Revision: Critical Value
Definition
Critical value level α of Z distribution, denoted by zα , is a number that
P(Z > zα ) = α
z1−α = zα
z0 = +∞; z1 = −∞; z0.5 = 0
z0.05 = 1.645; z0.025 = 1.96
Z ∼ N(0, 1)
α
• •
z1−α 0 0 zα
= −zα
August 9, 2023 22 / 30
Example
Example 1.4
Income X is normal distributed with mean of 500 USD and variance of
400 USD2 .
(a) Find the probability that X > 510
(b) With probabilty of 0.95, find the upper limit of X
(c) With probabilty of 0.95, find the lower limit of X
Example 1.5
X ∼ N(µ, σ 2 ). With probability of (1 − α)
(a) Find the upper limit of X
(b) Find the lower limit of X
(c) Find an interval around the mean that X falls into
August 9, 2023 23 / 30
1.7. Revision: Chi-squared Distribution
Definition
v
Zi2 is Chi-squared
P
If Zi ∼ N(0, 1) and independent, then X =
i=1
distributed v degree of freedom, denoted by X ∼ χ2 (v ).
f χ2 (v )
α
•
0 χ2(v )α
August 9, 2023 24 / 30
1.7. Revision: Student Distribution
Definition
Z
If Z ∼ N(0, 1) and X ∼ χ2 (v ), independent, then T = p is Student
X /v
distributed v degree of freedom, denoted by T ∼ T (v ).
f
v = 100
v =2
•
0 t(100)α
August 9, 2023 25 / 30
1.7. Revision: Fisher Distribution
Definition
X1 /v1
If X1 ∼ χ2 (v1 ) and X2 ∼ χ2 (v2 ) are independent, then F = is
X2 /v1
Fisher distributed v1 , v2 degree of freedom, denoted by F ∼ F (v1 , v2 ).
Reference
For χ2 , T , F distribution, reference the book [1] pp. 315 - 325.
August 9, 2023 26 / 30
Example
Example 1.6
Find the following critical values and their probability meaning
χ2(20)0.05 , χ2(20)0.95
t(20)0.05 , t(20)0.95
t(200)0.025
f(2,10)0.05 , f(10,2)0.05 , f(2,10)0.95
Example 1.7
Error of a measurement is N(0, 1), and repair cost is square of error.
With probability of 0.95, find upper limit of cost of a measurement
Find probability that total cost of 3 independent measurements is
greater than 6.49
August 9, 2023 27 / 30
1.7. Revision: Central Limit Theorem
Theorem (simplified)
If X1 , X2 , ..., Xn are independent, identically distributed with mean of µ,
variance of σ 2 then
Pn
σ2
i=1 Xi n→∞
X = −−−→∼ N µ,
n n
August 9, 2023 28 / 30
Practice: Microsoft Excel and R
August 9, 2023 29 / 30
Exercise
August 9, 2023 30 / 30