0% found this document useful (0 votes)
18 views30 pages

Lec 1

The document provides information about a course in mathematical statistics, including: - Total lecture and tutorial time is 45 and 15 hours respectively. Software used includes Excel and R. Evaluation is based on attendance, quizzes, a computer test, and a final exam. - The course covers topics like descriptive and inferential statistics, data sources, population and sampling, variables, and probability. References for further reading are also provided. - The first lecture introduces key concepts and discusses quantitative and qualitative variables as well as probability rules.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views30 pages

Lec 1

The document provides information about a course in mathematical statistics, including: - Total lecture and tutorial time is 45 and 15 hours respectively. Software used includes Excel and R. Evaluation is based on attendance, quizzes, a computer test, and a final exam. - The course covers topics like descriptive and inferential statistics, data sources, population and sampling, variables, and probability. References for further reading are also provided. - The first lecture introduces key concepts and discusses quantitative and qualitative variables as well as probability rules.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

MATHEMATICAL STATISTICS

Bachelor Program in Actuary and DSEB

Dong Xuan Bach


https://sites.google.com/view/xuanbachdong/home Link

2023

Dong Xuan Bach Mathematical Statistics 2023 1 / 30


Course Information

Total time: Lecture: 45 hours; Tutorial: 15 hours


Software: Microsoft Excel, R
Evaluation:
Attendance: 10%
Individual quizzes: 20%
Computer test: 20%
Final exam: 50%

Dong Xuan Bach Mathematical Statistics 2023 2 / 30


References

1 Jay L. Devore, Kenneth N. Berk (2012), Modern Mathematical


Statistics with Applications, 2nd Edition, Springer.
2 Irwin Miller, Marylees (2014), John E. Freund’s Mathematical
Statistics with Applications, 8th Edition, Pearson.
3 Robert V. Hogg, Joseph W. McKean, Allen T. Craig (2013),
Introduction to Mathematical Statistics, 7th Edition, Pearson.
4 Paul Newbaul, William L. Carlson, Betty M. Thorne (2013), Statistics
for Business and Economics, 8th Edition, Pearson.

Dong Xuan Bach Mathematical Statistics 2023 3 / 30


Lec01. Introduction

1.1. Concept
1.2. Branches of Statistics
1.3. Data Sources
1.4. Population and Sample
1.5. Structure of Classical Data
1.6. Types of Variable
1.7. Revision of Probability

Reference
Book [1] Chapter 1, pp.1 - 9
Revision of Probability: Book [1] Chapter 2,3,4,5

August 9, 2023 4 / 30
1.1. Concept
Opinions Statistics
It seems that in NEU, the number In total 16,000 students, there are
of female students is greater than 10,000 females (62.5%) and 6,000
that of males males (37.5%)

In general, the older the cus- Age 20-29 30-39 40+


tomers are, the less they spend Spend 35 28 23

Growth trend will continue !

https://ptkt.vietstock.vn/

August 9, 2023 5 / 30
1.2. Branches of Statistics

Two main branches:


Descriptive Statistics
organize, summarize, and present data in a convenient and informative way

Inferential Statistics
predict, forecast, and verify knowledge by analyzing data

August 9, 2023 6 / 30
1.3. Data Sources

Primary data Secondary data


From Self-survey, Other parties
questionairs, records official reports, publishs
Advantages Relevant to the purpose Official, high accuracy
flexible, deep information Bigger data, less expense
Disadvantages Costly, not response Not completely relevant
missing information No further information
errors in measures,
errors in method,...

August 9, 2023 7 / 30
1.4. Population and Sample

An investigation will typically focus on a well-defined collection of


objects constituting a population of interest, e.g., the first-year
students at NEU
When desired information is available for all objects in the population,
we have what is called a census
Constraints on time, money, and other scarce resources make a census
impractical or infeasible
A subset of the population –a sample– is selected

August 9, 2023 8 / 30
1.4. Population and Sample

Population Sample
Set of all interested elements subset of Population
Size N, maybe infinite n, finite
Value Parameter Statistics


• • • • Sample 1
• •••
•••


• ⋆ ⋆ ⋆ Sample 2
•⋆⋆⋆ ⋆⋆⋆
• ⋆⋆⋆ ⋆⋆⋆
• ⋆⋆⋆
Population, N = 100

August 9, 2023 9 / 30
Variable

We are usually interested only in certain characteristics of the


objects in a population, e.g., the amount of vitamin C in the pill, the
gender of a mathematics graduate, the age at which the individual
graduated, etc.
A variable is any characteristic whose value may change from one
object to another in the population
Denote variables by lowercase letters, e.g., x = the amount of vitamin
C in a pill, y = the gender of a mathematics graduate, and z = the
age at which an individual graduated

August 9, 2023 10 / 30
1.5. Classical Data Set
Observations (in row); variables (in column); values (in cell)

No. Name Sex Age Eng. mark Math score ···


1 Anderson Male 20 B 73 ···
2 Berky Female 19 A 80 ···
3 Charles Male 20 C 72 ···
.. .. .. .. .. .. ..
. . . . . . .

Observations on 1 variable: A univariate data, e.g., the age of the


students in the table above
Observations on 2 variables: Bivariate data, e.g., (Eng. mark, Math
score)
Observations on more than 2 variables: Multivariate data, e.g.,
(Name, Sex, Age)
Big data: Methodology for many more type, complex structure,
non-structure data
August 9, 2023 11 / 30
1.6. Classification of variables

One method of classification refers to the type and amount of


information contained in the data: categorical or numerical variables
Another method is to classify data by levels of measurement:
qualitative or quantitative variables

August 9, 2023 12 / 30
Qualitative vs Quantitative

Qualitative
Data obtained from categorical questions, values are words, and there is
no measurable meaning to the “difference” in numbers
Nominal (incomparable values), e.g., names, addresses
Ordinal (comparable values), e.g., product quality rating (1: poor; 2:
average; 3: good)

Quantitative
Data obtained from numerical variables, values are numerical, and there is
a measurable meaning to the “difference” in numbers
Interval: data are provided relative to an arbitrarily determined
benchmark, e.g., temperature (degree of Celsius), time (Gregorian
calendar)
Ratio: with ratios of two measures having meaning, e.g., weight, age

August 9, 2023 13 / 30
1.7. Revision of Probability

Probability of intersection

P(A ∩ B) = P(A)P(B|A)

P(A ∩ B) = P(A)P(B) ⇔ A, B independent


Probability of union

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

P(A ∪ B) = P(A) + P(B) ⇔ A, B mutually exclusive


Can be extended to n variables A1 , A2 , ..., An

August 9, 2023 14 / 30
1.7. Revision: Random Variable (r.v)
Discrete rv: PMF: px = P(X = x); Continuous rv: PDF: f (x)
X Z +∞
pxi = 1 ; f (x)dx = 1
i −∞

Expected value:
X Z +∞
E (X ) = xi pi ; E (X ) = xf (x)dx
i −∞

Variance and Standard Deviation


2 2
V (X ) = E X − E (X ) = E (X 2 ) − E (X )
 

p
σX = V (X )
Covariance
  
Cov (X , Y ) = E X − E (X ) Y − E (Y ) = E (XY ) − E (X )E (Y )

August 9, 2023 15 / 30
1.7. Revision: Random Variable (cont.)

Correlation coefficient
Cov (X , Y )
ρX ,Y =
σX σY

X , Y are independent ⇒ Cov (X , Y ) = ρX ,Y = 0


Moment: Raw moment and Central moment order k
k
µk = E (X k ) ; µck = E X − E (X )


E (X ) = µ1 ; V (X ) = µc2 = µ2 − µ21
Skewness and Kurtosis
 3  4
E X − E (X ) E X − E (X )
Skew = ; Kurt =
σ3 σ4

August 9, 2023 16 / 30
1.7. Properties of Parameters

Properties of Expected value and Variance, with constant c

E (c) = c V (c) = 0
E (X + c) = E (X ) + c V (X + c) = V (X )
E (cX ) = cE (X ) V (cX ) = c 2 V (X )
E (X ±Y ) = E (X )±E (Y ) V (X ±Y ) = V (X )+V (Y ) ± 2Cov (X , Y )
P  P P  P
E i Xi = i E (Xi ) V i Xi = i V (Xi ) : Xi independent

Cov (aX + bY ) = a2 V (X ) + b 2 V (Y ) ± 2abCov (X , Y )


ρ(aX , bY ) = ρX ,Y

August 9, 2023 17 / 30
Example

Example 1.1
Players A and B play a game that have no draw, they are equally likely to
win each match. They intend to play 9 matches, who wins more will take
prize of 1 thousand USD. But after 7 matches and ratio of A:B is 4:3.
How to distribute the money?
Example 1.2
A couple have an online appointment, from 0:00 to 1:00, the first comer
will wait only 20 minutes. Find the probability that they meet each other.
Example 1.3
Consider the Vietnamese gamble “danh de” and its profit X
Find E (X ) and V (X ) when play 1 (mil.VND) in one day;
Compare playing 10 mil in one day, and play 10 days, each day 1 mil.

August 9, 2023 18 / 30
1.7. Revision: Common Discrete Distributions

Distribution Prob. Mass Function E (X ) V (X )

Bernoulli P(X = x) = p x (1 − p)1−x p p(1 − p)


B(1, p) x = 0, 1

Binomial P(X = x) = Cnx p x (1 − p)n−x np np(1 − p)


B(n, p) x = 0, 1, 2, ..., n
λx e −λ
Poisson P(X = x) = λ λ
x!
P(λ) x = 0, 1, 2, ...
1 1−p
Geometric P(X = x) = (1 − p)x−1 p
p p
G (p) x = 1, 2, ...

August 9, 2023 19 / 30
1.7. Revision: Common Continuous Distribution

Distribution Prob. Density Function f (x) > 0 E (X ) V (X )


1 a+b (b − a)2
Uniform f (x) =
b−a 2 12
U(a, b) x ∈ (a, b)
1 1
Exponential f (x) = λe −λx
λ λ2
E (λ) x >0
1 (x−µ)2
Normal f (x) = √ e − 2σ2 µ σ2
σ 2π
N(µ, σ 2 ) x ∈R
1 z2
Standardized f (z) = √ e − 2 0 1

Normal N(0, 1) z ∈R

August 9, 2023 20 / 30
1.7. Revision: Normal Distribution

X −µ
X ∼ N(µ, σ 2 ) ⇒ Z = ∼ N(0, 1)
σ
 
b−µ
P(X < b) = P Z < ; P(Z < b ⋆ ) = P(X < µ + b ⋆ σ)
σ

f
Z ∼ N(0, 1) X −µ
Z=
σ
X ∼ N(µ, σ 2 )
X = µ + Zσ

• •
0 b−µ µ b x
σ
= b⋆ = µ + b⋆ σ

August 9, 2023 21 / 30
1.7. Revision: Critical Value
Definition
Critical value level α of Z distribution, denoted by zα , is a number that

P(Z > zα ) = α

z1−α = zα
z0 = +∞; z1 = −∞; z0.5 = 0
z0.05 = 1.645; z0.025 = 1.96

Z ∼ N(0, 1)

α
• •
z1−α 0 0 zα
= −zα

August 9, 2023 22 / 30
Example

Example 1.4
Income X is normal distributed with mean of 500 USD and variance of
400 USD2 .
(a) Find the probability that X > 510
(b) With probabilty of 0.95, find the upper limit of X
(c) With probabilty of 0.95, find the lower limit of X

Example 1.5
X ∼ N(µ, σ 2 ). With probability of (1 − α)
(a) Find the upper limit of X
(b) Find the lower limit of X
(c) Find an interval around the mean that X falls into

August 9, 2023 23 / 30
1.7. Revision: Chi-squared Distribution
Definition
v
Zi2 is Chi-squared
P
If Zi ∼ N(0, 1) and independent, then X =
i=1
distributed v degree of freedom, denoted by X ∼ χ2 (v ).

Critical value level α, denoted by χ2(v )α (See [1]Table A6 p.796)

P(X > χ2(v )α ) = α

f χ2 (v )

α

0 χ2(v )α

August 9, 2023 24 / 30
1.7. Revision: Student Distribution

Definition
Z
If Z ∼ N(0, 1) and X ∼ χ2 (v ), independent, then T = p is Student
X /v
distributed v degree of freedom, denoted by T ∼ T (v ).

Critical value t(v )α : P(T > t(v )α ) = α (see [1]Table A5 p.795)


v →∞ v →∞
T (v ) −−−→∼ N(0, 1); t(v )α −−−→ zα

f
v = 100

v =2

0 t(100)α

August 9, 2023 25 / 30
1.7. Revision: Fisher Distribution

Definition
X1 /v1
If X1 ∼ χ2 (v1 ) and X2 ∼ χ2 (v2 ) are independent, then F = is
X2 /v1
Fisher distributed v1 , v2 degree of freedom, denoted by F ∼ F (v1 , v2 ).

Critical value level α, denoted by f(v1 ,v2 )α (See [1]Table A8 p.799)

P(F > f(v1 ,v2 )α ) = α


1
f(v1 ,v2 )1−α =
f(v2 ,v1 )α

Reference
For χ2 , T , F distribution, reference the book [1] pp. 315 - 325.

August 9, 2023 26 / 30
Example

Example 1.6
Find the following critical values and their probability meaning
χ2(20)0.05 , χ2(20)0.95
t(20)0.05 , t(20)0.95
t(200)0.025
f(2,10)0.05 , f(10,2)0.05 , f(2,10)0.95

Example 1.7
Error of a measurement is N(0, 1), and repair cost is square of error.
With probability of 0.95, find upper limit of cost of a measurement
Find probability that total cost of 3 independent measurements is
greater than 6.49

August 9, 2023 27 / 30
1.7. Revision: Central Limit Theorem

Theorem (simplified)
If X1 , X2 , ..., Xn are independent, identically distributed with mean of µ,
variance of σ 2 then
Pn
σ2
 
i=1 Xi n→∞
X = −−−→∼ N µ,
n n

In practice: n > 30 is enough to apply CLT.


Example 1.8
Weight of an egg is Uniform distributed in the interval of (50,62)g. Find
the probability that average weight of 100 eggs is lighter than 56.5 g.

August 9, 2023 28 / 30
Practice: Microsoft Excel and R

Distribution Value Excel 2016 R


N(µ, σ 2 ) f (x) norm.dist(x, µ, σ, 0) dnorm(x, µ, σ)
P(X < x) norm.dist(x, µ, σ, 1) pnorm(x, µ, σ)
qβ norm.inv(β, µ, σ) qnorm(β, µ, σ)
xα norm.inv(1 − α, µ, σ) qnorm(1 − α, µ, σ)
N(0, 1) P(Z < z) norm.dist(x, 0, 1, 1) pnorm(z)
qβ norm.inv(β, 0, 1) qnorm(β)
zα norm.inv(1 − α, 0, 1) qnorm(1 − α)
χ2 (v ) P(X < x) chisq.dist(x, v , 1) pchisq(x, v )
χ2(v )α chisq.inv(1 − α, v ) qchisq(1 − α, v )
T (n) P(X < x) t.dist(x, v , 1) pt(x, v )
t(v )α t.inv(1 − α, v ) qt(1 − α, n)
F (v1 , v2 ) P(X < x) f.dist(x, v1 , v2 , 1) pf(x, v1 , v2 )
f(v1 ,v2 )α f.inv(1 − α, v1 , v2 ) qf(1 − α, v1 , v2 )

August 9, 2023 29 / 30
Exercise

Using Microsoft Excel or R to find the following Probability and


correspond critical value
P[X < 125|X ∼ N(100, 202 )]
P[X < 1.25|X ∼ N(0, 1)]
P[X < 1.25|X ∼ T (10)]
P[X < 25|X ∼ χ2 (10)]
P[X < 2.5|X ∼ F (10, 20)]
Using Microsoft Excel or R to find the following Critical value and
correspond Probability value
z0.2
t(10)0.15
χ2(20)0.25
f(20,30)0.12

August 9, 2023 30 / 30

You might also like