STATISTICS 101 - Day 3 - Discrete Probability Distribution + Normal - T3
STATISTICS 101 - Day 3 - Discrete Probability Distribution + Normal - T3
PROBABILITY
DISTRIBUTIONS
MODULE 2
WEEK TOPIC DATE
WEEK TOPIC DATE
1 Start of Classes Monday, 8 May 2023 VII. HYPOTHESIS TESTING
I. REVIEW OF PROBABILITY Thursday, 11 May 2023 Elements of a Statistical Test of Hypothesis
II. PROBABILITY DISTRIBUTIONS 7 One-Tailed and Two-Tailed Tests and P-Value
Concept of a Random Variable Steps in Testing Hypotheses
2 Concept of a Probability Distribution Tests Concerning One Mean Monday, 26 June 2023
Types of Probability Distributions Monday, 15 May 2023 Tests Concerning the Difference Between Two Means
Mean of a Discrete Random Variable Tests Concerning a Proportion
Variance of a Discrete Random Variable Tests Concerning the Difference Between Two
Properties of the Mean and Variance Proportions Thursday, 29 June 2023
Thursday, 18 May 2023 Tests Concerning the Variance
III. DISCRETE PROBABILITY DISTRIBUTIONS Tests Concerning the Ratio of Two Variances
Uniform Distribution 8 Monday, 3 July 2023
3 Bernoulli and Binomial Distributions Exercises / Review Thursday, 6 July 2023
Hypergeometric Distribution Monday, 22 May 2023 9 QUIZ 2 Monday, 10 July 2023
Negative Binomial and Geometric Distributions VIII. CHI-SQUARE TESTS
Poisson Distribution Test of Homogeneity of More Than Two Proportions
Test for Independence
IV. CONTINUOUS PROBABILITY DISTRIBUTIONS Goodness-of-Fit Test
Uniform Distribution Thursday, 25 May 2023 Post Hoc Analysis: Tukey-Kramer Test Thursday, 13 July 2023
Normal Distribution IX. ANALYSIS OF VARIANCE
4 Areas under the Normal Curve Assumptions of the Analysis of Variance
Monday, 29 May 2023 Test on the Equality of Several Variances
QUIZ 1 Thursday, 1 June 2023 One-Way Analysis of Variance
10 Monday, 17 July 2023
V. SAMPLING AND SAMPLING DISTRIBUTIONS Two-Way Analysis of Variance**
Sampling Distribution of the Sample Mean
5 Mean and Variance of the Sampling Distribution X. REGRESSION AND CORRELATION
Central Limit Theorem Correlation Analysis
Monday, 5 June 2023 Pearson’s Correlation Coefficient
VI. ESTIMATION OF PARAMETERS Test of Significance of Thursday, 20 July 2023
Types of Estimates Regression Analysis
Estimating the Mean Thursday, 8 June 2023 Simple Linear Regression Model
INDEPENDENCE DAY Monday, 12 June 2023 Coefficient of Determination
Estimating the Difference Between Two Means** Test of Significance of β1
11 Monday, 24 July 2023
6 Estimating a Proportion
Estimating the Difference Between Two Proportions** Monday, 19 June 2023 Quiz 3 Thursday, 27 July 2023
Estimating the Variance
Estimating the Ratio of Two Variances**
Sample Size Determination Thursday, 22 June 2023
MODULE 2
DISCRETE PROBABILITY
DISTRIBUTION
SPECIAL PROBABILITY DISTRIBUTIONS
Discrete and
Continuous
Distributions
Discrete Continuous
Distributions Distribution
1 1
P 𝑋 = 8 = 28 x=1,2, 28
2
= 0 ow
𝑋~𝐵𝑒(𝑝)
BERNOULLI TRIAL
• Suppose a newborn child is chosen at random. Let X = 0 if the chosen child is female, and X = 1 if male.
Outcome M F
X 1 0
Outcome Winning Lossing
P(X) 1/2 1/2
X 1 0
P(X) 1/30 29/30
Bernoulli Distribution
A random variable X is said to be a Bernoulli
random variable if its probability mass function is
given by
p x (1 − p)1− x , x = 0,1
f ( x) = f ( x; p ) = P[ X = x] =
0 , otherwise
X 0 1
P(X) 1-p p
𝑋~𝐵𝑒(𝑝)
Bernoulli Distribution
• If X has a Bernoulli distribution, then
EX = p 𝐸 𝑋 =0∗𝑞+1∗𝑝= 𝑝
V X = pq
𝑉 𝑋 = 𝐸(𝑋 − 𝑝)2 = (0 − 𝑝)2 ∗ 𝑞 + 1 − 𝑝 2 ∗ 𝑝
m X (t ) = q + pet = 𝑝2 ∗ 𝑞 + 𝑞 2 ∗ 𝑝 = 𝑝𝑞 ∗ 𝑝 + 𝑞 = 𝑝𝑞
𝑋~𝐵𝑖(𝑛, 𝑝)
BINOMIAL EXPERIMENT
Consider the experiment consisting of n independent and identical Bernoulli trials,
and observe the total number of successes.
This experiment is called a binomial experiment.
Example: tossing a coin 4X. How many times will a head appear? N(S)= 2*2*2*2 = 16
x Outcome P(x)
0 TTTT 1/16
1 HTTT, THTT, TTHT, TTTH 4/16
2 HHTT, HTHT, HTTH, THHT, THTH, TTHH 6/16
3 HHHT, HHTH, HTHH. THHH 4/16
4 HHHH 1/16
𝑋~𝐵𝑖(𝑛, 𝑝)
BINOMIAL DISTRIBUTION
Model for outcomes limited to two choices (e.g. pass or fail, win or
lose, sick or well, dead or alive)
1. Each trial has only two possible outcomes – success or failure.
2. The outcome of each trial is independent of the outcomes of
any other trial.
3. Each trial is a Bernoulli trial
4. The probability of success, p, is constant from trial to trial
(probability of failure: q = 1 – p is constant)
𝑋~𝐵𝑖(𝑛, 𝑝)
BINOMIAL DISTRIBUTION
The probability of an event consisting of x successes out of n trials is
𝑛 𝑥 𝑛−𝑥 𝑛!
𝑃 𝑋=𝑥 = 𝑝 𝑞 = 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥 𝑥! 𝑛−𝑥 !
x P(x) P(x)
where 0 4−0
4 1 1
0 1/16
n = number of trials in an experiment 0 2 2
1 4−1
x = number of successes, x = 0, 1, 2, …, n 1 4 1 1
4/16
1 2 2
n-x = number of failures
2 4−2
4 1 1
p = probability of success 2 6/16
2 2 2
q = 1-p, probability of failure 4 1
3
1
4−3
3 4/16
3 2 2
x Outcome P(x)
4 4−4
0 TTTT 1/16 4 1 1
4 1/16
1 HTTT, THTT, TTHT, TTTH 4/16 4 2 2
2 HHTT, HTHT, HTTH, THHT, THTH, TTHH 6/16
3 HHHT, HHTH, HTHH. THHH 4/16
4 HHHH 1/16
~
𝑋~𝐵𝑖(𝑛, 𝑝)
Binomial Distribution
• Mean of a binomial random variable:
μ = np
Example
The shooting average of Lebron James is 75%. Suppose that he attempts to shoot the ball
12 times.
1. What is the probability that he will make exactly 10 shots?
2. What is the probability that he will make at least 11 shots?
3. What is the probability that he will make more than two shots?
4. What is the expected number of shots and its variance?
12 For x=0,1,2,…,12
𝑃 𝑋 = .75𝑥 .2512−𝑥
𝑥
12
1
𝑃 𝑋 = 10 = .7510 .2512−10
10
12 12
2 𝑃 𝑋 ≥ 11 = .7511 .2512−11 + .7512 .2512−12
11 12
12 12 12
3 𝑃 𝑋 >2 =1−𝑃 𝑋 ≤2 =1−[ .750 .2512−0 + .751 .2512−1 + .752 .2512−2
0 1 2
4 𝐸 𝑋 = 𝑛𝑝 = 12 ∗ .75 = 9 V 𝑋 = 𝑛𝑝𝑞 = 12 ∗ .75 .25 = 9 ∗ .25 = 2.25
Example
𝑋~𝐵𝑖(𝑛, 𝑝)
Example
𝑋~𝐵𝑖(𝑛, 𝑝)
EXERCISES:
BINOMIAL DISTRIBUTION TABLE OTHER USEFUL VIDEOS…
https://youtu.be/e04_wUoscBU
https://youtu.be/CcbhNYtNtn4
https://youtu.be/J8jNoF-K8E8
𝑋~𝐻𝑦(𝑁, 𝑛, 𝐾)
HYPERGEOMETRIC DISTRIBUTION
PROPERTIES of a HYPERGEOMETRIC DISTRIBUTION
• A random sample of size n is selected without replacement from N items.
• k of the N items may be classified as successes and N – k are classified as failures.
• The number X of successes of a hypergeometric experiment is called a
hypergeometric random variable.
K=8
N=12
n=6
HYPERGEOMETRIC DISTRIBUTION
• The probability distribution of the hypergeometric random
variable X, the number of successes in a random sample size
n selected from N items of which k are labeled success and
N – k labeled failure, is
k N − k
x n − x
h( x; N, n, k ) = , x = 0,1,2,..., k
N
n
𝑋~𝐻𝑦(𝑁, 𝑛, 𝐾)
HYPERGEOMETRIC DISTRIBUTION
PROPERTIES of a HYPERGEOMETRIC DISTRIBUTION
• A random sample of size n is selected without replacement from N items.
• k of the N items may be classified as successes and N – k are classified as failures.
• The number X of successes of a hypergeometric experiment is called a hypergeometric
random variable.
K=8
N=12
8 4
n=6
ℎ 𝑥; 𝑁, 𝑛, 𝐾 = ℎ 𝑥; 12,6,8 = 𝑥 6−𝑥
12
X=no of green balls 6
8 4 8! 4! 8∗7∗6 4
∗
3 3 3!5! 3!1! 3∗2 1 224
𝑃 𝑋 = 3 = ℎ 3; 12,6,8 = 12 = 12! = 12∗11∗10∗9∗8∗7 = = 0.1696969
1320
6 6!6! 6∗5∗4∗3∗2∗1
𝑋~𝐻𝑦(𝑁, 𝑛, 𝐾)
Examples of Hypergeometric Variable
• Suppose there are 20 members of a committee consisting of 4 officers
and 16 members. A random sample of 5 will be chosen. Let X be the
number of officers who will be chosen.
• N=20, n=5, K=4
4 16
𝑥 5−𝑥
• ℎ 𝑥; 20, 5, 4 = 20
5
4 16
ℎ 𝑥; 20, 5, 4 = 𝑥 5−𝑥 4 4
V 𝑋 = 5 ∗ 20 ∗
20−4 20−5
∗ 20−1
1 16 15
= 5 ∗ 5 ∗ 20 ∗ 19
20 𝐸 𝑋 =5 =1 20
5 20
1 9
ℎ 𝑥; 10, 4, 1 = 𝑥 4−𝑥
10
4
𝑋~𝐻𝑦(𝑁, 𝑛, 𝐾)
Exercise
A piece of electronic equipment contains six computer chips, two of which are
defective. Three chips are selected at random, remove from the piece of
equipment and inspected. N=6, n=3, K=2
a. What is the probability that there is no defective computer chip found?
b. What is the probability that at most one defective computer chip is found?
c. What is the expected number of defective computer chips that can be selected?
2 4 2 4
0 3−0
ℎ 𝑥; 6, 3, 2 = 𝑥 3−𝑥 a. 𝑃 𝑋 = 0 = ℎ 0; 6, 3, 2 = 6
6 3
3
2 4 2 4
0 3−0 1 3−1
b. 𝑃 𝑋 ≤ 1 = ℎ 0; 6, 3, 2 + ℎ 1; 6, 3, 2 = 6 + 6
3 3
2 2 6−2 6−3
c. 𝐸 𝑋 = 3 6 =4 V 𝑋 = 36 ∗ ∗ 6−1
6
𝑋~𝐻𝑦(𝑁, 𝑛, 𝐾)
Example
A state grand lottery is conducted in which six winning numbers are selected from a total of
55 numbers. What is the probability that if six numbers are/is randomly selected:
N=55 n=6 K=6 𝑋~𝐻𝑦(55,6,6)
6 49
𝑥 6−𝑥
a. What is the pmf? h(x) = 55
6
6 49
6 6−6
b. All six numbers will be winning numbers? h(x = 6) = 55
6
POISSON DISTRIBUTION
Poisson Process-
situations yielding numerical values of a random variable X, the number of
outcomes occurring during a given time interval or in a specified region
Examples:
- Counting the number of errors in a page
- Counting the number of storms in a year
- Counting the number of days classes are suspended due to typhoons and
floods in a month
- Counting the number of persons served in a day in the baggage counter of a
supermarket
- Counting the number of cracks for every kilometer highway by an engineer
- Counting the number of calls received in a day by a certain call center agent.
PROPERTIES of a Poisson Process
𝑋~𝑃𝑜(λ)
Poisson Occurrences
• Consider a random variable that gives the number of occurrences over a specified
time or space
• . X is a random variable with Poisson distribution
~
is the average number of occurrences in the specified time or space.
𝑋~𝑃𝑜(λ)
POISSON DISTRIBUTION
• The probability distribution of the Poisson random variable X, representing the
number of outcomes occurring in a given time interval or specified region, is
𝑒 −λ λ𝑥
𝑝 𝑥; λ = Where x=0,1,2,3… and 𝑒 ≅ 2.71828
𝑥!
Note: The only distribution where mean and Variance are the same
𝑋~𝑃𝑜(λ)
Poisson Distribution
• It is closely related to the binomial process in the
sense that the number of trials is large and =np.
That is
n− x
n x n(n − 1) (n − x + 1)
x
n− x
lim p (1 − p) = lim 1 −
n → x n → x ! n n
−x
n(n − 1) (n − x + 1)
x n
= lim 1 − 1 −
n → x! n n x
n
−x
x n(n − 1) (n − x + 1)
n
= lim 1 − 1 −
x ! n→ n n x
n
exp( − )
x
= e−
x!
𝑋~𝑃𝑜(λ)
Example
• On average a certain intersection results in 3 traffic accidents per month. What
is the probability that for any given month at this intersection
𝑒 −3 35
a. exactly 5 accidents will occur? 𝑃 𝑋=5 =
5!
6 𝑒 −3 3𝑥
b. From 4 to 6 accidents will occur? 𝑃 4≤𝑋≤6 = σ𝑥=4
𝑥!
c. at least 2 accidents will occur 𝑃 𝑋 ≥2 =1−𝑃 𝑋 <2 =𝑃 𝑋 =0 +
𝑃 𝑋=1
𝑒 −18 18𝑥
5 or 6 accidents in half of a year?
d.𝜆 = 3 𝜆=6*3=18 𝑃 𝑌 = 𝑦, 18 =
𝑥!
e. Expected number of accidents in half a year?
𝑒 −3 3𝑥
𝑃 𝑋=𝑥 =
𝑥!
POISSON TABLE
https://youtu.be/cPOChr_kuQs
𝑋~𝑃𝑜(λ)
Example
𝑋~𝐺𝑒(𝑝)
Geometric Distribution
• A process consists of independent trials
where each trial results to either success or
failure (dichotomous outcomes) and that the
probability of success, p, does not change
from trial to trial.
• The random variable of interest, X, is the
number of trials it would take before the first
success is observed. [In here, we are no
longer interested in the number of
successes. Such a random variable is called a
geometric random variable.]
Example
• Incidence of Positive Covid is 10% in the area
• What is the probability that the 9th person you meet is +
𝑋~𝑁𝐵(𝑝, 𝑘)
𝑋~𝐺𝑒(𝑝)
Example
𝑋~𝐺𝑒(𝑝)
EXAMPLE
𝑋~𝐺𝑒(𝑝)
PDF of X Description of RV PDF of X E(X) (X)Var
Bernoulli 𝑃 𝑋 = 𝑥 = 𝑝𝑥 𝑞 1−𝑥
x=0,1
Binomial
Poisson 𝑒 −𝜇 𝜇 𝑥 μ μ
𝑃 𝑋=𝑥 =
𝑥!
Hypergeometric
Geometric
Negative Binomial
Uniform
Normal
Homework #3 For the ff questions below, Identify the random variable and it probability distribution. Set up the
Probability Mass Function the solve for the probabilities
5.3 An employee is selected from a staff of 10 to supervise a 5.54 According to a study published by a group of University of Massachusetts
certain project by selecting a tag at random from a box sociologists, about twothirds of the 20 million persons in this country who take
containing 10 tags numbered from 1 to 10. Find the formula Valium are women. Assuming this figure to be a valid estimate, find the
for the probability distribution of X representing the number
on the tag that is drawn. What is the probability that the probability that on a given day the fifth prescription written by a doctor for
number drawn is less than 4? Valium is
(a) the first prescribing Valium for a woman;
5.7 One prominent physician claims that 70% of those with (b) the third prescribing Valium for a woman
lung cancer are chain smokers. If his assertion is correct,
(a) find the probability that of 10 such patients recently
admitted to a hospital, fewer than half are chain smokers; 5.56 On average, 3 traffic accidents per month occur at a certain intersection.
(b) find the probability that of 20 such patients recently What is the probability that in any given month at this intersection
admitted to a hospital, fewer than half are chain smokers
(a) exactly 5 accidents will occur?
5.10 A nationwide survey of college seniors by the University (b) fewer than 3 accidents will occur?
of Michigan revealed that almost 70% disapprove of daily pot (c) at least 2 accidents will occur?
smoking, according to a report in Parade. If 12 seniors are
selected at random and asked their opinion, find the
probability that the number who disapprove of smoking pot 5.74 It is known that 3% of people whose luggage is screened at an airport have
daily is questionable objects in their luggage. What is the probability that a string of 15
(a) anywhere from 7 to 9; people pass through screening successfully before an individual is caught with a
(b) at most 5;
(c) not less than 8. questionable object? What is the expected number of people to pass through
before an individual is stopped?
5.30 To avoid detection at customs, a traveler places 6 narcotic
tablets in a bottle containing 9 vitamin tablets that are similar
in appearance. If the customs official selects 3 of the tablets at
random for analysis, what is the probability that the traveler
will be arrested for illegal possession of narcotics?
5.33 If 7 cards are dealt from an ordinary deck of 52 playing
cards, what is the probability that
(a) exactly 2 of them will be face cards?
(b) at least 1 of them will be a queen?
MODULE 2
CONTINUOUS DISTRIBUTIONS
UNIFORM / NORMAL
PROPERTIES OF PDF
X~U(a,b)
CONTINUOUS UNIFORM
DISTRIBUTION
f(x)
a b
X~U(a,b)
Example
X~U(a,b)
Example
X~N(μ,σ)
• continuous distribution
• a random variable X is said to have a normal distribution if its density function is
given by
−1 x − 2
1
2 − x
f(x)= e
2
X~N(μ,σ)
𝐏(𝑿 ≤ 𝒂)
Area (contained) Under the Curve
Upper Tail Probability
𝐏(𝑿 > 𝒃)
Area (contained) Under the Curve
In-Between Probability
𝒂 𝒃
𝐏(𝒂 < 𝑿 ≤ 𝒃)
Area (contained) Under the Curve
Two-Tailed Probability
𝒂 𝒃
𝐏(𝑿 ≤ 𝒂 𝐨𝐫 𝑿 > 𝒃)
Area (contained) Under the Curve
➢ EMPIRICAL RULE:
The Normal Distribution
❖ There is not just one normal curve but rather, a family of normal
distribution curves.
Z-score
x−
z =
• standard score
• Z value
Standard Normal Distribution
• distribution of the normal random variable with mean 0 and variance 1
−1 2
1 z
f ( z) = e 2
2 − z
Standard Normal Curve
The Standard Normal Distribution
𝒁~𝐍𝐨𝐫𝐦𝐚𝐥(𝛍 = 𝟎, 𝛔𝟐 = 𝟏)
❑ A specific value of z gives the distance between the mean and the point
represented by z in terms of the standard deviation.
z-scores
Standardization or Z-Transformation
❑ Any normally distributed random variable X with mean 𝜇𝑋 and variance
𝜎𝑋2 (i.e., 𝑿~𝑵𝒐𝒓𝒎𝒂𝒍(𝝁𝑿 , 𝝈𝟐𝑿 )) can be transformed to the standard
normal random variable Z by the formula:
𝑿 − 𝝁𝑿
𝒁=
𝝈𝑿
Standardizati
on or Z-
Transformatio
n
Purpose:
➢ to determine areas
under the curve (i.e.,
compute probabilities)
𝐏 𝒁 ≤ 𝟐. 𝟒𝟏 = 𝟎. 𝟗𝟗𝟐𝟎
Computing Probabilities
▪ P Z ≤ a = Φ(a)
▪ P Z > a = 1 − Φ(a)
▪ P a ≤ Z < b = Φ b − Φ(a)
▪ P Z ≤ a or Z > b = Φ a + [1 − Φ b ]
➢𝚽(𝒂) gives the cumulative area under the standard normal curve at the
value a.
iv. Find the percentage of aptitude scores either below 95 or above 110.
v. What is the score that divides the distribution such that 99% of the area is below it?
vi. What are the scores that bound the middle 95% of the distribution?
References: