0% found this document useful (0 votes)
3 views11 pages

L11 12 Notes

The document discusses probability distributions essential for inferential statistics, focusing on Binomial, Normal, and Standard Normal models. It explains key functions such as probability distribution, cumulative distribution function, and quantile function, along with practical examples for each model. The document also covers applications in various fields like inventory management, human resources, and investment analysis.

Uploaded by

Pragya Madaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views11 pages

L11 12 Notes

The document discusses probability distributions essential for inferential statistics, focusing on Binomial, Normal, and Standard Normal models. It explains key functions such as probability distribution, cumulative distribution function, and quantile function, along with practical examples for each model. The document also covers applications in various fields like inventory management, human resources, and investment analysis.

Uploaded by

Pragya Madaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Probability Distributions

We need models of probability theory to be able to infer from the sample


to the underlying population. The basis of such models are probability
distributions. The goal of inferential (parametric) statistics is to estimate
the (unknown) characteristics of the data generating process or the
population. It is assumed that the population follows some assumed
probability model or distribution. This translates to estimating unknown
parameters of the assumed probability model from the given data.

In this module we will study the probability distribution, cumulative


distribution function and quantile function of the following probability
models:

1) Binomial
2) Normal
3) Standard Normal

Each of these models starts by defining a random variable (call it X) which


can take a set of possible values. For each model, three functions come in
handy to do probability calculations.

Probability distribution (prefix d in R): is used to calculate the


probability of this X taking a particular value or lying in a given range, i.e.,
P(X = a) or P(X lies between a and b). The desired value(s) of X is given,
you have to calculate the probability.

Cumulative distribution function (prefix p in R): is used to calculate


the probability of this X being less than or equal to a particular value, i.e.,
P(X <= a). The desired value(s) of X is given, you have to calculate the
cumulative probability. By using the lower.tail = FALSE in the p function in
R, the probability P(X>a) is computed, instead of P(X<=a).

Quantile function (prefix q in R): gives answer to the question that


given a particular probability value p, what is the value a of the random
variable X such that P(X<=a) = p. It is the converse of the distribution
function - The desired value of the cumulative probability is given, you
have to calculate the corresponding quantile value a of X. It satisfies
P(X<=a) = p. Here a is called the pth quantile of X.

We will now learn to use these functions for each probability model.

1) Binomial Probability Model.

Binomial model is appropriate to model the number of “successes” in n


identical and independent trials where the outcome of each trial is binary
(e.g., success/failure; yes/no; 1/0). The probability of “success” p is
constant. Define the random variable X as “the number of successes in n

Page 1 of 11
trials”. Then the probability distribution of X can be modelled as the
Binomial model with two parameters n and p. This is written as X ~
Binomial(n,p). R abbreviation for the Binomial distribution is binom.
The parameters n and p are called size and prob in R. The three R
functions for Binomial model are dbinom, pbinom and qbinom. We will
use dbinom, pbinom and qbinom as needed in the questions below.

Probability function of X ~ Binomial(n,p) in R is dbinom (x,size =


n,prob = p)

Ex 1: We consider the production of bulbs where 1% of the bulbs are


defective. The quality inspector wants to check the quality of the last
batch (i.e., population). He randomly draws with replacement a sample of
size n=20 bulbs from the batch.

Ex 1.1: Provide appropriate probability model for the random


variable X = the number of defective bulbs in the chosen lot of 20
bulbs.

Ans: X ~ Binomial(20,0.01)

Ex 1.2: How likely is it that the inspector draws exactly one


defective bulb? P(X=1) =?

dbinom(1,size=20,prob=0.01)

[1] 0.1652337

Ex 1.3: How many defective bulbs the inspector can at most


expect with a probability of 99%?

qbinom(0.99,size=20,prob=0.01)

[1] 2

Ex 1.4: Find the probability of 0,1,2,3 defective bulbs.

dbinom(0:3,size=20,prob=0.01)

[1] 0.8179069 0.1652337 0.0158557 0.0009609

Ex 2: In 2023, the prevalence of diabetes among adults amounted to


about 9% where prevalence is the proportion of a population that has a
disease. You conduct a trial and randomly draw with replacement a
sample of 50 persons. The number of persons having diabetes in your
sample is denoted by the random variable X. How likely is it that your
sample contains

2.1: No person with diabetes

dbinom(0,size=50,prob=0.09)

Page 2 of 11
[1] 0.008955083

2.2: Exactly one person with diabetes

dbinom(1,size=50,prob=0.09)

[1] 0.04428338

2.3: At least two persons with diabetes: P(X>=2) = 1 – P(X<=1).


Either calculate P(X>=2) by hand or by using the cumulative
distribution function pbinom.

Ans: Method 1: Calculate by hand

P(X>=2) = 1-P(X=0)-P(X=1) = 1-0.0089-0.0443 =0.947

Method 2: Use the cumulative distribution function pbinom

P(X>=2)

=1 – P(X<=1)

1- pbinom(1,size=50,prob=0.09)

[1] 0.9467615

2) Normal Probability Model: Its Properties


a) The Normal (also called Gaussian) Probability Model plays the central
role in classical statistics. A random variable X which takes on any real
values that are symmetrically distributed about the central value can
be aptly modelled by the Normal probability model. Such a model has
only two parameters: mean (µ) which can be any real number and
standard deviation (σ) that can be any positive real number. This is
written as X ~ Norm(µ, σ2).

Page 3 of 11
d) As the names of the parameters already indicate, the mean and
variance of

X ~ Norm(µ, σ2) are E(X) = µ and Var(X) = σ2

R abbreviation for the Normal distribution is norm. The names of the


parameters are mean and sd. The three R functions for the Normal
model are dnorm (probability density), pnorm (cumulative distribution)
and qnorm (quantile function). We will use these as needed in the
questions below.

Ex 3: The body height of adults in a country can be well described by


normal distributions. In case of the women in Germany, we get a mean
height of about 167 cm with a standard deviation of about 6 cm. In case of
the men in Germany, the mean is 180 cm and the standard deviation is
about 6.5 cm

3.1: Find the probability that a German woman’s height is


between 161 and 173 cm?

a) Calculate using the pnorm() function in R.

pnorm(161, mean = 167, sd = 6)

[1] 0.1586553

pnorm(173, mean = 167, sd = 6)

[1] 0.8413447

0.8413447 - 0.1586553

[1] 0.6826894

Ans: Probability (a German woman’s height is between 161 and 173 cm)
is 0.683

b) Calculate by using properties of the Normal probability model.

For the distribution of heights of German women, it is given that

mean = 167 cm; sd = 6 cm

mean – sd = 167-6 = 161

mean + sd = 167+6 = 173

The interval (161, 173) is one standard deviation from the mean.

Ans: Hence, using property b) of Normal Distribution, the


required probability is 68.3% or 0.683

3.2: What is the proportion of women having height upto 175 cm?

Page 4 of 11
pnorm(175, mean = 167, sd = 6)

[1] 0.9087888

3.3: What is the proportion of women taller than 175 cm?

Method 1:

1-pnorm(175, mean = 167, sd = 6)

[1] 0.09121122

Method 2:

pnorm(175, mean = 167, sd = 6, lower.tail = FALSE)

[1] 0.09121122

About 9.12% women are taller than 175 cm.

3.4: The tallest 5% of men are taller than?

qnorm(0.95, mean = 180, sd = 6.5)

[1] 190.6915

The tallest 5% of men are taller than 190.6915 cm

Ex 4. Inventory Management

A retailer uses probability to decide his inventory level for a popular


product to minimize the risk of stockouts or excess inventory. The demand
for the product follows a Normal Distribution with a mean of 120 units with
a standard deviation of 20 units.

4.1 Calculate the probability of demand being between 100 and


150 units.

pnorm(150, mean = 120, sd = 20) - pnorm(100, mean=120, sd=20)

[1] 0.7745375

4.2 Calculate the probability of demand being less than or equal


to 120 units.

pnorm(120, mean = 120, sd = 20)

[1] 0.5

4.3 The retailer wants to set his inventory level so that he is 95%
confident of not having a stockout situation. What should be his
inventory level? <Hint: Find the demand level that has a 95%
probability of not being exceeded (i.e., there is a 95% probability that the
demand will not be more than that level).>

Page 5 of 11
qnorm(0.95, mean = 120, sd = 20)

[1] 152.8971

Ans: The retailer can set his inventory level at 153.

Ex 5. Human Resource Management: Determining Hiring


Standards

Problem: A company wants to set a minimum score on a job aptitude test


to hire only candidates in the top 10%. The test scores are normally
distributed with a mean of 80 and a standard deviation of 15. What should
the minimum score be?

qnorm(0.90, mean = 80, sd = 15)

[1] 99.22327

Alternatively,

# Define parameters

mean_score <- 80

sd_score <- 15

percentile <- 0.90

# Calculate the minimum score

min_score <- qnorm(percentile, mean_score, sd_score)

# Print the minimum score

print(min_score)

[1] 99.22327

3) Standard Normal Probability Model

The Standard Normal Probability Model, also called the Standard Normal
Distribution (SND), is the Normal Probability Model with mean = 0
and sd = 1 (default values). The SND variable is usually denoted by Z.
We write Z ~ Norm(µ=0, σ2=1)

For X ~ Norm(µ, σ2)

 The formula to convert from X to Z is Z = (X-μ)/σ


 Use the formula X= μ + σZ to convert from Z to X
 To calculate the cumulative probability of SND variable Z being less
than or equal to z use the function pnorm(z)

Page 6 of 11
 To calculate the cumulative probability of SND variable being greater
than or equal to z use 1-pnorm(z). You can also use pnorm(z, lower.tail
= FALSE).
 To find pth percentile of the SND use qnorm(p)

Ex 6: Quality Control in Manufacturing

A manufacturing company produces light bulbs. The company wants to


ensure that 95% of the bulbs produced meet (or exceed) the minimum
brightness standard. The average brightness of the bulbs is 100 lumens,
with a standard deviation of 10 lumens. What should they set the
minimum brightness standard as?

Data:

We want the brightness standard x such that 95% of the bulbs produced
meet (or exceed) that. In other words, 95% values are above x and 5%
values are below x. So x is 5th percentile value.

 Mean (μ) = 100 lumens

 Standard Deviation (σ) = 10 lumens

 Desired percentile (p) = 0.05

Solution:

Method 1 using ND

# Using qnorm() to directly find the x-score corresponding to p = 0.05


(i.e., 100pth percentile of X)

qnorm(0.05,mean=100,sd=10)

[1] 83.55146

Method 2 using SND

# Using qnorm() to find the pth percentile of Z corresponding to p = 0.05

z_score <- qnorm(0.05)

z_score

[1] -1.644854

# Calculating the minimum brightness level

μ <- 100

σ <- 10

x <- μ + σ* z_score

Page 7 of 11
print(x)

[1] 83.55146

Interpretation: The manufacturing company should set the minimum


brightness standard at approximately 83.5 lumens. This would ensure that
95% of the bulbs produced meet (or exceed) the brightness standard.

Ex 7: Investment Analysis

An investment advisor is evaluating a new stock. The expected return on


the stock is 12%, with a standard deviation of 4%. The advisor wants to
calculate the probability that the stock's return will be less than 8%.

Data:

 Mean (μ) = 12%

 Standard Deviation (σ) = 4%

 Desired probability (P) = P(X < 8%)

Solution:

Method 1 using ND

pnorm(0.08, mean=0.12, sd=0.04)

[1] 0.1586553

Method 2 using SND

# Standardizing the value 8%

μ<- 0.12

σ<- 0.04

z_score <- (0.08 - μ) / σ

# Using pnorm() to calculate the probability

probability <- pnorm(z_score)

print(probability)

[1] 0.1586553

Interpretation: The probability that the stock's return will be less than
8% is approximately 0.1586553 or 15.86%.

Ex 8: Customer Satisfaction

A customer satisfaction survey asks respondents to rate their satisfaction


on a scale of 1 to 10. The average rating is 8, with a standard deviation of

Page 8 of 11
1.5. The company wants to know the proportion of customers who rate
their satisfaction above 9.

Data:

 Mean (μ) = 8

 Standard Deviation (σ) = 1.5

 Desired probability (P) = P(X > 9)

Solution:

Method 1 using ND

pnorm(9, mean=8, sd=1.5, lower.tail=FALSE)

[1] 0.2524925

Method 2 using SND

# Standardizing the value 9

mean<-8

sd<-1.5

z_score <- (9 - mean) / sd

# Using pnorm() to calculate the probability

probability <- 1-pnorm(z_score)

print(probability)

[1] 0.2524925

Ex 9: Inventory Management

The daily demand for a product is normally distributed with a mean of 50


units and a standard deviation of 10 units. The retailer wants to determine
the inventory level for the product so as to ensure that there is a 98%
probability of having sufficient inventory to meet daily demand. What
inventory level should he decide?

Data:

 Mean = 50 units

 Standard Deviation (sd) = 10 units

 Desired probability (P) = 0.98

Solution:

qnorm(0.98,mean=50,sd=10)
Page 9 of 11
[1] 70.53749

OR

# Using qnorm() to find the z-score corresponding to P = 0.98

z_score <- qnorm(0.98)

# Calculating the optimal inventory level

inventory_level <- 50 + 10*z_score

print(inventory_level)

[1] 70.53749

Ex 10: Employee Performance

A company evaluates employee performance on a scale of 1 to 100. The


average performance score is 80, with a standard deviation of 15. The
company wants to identify the top 10% of employees based on their
performance scores. What cutoff score should be set to identify them?

Data:

 Mean (μ) = 80

 Standard Deviation (σ) = 15

 Desired probability (P) = 0.10

Solution:

qnorm(0.90,mean=80,sd=15)

[1] 99.22327

OR

# Using qnorm() to find the z-score corresponding to top 10%

z_score <- qnorm(0.90)

# Calculating the minimum score for the top 10%

min_score <- 80 + 15*z_score

print(min_score)

[1] 99.22327

Interpretation: Employees with a performance score of 99.22327 or


higher would be considered in the top 10% of performers.

Page 10 of 11
Page 11 of 11

You might also like