Probability, Statistics and Reliability: (Module-4)
Probability, Statistics and Reliability: (Module-4)
Statistics and
Reliability
(module-4)
What is a Sample?
Sampling Inference
sample
A population is a set of similar items or events which is of interest for some
question or experiment. For example, a population of persons, families, farms,
cattle, houses or automobiles in a region or a population of trees or a birds in a
forest etc.
A population is said to be finite population or an infinite population according
to as the number of units in it is finite or infinite.
Only students
enrolled in this
course
even more narrow
The sample space,
denoted S, is the
collection of all
possible outcomes of
a random study.
Types of data:
• discrete
• continuous
• categorical
What percentage of college students are underprivileged?
population
es
Y
a, b c, , . . . , x, y,
z
……… Yes, Yes,
I, j, k, Yes, No
….
p, q, r
How many hours do VIT students spend talking on
the phone?
population
5, 3, 0, 0, 0, 50
m e
f ra Sample Random
l ing
m p Sample
Sa
Pumpkin A B C D E F
Weight 19 14 15 9 10 17
Sampling Distributions: Construction
• A population consists of six pumpkins.
• Rajesh is asked to estimate the average weight of six pumpkins
by taking a random sample of size 2 (without replacement) from
the population.
• Rita is asked to estimate the average weight of six pumpkins by
taking a random sample of size 5 (without replacement) from the
population.
• Demonstrate the sampling distribution?
• Do you think the use of the sample to estimate the mean of
the population would involve some sampling error?
• Why? (since the sample mean is random)
Since we know the weights from the population, we can find the
population mean.
To demonstrate the sampling distribution, let’s start with obtaining all of the
possible samples of size n=2 from the populations, sampling without
replacement.
Sample Weight Probability
A, B 19, 14 16.5
A, C 19, 15 17.0
A, C 19, 15 17.0
A, D 19, 9 14.0
A,
A, D
E 19,
19, 9
10 14.0
14.5
F
A, E 17
19, 10 18.0
14.5
B, C 14, 15 14.5
A, F 19, 17 18.0
B, D 14, 9 11.5
B,
B, C
E 14,
14, 15
10 14.5
12.0
F
B, D 17
14, 9 15.5
11.5
C,
B, D
E 15,
14, 9
10 12.0
12.0
C, E 15, 10 12.5
B, F 14, 17 15.5
C, F 15, 17 16.0
C,
D, D
E 15, 9
9, 10 12.0
9.5
D, E
C, F 9,
15,17
10 13.0
12.5
E, F 10, 17 13.5
C, F 15, 17 16.0
D, E 9, 10 9.5
D, F 9, 17 13.0
E, F 10, 17 13.5
We can combine all of the values and create a table of the possible values and
their respective probabilities.
9.5 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.5 16.0 16.5 17.0 18.0
P( )
Now that we have the sampling distribution of the sample mean, we can calculate
the mean of all the sample means. In other words, we can find the mean (or
expected value) of all the possible ’s.
The mean of the sample means is
Even though each sample may give you an answer involving some error, the
expected value is right at the target: exactly the population mean. In other words, if
one does the experiment over and over again, the overall average of the sample
mean is exactly the population mean
Sample Weight Probability
A, B, C, D, E 19, 14, 15, 9, 10 13.4
A, B, C, D, F 19, 14, 15, 9, 17 14.8
A,
A, B,
B, C,
C, E,
E, F
F 19,
19, 14,
14, 15,
15, 10,
10, 17
17 15.0
15.0
A,
A, B,
B, D,
D, E,
E, F
F 19,
19, 14,
14, 9,
9, 10,
10, 17
17 13.8
13.8
A,
A, C,
C, D,
D, E,
E, F
F 19,
19, 15,
15, 9,
9, 10,
10, 17
17 14.0
14.0
We can combine all of the values and create a table of the possible values and
their respective probabilities.
Given a population of any non-normal functional form with a mean and finite
variance, the sampling distribution of sample mean, computed from samples of
size n from this population, will have mean () and variance (σ²/n) and will be
approximately normally distributed when the sample size is large.
Standard Error
The equations for the standard error are identical to the equations for the
standard deviation, except for one thing - the standard error equations
use statistics where the standard deviation equations use parameters. The
standard error equations use p in place of P, and s in place of σ.
The engines made by Ford for speedboats have an average power of 220
horsepower (HP) and standard deviation of 15 HP. Assume the distribution
of power follows a normal distribution. Consumer reports are testing the
engines and will dispute the company's claim if the sample mean is less than
215 HP.
a. If they take a sample of 4 engines, what is the probability the mean is
less than 215
b. If consumer reports samples 100 engines, what is the probability that the
sample mean will be less than 215?
The weights of baby giraffes are known to have a mean of 125 pounds and
a standard deviation of 15 pounds. If we obtained a random sample of 40
baby giraffes,
a. Does the problem indicate that the distribution of weights is normal?
b. what is the probability that the sample mean will be between 120 and
130 pounds?
Suppose it is known that in a certain large human population cranial length
is approximately normally distributed with a mean of 185.6mm. What is the
probability that a random sample of size 10 from this population will have a
mean greater than 190? The standard error is known to be 4.02 mm.
ANS: probability is .1357
Sampling Distribution of the Sample Proportion
Before we begin, let’s make sure we review the terms and notation
associated with proportions:
p: is the population proportion. It is a fixed value.
n: is the size of the random sample.
p̂ : is the sample proportion. It varies based on the sample.
In a particular family, there are five children. Their names are Alex (A),
Betina (B), Carly (C), Debbie (D), and Edward (E). The table below
shows the child’s name and their favorite color. We are interested in the
proportion of children in the family who prefer the color blue, and from
the table, we can see that p=.40 of the children prefer blue.
Name Alex (A) Betina (B) Carly (C) Debbie (D) Edward (E)
Color Green Blue Yellow Purple Blue
let's say we didn't know the proportion of children who like blue as their
favorite color. We'll use re-sampling methods to estimate the proportion.
Let’s take n=2 repeated samples, taken without replacement. Write all the
possible samples of size n=2, n=4 and their respective probabilities of the
proportion of children who like blue. Demonstrate the population
distribution n=2 and n=4?
Sample p (Blue) Probability
AB 1/2 1/10
AC 0 1/10
AD 0 1/10
AE 1/2 1/10
BC 1/2 1/10
BD 1/2 1/10
BE 1 1/10
CD 0 1/10
CE 1/2 1/10
DE 1/2 1/10
P(Blue) 0 1/2 1
Probability 3/10 6/10 1/10
Sampling Distribution of the Sample Proportion
Standardize : When x1 x 2 0,
0 0.70
z 0.45
1.55 at Ten, Boys or Girls?
Who’s Taller
Does the result above give us reason to doubt the researchers’ stated results?
If the mean height of the boys is greater than the mean height
of the girls, xm x f , That is x f xm 0. Above result shows
that there’s about a 33% chance of getting a difference in
sample means that’s negative just due to sampling variability.
This gives us little reason to doubt the researcher’s claim.
The Sampling Distribution of a Difference Between Two Proportions
What can we say about the difference pˆ1 pˆ 2 in the sample proportions?
The Sampling Distribution of a Difference Between Two Proportions
Example: Who Does More Homework?
Suppose that there are two large high schools, each with more than 2000 students, in a certain town. At School 1,
70% of students did their homework last night. Only 50% of the students at School 2 did their homework last
night. The counselor at School 1 takes an SRS of 100 students and records the proportion that did homework.
ˆ1 pˆ 2 = 0.10.
School 2’s counselor takes an SRS of 200pstudents and records the proportion that did homework. School 1’s
counselor and School 2’s counselor meet to discuss the results of their homework surveys. After the
ˆ1 meeting,
Describe the shape, center, and
they both report to their principals that
spread of the sampling distribution of p pˆ 2 .
Because n1 p1 =100(0.7) = 70, n1 (1 p1 ) 100(0.30) 30, n 2 p2 = 200(0.5) =100
and n 2 (1 p2 ) 200(0.5) 100 are all at least 10, the sampling distribution
of pˆ1 pˆ 2 is approximately Normal.
Its mean is p1 p2 0.70 0.50 0.20.
0.10 0.20
z 1.72
0.058
Example: Who Does More Homework?
The area to the left of z 1.72 under the
standard Normal curve is 0.0427.