0% found this document useful (0 votes)
109 views47 pages

Probability, Statistics and Reliability: (Module-4)

1) The document discusses populations, samples, parameters, and statistics. A population is a set of items of interest, while a sample is a subset of a population. Parameters describe characteristics of the entire population, while statistics describe characteristics of a sample. 2) It explains how to construct a sampling distribution by taking all possible samples of a given size from a population and calculating the statistic of interest (e.g. mean) for each sample. 3) Taking samples to estimate population characteristics inherently involves sampling error, as the sample statistic is random and may differ from the true population parameter.

Uploaded by

Anshika Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views47 pages

Probability, Statistics and Reliability: (Module-4)

1) The document discusses populations, samples, parameters, and statistics. A population is a set of items of interest, while a sample is a subset of a population. Parameters describe characteristics of the entire population, while statistics describe characteristics of a sample. 2) It explains how to construct a sampling distribution by taking all possible samples of a given size from a population and calculating the statistic of interest (e.g. mean) for each sample. 3) Taking samples to estimate population characteristics inherently involves sampling error, as the sample statistic is random and may differ from the true population parameter.

Uploaded by

Anshika Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Probability,

Statistics and
Reliability
(module-4)

Sayed Mohammed Zeeshan


School of Applied Science and languages
VIT Bhopal University
What is a Population?

What is a Sample?

When complete enumeration is not possible?


Population

Sampling Inference

sample
A population is a set of similar items or events which is of interest for some
question or experiment. For example, a population of persons, families, farms,
cattle, houses or automobiles in a region or a population of trees or a birds in a
forest etc.
A population is said to be finite population or an infinite population according
to as the number of units in it is finite or infinite.

A finite subset of items in a population is called a sample and the number


of individuals in the sample is called the sample size.
• For the purpose of determining population characteristics, instead
of enumerating the entire population, the items in the sample only
are observed.
• Then the sample characteristics are utilized to approximately
determine or estimate the population.
• For example, on examining. the sample of a particular stuff we
arrive at a decision of purchasing or rejecting that stuff.
• The error involved in such approximation is known sampling error
and is inherent and unavoidable in any and every sampling
scheme.
What do you think about these research questions?

What percentage of college students are


underprivileged?

How many hours do VIT students spend talking


on the phone?

What is the specific number of online shopping


done by female students of this class?
What do you think about this research questions?

What percentage of college


students are underprivileged?

The target population is


all college students of
India
What do you think about this
theseresearch
researchquestions?
questions?

How many hours do VIT students spend


talking on the phone?

The target population


is VIT’s students
little narrower
What do you think about this
theseresearch
researchquestions?
questions?

What is the specific number of online shopping


done by female students of this class?

Only students
enrolled in this
course
even more narrow
The sample space,
denoted S, is the
collection of all
possible outcomes of
a random study.

Types of data:
• discrete
• continuous
• categorical
What percentage of college students are underprivileged?
population

The target population is all college students of


India

Questionnaire: Are you underprivileged?

S = {yes, no}


sample

Yes, Yes, Yes, No


Ye
s
Ye
s
No

es
Y

a, b c, , . . . , x, y,
z
……… Yes, Yes,
I, j, k, Yes, No
….
p, q, r
How many hours do VIT students spend talking on
the phone?
population

The target population is VIT’s students

Questionnaire: record the number of hours they talk.

S = {h: h ≥ 0 hours}


sample
1.5hrs, 3 hrs,
6.32 hrs
What is the specific number of online shopping done by
female students of this class?
population

The target population is this class students

Questionnaire: how many online shopping do you have?

S = {0, 1, 2, ...}


sample

5, 3, 0, 0, 0, 50
m e
f ra Sample Random
l ing
m p Sample
Sa

S.No. Reg. No Name Gender Age Religion Height Weight


1 171046043 DEEP 1 25 1 160 75.2
its
2Un 181046001 SRINIVAS 1 25 3 172 83.5
3 181046002 VINEETHA 0 23 2 145 48.5
4 181046003 BASAVARAJ 1 19 2 154 68.4
5 181046004 GEETHA 0 19 2 148 52
6 n it s
181046005 PRABHU 1 24 1 158 60.4
U
li ng
m7p 181046006 NAIR 1 21 3 152 73.8
Sa
8 181046007 BABU 1 25 2 155 70
9 181046008 ROSE 0 17 1 151 60.2
10 181046009 KUMAR 1 23 1 162 60.2
Parameter and Statistic
In order to avoid verbal confusion let us make clear that:

The statistical constants of the


Population population like mean and variance
are referred as parameters.

Statistical measures computed from


the sample observations alone like
mean, variance etc. have been termed sample
as statistics.
Parameter versus Statistic

A parameter is the A statistic is the


numerical summary of the numerical summary
population based on the sample

Population = All cars in Sample= 50 cars


the Bhopal district selected

Number of red cars in 15 red cars in our


Manipal implies sample implies
A PARAMETER A STATISTICS
SAMPLING DISTRIBUTIONS

The distribution of all possible values that can be assumed


by some statistic, computed from samples of the same size
randomly drawn from the same population, is called the
sampling distribution of that statistic

Sampling Distributions: Important Characteristics

We usually are interested in knowing three things about a


given sampling distribution: its mean, its variance, and its
functional form (how it looks when graphed)
Sampling Distributions: Construction

To construct a sampling distribution we proceed as follows:

1. From a finite population of size N, randomly draw all possible


samples of size n.
2. Compute the statistic of interest (like mean, variance etc.)
for each sample.
3. List in one column the different distinct observed values of the
statistic, and in another column list the corresponding
frequency of occurrence of each distinct observed value of the
statistic
Sampling Distributions: Construction

Pumpkin A B C D E F
Weight 19 14 15 9 10 17
Sampling Distributions: Construction
• A population consists of six pumpkins.
• Rajesh is asked to estimate the average weight of six pumpkins
by taking a random sample of size 2 (without replacement) from
the population.
• Rita is asked to estimate the average weight of six pumpkins by
taking a random sample of size 5 (without replacement) from the
population.
• Demonstrate the sampling distribution?
• Do you think the use of the sample to estimate the mean of
the population would involve some sampling error?
• Why? (since the sample mean is random)
Since we know the weights from the population, we can find the
population mean.

To demonstrate the sampling distribution, let’s start with obtaining all of the
possible samples of size n=2 from the populations, sampling without
replacement.
Sample Weight Probability
A, B 19, 14 16.5
A, C 19, 15 17.0
A, C 19, 15 17.0
A, D 19, 9 14.0
A,
A, D
E 19,
19, 9
10 14.0
14.5
F
A, E 17
19, 10 18.0
14.5
B, C 14, 15 14.5
A, F 19, 17 18.0
B, D 14, 9 11.5
B,
B, C
E 14,
14, 15
10 14.5
12.0
F
B, D 17
14, 9 15.5
11.5
C,
B, D
E 15,
14, 9
10 12.0
12.0
C, E 15, 10 12.5
B, F 14, 17 15.5
C, F 15, 17 16.0
C,
D, D
E 15, 9
9, 10 12.0
9.5
D, E
C, F 9,
15,17
10 13.0
12.5
E, F 10, 17 13.5
C, F 15, 17 16.0
D, E 9, 10 9.5
D, F 9, 17 13.0
E, F 10, 17 13.5
We can combine all of the values and create a table of the possible values and
their respective probabilities.

9.5 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.5 16.0 16.5 17.0 18.0
P( )
Now that we have the sampling distribution of the sample mean, we can calculate
the mean of all the sample means. In other words, we can find the mean (or
expected value) of all the possible  ’s.
The mean of the sample means is

Even though each sample may give you an answer involving some error, the
expected value is right at the target: exactly the population mean. In other words, if
one does the experiment over and over again, the overall average of the sample
mean is exactly the population mean
Sample Weight Probability
A, B, C, D, E 19, 14, 15, 9, 10 13.4
A, B, C, D, F 19, 14, 15, 9, 17 14.8
A,
A, B,
B, C,
C, E,
E, F
F 19,
19, 14,
14, 15,
15, 10,
10, 17
17 15.0
15.0
A,
A, B,
B, D,
D, E,
E, F
F 19,
19, 14,
14, 9,
9, 10,
10, 17
17 13.8
13.8
A,
A, C,
C, D,
D, E,
E, F
F 19,
19, 15,
15, 9,
9, 10,
10, 17
17 14.0
14.0

We can combine all of the values and create a table of the possible values and
their respective probabilities.

13.0 13.4 13.8 14.0 14.8 15.0 3.0


P( )

The mean of the sample means is...


The following dot plots show the distribution of the sample means
corresponding to sample sizes of n=2 and of n=5. Again, we see that
using the sample mean to estimate population mean involves
sampling error. However, the error with a sample of size n=5 is on the
average smaller than with a sample of size n=2.
Sampling Distribution of mean when we do sampling
Normally Distributed Populations

If the population is normally distributed with mean μ and standard deviation σ,


then the sampling distribution of the sample mean is also normally distributed no
matter what the sample size is.

When sampling is from a normally distributed population, the distribution of


the sample mean will possess the following properties:
1. The distribution of sample mean will be normal.
2. The mean of the distribution of sample mean will be equal to the mean of
the population from which the samples were drawn.
3. The variance of the distribution of sample mean will be equal to the
variance of the population divided by the sample size.
Sampling Distribution of mean when we do sampling from
Non-Normally Distributed Populations

The Central Limit Theorem

 
Given a population of any non-normal functional form with a mean and finite
variance, the sampling distribution of sample mean, computed from samples of
size n from this population, will have mean () and variance (σ²/n) and will be
approximately normally distributed when the sample size is large.
Standard Error

The standard deviation of the sampling distribution of a statistic is


known as its standard Error, abbreviated as S.E.
Statistics Standard Error
Sample mean:
Sample proportion:

Difference of two sample


means:
Difference between two
sample proportions:

In order to compute the standard deviation of a sample statistic (S.E), you


must know the value of one or more population parameters. For example,
to compute the standard deviation of the sample mean, you need to know
the variance of the population
Standard Error
 The values of population parameters are often unknown, making it
impossible to compute the standard error of a statistic. Then
Statistics Standard Error
Sample mean:
Sample proportion:

Difference of two sample


means:
Difference between two
sample proportions:

The equations for the standard error are identical to the equations for the
standard deviation, except for one thing - the standard error equations
use statistics where the standard deviation equations use parameters. The
standard error equations use p in place of P, and s in place of σ.
The engines made by Ford for speedboats have an average power of 220
horsepower (HP) and standard deviation of 15 HP. Assume the distribution
of power follows a normal distribution. Consumer reports are testing the
engines and will dispute the company's claim if the sample mean is less than
215 HP.
a. If they take a sample of 4 engines, what is the probability the mean is
less than 215
b. If consumer reports samples 100 engines, what is the probability that the
sample mean will be less than 215?

The weights of baby giraffes are known to have a mean of 125 pounds and
a standard deviation of 15 pounds. If we obtained a random sample of 40
baby giraffes,
a. Does the problem indicate that the distribution of weights is normal?
b. what is the probability that the sample mean will be between 120 and
130 pounds?
Suppose it is known that in a certain large human population cranial length
is approximately normally distributed with a mean of 185.6mm. What is the
probability that a random sample of size 10 from this population will have a
mean greater than 190? The standard error is known to be 4.02 mm.
ANS: probability is .1357
Sampling Distribution of the Sample Proportion

Before we begin, let’s make sure we review the terms and notation
associated with proportions:
p: is the population proportion. It is a fixed value.
n: is the size of the random sample.
p̂ : is the sample proportion. It varies based on the sample.
In a particular family, there are five children. Their names are Alex (A),
Betina (B), Carly (C), Debbie (D), and Edward (E). The table below
shows the child’s name and their favorite color.  We are interested in the
proportion of children in the family who prefer the color blue, and from
the table, we can see that p=.40 of the children prefer blue.

Name Alex (A) Betina (B) Carly (C) Debbie (D) Edward (E)
Color Green Blue Yellow Purple Blue

let's say we didn't know the proportion of children who like blue as their
favorite color. We'll use re-sampling methods to estimate the proportion.
Let’s take n=2 repeated samples, taken without replacement. Write all the
possible samples of size n=2, n=4 and their respective probabilities of the
proportion of children who like blue. Demonstrate the population
distribution n=2 and n=4?
Sample p (Blue) Probability
AB 1/2 1/10
AC 0 1/10
AD 0 1/10
AE 1/2 1/10
BC 1/2 1/10
BD 1/2 1/10
BE 1 1/10
CD 0 1/10
CE 1/2 1/10
DE 1/2 1/10

The probability mass function (PMF) is

P(Blue) 0 1/2 1
Probability 3/10 6/10 1/10
Sampling Distribution of the Sample Proportion

The sampling distribution of the sample proportion is approximately


normal if np>5 and n(1-p)>5.

Suppose it is known that 43% of Americans own an iPhone. If a


random sample of 50 Americans were surveyed, what is the probability
that the proportion of the sample who owned an iPhone is between
45% and 50%?
The Sampling Distribution of a Difference Between Two Means
Who’s Taller at Ten, Boys or Girls?
A research claim that the heights (in inches) of ten-year-old girls follow
a Normal distribution N(56.4, 2.7). The heights (in inches) of ten-year-
old boys follow a Normal distribution N(55.7, 3.8). A researcher takes
independent SRSs of 12 girls and 8 boys of this age and measures their
heights. After analysing the data, the researcher reports that the sample
mean height of the boys is larger than the sample mean height of the
girls.
Describe the shape, center, and spread of the sampling distribution of x f  xm .
Because both population distributions are Normal, the sampling distribution
of x f  x m is Normal.

Its mean is  f  m  56.4  55.7  0.7 inches.

Its standard deviation is


2.7 2 3.8 2
  1.55 inches.
12 8
Find the probability of getting a difference in sample means
x1  x2 that is less than 0.

Standardize : When x1  x 2  0,

0  0.70
z  0.45
1.55 at Ten, Boys or Girls?
Who’s Taller

The area to the left of z  0.45 under the standard


Normal curve is 0.3264.

Does the result above give us reason to doubt the researchers’ stated results?
If the mean height of the boys is greater than the mean height
of the girls, xm  x f , That is x f  xm  0. Above result shows
that there’s about a 33% chance of getting a difference in
sample means that’s negative just due to sampling variability.
This gives us little reason to doubt the researcher’s claim.
The Sampling Distribution of a Difference Between Two Proportions

To explore the sampling distribution of the difference between two


proportions, let’s start with two populations having a known proportion of
successes.
 At School 1, 70% of students did their homework last night
 At School 2, 50% of students did their homework last night.
Suppose the counselor at School 1 takes an SRS of 100 students and records
the sample proportion that did their homework.
School 2’s counselor takes an SRS of 200 students and records the sample
proportion that did their homework.

What can we say about the difference pˆ1  pˆ 2 in the sample proportions?


The Sampling Distribution of a Difference Between Two Proportions
Example: Who Does More Homework?
Suppose that there are two large high schools, each with more than 2000 students, in a certain town. At School 1,
70% of students did their homework last night. Only 50% of the students at School 2 did their homework last
night. The counselor at School 1 takes an SRS of 100 students and records the proportion that did homework.
ˆ1  pˆ 2 = 0.10.
School 2’s counselor takes an SRS of 200pstudents and records the proportion that did homework. School 1’s
counselor and School 2’s counselor meet to discuss the results of their homework surveys. After the
ˆ1 meeting,
Describe the shape, center, and
they both report to their principals that
spread of the sampling distribution of p pˆ 2 .
Because n1 p1 =100(0.7) = 70, n1 (1 p1 )  100(0.30)  30, n 2 p2 = 200(0.5) =100

and n 2 (1 p2 )  200(0.5)  100 are all at least 10, the sampling distribution
of pˆ1  pˆ 2 is approximately Normal.
Its mean is p1  p2  0.70  0.50  0.20.

Its standard deviation is


0.7(0.3) 0.5(0.5)
  0.058.
100 200
Find the probability of getting a difference in sample proportions
pˆ1  pˆ 2 of 0.10 or less from the two surveys.

Standardize : When pˆ1  pˆ 2  0.10,

0.10  0.20
z  1.72
0.058
Example: Who Does More Homework?
The area to the left of z  1.72 under the
standard Normal curve is 0.0427.

Does the result in above give us reason to doubt


the counselors' reported value?

There is only about a 4% chance of getting a difference in sample proportions


as small as or smaller than the value of 0.10 reported by the counselors.
This does seem suspicious!
Critical values (zα) of Z
Critical values Level of significance
1% 5% 10%

Two tailed zα=±2.58 zα=±1.96 zα=±1.64

Right tailed zα=2.33 zα=1.645 zα=1.28

Left tailed zα=-2.33 zα=-1.645 zα=-1.28

You might also like