0% found this document useful (0 votes)

48 views66 pages

Statistics Lecture Course 2022-2023

1) This document appears to be part of a course on statistics for civil engineering students. It covers topics like descriptive statistics, measures of central tendency, measures of variation, and probability theory. 2) Key concepts discussed include the mean, median, range, variance, standard deviation, and how these statistical measures are used to summarize and describe data. Graphs like histograms and relative frequency distributions are also introduced. 3) Probability theory is discussed, defining random variables, random experiments, and sample spaces. Examples are given to illustrate these probabilistic concepts.

Uploaded by

Cece Ses

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views66 pages

Statistics Lecture Course 2022-2023

Uploaded by

Cece Ses

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

University of Technology ‫الجامعة التكنولوجية‬

Civil Eng. Dept., all divisions ‫قسم الهندسة المدنية – كافة الفروع‬

Second year, semester # 3 ‫المرحلة الثانية‬

(CEPS 206)

Civil Engineering Department,

University of Technology,
Baghdad, Iraq.

Prof. Maan S. Hassan (PhD)

2022-2023

Page | 1
Partial List of Symbols

Page | 2
Introduction to Statistics

Definitions:

Statistics: is the branch of scientific inquiry that provides methods for organizing
and summarizing data, and for using information in the data to draw various
conclusions.

Descriptive Statistics: The part of statistics that deals with methods for organization
and summarization of data. Descriptive methods can be used with list of all
population members (a census), or when the data consists of a samples.

Inferential Statistics: When the data is a sample and the objective is to go beyond
the sample to draw conclusions about the population based on sample information.

Population: A population of participants or objects consists of all those participants

or objects that are relevant in a particular study.

Sample: A sample is any subset of the population of individuals or things under

study.

Probability function: is a rule, denoted by p(x) that assigns numbers to elements

of the sample space

Link between statistics and Probability

Probability
Population Sample

Statistics

Page | 3
Three fundamental components of statistics
Statistical techniques consist of a wide range of goals, techniques and strategies. Three fundamental
components worth stressing are:

1. Design, meaning the planning and carrying out of a study.

2. Description, which refers to methods for summarizing data.

3. Inference, which refers to making predictions or generalizations about a Population of individuals or

things based on a sample of observations available to us.

Numerical Summaries of Data

1.0 Summation notation

In symbols, adding the numbers X1,X2, . . . ,Xn is denoted by

where ∑ is an upper case Greek sigma. The subscript i is the index of summation
and the 1 and n that appear respectively below and above the symbol ∑ designate
the range of the summation.

Page | 4
Example 1:

Page | 5
Measures of location:
The sample mean:
The first measure of location, called the sample mean, is just the average of the
values and is generally labeled X¯. The notation X¯ is read as X bar. In summation
notation,

Example 1:

You sample ten married couples and determine the number of children they have.
The results are 0, 4, 3, 2, 2, 3, 2, 1, 0, 8.

The sample mean is: X¯ = (0+4+3+2+2+3+2+1+0+8)/10 = 2.5.

Of course, nobody has 2.5 children. The intention is to provide a number that is
centrally located among the 10 observations with the goal of conveying what is
typical.

Example 2

The salaries (in thousands Iraqi D) of the 11 individuals currently working at the
company are:

300,250,320,280,350,310,300,360,290,2000,5000,

where the two largest salaries correspond to the vice president and president,

The average is 887, but it gives a distorted sense of what is typical!

Outliers are values that are unusually large or small.

Page | 6
2.0 The median

Another important measure of location is called the sample median. The basic idea
is easily described using the example based on the weight of trout. The observed
weights were

1.1,2.3,1.7,0.9,3.1.

Putting the values in ascending order yields

0.9,1.1,1.7,2.3,3.1.

Notice that the value 1.7 divides the observations in the middle in the sense that
half of the remaining observations are less than 1.7 and half are larger.

If instead we have an even number of observations, there is no middle value,

0.8, 1.3, 1.8, 2.6, 2.7, 2.7, 3.1, 4.5

The sample median in this case is taken to be the average of 2.6 and 2.7, namely
(2.6 + 2.7)/2 = 2.65.

Problems

4. Find the mean and median of the following sets of numbers. (a) −1, 03, 0, 2, −5.
(b) 2, 2, 3, 10, 100, 1,000.

5. The final exam scores for 15 students are 73, 74, 92, 98, 100, 72, 74, 85, 76, 94,
89, 73, 76, 99. Compute the mean and median.

6. The average of 23 numbers is 14.7. What is the sum of these numbers?

7. Consider the ten values 3, 6, 8, 12, 23, 26, 37, 42, 49, 63. The mean is X¯ = 26.9.

(a) What is the value of the mean if the largest value, 63, is increased to 100?
(b) What is the mean if 63 is increased to 1,000? (c) What is the mean if 63 is increased to
10,000?

8. Repeat the previous problem, only compute the median instead.

Page | 7
Measures of variation:
1.0 The range

The range is just the difference between the largest and smallest observations. In
symbols, it is X(n) −X(1).

2.0 The variance and standard deviation

The following data written in ascending order:

7.5,8.0,8.0,8.5,9.0,11.0,19.5,19.5,28.5,31.0,36.0.

The data mean is X¯ = 17, so the deviation scores are

−9.5,−9.0,−9.0,−8.5,−8.0,−6.0,2.5,2.5,11.5,14.0,19.0.

Deviation scores reflect how far each observation is from the mean, but often it is
best to find a single numerical quantity that summarizes the amount of variation in
our data

The average difference is always zero, so this approach is unsatisfactory

The average squared difference from the mean is called the sample variance,
which is:

The sample standard deviation is the (positive) square root of the variance, Ѕ.

Example 1

The following data are the sample test results

3,9,10,4,7,8,9,5,7,8.

The sample mean is X¯ = 7,

Page | 8
The sum of the observations in the last column is

∑(Xi −X¯)2 =48.

So,
Ѕ2 = 48/9 = 5.33.

Page | 9
GRAPHICAL SUMMARIES OF DATA:

1.0 Relative frequencies

The notation fx is used to denote the frequency or number of times the value x
occurs.

Plots of relative frequencies help add perspective on the sample variance, mean
and median.

n =∑ fx,

Table 1: One hundred results

22222333333333333333333444444444444444444444
44455555555555555555555555556666666666666667
777777778888

Figure 1: Relative frequencies for the data in table 1.

Page | 10
The sample variance is

The cumulative relative frequency distribution F(x) refers to the proportion of

observations less than or equal to a given value.

Problems
1. Based on a sample of 100 individuals, the values 1, 2, 3, 4, 5 are observed with
relative frequencies 0.2, 0.3, 0.1, 0.25, 0.15. Compute the mean, variance and
standard deviation.

2. Fifty individuals are rated on how open minded they are. The ratings have the
values 1, 2, 3, 4 and the corresponding relative frequencies are 0.2, 0.24, 0.4, 0.16,
respectively. Compute the mean, variance and standard deviation.

3. For the values 0, 1, 2, 3, 4, 5, 6 the corresponding relative frequencies based on

a sample of 10,000 observations are 0.015625, 0.093750, 0.234375, 0.312500,
0.234375, 0.093750, 0.015625, respectively. Determine the mean, median,
variance, standard deviation and mode.

4. For a local charity, the donations in dollars received during the last month were
5, 10, 15, 20, 25, 50 having the frequencies 20, 30, 10, 40, 50, 5. Compute the
mean, variance and standard deviation.
5. The values 1, 5, 10, 20 have the frequencies 10, 20, 40, 30. Compute the mean,
variance and standard deviation.

Page | 11
Probability Theory

A random variable refers to a measurement or observation that cannot be known

in advance.

An experiment that can result in different outcomes, even though it is

repeated in the same manner every time, is called a random experiment.

Roman letter is used to represent a random variable, the most common letter being X.

A lower-case x is used to represent an observed value corresponding to the random

variable X. So, the notation X =x means that the observed value of X is x.

The set of all possible outcomes or values of X we might observe is called the sample
space.

The set of all possible outcomes of a random experiment is called the sample space
of the experiment. The sample space is denoted as S.

EXAMPLE 1:
Consider an experiment in which you select a plastic pipe, and measure its thickness.

Sample space as simply the positive real line because a negative value for thickness
cannot occur

S= R+ = { x│x>0 }

If it is known that all connectors will be between 10 and 11 millimeters thick, the
sample space could be
S= { x │10 < x < 11 }

Page | 12
If the objective of the analysis is to consider only whether a particular part is low,
medium, or high for thickness, the sample space might be taken to be the set of three
outcomes:

S = { low, medium, high }

If the objective of the analysis is to consider only whether or not a particular part
conforms to the manufacturing specifications, the sample space might be simplified
to the set of two outcomes,

S = { yes, no }

that indicate whether or not the part conforms.

A discrete random variable meaning that there are gaps between any value and the
next possible value.

A continuous random variable meaning that for any two outcomes, any value
between these two values is possible.

EXAMPLE 2:

If two connectors are selected and measured, the sample space is depending on the
objective of the study.

If the objective of the analysis is to consider only whether or not the parts conform
to the manufacturing specifications, either part may or may not conform. The sample
space can be represented by the four outcomes:

Page | 13
S = { yy, yn, ny, nn }

If we are only interested in the number of conforming parts in the sample, we

might summarize the sample space as

S = { 0, 1, 2 }

In random experiments in which items are selected from a batch, we will indicate
whether or not a selected item is replaced before the next one is selected. For
example, if the batch consists of three items {a, b, c} and our experiment is to select
two items without replacement, the sample space can be represented as

Swithout = { ab, ac, ba, bc, ca, cb }

Swith = { aa, ab, ac, ba, bb, bc, ca, cb, cc }

Events:

Often, we are interested in a collection of related outcomes from a random

experiment.

An event is a subset of the sample space of a random experiment.

Some of the basic set operations are summarized below in terms of events:

• The union of two events is the event that consists of all outcomes that are contained in
either of the two events. We denote the union as E1UE2.

• The intersection of two events is the event that consists of all outcomes that are contained
in both of the two events. We denote the intersection as E1∩E2.

• The complement of an event in a sample space is the set of outcomes in the sample space
that are not in the event. We denote the component of the event E as É.

Page | 14
EXAMPLE 3:

Consider the sample space S {yy, yn, ny, nn} in Example 2. Suppose that the set of
all outcomes for which at least one part conforms is denoted as E1. Then,

E1 = { yy, yn, ny }

The event in which both parts do not conform, denoted as E2, contains only the single
outcome, E2{nn}. Other examples of events are E3 = Ø, the null set, and E4=S, the
sample space. If E5={yn, ny, nn},

E1 U E5 = S E1∩ E5 = { yn , ny } É1= { nn }

EXAMPLE 4:

Measurements of the time needed to complete a chemical reaction might be

modeled with the sample space S= R+, the set of positive real numbers. Let

E1= { x │1 ≤ x < 10} and E2= { x │1 < x < 118}

Then,
E1 U E2 = { x │1 ≤ x < 118} and E1 ∩ E2 = { x │3 < x < 10}

Also,

É1= { x │ x ≥ 10} and É1 ∩ E2 = { x │10 ≥ x < 118}

Page | 15
EXAMPLE 5:

Samples of concrete surface are analyzed for abrasion resistance and impact
strength. The results from 50 samples are summarized as follows:

impact strength
High Low
abrasion resistance High 40 4
Low 1 5

Let A denote the event that a sample has high impact strength,
Let B denote the event that a sample has high abrasion resistance.

Determine the number of samples in A ∩ B, Á, and A U B

The event A ∩ B consists of the 40 samples for which abrasion resistance and impact
strength are high. The event Á consists of the 9 samples in which the impact strength
is low. The event A U B consists of the 45 samples in which the abrasion resistance,
impact strength, or both are high.

Figure 1: Venn diagrams

Page | 16
Venn diagrams are often used to describe relationships between events and sets.

Two events, denoted as E1 and E2, such that

E1∩E2 = Ø

are said to be mutually exclusive.

The two events in Fig. 1(b) are mutually exclusive, whereas the two events in Fig. 1(a) are not. Additional results
involving events are summarized below. The deﬁnition of the complement of an event implies that
1 E¿ 2 ¿ E
The distributive law for set operations implies that

Table 1: Corresponding statements in set theory and probability Set theory

Probability theory

Probability is used to quantify the likelihood, or chance, that an outcome of a

random experiment will occur. “The chance of rain today is 30%’’ is a statement
that quantifies our feeling about the possibility of rain.

Page | 17
A 0 probability indicates an outcome will not occur. A probability of 1 indicates an
outcome will occur with certainty.

100 Elements

Fig. 2: Probability of the event E is the sum of the probabilities of the outcomes in E.

For a discrete sample space, the probability of an event E, denoted as P(E),

equals the sum of the probabilities of the outcomes in E.

EXAMPLE 6:

A random experiment can result in one of the outcomes {a, b, c, d} with probabilities
0.1, 0.3, 0.5, and 0.1, respectively. Let A denote the event {a, b}, B the event {b, c,
d}, and C the event {d}.Then,

P(A)= 0.1 + 0.3 = 0.4

P(B)= 0.3 + 0.5 + 0.1 = 0.9
P(C) = 0.1

Also: P (Á)= 0.6, P(B´)= 0.1, P(C´) = 0.9

P (A ∩ B)= 0.3
P (A U B)= 1
P (A ∩ C)= 0

Page | 18
EXAMPLE 7:
A visual inspection of a defects location on concrete element manufacturing
process resulted in the following table:

Number of defects Proportion of concrete element

0 0.4
1 0.2
2 0.15
3 0.1
4 0.05
5 or more 0.1

a) If one element is selected randomly from this process to inspected, what is

the probability that it contains no defects?

The event that there is no defect in the inspected concrete elements, denoted as E1,
can be considered to be comprised of the single outcome,

E1= {0}.

Therefore, P(E1) = 0.4

b) What is the probability that it contains 3 or more defects?

Let the event that it contains 3 or more defects, denoted as E2

P (E2) = 0.1+0.05+0.1= 0.25

EXAMPLE 8:
Suppose that a batch contains six parts with part numbers {a, b, c, d, e, f}. Suppose
that two parts are selected without replacement. Let E denote the event that the part
number of the first part selected is a. Then E can be written as E {ab, ac, ad, ae, af}.
The sample space can be counted. It has 30 outcomes. If each outcome is equally
likely,

P(E) = 5/30 = 1/6

Page | 19
ADDITION RULES

P( A U B ) = P( A ) + P( B ) - P( A ∩ B )

EXAMPLE 9:

The defects such as those described in Example 7 were further classified as either in
the “center’’ or at the “edge’’ of the concrete elements, and by the degree of damage.
The following table shows the proportion of defects in each category. What is the
probability that a defect was either at the edge or that it contains four or more
defects?

Location in Concrete Element Surface

Defects Center Edge Total
Low 514 68 582
High 112 246 358
Total 626 314

Let E1 denote the event that a defect contains four or more defects, and let E2 denote
the event that a defect is at the edge.

Defects Classified by Location and Degree

Number of defects Center Edge Totals
0 0.30 0.10 0.40
1 0.15 0.05 0.20
2 0.10 0.05 0.15
3 0.06 0.04 0.10
4 0.04 0.01 0.05
5 or more 0.07 0.03 0.10
Totals 0.72 0.28 1.00

Page | 20
The requested probability is P (E1 U E2). Now, P (E1) = 0.15 and P (E2) = 0.28. Also,
from the table above, P (E1 ∩ E2) = 0.04

Therefore, P (E1 U E2) = 0.15 + 0.28 – 0.04 = 0.39

What is the probability that concrete surface contains less than two defects (denoted
as E3) or that it is both at the edge and contains more than four defects (denoted as
E4)?

The requested probability is P (E3 U E4). Now P (E3) = 0.6, and P (E4) = 0.03. Also,
E3 and E4 are mutually exclusive.

Therefore, P (E3 ∩ E4) = Ø

and P (E3 U E4) = 0.6 + 0.03 = 0.63

for the case of three events:

Page | 21
EXAMPLE 9:

Let X denote the pH of a sample. Consider the event that X is greater than 6.5 but
less than or equal to 7.8. This probability is the sum of any collection of mutually
exclusive events with union equal to the same range for X. One example is:

Another example is

The best choice depends on the particular probabilities available.

Page | 22
Page | 23
CONDITIONAL PROBABILITY

In a manufacturing process, 10% of the parts contain visible surface flaws and 25%
of the parts with surface flaws are (functionally) defective parts. However, only 5%
of parts without surface flaws are defective parts. The probability of a defective part
depends on our knowledge of the presence or absence of a surface flaw.

Let D denote the event that a part is defective

and let F denote the event that a part has a surface flaw.

Then, the probability of D given, or assuming, that a part has a surface flaw as
P(D│F). This notation is read as the conditional probability of D given F, and it is
interpreted as the probability that a part is defective, given that the part has a surface
flaw.

Page | 24
EXAMPLE 1:

Table 1 below provides an example of 400 parts classified by surface flaws and as
(functionally) defective. For this table the conditional probabilities match those
discussed previously in this section. For example, of the parts with surface flaws (40
parts) the number defective is 10.

Table 1: Parts Classified

Yes (event F) No Total

Defective Yes (event D) 10 18 28
No 30 342 372
Total 40 360 400

Therefore,

and of the parts without surface flaws (360 parts) the number defective is 18.
Therefore,

Figure 1: Tree diagram for parts classified

Therefore, P (B│A) can be interpreted as the relative frequency of event B among

the trials that produce an outcome in event A.

Page | 25
EXAMPLE 2:

Again, consider the 400 parts in Table 1 above (example 1). From this table

Note that in this example all four of the following probabilities are different:

Here, P (D) and P (D│F) are probabilities of the same event, but they are computed
under two different states of knowledge.

Similarly, P (F) and P (F│D),

The tree diagram in Fig. 1 can also be used to display conditional probabilities.

Page | 26
Permutations

Another useful calculation is the number of ordered sequences of the elements of a

set. Consider a set of elements, such as S {a, b, c}. A permutation of the elements
is an ordered sequence of the elements. For example, abc, acb, bac, bca, cab, and
cba are all of the permutations of the elements of S.

In some situations, we are interested in the number of arrangements of only some of

the elements of a set. The following result also follows from the multiplication rule.

EXAMPLE 3:

A printed circuit board has eight different locations in which a component can be
placed. If four different components are to be placed on the board, how many
different designs are possible?

Each design consists of selecting a location from the eight locations for the first
component, a location from the remaining seven for the second component, a
location from the remaining six for the third component, and a location from the
remaining five for the fourth component. Therefore,

Page | 27
Combinations

Another counting problem of interest is the number of subsets of r elements that can
be selected from a set of n elements. Here, order is not important.

EXAMPLE 4:

A printed circuit board has eight different locations in which a component can be
placed. If five identical components are to be placed on the board, how many
different designs are possible? Each design is a subset of the eight locations that are
to contain the components. From the Equation above, the number of possible designs
is

The following example uses the multiplication rule in combination with the above
equation to answer a more difficult, but common, question.

Page | 28
EXAMPLE 5:

A bin of 50 manufactured parts contains three defective parts and 47 non-defective

parts. A sample of six parts is selected from the 50 parts. Selected parts are not
replaced. That is, each part can only be selected once and the sample is a subset of
the 50 parts. How many different samples are there of size six that contain exactly
two defective parts?

A subset containing exactly two defective parts can be formed by first choosing the
two defective parts from the three defective parts.

Then, the second step is to select the remaining four parts from the 47 acceptable
parts in the bin. The second step can be completed in

Therefore, from the multiplication rule, the number of subsets of size six that contain
exactly two defective items is

3 * 178,365 = 535,095
As an additional computation, the total number of different subsets of size six is
found to be

Therefore, the probability that a sample contains exactly two defective parts is

Page | 29
Page | 30
Distributions

Discrete Distributions:

Continuous Distributions:

Page | 31
Page | 32
Definition:

BINOMIAL DISTRIBUTION:

Definition:

EXAMPLE 1:

Each sample of water has a 10% chance of containing a particular organic pollutant.
Assume that the samples are independent with regard to the presence of the pollutant.
Find the probability that in the next 18 samples, exactly 2 contain the pollutant. Let
X the number of samples that contain the pollutant in the next 18 samples analyzed.
Then X is a binomial random variable with p= 0.1 and n= 18. Therefore,

Page | 33
Determine the probability that at least four samples contain the pollutant?

The requested probability is

However, it is easier to use the complementary event,

Determine the probability that 3 ≤ X < 7. Now

The mean and variance of a binomial random variable depend only on the parameters
p and n.

Page | 34
EXERCISES:

1. For each scenario described below, state whether or not the binomial distribution is a reasonable
model for the random variable and why. State any assumptions you make.

(a) A production process produces thousands of temperature transducers. Let X denote the number
of nonconforming transducers in a sample of size 30 selected at random from the process.

(b) From a batch of 50 temperature transducers, a sample of size 30 is selected without

replacement. Let X denote the number of nonconforming transducers in the sample.

(c) Four identical electronic components are wired to a controller that can switch from a failed
component to one of the remaining spares. Let X denote the number of components that have failed
after a specified period of operation.

(d) Defects occur randomly over the surface of a semiconductor chip. However, only 80% of
defects can be found by testing. A sample of 40 chips with one defect each is tested. Let X denote
the number of chips in which the test finds a defect.

2. The random variable X has a binomial distribution with n=10 and p=0.5. Determine the
following probabilities:
(a) P(X = 5) (b) P(X ≤ 2) (c) P(X ≥ 9) (d) P (3 ≤ X < 5)

3. Sketch the probability mass function of a binomial distribution with n =10 and p = 0.01 and
comment on the shape of the distribution.
(a) What value of X is most likely? (b) What value of X is least likely?

4. Batches that consist of 50 concrete blocks from a production process are checked for
conformance to building requirements. The mean number of nonconforming concrete blocks in a
batch is 5. Assume that the number of nonconforming concrete blocks in a batch, denoted as X, is
a binomial random variable.
(a) What are n and p? (b) What is P(X ≤ 2)? (c) What is P(X ≥ 49)?

5. A manufacturing process has 100 customer orders to fill. Each order requires one component
part that is purchased from a supplier. However, typically, 2% of the components are identified as
defective, and the components can be assumed to be independent.
a) If the manufacturer stocks 100 components, what is the probability that the 100 orders can
be filled without reordering components?
b) If the manufacturer stocks 102 components, what is the probability that the 100 orders can
be filled without reordering components?
c) If the manufacturer stocks 105 components, what is the probability that the 100 orders can
be filled without reordering components?

(This exercise illustrates that poor quality can affect schedules and costs).

Page | 35
POISSON DISTRIBUTION:

EXAMPLE 2:

For the case of the thin copper wire, suppose that the number of flaws follows a
Poisson distribution with a mean of 2.3 flaws per millimeter. Determine the
probability of exactly 2 flaws in 1 millimeter of wire. Let X denote the number of
flaws in 1 millimeter of wire. Then, E(X) = 2.3 flaws and

Determine the probability of 10 flaws in 5 millimeters of wire. Let X denote the

number of flaws in 5 millimeters of wire. Then, X has a Poisson distribution with

E(X) = 5 mm × 2.3 flaws/mm = 11.5 flaws

Therefore,

Determine the probability of at least 1 flaw in 2 millimeters of wire. Let X denote

the number of flaws in 2 millimeters of wire. Then, X has a Poisson distribution with
E(X) = 2 mm × 2.3 flaws/mm = 4.6 flaws
Therefore,

Page | 36
EXERCISES:

Page | 37
Density of a loading on a Probability determined from the area
long, thin beam under f(x)

Definition:

For the density function of a loading on a long thin beam, because every point has
zero width, the loading at any point is zero. Similarly, for a continuous random
variable X and any value x.
P(X= x) = 0

Page | 38
EXAMPLE:

Let the continuous random variable X denote the diameter of a hole drilled in a sheet
metal component. The target diameter is 12.5 mm. Most random disturbances to the
process result in larger diameters. Historical data show that the distribution of X can
be modeled by a probability density function f (x) = 20 e -20(x-12.5), x ≥ 12.5.

If a part with a diameter larger than 12.60 millimeters is scrapped, what proportion
of parts is scrapped? The density function and the requested probability are shown
in Fig. 2. A part is scrapped if X ≥ 12.60. Now,

`
What proportion of parts is between 12.5 and 12.6 millimeters? Now,

Because the total area under f (x) equals 1, we can also calculate

P (12.5< X <12.62) = 1 – P(X > 12.62) = 1- 0.135= 0.865.

Figure 2: Probability density function

Page | 39
EXERCISES:

Page | 40
NORMAL DISTRIBUTION:

Normal probability density functions for selected values of the parameters µ and σ2

Definition:

EXAMPLE 4:
Assume that the current measurements in a strip of wire follow a normal distribution
with a mean of 10 mA and a variance of 4 (mA)2. What is the probability that a
measurement exceeds 13 mA?

Let X denote the current in mA. The requested probability can be represented as:
P(X > 13)

Page | 41
This probability is shown as the shaded area under the normal probability density
function in Fig. 3.

Some useful results concerning a normal distribution are summarized below and in
Fig. 4. For any normal random variable,

Definition:

Page | 42
Summary of Common Probability Distributions

Figure 5: Graphical displays for standard normal distributions.

Page | 48
EXAMPLE 6:

Suppose the current measurements in a strip of wire are assumed to follow a normal
distribution with a mean of 10 mA and a variance of 4 (mA)2. What is the probability
that a measurement will exceed 13 mA?

Let X denote the current in mA.

The requested probability can be represented as P(X > 13).

Let Z= (X- 10)/ 2.

We note that X> 13 corresponds to Z> 1.5. Therefore, from Appendix Table II,

Page | 49
EXAMPLE 7: Continuing the previous example, what is the probability that a
current measurement is between 9 and 11 mA?

Determine the value for which the probability that a current measurement is below
this value is 0.98. The requested value is shown graphically in the figure below. We
need the value of x such that P(X < x) = 0.98. By standardizing, this probability
expression can be written as

Appendix Table II is used to find the z-value such that P (Z < z) = 0.98. The nearest
probability from Table II results in

P (Z< 2.05) = 0.97982

Therefore, (x - 10)/ 2= 2.05, and the standardizing transformation is used in reverse
to solve for x. The result is
x = 2(2.05)/10 = 14.1 mA

Page | 50
EXAMPLE 8: The diameter of a shaft in an optical storage drive is normally
distributed with `mean 0.2508 inch and standard deviation 0.0005 inch. The
specifications on the shaft are 0.2500 ± 0.0015 inch. What proportion of shafts
conforms to specifications?

Let X denote the shaft diameter in inches. The requested probability is shown in the
figure below and

Most of the nonconforming shafts are too large, because the process mean is located
very near to the upper specification limit. If the process is centered so that the process
mean is equal to the target value of 0.2500,

By recentering the process, the yield is increased to approximately 99.73%.

Page | 51
EXERCISES:

Page | 52
SAMPLING THEORY
Link between Population and Sampling:

Probability
Population Sample

Statistics

1.0 SAMPLING DISTRIBUTIONS

Statistical inference is concerned with making decisions about a population based

on the information contained in a random sample from that population.

For instance, the mean fill volume of a can (population) is required to be 300 mm.

An engineer takes a random sample of 25 cans and computes the sample

average fill volume to be

x‾ = 298 mm

The engineer will probably decide that the population mean is µ=300 mm,
even though the sample mean was 298 mm because he or she knows that the sample
mean is a reasonable estimate of µ and that a sample mean of 298 mm is very likely
to occur, even if the true population mean is µ=300 mm.

Test values of x‾ vary both above and below µ=300 mm.

Page | 53
The sampling distribution of a statistic depends on:

• The distribution of the population,

• The size of the sample, and
• The method of sample selection.

2.0 SAMPLING METHODS:

1. Random sampling
2. Systematic sampling
3. Stratified sampling
4. Multi-stage sampling

3.0 SAMPLING DISTRIBUTIONS OF MEANS

Suppose that a random sample of size n is taken from a normal population with
mean µ and variance σ2.

Now each observation in this sample, say, X1, X2, X3… Xn, is a normally and
independently distributed random variable with mean µ and variance σ2

The sample mean:

has a normal distribution with mean:

and variance:

Page | 54
(For large N)

N–n (For small N)

N–1

Theorem:

EXAMPLE 1:

An electronics company manufactures resistors that have a mean resistance of 100 ohms
and a standard deviation of 10 ohms. The distribution of resistance is normal.
Find the probability that a random sample of n= 25 resistors will have an average resistance
less than 95 ohms.

Note that the sampling distribution of x‾ is normal, with mean µx‾ = 100 ohms and a
standard deviation of:

Therefore, the desired probability (shaded area) is shown in the figure

below:

Page | 55
Standardizing the point x‾ =95 in the Figure. We find that:

and therefore,

3.0 SAMPLING DISTRIBUTIONS OF DIFFERENCES & SUM:

For two independent populations,

Let the first population has mean µ1 and variance σ12 and the second population has mean µ2 and
variance σ22. Suppose that both populations are normally distributed. Then, we can say that the
sampling distribution of (x1‾ - x2‾) is normal with mean:

And variance

Page | 56
If we have two independent populations with means µ1 and µ2 and variances σ12 and
σ22 and if x1‾ and x2‾ are the sample means of two independent random samples of
sizes n1 and n2 from these populations, then the sampling distribution is:

with condition n1, n2 ≥ 30

EXAMPLE 2:

The effective life of a component used in an engine is a random variable with mean 5000 hours
and standard deviation 40 hours. The distribution of effective life is fairly close to a normal
distribution.
The engine manufacturer introduces an improvement into the manufacturing process for
this component that increases the mean life to 5050 hours and decreases the standard deviation to
30 hours. Suppose that a random sample of n1= 16 components is selected from the “old”
process and a random sample of n2=25 components is selected from the “improved” process.
What is the probability that the difference in the two sample means x2‾ - x1‾ is at least 25
hours? Assume that the old and improved processes can be regarded as independent populations.

the distribution of x1‾ is normal with mean µ1= 5000 hours and standard deviation

σ1/√n1 = 40/√16 = 10 hours,

and the distribution of x2‾ is normal with mean µ2= 5050 hours and standard deviation

σ2/√n2 = 30/√25 = 6 hours,

Now the distribution of x2‾ - x1‾ is normal with mean

µ2 - µ1 = 5050 – 5000 = 50 hours

and variance

σ22/n22 + σ12/n12 = 62 + 102 = 136 hours2.

This sampling distribution is shown in the Figure below:

Page | 57
The sampling distribution of in Example 2

The probability that x2‾ - x1‾ ≥ 25 hours is the shaded portion of the normal
distribution in this figure.

So,

and we find that:

Page | 58
EXERCISES:

1. PVC pipe is manufactured with a mean diameter of 1.01 inch and a standard
deviation of 0.003 inch. Find the probability that a random sample of n = 9 sections
of pipe will have a sample mean diameter greater than 1.009 inch and less than 1.012
inch.

2. A synthetic fiber used in manufacturing carpet has tensile strength that is normally
distributed with mean 75.5 psi and standard deviation 3.5 psi. Find the probability
that a random sample of n= 6 fiber specimens will have sample mean tensile strength
that exceeds 75.75 psi.

3. A random sample of size n1= 16 is selected from a normal population with a mean
of 75 and a standard deviation of 8. A second random sample of size n2= 9 is taken
from another normal population with mean 70 and standard deviation 12. Let x1‾ and
x2‾ be the two-sample means. Find

a) The probability that x1‾- x2‾ exceeds 4

a) (b) The probability that 3.5 ≤ x1‾- x2‾ ≤ 5.5

4. The elasticity of a polymer is affected by the concentration of a reactant. When

low concentration is used, the true mean elasticity is 55, and when high concentration
is used the mean elasticity is 60. The standard deviation of elasticity is 4, regardless
of concentration. If two random samples of size 16 are taken, find the probability
that x‾high- x‾low ≥ 2.

Page | 59
REGRESSION & CORRELATION
Many problems in engineering and science involve exploring the relationships between
two or more variables. Regression analysis is a statistical technique that is very useful for these
types of problems.

For example, in a chemical process, suppose that the yield of the product is related to the
process-operating temperature. Regression analysis can be used to build a model to predict yield
at a given temperature level. This model can also be used for process optimization, such as finding
the level of temperature that maximizes yield, or for process control purposes.

1.0 SIMPLE LINEAR REGRESSION

The case of simple linear regression considers a single predictor independent variable x
and a dependent or response variable Y. Suppose that the true relationship between Y and x is a
straight line and that the observation Y at each level of x is a random variable.

Page | 60
The expected value of Y, can be described by the model:

where the intercept β0 and the slope β1 are unknown regression coefficients.
ε is a random error with mean zero

Figure 2: Deviation of data from the estimated regression model

We call this criterion for estimating the regression coefficients the method of least
squares. We may express the n observations in the sample as

and the sum of the squares of the deviations of the observations from the true regression line is

The least squares estimators of β0 and β1, must satisfy

Simplifying these two equations yields:

Page | 61
The solution to the normal equations results in the least squares estimators β0 and β1:

∑ y¡ ∑ x¡ ² − ∑ x¡ ∑ x¡ y¡
β₀ =
n ∑ x¡ ² − (∑ x¡ )²

n ∑ x¡ y¡ − ∑ x¡ ∑ y¡
β₁ =
n ∑ x¡ ² − (∑ x¡ )²

Note that each pair of observations satisfies the relationship:

yi = β0 + β1 xi + ei i= 1, 2, ………, n

where ei = yi - yˆi is called the residual. The residual describes the error in the fit of the model to
the ith observation yi.

Let:

and

EXAMPLE 1: We will fit a simple linear regression model to the oxygen purity data in Table 1.
The following quantities may be computed:

Page | 62
∑ y¡ ∑ x¡ ² − ∑ x¡ ∑ x¡ y¡
β₀ =
n ∑ x¡ ² − (∑ x¡ )²

n ∑ x¡ y¡ − ∑ x¡ ∑ y¡
β₁ =
n ∑ x¡ ² − (∑ x¡ )²

1843.21 ∗ 29.2892 − 23.92 ∗ 2214.6566

β₀ =
20 ∗ 29.2892 − (23.92)²

20 ∗ 2214.6566 − 23.92 ∗ 1843.21

β₁ =
20 ∗ 29.2892 − (23.92)²

β0 = 74.283

β1 = 14.947
As a double check:

y‾ =? β0 + β1 x‾
So,
92.160 =? 74.283 + 14.947 * 1.196 if yes then continue
If not then re-check your calculations

The fitted simple linear regression model (with the coefficients reported to three decimal places)
is:

Page | 63
3

Using the regression model of Example 1, we would predict oxygen purity of yˆ =

89.23% when the hydrocarbon level is x = 1.00%.

The purity 89.23% may be interpreted as an estimate of the true population mean purity
when x=1.00%, or as an estimate of a new observation when x = 1.00%. These estimates are, of
course, subject to error; that is, it is unlikely that a future observation on purity would be exactly
89.23% when the hydrocarbon level is 1.00%. In subsequent sections we will see how to use
confidence intervals and prediction intervals to describe the error in estimation from a regression
model.

Page | 64
2.0 Correlation

A measure of the linear relationship between two numerical variables is provided

by the correlation coefficient. A correlation coefficient takes a value between -1
(perfect negative correlation) to +1 (perfect positive correlation) with zero
representing no correlation.

H.W No. 1:

The accompanying data was taken from published paper. The independent variable
is SO2 deposition rate (mg/m2/day) and the dependent variable is steel weight loss
(gm/m2).

x: 14, 18, 40, 43, 45, 112

y: 280, 350, 470, 500, 560, 1200

a) Construct a scatter plot. Dose the simple linear regression model appear to
be reasonable in this situation?
b) Calculate the equation of the estimated regression line?
c) Estimate the standard deviation of observation about the true regression
line.

H.W No. 2:

The accompanying data resulted from a study carried out to examine the
relationship between a measure of the corrosion of reinforcement (y) and the
concentration of the corrosion inhibitor solution in concrete pores (x, in ppm):

x: 2.5, 5.03, 7.6, 11.6, 13, 19.6, 26.2, 33, 40, 50, 55

y: 7.68, 6.95, 6.3, 5.75, 5.01, 1.43, 0.93, 0.72, 0.68, 0.65, 0.56

a. Construct a scatter plot of the data. Dose the simple linear regression appear
to be logical?
b. Calculate the equation of the estimated regression line, use it to predict the
value of the corrosion rate that would be observed for a concentration of 33
ppm, and calculate corresponding residual.
c. Estimate the standard deviation of observation about the true regression line.

Page | 65
H.W No. 3:

The accompanying data resulted from a car factory study carried out to examine
the relationship between work hours (y) and work injuries(x), the data below
shows the results:

Y 128 213 75 250 446 540

X 7 12 4 14 25 30

a. Construct a scatter plot of the data. Dose the simple linear regression appear
to be logical?
b. Calculate the equation of the estimated regression line, (Y=b0+b1X)?
c. Use it to predict the value of the work hours that would be estimated for a
work injury of 20?

Page | 66

Emgt 512 SP 2024
No ratings yet
Emgt 512 SP 2024
156 pages
Practice Questions Cost Behaviour
100% (1)
Practice Questions Cost Behaviour
6 pages
MT233 October 2019-1
No ratings yet
MT233 October 2019-1
39 pages
Y. B. Almquist, S. Ashir, L. Brännström - A Guide To Quantitative Methods-Stockholm University (2019)
100% (1)
Y. B. Almquist, S. Ashir, L. Brännström - A Guide To Quantitative Methods-Stockholm University (2019)
343 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Actuarial Science Cs 1 Exam Paper
No ratings yet
Actuarial Science Cs 1 Exam Paper
5 pages
Structural Equation Modeling Using AMOS: An Introduction
No ratings yet
Structural Equation Modeling Using AMOS: An Introduction
64 pages
Statistics and Probability Notes Part 1
No ratings yet
Statistics and Probability Notes Part 1
23 pages
MAS.M-1414. Cost Concepts, Classification and Segregation - MC
No ratings yet
MAS.M-1414. Cost Concepts, Classification and Segregation - MC
10 pages
Stat Learning Notes IV2
No ratings yet
Stat Learning Notes IV2
333 pages
Screenshot 2024-07-22 at 10.26.36 AM
No ratings yet
Screenshot 2024-07-22 at 10.26.36 AM
35 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Slideset 2
No ratings yet
Slideset 2
63 pages
Statistics 1
No ratings yet
Statistics 1
10 pages
The Role of Family Functioning
No ratings yet
The Role of Family Functioning
84 pages
BDU Biometrics
No ratings yet
BDU Biometrics
122 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
Chapter 5
No ratings yet
Chapter 5
82 pages
Statatics Chapter 1
No ratings yet
Statatics Chapter 1
21 pages
2021 EDA-Module 2 DESCRIBING DATA - Oct. 22c
No ratings yet
2021 EDA-Module 2 DESCRIBING DATA - Oct. 22c
70 pages
2.2 Unit-Dsp
No ratings yet
2.2 Unit-Dsp
63 pages
Statistics
No ratings yet
Statistics
12 pages
Statistics YTU Day 1
No ratings yet
Statistics YTU Day 1
37 pages
HOUSE PREDICTION (1) (1) New
No ratings yet
HOUSE PREDICTION (1) (1) New
24 pages
Business Statistics Notes
No ratings yet
Business Statistics Notes
50 pages
Notes PDF
No ratings yet
Notes PDF
54 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Lecture 3
No ratings yet
Lecture 3
36 pages
Week 01 Introduction
No ratings yet
Week 01 Introduction
33 pages
Statistical Concepts and Principles
No ratings yet
Statistical Concepts and Principles
37 pages
Hirtenlehner, H. The Compensatory Effects of Inner and Outer Controls
No ratings yet
Hirtenlehner, H. The Compensatory Effects of Inner and Outer Controls
19 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
LQ1 Notes
No ratings yet
LQ1 Notes
15 pages
Lec 1
No ratings yet
Lec 1
54 pages
Math
No ratings yet
Math
6 pages
Satistics
No ratings yet
Satistics
18 pages
Basic Statistics
No ratings yet
Basic Statistics
23 pages
Stats Lec01
No ratings yet
Stats Lec01
9 pages
Unit 1: Exploratory Data Analysis
No ratings yet
Unit 1: Exploratory Data Analysis
28 pages
Reviewer in Statistics and Probability
No ratings yet
Reviewer in Statistics and Probability
7 pages
Econometrics I: TA Session 5: Giovanna Ubida
No ratings yet
Econometrics I: TA Session 5: Giovanna Ubida
20 pages
Random Variables and Exploratory Data Analysis
No ratings yet
Random Variables and Exploratory Data Analysis
13 pages
Math
No ratings yet
Math
10 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Lecture 1,2,3
No ratings yet
Lecture 1,2,3
13 pages
Stats Reviewer
No ratings yet
Stats Reviewer
16 pages
Lab 3 Statistics Intro
No ratings yet
Lab 3 Statistics Intro
12 pages
6.1 Basic Statistic
No ratings yet
6.1 Basic Statistic
10 pages
Statistics From PLTW
No ratings yet
Statistics From PLTW
64 pages
Applications of Error Propagation Analysis To The Uncertainties of Regression Models
No ratings yet
Applications of Error Propagation Analysis To The Uncertainties of Regression Models
18 pages
Ferreira Dan Vilela 2004 PDF
No ratings yet
Ferreira Dan Vilela 2004 PDF
26 pages
BTM Graduate Course-Requirements English 2023
No ratings yet
BTM Graduate Course-Requirements English 2023
12 pages
TG2 Acc115
No ratings yet
TG2 Acc115
12 pages
University of Zimbabwe: Authorized Materials: Calculator
No ratings yet
University of Zimbabwe: Authorized Materials: Calculator
11 pages
Demand Forecasting II: Evidence-Based Methods and Checklists
No ratings yet
Demand Forecasting II: Evidence-Based Methods and Checklists
36 pages
Ge 4 - Topic 2-Statistics
No ratings yet
Ge 4 - Topic 2-Statistics
8 pages
Estimating R 2 Shrinkage in Regression
No ratings yet
Estimating R 2 Shrinkage in Regression
6 pages
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
No ratings yet
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
24 pages
Corporate Governance and Operational Risk Voluntary Disclosure
No ratings yet
Corporate Governance and Operational Risk Voluntary Disclosure
33 pages
Yu 2014
No ratings yet
Yu 2014
8 pages
Statistical and Probability Tools For Cost Engineering
No ratings yet
Statistical and Probability Tools For Cost Engineering
16 pages
Lecture 5-Statistics-New
No ratings yet
Lecture 5-Statistics-New
7 pages
Statistics 10 1
No ratings yet
Statistics 10 1
5 pages
A Critical Discussion of Intraclass Correlation Coefficients
No ratings yet
A Critical Discussion of Intraclass Correlation Coefficients
12 pages
Sinharay S. Definition of Statistical Inference
No ratings yet
Sinharay S. Definition of Statistical Inference
11 pages
Sampling Distribution and Central Limit Theorem: Session 2
No ratings yet
Sampling Distribution and Central Limit Theorem: Session 2
19 pages
18 Mat 412
No ratings yet
18 Mat 412
4 pages
Taliaferro Et Al. - 2009 - Spiritual Well-Being and Suicidal Ideation Among College Students-Annotated
No ratings yet
Taliaferro Et Al. - 2009 - Spiritual Well-Being and Suicidal Ideation Among College Students-Annotated
10 pages
Basic Statistics: Statistics: Is A Science That Analyzes Information Variables (For Instance
No ratings yet
Basic Statistics: Statistics: Is A Science That Analyzes Information Variables (For Instance
14 pages
Week 3 - Measures of Central Tendency
No ratings yet
Week 3 - Measures of Central Tendency
4 pages
The Study of Supply Chain Management Strategy and Practices On Supply Chain Performance
No ratings yet
The Study of Supply Chain Management Strategy and Practices On Supply Chain Performance
9 pages
Stats - Prob - 3rd Quarter
No ratings yet
Stats - Prob - 3rd Quarter
4 pages
Measurement of Variability
No ratings yet
Measurement of Variability
11 pages
Math Iii
No ratings yet
Math Iii
6 pages
Qualitative Quantitative: Random Variable
No ratings yet
Qualitative Quantitative: Random Variable
4 pages
Design and Learning Effectiveness Evaluation of Gamification in e Learning Systems
No ratings yet
Design and Learning Effectiveness Evaluation of Gamification in e Learning Systems
5 pages
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
No ratings yet
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
4 pages
Stats Week 1 PDF
No ratings yet
Stats Week 1 PDF
6 pages
Nonlinear Model
No ratings yet
Nonlinear Model
3 pages
Describing Data: Centre Mean Is The Technical Term For What Most People Call An Average. in Statistics, "Average"
No ratings yet
Describing Data: Centre Mean Is The Technical Term For What Most People Call An Average. in Statistics, "Average"
4 pages
Why "Sample" The Population? Why Not Study The Whole Population?
No ratings yet
Why "Sample" The Population? Why Not Study The Whole Population?
9 pages
Why "Sample" The Population? Why Not Study The Whole Population?
No ratings yet
Why "Sample" The Population? Why Not Study The Whole Population?
9 pages
Ffcode
No ratings yet
Ffcode
5 pages
Classify Sample Observation
No ratings yet
Classify Sample Observation
2 pages
Linear Regression Excel Example
No ratings yet
Linear Regression Excel Example
3 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet