Statistics Lecture Course 2022-2023
Statistics Lecture Course 2022-2023
Civil Eng. Dept., all divisions قسم الهندسة المدنية – كافة الفروع
(CEPS 206)
2022-2023
Page | 1
Partial List of Symbols
Page | 2
Introduction to Statistics
Definitions:
Statistics: is the branch of scientific inquiry that provides methods for organizing
and summarizing data, and for using information in the data to draw various
conclusions.
Descriptive Statistics: The part of statistics that deals with methods for organization
and summarization of data. Descriptive methods can be used with list of all
population members (a census), or when the data consists of a samples.
Inferential Statistics: When the data is a sample and the objective is to go beyond
the sample to draw conclusions about the population based on sample information.
Probability
Population Sample
Statistics
Page | 3
Three fundamental components of statistics
Statistical techniques consist of a wide range of goals, techniques and strategies. Three fundamental
components worth stressing are:
where ∑ is an upper case Greek sigma. The subscript i is the index of summation
and the 1 and n that appear respectively below and above the symbol ∑ designate
the range of the summation.
Page | 4
Example 1:
Page | 5
Measures of location:
The sample mean:
The first measure of location, called the sample mean, is just the average of the
values and is generally labeled X¯. The notation X¯ is read as X bar. In summation
notation,
Example 1:
You sample ten married couples and determine the number of children they have.
The results are 0, 4, 3, 2, 2, 3, 2, 1, 0, 8.
Of course, nobody has 2.5 children. The intention is to provide a number that is
centrally located among the 10 observations with the goal of conveying what is
typical.
Example 2
The salaries (in thousands Iraqi D) of the 11 individuals currently working at the
company are:
300,250,320,280,350,310,300,360,290,2000,5000,
where the two largest salaries correspond to the vice president and president,
Page | 6
2.0 The median
Another important measure of location is called the sample median. The basic idea
is easily described using the example based on the weight of trout. The observed
weights were
1.1,2.3,1.7,0.9,3.1.
0.9,1.1,1.7,2.3,3.1.
Notice that the value 1.7 divides the observations in the middle in the sense that
half of the remaining observations are less than 1.7 and half are larger.
The sample median in this case is taken to be the average of 2.6 and 2.7, namely
(2.6 + 2.7)/2 = 2.65.
Problems
4. Find the mean and median of the following sets of numbers. (a) −1, 03, 0, 2, −5.
(b) 2, 2, 3, 10, 100, 1,000.
5. The final exam scores for 15 students are 73, 74, 92, 98, 100, 72, 74, 85, 76, 94,
89, 73, 76, 99. Compute the mean and median.
7. Consider the ten values 3, 6, 8, 12, 23, 26, 37, 42, 49, 63. The mean is X¯ = 26.9.
(a) What is the value of the mean if the largest value, 63, is increased to 100?
(b) What is the mean if 63 is increased to 1,000? (c) What is the mean if 63 is increased to
10,000?
Page | 7
Measures of variation:
1.0 The range
The range is just the difference between the largest and smallest observations. In
symbols, it is X(n) −X(1).
7.5,8.0,8.0,8.5,9.0,11.0,19.5,19.5,28.5,31.0,36.0.
−9.5,−9.0,−9.0,−8.5,−8.0,−6.0,2.5,2.5,11.5,14.0,19.0.
Deviation scores reflect how far each observation is from the mean, but often it is
best to find a single numerical quantity that summarizes the amount of variation in
our data
The sample standard deviation is the (positive) square root of the variance, Ѕ.
Example 1
3,9,10,4,7,8,9,5,7,8.
Page | 8
The sum of the observations in the last column is
So,
Ѕ2 = 48/9 = 5.33.
Page | 9
GRAPHICAL SUMMARIES OF DATA:
The notation fx is used to denote the frequency or number of times the value x
occurs.
Plots of relative frequencies help add perspective on the sample variance, mean
and median.
n =∑ fx,
22222333333333333333333444444444444444444444
44455555555555555555555555556666666666666667
777777778888
Page | 10
The sample variance is
Problems
1. Based on a sample of 100 individuals, the values 1, 2, 3, 4, 5 are observed with
relative frequencies 0.2, 0.3, 0.1, 0.25, 0.15. Compute the mean, variance and
standard deviation.
2. Fifty individuals are rated on how open minded they are. The ratings have the
values 1, 2, 3, 4 and the corresponding relative frequencies are 0.2, 0.24, 0.4, 0.16,
respectively. Compute the mean, variance and standard deviation.
4. For a local charity, the donations in dollars received during the last month were
5, 10, 15, 20, 25, 50 having the frequencies 20, 30, 10, 40, 50, 5. Compute the
mean, variance and standard deviation.
5. The values 1, 5, 10, 20 have the frequencies 10, 20, 40, 30. Compute the mean,
variance and standard deviation.
Page | 11
Probability Theory
Roman letter is used to represent a random variable, the most common letter being X.
The set of all possible outcomes or values of X we might observe is called the sample
space.
The set of all possible outcomes of a random experiment is called the sample space
of the experiment. The sample space is denoted as S.
EXAMPLE 1:
Consider an experiment in which you select a plastic pipe, and measure its thickness.
Sample space as simply the positive real line because a negative value for thickness
cannot occur
S= R+ = { x│x>0 }
If it is known that all connectors will be between 10 and 11 millimeters thick, the
sample space could be
S= { x │10 < x < 11 }
Page | 12
If the objective of the analysis is to consider only whether a particular part is low,
medium, or high for thickness, the sample space might be taken to be the set of three
outcomes:
S = { yes, no }
A discrete random variable meaning that there are gaps between any value and the
next possible value.
A continuous random variable meaning that for any two outcomes, any value
between these two values is possible.
EXAMPLE 2:
If two connectors are selected and measured, the sample space is depending on the
objective of the study.
If the objective of the analysis is to consider only whether or not the parts conform
to the manufacturing specifications, either part may or may not conform. The sample
space can be represented by the four outcomes:
Page | 13
S = { yy, yn, ny, nn }
S = { 0, 1, 2 }
In random experiments in which items are selected from a batch, we will indicate
whether or not a selected item is replaced before the next one is selected. For
example, if the batch consists of three items {a, b, c} and our experiment is to select
two items without replacement, the sample space can be represented as
Events:
Some of the basic set operations are summarized below in terms of events:
• The union of two events is the event that consists of all outcomes that are contained in
either of the two events. We denote the union as E1UE2.
• The intersection of two events is the event that consists of all outcomes that are contained
in both of the two events. We denote the intersection as E1∩E2.
• The complement of an event in a sample space is the set of outcomes in the sample space
that are not in the event. We denote the component of the event E as É.
Page | 14
EXAMPLE 3:
Consider the sample space S {yy, yn, ny, nn} in Example 2. Suppose that the set of
all outcomes for which at least one part conforms is denoted as E1. Then,
E1 = { yy, yn, ny }
The event in which both parts do not conform, denoted as E2, contains only the single
outcome, E2{nn}. Other examples of events are E3 = Ø, the null set, and E4=S, the
sample space. If E5={yn, ny, nn},
E1 U E5 = S E1∩ E5 = { yn , ny } É1= { nn }
EXAMPLE 4:
Then,
E1 U E2 = { x │1 ≤ x < 118} and E1 ∩ E2 = { x │3 < x < 10}
Also,
Page | 15
EXAMPLE 5:
Samples of concrete surface are analyzed for abrasion resistance and impact
strength. The results from 50 samples are summarized as follows:
impact strength
High Low
abrasion resistance High 40 4
Low 1 5
Let A denote the event that a sample has high impact strength,
Let B denote the event that a sample has high abrasion resistance.
The event A ∩ B consists of the 40 samples for which abrasion resistance and impact
strength are high. The event Á consists of the 9 samples in which the impact strength
is low. The event A U B consists of the 45 samples in which the abrasion resistance,
impact strength, or both are high.
Page | 16
Venn diagrams are often used to describe relationships between events and sets.
E1∩E2 = Ø
The two events in Fig. 1(b) are mutually exclusive, whereas the two events in Fig. 1(a) are not. Additional results
involving events are summarized below. The definition of the complement of an event implies that
1 E¿ 2 ¿ E
The distributive law for set operations implies that
Page | 17
A 0 probability indicates an outcome will not occur. A probability of 1 indicates an
outcome will occur with certainty.
100 Elements
Fig. 2: Probability of the event E is the sum of the probabilities of the outcomes in E.
EXAMPLE 6:
A random experiment can result in one of the outcomes {a, b, c, d} with probabilities
0.1, 0.3, 0.5, and 0.1, respectively. Let A denote the event {a, b}, B the event {b, c,
d}, and C the event {d}.Then,
Page | 18
EXAMPLE 7:
A visual inspection of a defects location on concrete element manufacturing
process resulted in the following table:
The event that there is no defect in the inspected concrete elements, denoted as E1,
can be considered to be comprised of the single outcome,
E1= {0}.
EXAMPLE 8:
Suppose that a batch contains six parts with part numbers {a, b, c, d, e, f}. Suppose
that two parts are selected without replacement. Let E denote the event that the part
number of the first part selected is a. Then E can be written as E {ab, ac, ad, ae, af}.
The sample space can be counted. It has 30 outcomes. If each outcome is equally
likely,
Page | 19
ADDITION RULES
P( A U B ) = P( A ) + P( B ) - P( A ∩ B )
EXAMPLE 9:
The defects such as those described in Example 7 were further classified as either in
the “center’’ or at the “edge’’ of the concrete elements, and by the degree of damage.
The following table shows the proportion of defects in each category. What is the
probability that a defect was either at the edge or that it contains four or more
defects?
Let E1 denote the event that a defect contains four or more defects, and let E2 denote
the event that a defect is at the edge.
Page | 20
The requested probability is P (E1 U E2). Now, P (E1) = 0.15 and P (E2) = 0.28. Also,
from the table above, P (E1 ∩ E2) = 0.04
What is the probability that concrete surface contains less than two defects (denoted
as E3) or that it is both at the edge and contains more than four defects (denoted as
E4)?
The requested probability is P (E3 U E4). Now P (E3) = 0.6, and P (E4) = 0.03. Also,
E3 and E4 are mutually exclusive.
Page | 21
EXAMPLE 9:
Let X denote the pH of a sample. Consider the event that X is greater than 6.5 but
less than or equal to 7.8. This probability is the sum of any collection of mutually
exclusive events with union equal to the same range for X. One example is:
Another example is
Page | 22
Page | 23
CONDITIONAL PROBABILITY
In a manufacturing process, 10% of the parts contain visible surface flaws and 25%
of the parts with surface flaws are (functionally) defective parts. However, only 5%
of parts without surface flaws are defective parts. The probability of a defective part
depends on our knowledge of the presence or absence of a surface flaw.
Then, the probability of D given, or assuming, that a part has a surface flaw as
P(D│F). This notation is read as the conditional probability of D given F, and it is
interpreted as the probability that a part is defective, given that the part has a surface
flaw.
Page | 24
EXAMPLE 1:
Table 1 below provides an example of 400 parts classified by surface flaws and as
(functionally) defective. For this table the conditional probabilities match those
discussed previously in this section. For example, of the parts with surface flaws (40
parts) the number defective is 10.
Therefore,
and of the parts without surface flaws (360 parts) the number defective is 18.
Therefore,
Page | 25
EXAMPLE 2:
Again, consider the 400 parts in Table 1 above (example 1). From this table
Note that in this example all four of the following probabilities are different:
Here, P (D) and P (D│F) are probabilities of the same event, but they are computed
under two different states of knowledge.
The tree diagram in Fig. 1 can also be used to display conditional probabilities.
Page | 26
Permutations
EXAMPLE 3:
A printed circuit board has eight different locations in which a component can be
placed. If four different components are to be placed on the board, how many
different designs are possible?
Each design consists of selecting a location from the eight locations for the first
component, a location from the remaining seven for the second component, a
location from the remaining six for the third component, and a location from the
remaining five for the fourth component. Therefore,
Page | 27
Combinations
Another counting problem of interest is the number of subsets of r elements that can
be selected from a set of n elements. Here, order is not important.
EXAMPLE 4:
A printed circuit board has eight different locations in which a component can be
placed. If five identical components are to be placed on the board, how many
different designs are possible? Each design is a subset of the eight locations that are
to contain the components. From the Equation above, the number of possible designs
is
The following example uses the multiplication rule in combination with the above
equation to answer a more difficult, but common, question.
Page | 28
EXAMPLE 5:
A subset containing exactly two defective parts can be formed by first choosing the
two defective parts from the three defective parts.
Then, the second step is to select the remaining four parts from the 47 acceptable
parts in the bin. The second step can be completed in
Therefore, from the multiplication rule, the number of subsets of size six that contain
exactly two defective items is
3 * 178,365 = 535,095
As an additional computation, the total number of different subsets of size six is
found to be
Therefore, the probability that a sample contains exactly two defective parts is
Page | 29
Page | 30
Distributions
Discrete Distributions:
Continuous Distributions:
Page | 31
Page | 32
Definition:
BINOMIAL DISTRIBUTION:
Definition:
EXAMPLE 1:
Each sample of water has a 10% chance of containing a particular organic pollutant.
Assume that the samples are independent with regard to the presence of the pollutant.
Find the probability that in the next 18 samples, exactly 2 contain the pollutant. Let
X the number of samples that contain the pollutant in the next 18 samples analyzed.
Then X is a binomial random variable with p= 0.1 and n= 18. Therefore,
Page | 33
Determine the probability that at least four samples contain the pollutant?
The mean and variance of a binomial random variable depend only on the parameters
p and n.
Page | 34
EXERCISES:
1. For each scenario described below, state whether or not the binomial distribution is a reasonable
model for the random variable and why. State any assumptions you make.
(a) A production process produces thousands of temperature transducers. Let X denote the number
of nonconforming transducers in a sample of size 30 selected at random from the process.
(c) Four identical electronic components are wired to a controller that can switch from a failed
component to one of the remaining spares. Let X denote the number of components that have failed
after a specified period of operation.
(d) Defects occur randomly over the surface of a semiconductor chip. However, only 80% of
defects can be found by testing. A sample of 40 chips with one defect each is tested. Let X denote
the number of chips in which the test finds a defect.
2. The random variable X has a binomial distribution with n=10 and p=0.5. Determine the
following probabilities:
(a) P(X = 5) (b) P(X ≤ 2) (c) P(X ≥ 9) (d) P (3 ≤ X < 5)
3. Sketch the probability mass function of a binomial distribution with n =10 and p = 0.01 and
comment on the shape of the distribution.
(a) What value of X is most likely? (b) What value of X is least likely?
4. Batches that consist of 50 concrete blocks from a production process are checked for
conformance to building requirements. The mean number of nonconforming concrete blocks in a
batch is 5. Assume that the number of nonconforming concrete blocks in a batch, denoted as X, is
a binomial random variable.
(a) What are n and p? (b) What is P(X ≤ 2)? (c) What is P(X ≥ 49)?
5. A manufacturing process has 100 customer orders to fill. Each order requires one component
part that is purchased from a supplier. However, typically, 2% of the components are identified as
defective, and the components can be assumed to be independent.
a) If the manufacturer stocks 100 components, what is the probability that the 100 orders can
be filled without reordering components?
b) If the manufacturer stocks 102 components, what is the probability that the 100 orders can
be filled without reordering components?
c) If the manufacturer stocks 105 components, what is the probability that the 100 orders can
be filled without reordering components?
(This exercise illustrates that poor quality can affect schedules and costs).
Page | 35
POISSON DISTRIBUTION:
EXAMPLE 2:
For the case of the thin copper wire, suppose that the number of flaws follows a
Poisson distribution with a mean of 2.3 flaws per millimeter. Determine the
probability of exactly 2 flaws in 1 millimeter of wire. Let X denote the number of
flaws in 1 millimeter of wire. Then, E(X) = 2.3 flaws and
Therefore,
Page | 36
EXERCISES:
Page | 37
Density of a loading on a Probability determined from the area
long, thin beam under f(x)
Definition:
For the density function of a loading on a long thin beam, because every point has
zero width, the loading at any point is zero. Similarly, for a continuous random
variable X and any value x.
P(X= x) = 0
Page | 38
EXAMPLE:
Let the continuous random variable X denote the diameter of a hole drilled in a sheet
metal component. The target diameter is 12.5 mm. Most random disturbances to the
process result in larger diameters. Historical data show that the distribution of X can
be modeled by a probability density function f (x) = 20 e -20(x-12.5), x ≥ 12.5.
If a part with a diameter larger than 12.60 millimeters is scrapped, what proportion
of parts is scrapped? The density function and the requested probability are shown
in Fig. 2. A part is scrapped if X ≥ 12.60. Now,
`
What proportion of parts is between 12.5 and 12.6 millimeters? Now,
Because the total area under f (x) equals 1, we can also calculate
Page | 39
EXERCISES:
Page | 40
NORMAL DISTRIBUTION:
Normal probability density functions for selected values of the parameters µ and σ2
Definition:
EXAMPLE 4:
Assume that the current measurements in a strip of wire follow a normal distribution
with a mean of 10 mA and a variance of 4 (mA)2. What is the probability that a
measurement exceeds 13 mA?
Let X denote the current in mA. The requested probability can be represented as:
P(X > 13)
Page | 41
This probability is shown as the shaded area under the normal probability density
function in Fig. 3.
3:
Some useful results concerning a normal distribution are summarized below and in
Fig. 4. For any normal random variable,
4:
Definition:
Page | 42
Summary of Common Probability Distributions
Page | 43
Page | 44
Page | 45
Page | 46
Page | 47
EXAMPLE 5:
The following calculations are shown pictorially in Fig. 5.
Page | 48
EXAMPLE 6:
Suppose the current measurements in a strip of wire are assumed to follow a normal
distribution with a mean of 10 mA and a variance of 4 (mA)2. What is the probability
that a measurement will exceed 13 mA?
We note that X> 13 corresponds to Z> 1.5. Therefore, from Appendix Table II,
Page | 49
EXAMPLE 7: Continuing the previous example, what is the probability that a
current measurement is between 9 and 11 mA?
Determine the value for which the probability that a current measurement is below
this value is 0.98. The requested value is shown graphically in the figure below. We
need the value of x such that P(X < x) = 0.98. By standardizing, this probability
expression can be written as
Appendix Table II is used to find the z-value such that P (Z < z) = 0.98. The nearest
probability from Table II results in
Page | 50
EXAMPLE 8: The diameter of a shaft in an optical storage drive is normally
distributed with `mean 0.2508 inch and standard deviation 0.0005 inch. The
specifications on the shaft are 0.2500 ± 0.0015 inch. What proportion of shafts
conforms to specifications?
Let X denote the shaft diameter in inches. The requested probability is shown in the
figure below and
Most of the nonconforming shafts are too large, because the process mean is located
very near to the upper specification limit. If the process is centered so that the process
mean is equal to the target value of 0.2500,
Page | 51
EXERCISES:
Page | 52
SAMPLING THEORY
Link between Population and Sampling:
Probability
Population Sample
Statistics
For instance, the mean fill volume of a can (population) is required to be 300 mm.
x‾ = 298 mm
The engineer will probably decide that the population mean is µ=300 mm,
even though the sample mean was 298 mm because he or she knows that the sample
mean is a reasonable estimate of µ and that a sample mean of 298 mm is very likely
to occur, even if the true population mean is µ=300 mm.
Page | 53
The sampling distribution of a statistic depends on:
1. Random sampling
2. Systematic sampling
3. Stratified sampling
4. Multi-stage sampling
Suppose that a random sample of size n is taken from a normal population with
mean µ and variance σ2.
Now each observation in this sample, say, X1, X2, X3… Xn, is a normally and
independently distributed random variable with mean µ and variance σ2
and variance:
Page | 54
(For large N)
Theorem:
EXAMPLE 1:
An electronics company manufactures resistors that have a mean resistance of 100 ohms
and a standard deviation of 10 ohms. The distribution of resistance is normal.
Find the probability that a random sample of n= 25 resistors will have an average resistance
less than 95 ohms.
Note that the sampling distribution of x‾ is normal, with mean µx‾ = 100 ohms and a
standard deviation of:
Page | 55
Standardizing the point x‾ =95 in the Figure. We find that:
and therefore,
Let the first population has mean µ1 and variance σ12 and the second population has mean µ2 and
variance σ22. Suppose that both populations are normally distributed. Then, we can say that the
sampling distribution of (x1‾ - x2‾) is normal with mean:
And variance
Page | 56
If we have two independent populations with means µ1 and µ2 and variances σ12 and
σ22 and if x1‾ and x2‾ are the sample means of two independent random samples of
sizes n1 and n2 from these populations, then the sampling distribution is:
EXAMPLE 2:
The effective life of a component used in an engine is a random variable with mean 5000 hours
and standard deviation 40 hours. The distribution of effective life is fairly close to a normal
distribution.
The engine manufacturer introduces an improvement into the manufacturing process for
this component that increases the mean life to 5050 hours and decreases the standard deviation to
30 hours. Suppose that a random sample of n1= 16 components is selected from the “old”
process and a random sample of n2=25 components is selected from the “improved” process.
What is the probability that the difference in the two sample means x2‾ - x1‾ is at least 25
hours? Assume that the old and improved processes can be regarded as independent populations.
the distribution of x1‾ is normal with mean µ1= 5000 hours and standard deviation
and the distribution of x2‾ is normal with mean µ2= 5050 hours and standard deviation
and variance
Page | 57
The sampling distribution of in Example 2
The probability that x2‾ - x1‾ ≥ 25 hours is the shaded portion of the normal
distribution in this figure.
So,
Page | 58
EXERCISES:
1. PVC pipe is manufactured with a mean diameter of 1.01 inch and a standard
deviation of 0.003 inch. Find the probability that a random sample of n = 9 sections
of pipe will have a sample mean diameter greater than 1.009 inch and less than 1.012
inch.
2. A synthetic fiber used in manufacturing carpet has tensile strength that is normally
distributed with mean 75.5 psi and standard deviation 3.5 psi. Find the probability
that a random sample of n= 6 fiber specimens will have sample mean tensile strength
that exceeds 75.75 psi.
3. A random sample of size n1= 16 is selected from a normal population with a mean
of 75 and a standard deviation of 8. A second random sample of size n2= 9 is taken
from another normal population with mean 70 and standard deviation 12. Let x1‾ and
x2‾ be the two-sample means. Find
Page | 59
REGRESSION & CORRELATION
Many problems in engineering and science involve exploring the relationships between
two or more variables. Regression analysis is a statistical technique that is very useful for these
types of problems.
For example, in a chemical process, suppose that the yield of the product is related to the
process-operating temperature. Regression analysis can be used to build a model to predict yield
at a given temperature level. This model can also be used for process optimization, such as finding
the level of temperature that maximizes yield, or for process control purposes.
The case of simple linear regression considers a single predictor independent variable x
and a dependent or response variable Y. Suppose that the true relationship between Y and x is a
straight line and that the observation Y at each level of x is a random variable.
Page | 60
The expected value of Y, can be described by the model:
where the intercept β0 and the slope β1 are unknown regression coefficients.
ε is a random error with mean zero
We call this criterion for estimating the regression coefficients the method of least
squares. We may express the n observations in the sample as
and the sum of the squares of the deviations of the observations from the true regression line is
Page | 61
The solution to the normal equations results in the least squares estimators β0 and β1:
∑ y¡ ∑ x¡ ² − ∑ x¡ ∑ x¡ y¡
β₀ =
n ∑ x¡ ² − (∑ x¡ )²
n ∑ x¡ y¡ − ∑ x¡ ∑ y¡
β₁ =
n ∑ x¡ ² − (∑ x¡ )²
yi = β0 + β1 xi + ei i= 1, 2, ………, n
where ei = yi - yˆi is called the residual. The residual describes the error in the fit of the model to
the ith observation yi.
Let:
and
EXAMPLE 1: We will fit a simple linear regression model to the oxygen purity data in Table 1.
The following quantities may be computed:
Page | 62
∑ y¡ ∑ x¡ ² − ∑ x¡ ∑ x¡ y¡
β₀ =
n ∑ x¡ ² − (∑ x¡ )²
n ∑ x¡ y¡ − ∑ x¡ ∑ y¡
β₁ =
n ∑ x¡ ² − (∑ x¡ )²
β0 = 74.283
β1 = 14.947
As a double check:
y‾ =? β0 + β1 x‾
So,
92.160 =? 74.283 + 14.947 * 1.196 if yes then continue
If not then re-check your calculations
The fitted simple linear regression model (with the coefficients reported to three decimal places)
is:
Page | 63
3
The purity 89.23% may be interpreted as an estimate of the true population mean purity
when x=1.00%, or as an estimate of a new observation when x = 1.00%. These estimates are, of
course, subject to error; that is, it is unlikely that a future observation on purity would be exactly
89.23% when the hydrocarbon level is 1.00%. In subsequent sections we will see how to use
confidence intervals and prediction intervals to describe the error in estimation from a regression
model.
Page | 64
2.0 Correlation
H.W No. 1:
The accompanying data was taken from published paper. The independent variable
is SO2 deposition rate (mg/m2/day) and the dependent variable is steel weight loss
(gm/m2).
a) Construct a scatter plot. Dose the simple linear regression model appear to
be reasonable in this situation?
b) Calculate the equation of the estimated regression line?
c) Estimate the standard deviation of observation about the true regression
line.
H.W No. 2:
The accompanying data resulted from a study carried out to examine the
relationship between a measure of the corrosion of reinforcement (y) and the
concentration of the corrosion inhibitor solution in concrete pores (x, in ppm):
x: 2.5, 5.03, 7.6, 11.6, 13, 19.6, 26.2, 33, 40, 50, 55
y: 7.68, 6.95, 6.3, 5.75, 5.01, 1.43, 0.93, 0.72, 0.68, 0.65, 0.56
a. Construct a scatter plot of the data. Dose the simple linear regression appear
to be logical?
b. Calculate the equation of the estimated regression line, use it to predict the
value of the corrosion rate that would be observed for a concentration of 33
ppm, and calculate corresponding residual.
c. Estimate the standard deviation of observation about the true regression line.
Page | 65
H.W No. 3:
The accompanying data resulted from a car factory study carried out to examine
the relationship between work hours (y) and work injuries(x), the data below
shows the results:
a. Construct a scatter plot of the data. Dose the simple linear regression appear
to be logical?
b. Calculate the equation of the estimated regression line, (Y=b0+b1X)?
c. Use it to predict the value of the work hours that would be estimated for a
work injury of 20?
Page | 66