Lecture 10 Regression Analysis
Lecture 10 Regression Analysis
Chapter
13
11
ARSHAD ALI
9
7
5
20
30
40
50
60
70
Regression
One of the main purposes of curve fitting is to estimate one of the variables
(the dependent variable) from the other (the independent variable). The
process of estimation is often referred to as regression. If y is to be
estimated from x by means of some equation, we call the equation a
regression equation of y on x and the corresponding curve a regression
curve of y on x.
11/02/2015
Generally, more than one curve of a given type will appear to fit a set of data. To
avoid individual judgment in constructing lines, parabolas, or other approximating
curves, it is necessary to agree on a definition of a best fitting line, best-fitting
parabola, etc.
To motivate a possible definition, consider following figure in which the data points
are (x1, y1), . . . , (xn, yn).
For a given value of x, say, x1, there will be a difference between the value y1and the
corresponding value as determined from the curve C. We denote this difference by d1,
which is sometimes referred to as a deviation, error, or residual and may be positive,
negative, or zero. Similarly, corresponding to the values x2, . . . , xn, we obtain the
deviations d2, . . . , dn
10
y na b x
xy a x b x
CIVIL ENGINEERING BUITEMS, QUETTA
11
12
11/02/2015
13
14
indicate a
17
16
15
Positive Correlation
18
11/02/2015
19
20
22
Example Problem
Average Hourly
Week Temperature x
(deg F)
Weekly Fuel
Consumption y
(MMcf)
28
12.4
28
11.7
32.5
12.4
39
10.8
45.9
9.4
57.8
9.5
58.1
FUELCON
21
Regression
10
9
8
7
6
5
20
30
40
50
60
70
TEMP
xy ax bx 2
y a bx
y a bx
xy ax bx
xy ax bx
y na b x
xy a x b x
7.5
23
Example Problme
62.5
24
y na b x
xy a x b x
11/02/2015
y na b x
y
x2
12.4
28.0
784.00
347.20
11.7
28.0
784.00
327.60
12.4
32.5
1056.25
403.00
10.8
39.0
1521.00
421.20
9.4
45.9
2106.81
431.46
9.5
57.8
3340.84
549.10
8.0
58.1
3375.61
464.80
7.5
62.5
3906.25
468.75
81.7
351.8
16874.8
3413.1
xy
81.7 8 a 351.8 b
xy a x b x
25
Example Problem
a 15.84
b 0.1289
y x x xy
2
26
Example Problem
2nd
degree polynomial
xy ax bx 2 cx 3
xy ax bx 2 cx 3
y a bx cx 2
xy ax bx
y a bx cx
2
y a bx cx
y na b x c x
xy ax bx
cx 3
xy ax bx cx
2
xy a x b x
c x3
cx 3
xy ax bx cx
2
y a x 2 b x 3 c x 4
y 15.84 0.1289 x
CIVIL ENGINEERING BUITEMS, QUETTA
x2
x3
x4
xy
x2y
28.0
12.4
784.00
21952.00
614656.00
347.20
9721.6
28.0
11.7
784.00
21952.00
614656.00
327.60
9172.8
32.5
12.4
1056.25
34328.13
1115664.06
403.00
13097.5
39.0
10.8
1521.00
59319.00
2313441.00
421.20
16426.8
45.9
9.4
2106.81
96702.58
4438648.38
431.46
19804.01
57.8
9.5
3340.84
193100.55
11161211.91
549.10
31737.98
58.1
8.0
3375.61
196122.94
11394742.87
464.80
27004.88
62.5
7.5
3906.25
244140.63
15258789.06
468.75
29296.88
351.8
81.7
16874.8
867617.8
46911809.3
3413.1
156262.4
x y
xy x
27
28
x2
x3
x4
xy
x2y
81.7
351.8
16874.8
867617.8
46911809.3
3413.1
156262.4
y na b x c x
xy a x b x
x
c x3
y a x 2 b x 3 c x 4
a 15.63
b 0.118
c 1.145
351.8
16874.8 a
81.7 8
34131.1 351.8
16874.8
867617.8 b
156262.4 16874.8 867617.8 46911809.3 c
29
y a bx cx 2
y 15.63 0.118 x 1.145 x 2
30
a 15.63
b 0.118
c 1.145
11/02/2015
35
33
31
32
34
36
11/02/2015
12x16y=5
38
39
37
40
16x+106y=47
PROBLEM
41
42
Ten steel wires of diameter 0.5 mm and length 2.5 m were extended in a laboratory
by applying vertical forces of varying magnitudes. Results are as follows:
F kg
15
19
25
35
42
48
53
56
62
65
mm
1.7
2.1
2.5
3.4
3.9
4.9
5.4
5.7
6.6
7.2
(a)
Estimate the parameters of a simple linear regression model with force as the
explanatory variable.
(b)
(c)
Make Microsoft Excel calcs of the same and compare with the manual Clacs
11/02/2015
USING MICROSOFT
EXCEL
Step 2
Go to insert
44
Step 1
Enter Data in Microsoft Excel
45
43
46
Select Data
47
48
7.2
5.7
6.6
5
5.4
3.4
4.9
3.9
2.1
3
2
2.5
Step 3
1.7
0
Left Click on scatter data, right click & select
Add Data Labels
CIVIL ENGINEERING BUITEMS, QUETTA
10
20
30
40
50
60
70
11/02/2015
Step 4
Right Click on data points and
Select Add Trend line
7.2
5.7
6.6
5
5.4
3.4
4.9
3.9
2.1
50
Step 5
Select Linear Model
51
49
52
2
2.5
1.7
0
0
10
20
30
40
50
60
Step 6
70
Step 7
Check Display Eq
Check R squared
53
54
7.2
y = 0.1062x - 0.1212
R = 0.9833
6
mm
5.7
6.6
5.4
3.4
4.9
3.9
2.1
3
2
2.5
1.7
0
0
10
20
30
40
50
60
70
Force (kg)
CIVIL ENGINEERING BUITEMS, QUETTA
11/02/2015
Linear Regression
Equation
55
y a bx
y a bx
y
na b x i
x y
a xi b xi2
15
yi
xi yi
xi2
1.7
25.5
225
19
2.1
39.9
361
25
2.5
62.5
625
35
3.4
119
1225
42
3.9
163.8
1764
48
4.9
235.2
2304
53
5.4
286.2
2809
56
5.7
319.2
3136
62
6.6
409.2
3844
Regression Equations
y
x y
i
7.2
468
4225
43.4
2128.5
20518
x y
a xi b xi2
65
na b x i
Substituting Values
420
58
217
10a 420b
5
4257
420a 20518b
2
i
Multiply10a 420b
217
by 42
5
9114
420a 17640b
5
therefore the set of equations become
9114
420a 17640b
5
4257
420a 20518b
2
59
57
xi
56
60
Subtructing as follows
4257
2
9114
420a 17640b
5
-
420a 20518b
2878b
3057
10
3057
0.10622
28780
10
11/02/2015
420a 17640
3057 9114
28780
5
420a
9114
3057
17640
5
28780
420a
9114 2696274
5
1439
420a
y= a + bx
y 0.1212 0.10622 x
1.7
19
2.1
25
2.5
35
3.4
42
3.9
48
4.9
53
5.4
56
5.7
62
6.6
65
7.2
63
Y 0.1212 0.10622 x
15
50 2100
50 41036 2100 840 287800
840 41036
217 2100
217 41036 2100 4257 34888
4257 41036
64
50a 2100b
217
840a 41036b 4257
217 2100
Da
4257 41036
65
50 2100
D
840 41036
det ( Dy )
366324 1
4361
0.1212
7195 420 35975
det ( Dx )
62
366324
7195
det ( D)
61
9114
420a 17640 b
5
50 217
Db
840 4257
66
det ( Dx ) 34888
4361
0.1212
det ( D) 287800
35975
det ( Dy )
det ( D)
30570
3057
0.10622
287800 28780
50 217
50 4257 217 840 30570
840 4257
11
11/02/2015
Correlation
67
Defined as
covariance x, y
Var x Var y
XY
X2
X2Y
X3
X4
15
1.7
25.5
225
382.5
3375
50625
19
2.1
39.9
361
758.1
6859
130321
25
2.5
62.5
625
1562.5
15625
390625
35
3.4
119
1225
4165
42875
1500625
42
3.9
163.8
1764
6879.6
74088
3111696
48
4.9
235.2
2304
11289.6
110592
5308416
53
5.4
286.2
2809
15168.6
148877
7890481
56
5.7
319.2
3136
17875.2
175616
9834496
62
6.6
409.2
3844
25370.4
238328
14776336
65
7.2
468
4225
30420
274625
17850625
420
43.4
2128.5
20518
113871.5
1090860
60844246
xy
xy2
x2 y2
x 2 x 2 y 2 y 2
i i i i
n
n n
n
n XY X Y
n X 2 X 2 n Y 2 Y 2
69
x y x
i
Simplifies to
68
Coefficient of Correlation
70
Coefficient of Correlation
n XY X Y
n X 2 X 2 n Y 2 Y 2
Substituting
20518 , Y 2 221.38
0.9916 or 99.16%
PROBLEM
71
72
Problem
Consider the lab-measured water content in randomly selected field soil specimens
from a particular site, and the corresponding water content. estimated by a fast,
inexpensive method which measures the gas pressure created when the soil is
mixed with a chemical which reacts with water. If it is sufficiently accurate, the
second method will provide an inexpensive way to obtain more frequent watercontent samples during the quality control of soil compaction on a highway project.
The purpose is to predict true water content given an observed value of the fast
test. Find the estimated regression line using the following data
12
11/02/2015
na b x i
x y
a xi b xi2
x 13.8
x
x
Substituting Values
We get
13.8
13.8
67
x y
13145 , x i 13260
13.8 67
73
74
67 a 924.6 b 924.6
924.6 a 13260 b 13145
STEP 1: Get rid of the fractions.by multiplying first equation by 5 and second
equation by 10. After multiplying we have the following system:
924.6
335 x 4623 y
4623
9246 x 132600 y 131455
Similarly
924.6 67 a b 924.6
924.6
STEP 2: Find coefficient matrix (D), X matrix (Dx) and Y matrix (Dy). In this
example we have.
4623
335
D
9246 132600
4623
4623
Dx
131455 132600
75
4623
335
Dy
9246 131455
det ( Dx ) 5293335
3.1569
det ( D ) 1676742
det ( D y )
det ( D )
1293167
0.771
1676742
y a bx
CIVIL ENGINEERING BUITEMS, QUETTA
76
det ( D )
335
4623
335 132600 4623 9246 1676742
9246 132600
det ( Dx )
4623
4623
4623 132600 4623 131455 5293335
131455 132600
det ( D y )
335
4623
335 131455 4623 9246 1293167
9246 131455
77
78
PROBLEM
y 3.1569 0.771x
ARSHAD ALI ([email protected])
13
11/02/2015
80
Scatter Diagram
80
70
Arsenic remove d %
79
60
50
40
30
20
10
6.5
7.5
X 7.01 7.11 7.12 7.24 7.94 7.94 8.04 8.05 8.07 8.9 8.94 8.95 8.97 8.98 9.85 9.86 9.86 9.87
Y
60 67 66 52 50 45 52 48 40 23 20 40 31 26
22 13
X 152.7
Y 671
X Y 5380.84
, X 2 1312.6764
9.5
10
10.5
Y
60
67
66
52
50
45
52
48
40
23
20
40
31
26
9
22
13
7
XY
420.6
476.37
469.92
376.48
397
357.3
418.08
386.4
322.8
204.7
178.8
358
278.07
233.48
88.65
216.92
128.18
69.09
83
81
8.5
x p-H
82
XX
49.1401
50.5521
50.6944
52.4176
63.0436
63.0436
64.6416
64.8025
65.1249
79.21
79.9236
80.1025
80.4609
80.6404
97.0225
97.2196
97.2196
97.4169
YY
3600
4489
4356
2704
2500
2025
2704
2304
1600
529
400
1600
961
676
81
484
169
49
84
y
x y
i
na b x i
a xi b x
n 18
2
i
X 152.7
Y 671
671 18 a 152.7 b
5380.84 152.7 a 1312.6764 b
X Y 5380.84
, X 2 1312.6764
31231
31231
14
11/02/2015
18 a 152.7 b 671
152.7 a 1312.6764 b 5380.84
18a
1527
b 671
10
85
Re organizing Equations
86
STEP 1: Get rid of the fractions.by multiplying first equation by 10 and second
equation by 2500.
1527
3281691
134521
a
b
10
2500
25
STEP 2: Find coefficient matrix (D), X matrix (Dx) and Y matrix (Dy). In this
example we have.
1527
180
D
381750 3281691
1527
6710
Dx
13452100 3281691
87
6710
180
Dy
381750 13452100
det ( Dx ) 1478789910
a
190.27
det ( D)
7772130
det ( Dy )
det ( D)
140164500
18.03
7772130
88
det ( D)
180
1527
180 3281691 1527 381750 7772130
381750 3281691
det ( Dx )
6710
1527
6710 3281691 1527 13452100 1478789910
13452100 3281691
det ( Dy )
180
6710
180 13452100 6710 381750 140164500
381750 13452100
89
90
Y X X XY
n X X
2
n XY X Y
n X 2 X
18.034
CIVIL ENGINEERING BUITEMS, QUETTA
15
11/02/2015
y
y
91
a b x
190.268 18.034 x
92
Correlation Coefficient
n XY X Y
n X 2 X 2 n Y 2 Y 2
0.9505
CIVIL ENGINEERING BUITEMS, QUETTA
PROBLEM
95
94
77.9 76.8 81.5 79.8 78.2 78.3 77.5 77 80.1 80.2 79.9 79 76.7 78.2 79.5 78.1 81.5 77
Scatter Plot
93
96
XY
XX
YY
125.3
77.9
9760.87
15700.09
6068.41
98.2
76.8
7541.76
9643.24
5898.24
201.4
81.5
16414.1
40561.96
6642.25
147.3
79.8
11754.54
21697.29
6368.04
145.9
78.2
11409.38
21286.81
6115.24
124.7
78.3
9764.01
15550.09
6130.89
112.2
77.5
8695.5
12588.84
6006.25
120.2
77
9255.4
14448.04
5929
161.2
80.1
12912.12
25985.44
6416.01
178.9
80.2
14347.78
32005.21
6432.04
159.5
79.9
12744.05
25440.25
6384.01
145.8
79
11518.2
21257.64
6241
75.1
76.7
5760.17
5640.01
5882.89
151.4
78.2
11839.48
22921.96
6115.24
144.2
79.5
11463.9
20793.64
6320.25
125
78.1
9762.5
15625
6099.61
198.8
81.5
16202.2
39521.44
6642.25
132.5
77
10202.5
17556.25
5929
159.6
79
12608.4
25472.16
6241
110.7
78.6
8701.02
12254.49
6177.96
79 78.6
16
11/02/2015
X 2817.9
Y 1574.8
X Y 222657.88
X
97
y
x y
i
na b x i
a xi b x
n 20
X 2817.9
2
i
Y 1574.8
Substituting the values of
X Y 222657.88
1574.8 20 a 2817.9 b
415949.85
98
X
Y
28179
7874
20 x
y
10
5
20 a 2817.9 b 1574.8
2817.9 a 415949.85 b 222657.88
28179
8318997
5566447
x
y
10
20
25
99
STEP 2: Multiply each equation by a number that will create opposite coefficients
for x. In this case we will multiply first equation by -28179 and the second equation
by 20.
37843659b 1552868
STEP 5: substitute the value for b into the original equation to solve for a.
10 37843659
5
1533899096
21024255
a 72.96
1552868
0.041
37843659
101
124039.58
100
Solving for a
415949.85
124039.58
20a
102
y a bx
y 72.96 0.041x
17
11/02/2015
PROBLEM
Values of modulus of elasticity (MOE, the ratio of stress, i.e., force per unit area, to
strain, i.e., deformation per unit length, in GPa) and flexural strength (a measure of
the ability to resist failure in bending, in MPa) were determined for a sample of
concrete beams of a certain type, resulting in the following data (read from a graph
in the article Effects of Aggregates and Microfillers on the Flexural Properties of
Concrete, Magazine of Concrete Research, 1997: 8198):
MOE
29.8 33.2 33.7 35.3 35.5 36.1 36.2 36.3 37.5 37.7 38.7 38.8 39.6 41 42.8 42.8 43.5 45.6 46 46.9 48 49.3 51.7 62.6 69.8 79.5
Strength 5.9
7.2
7.3
6.3
8.1
6.8
7.6
6.8
6.5
6.3
7.9
8.2
8.7
7.8
80
104
Problem Statement
12
30
36
40
45
57
62
67
71
78
93
94
100
105
3.3
3.2
3.4
2.8
2.9
2.7
2.6
2.5
2.6
2.2
2.3
2.1
105
Problem Statement
103
Use the accompanying Minitab output to obtain the equation of the least squares
line for predicting strength from modulus of elasticity, and then predict strength for a
106
Problem Statement
No-fines concrete, made from a uniformly graded coarse aggregate and a
cement-water paste, is beneficial in areas prone to excessive rainfall because of
its excellent drainage properties. The article Pavement Thickness Design for NoFines Concrete Parking Lots, J. of Trans. Engr., 1995: 476484) employed a
least squares analysis in studying how y = porosity (%) is related to x = unit
weight (pcf) in concrete specimens. Consider the following representative data:
x
99
101.1
102.7
103
105.4
107
108.7
110.8
112.1
112.4
113.6
113.8
115.1
115.4
120
28.8
27.9
27
25.2
22.8
21.5
20.9
19.6
17.1
18.9
16
16.7
13
13.6
10.8
Obtain the equation of the estimated regression line. Then create a scatter plot of
the data and graph the estimated line. Does it appear that the model relationship
will explain a great deal of the observed variation in y? What happens if the
estimated line is used to predict porosity when unit weight is 135? Why is this not
a good idea?
beam whose modulus of elasticity is 40. Would you feel comfortable using the least
squares line to predict strength when modulus of elasticity is 100? Explain.
CIVIL ENGINEERING BUITEMS, QUETTA
For the past decade, rubber powder has been used in asphalt cement to improve
performance. The article Experimental Study of Recycled Rubber-Filled HighStrength Concrete (Magazine of Concrete Res., 2009: 549556) includes a
regression of y = axial Strength Mpa, x = cube strength in Mpa based on the
following sample data: Obtain the Regression line and Coefficient of Correlation
from the given data
x
112.3
97
92.7
86
102
99.2
95.8
103.5
89
86.7
75
71
57.7
48.7
74.3
73.3
68
59.3
57.8
48.5
107
Problem Statement
108
Problem Statement
Wrinkle recovery angle and tensile strength are the two most important
characteristics for evaluating the performance of cross-linked cotton fabric. An
increase in the degree of crosslinking, as determined by ester carboxyl band
absorbance, improves the wrinkle resistance of the fabric (at the expense of
reducing mechanical strength). The accompanying data on x = absorbance
and y= wrinkle resistance angle was read from a graph in the paper Predicting the
Performance of Durable Press Finished Cotton Fabric with Infrared Spectroscopy
(Textile Res. J., 1999: 145151).
x
y
0.115 0.126 0.183 0.246 0.282 0.344 0.355 0.452 0.491 0.554 0.651
334
342
355
363
365
372
381
392
400
412
420
18
11/02/2015
Calcium phosphate cement is gaining increasing attention for use in bone repair
applications. The article Short-Fibre Reinforcement of Calcium Phosphate Bone
Cement (J. of Engr. in Med., 2007: 203211) reported on a study in which
polypropylene fibers were used in an attempt to improve fracture behavior. The
following data x = fiber weight %age on and y = compressive strength (MPa) was
provided by the articles authors.
x
1.25
1.25
1.25
9.94
11.67
11
13.44
9.2
9.92
9.79
1.25
2.5
2.5
2.5
2.5
8.69
9.91
10.45
2.5
7.5
7.5
7.5
7.5
10
10
10
10
10.25
7.89
7.61
8.07
9.04
6.63
6.43
7.03
7.63
7.35
6.94
7.02
7.67
a. Fit the simple linear regression model to this data. Then determine the
proportion of observed variation in strength that can be attributed to the model
relationship between strength and fiber weight.
b. The average strength values for the six different levels of fiber weight are 11.05,
10.51, 10.32, 8.15, 6.93, and 7.24, respectively. The cited paper included a
figure in which the average strength was regressed against fiber weight.
c. Obtain the equation of this regression line and calculate the corresponding
coefficient of determination. Explain the difference between the r2 value for this
regression and the r2 value obtained in (a).
Variations in clay brick masonry weight have implications not only for structural and
acoustical design but also for design of heating, ventilating, and air conditioning
systems. The article Clay Brick Masonry Weight Variation (J. of Architectural Engr.,
1996: 135137) gave a scatter plot of y = mortar dry density (lb/ft3 ) versus x = mortar
air content (%) or a sample of mortar specimens, from which the following
representative data was read:
x
5.7
6.8
9.6
10
10.7
12.6
14.4
15
119
121.3
118.2
124
112.3
114.1
112.2
115.1
15
16.2
17.8
19
19.7
20.6
25
111
107.2
108.9
108
111
106.2
105
Corrosion of steel reinforcing bars is the most important durability problem for
reinforced concrete structures. Carbonation of concrete results from a chemical
reaction that lowers the pH value by enough to initiate corrosion of the rebar.
Representative data on x = carbonation depth (mm) and y = strength (MPa) for a
sample of core specimens taken from a particular building follows (read from a plot in
the article The Carbonation of Concrete Structures in the Tropical Environment of
Singapore, Magazine of Concrete Res., 1996: 293300).
x
y
x
y
8
22.8
38
19.5
15
27.2
40
12.4
16.5
23.7
45
13.2
20
17.1
50
11.4
20
21.5
50
10.3
27.5
18.6
55
14.1
30
16.1
55
9.7
30
23.4
59
12
35
13.4
65
6.8
The accompanying data was read from a graph that appeared in the article
Reactions on Painted Steel Under the Influence of Sodium Chloride, and
Combinations Thereof (Ind. Engr. Chem. Prod. Res. Dev., 1985: 375378). The
independent variable is SO2 deposition rate (mg/m2 /d), and the dependent
variable is steel weight loss (g/m2 ).
14
18
40
43
45
112
280
350
470
500
560
1200
a) Construct a scatter plot. Does the simple linear regression model appear to be
reasonable in this situation?
b) Calculate the equation of the estimated regression line.
c) What percentage of observed variation in steel weight loss can be attributed to the
model relationship in combination with variation in deposition rate?
d) Because the largest x value in the sample greatly exceeds the others, this observation
may have been very influential in determining the equation of the estimated line. Delete
this observation and recalculate the equation.
e) Does the new equation appear to differ substantially from the original one (you might
consider predicted values)?
ARSHAD ALI ([email protected])
112
Problem Statement
During oil drilling operations, components of the drilling assembly may suffer from sulfide
stress cracking. The article Composition Optimization of High-Strength Steels for
Sulfide Cracking Resistance Improvement (Corrosion Science, 2009: 28782884) reported
on a study in which the composition of a standard grade of steel was analyzed. The
following data on y = threshold stress (% SMYS) and x = yield strength (MPa) was read
from a graph in the article (which also included the equation of the least squares line).
635
644
711
708
836
820
810
870
856
923
878
937
948
100
93
88
84
77
75
74
63
57
55
47
43
38
113
Problem Statement
110
Problem Statement
111
Problem Statement
109
Problem Statement
114
Problem Statement
The catch basin in a storm-sewer system is the interface between surface runoff and
the sewer. The catch-basin insert is a device for retrofitting catch basins to improve
pollutant removal properties. The article An Evaluation of the Urban Storm water
Pollutant Removal Efficiency of Catch Basin Inserts (Water Envir. Res., 2005: 500
510) reported on tests of various inserts under controlled conditions for which inflow
is close to what can be expected in the field. Consider the following data, read from a
graph in the article, for one particular type of insert on x = amount filtered (1000s of
liters) and y = % total suspended solids removed.
23
45
68
91
114
136
159
182
205
228
53.3
26.9
54.8
33.8
29.9
8.2
17.2
12.2
3.2
11.1
19
11/02/2015
115
Thanks
Note:
If you feel any difficulty in understanding the lectures or if you could not get
the soft copy of the lecture you can email me. I will try my best to answer.
Personal:
Official:
CIVIL ENGINEERING BUITEMS, QUETTA
[email protected]
[email protected]
ARSHAD ALI ([email protected])
20