0% found this document useful (0 votes)
74 views26 pages

ISE 500 Fall 2018 Assignment 7: Regression Plot

This document contains a table listing Bill Rodgers' marathon results between 1971 and 1978, including place, time, age, mileage leading up to the race, course difficulty, weight before the race, whether he had a heart attack, and his temperature after the race. It shows improvement over time with faster times as he got older and increased his training mileage, with no reported health issues. The data could be analyzed to understand the relationship between training regimen, age, and marathon performance.

Uploaded by

Ananth Ramesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views26 pages

ISE 500 Fall 2018 Assignment 7: Regression Plot

This document contains a table listing Bill Rodgers' marathon results between 1971 and 1978, including place, time, age, mileage leading up to the race, course difficulty, weight before the race, whether he had a heart attack, and his temperature after the race. It shows improvement over time with faster times as he got older and increased his training mileage, with no reported health issues. The data could be analyzed to understand the relationship between training regimen, age, and marathon performance.

Uploaded by

Ananth Ramesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 26

ISE 500 Fall 2018 Assignment 7

Submitted by: Ananth Ramesh (collaborated with Mazen)


Date: 8/10/2018

Observed X Observed Y
Temp. Ratio Predicted Y Residuals
170 0.84 0.769 0.071
172 1.31 0.9574 0.3526 3.5
173 1.42 1.0516 0.3684
3
174 1.03 1.1458 -0.1158
174 1.07 1.1458 -0.0758 2.5
175 1.08 1.24 -0.16
176 1.04 1.3342 -0.2942 2

177 1.8 1.4284 0.3716 1.5


180 1.45 1.711 -0.261
180 1.6 1.711 -0.111 1

180 1.61 1.711 -0.101 0.5


180 2.13 1.711 0.419
180 2.15 1.711 0.439 0
165
181 0.84 1.8052 -0.9652
181 1.43 1.8052 -0.3752
182 0.9 1.8994 -0.9994
182 1.81 1.8994 -0.0894 1
182 1.94 1.8994 0.0406
182 2.68 1.8994 0.7806
0.5
184 1.49 2.0878 -0.5978
184 2.52 2.0878 0.4322
0
185 3 2.182 0.818 165
186 1.87 2.2762 -0.4062
188 3.08 2.4646 0.6154 -0.5

-1

-1.5
-0.5

-1

-1.5

Since the residual plot shows a random distribution of the residuals for the individula temperature values, it is a good fit for

Ratio point predicted at a temperature of 192 considering individual values = 2.8414

The Regression shows that only 45% (0.4514) of the ratio values will show variation if the temperature is varied. This indicat
Hence, it is difficult to accurately predict ratios for any particular temperature value

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.671861547
R Square 0.4513979383
Adjusted R Square 0.426461481
Standard Error 0.4972447603
Observations 24

ANOVA
df SS MS F
Regression 1 4.4757440972 4.4757440972 18.1019273124
Residual 22 5.4395517361 0.2472523516
Total 23 9.9152958333

Coefficients Standard Error t Stat P-value


Intercept -15.2449652778 3.9770484595 -3.8332359872 0.0009050335
Temperature 0.0942361111 0.022149042 4.2546359788 0.0003239397

Temperatures Ratio 2.5


172.6 1.134
f(x) = 0.1041959875x - 1
177.6 1.394 R² = 0.8518499355
2
180.4 1.632
182.4 1.764
1.5
185.75 2.6175

1
Ratio at temp. 192 for averages of groups of 5 = 2.9854

0.5

0
170 172 174 176
0
170 172 174 176

I would not be satisfied with these results as I would want a higher R-square value to get a more accurate prediction of any d
which offers me a clearer picture and better accuracy of the relations between X values and Y values.
Having an R-square value of 0.95 and above would help me to predict values with greater accuracy.
The Regression Plot shows a random distribution of the Y variables (Ratios) as compared to the X variables (Temperatures) a

Regression Plot
3.5

2.5
f(x) = 0.0942361111x - 15.2449652778
2 R² = 0.4513979383

1.5

0.5

0
165 170 175 180 185 190

Residual Plot
1

0.5

0
165 170 175 180 185 190

-0.5

-1

-1.5
-0.5

-1

-1.5

erature values, it is a good fit for a linear model.

mperature is varied. This indicates that the ratio is randomly distributed and does not indicate a fixed pattern.

Significance F
0.0003239397

Lower 95% Upper 95%


-23.4928589677 -6.997071588
0.0483018094 0.1401704129

f(x) = 0.1041959875x - 17.0209287529


R² = 0.8518499355

0 172 174 176 178 180 182 184 186 188


0 172 174 176 178 180 182 184 186 188

more accurate prediction of any data. I would consider other methods


Y values.
he X variables (Temperatures) and the plot is fitted with a linear trendline which is continuously increasing.
Course After
8 Week No. of 20
DATE PLACE TIME AGE DIFFICUL WEIGHT Heart TEMP
Mileage milers
TY Attack
3/10/1971 SANTA BAR 04:40:21 33.5 209 3 144 0 90 2
5/12/1971 CULVER CIT 04:58:00 33.6 296 1 142 0 70 3
05/27/72 PALOS VER 04:05:39 34.1 367 4 139 0 70 0
3/12/1972 CULVER CIT 03:43:00 34.6 478 1 135 0 65 1
9/6/1973 PALOS VER 04:10:56 35.2 477 4 140 0 65 2
06/14/75 PALOS VER 03:21:59 37.2 476 4 132 0 65 1
01/25/76 ORANGE 03:32:00 37.8 456 1 145 0 70 3
12/6/1976 PALOS VER 04:16:29 38.2 488 4 143 0 65 7
10/24/76 NEW YORK 03:09:00 38.5 637 1 140 0 60 1
11/20/76 ROSEBOWL 03:23:00 38.6 679 1 137 0 70 3
5/12/1976 CULVER CIT 03:28:35 38.6 657 1 137 0 65 4
11/6/1977 PALOS VER 03:48:00 39.2 328 4 140 0 65 0
01/14/78 SAN DIEGO 03:29:48 39.8 368 1 148 0 75 2
01/29/78 ORANGE 03:26:09 39.8 405 1 147 0 60 4
12/2/1978 HIDDEN VAL 03:29:14 39.8 458 5 145 0 55 5
11/3/1978 LOS ALAMIT 03:17:25 39.9 505 1 144 0 55 4
8/4/1978 ORANGE CO 03:16:05 40 540 1 141 0 55 2
06/17/78 PALOS VER 04:12:00 40.2 399 4 142 0 70 1
01/13/80 SAN DIEGO 04:15:03 41.8 322 1 139 0 60 1
1/3/1987 LOS ANGEL 04:05:03 48.9 362 1 143 0 75 2
3/5/1987 LONG BEAC 03:58:17 49.1 457 1 138 0 80 2
06/13/87 PALOS VER 04:28:00 49.2 407 4 141 0 70 3
07/19/87 SAN FRANSI 04:18:00 49.3 309 3 141 0 70 2
6/3/1988 LOS ANGEL 04:18:00 49.9 389 1 143 0 70 1
11/6/1988 PALOS VER 04:46:00 50.2 212 4 146 0 65 0
10/30/88 CHICAGO 04:42:00 50.6 328 1 143 0 45 0
4/3/1990 LOS ANGEL 04:31:00 51.9 420 1 151 0 80 0
8/12/1991 SAN DIEGO 04:24:00 53.7 475 3 149 0 55 2
01/24/93 SAN DIEGO 04:35:00 54.8 401 3 148 0 55 1
7/2/1993 LONG BEAC 04:19:00 54.8 407 1 146 0 55 2
7/3/1993 LOS ANGEL 04:36:00 54.9 372 1 146 0 95 4
03/20/93 CATALINA 04:42:00 54.9 378.5 7 143 0 65 4
04/17/93 LAKE POWE 04:24:38 55 406 5 138 0 65 3
12/6/1993 PALOS VER 04:56:55 55.2 404.5 4 137 0 80 1
07/18/93 SAN FRANSI 04:18:01 55.3 369 3 136 0 65 2
09/26/93 PORTLAND 04:07:16 55.5 418 1 137 0 65 4
10/31/93 CHICAGO 03:57:00 55.6 426 1 132 0 32 5
5/12/1993 CULVER CIT 04:11:10 55.7 445 1 131 0 55 3
01/23/94 SAN DIEGO 03:59:50 55.8 469 3 134 0 65 8
6/2/1994 LONG BEAC 04:18:00 55.8 461 1 133 0 60 6
6/3/1994 LOS ANGEL 04:13:25 55.9 440.5 1 132 0 70 4
03/19/94 CATALINA 04:32:10 55.9 401.5 7 133 0 60 3
04/16/94 LAKE POWE 04:23:37 56 431 5 136 0 75 4
07/31/94 SAN FRANSI 04:14:05 56.3 372.5 3 139 0 75 2
2/10/1994 PORTLAND 04:05:02 56.5 410 1 138 0 55 4
10/30/94 CHICAGO 04:28:16 56.6 380 1 136 0 60 4
4/12/1994 CULVER CIT 04:19:23 56.7 338.5 1 136 0 60 4
5/2/1995 LONG BEAC 04:22:16 56.8 323 1 144 0 50 1
5/3/1995 LOS ANGEL 04:40:00 56.9 336.5 1 144 0 75 2
03/18/95 CATALINA 04:50:00 56.9 330.5 7 144 0 65 2
04/30/95 Big Sur 04:25:00 57.1 312.5 5 144 0 55 2
3/6/1995 Nicene 04:47:00 57.2 347.5 7 146 0 60 2
9/7/1995 SAN FRANSI 04:40:00 57.3 283.5 3 147 0 70 1
04/28/96 Big Sur 05:01:00 58.1 257 5 151 0 65 0
4/5/1996 Shiprock 05:20:00 58.1 306 6 151 0 75 1
1/6/1996 Nicene 05:43:00 58.2 405 7 149 0 60 2
8/6/1996 PALOS VER 04:55:00 58.2 450 4 149 0 55 3
07/14/96 SAN FRANSI 04:50:00 58.3 497 3 150 0 60 3
2/3/1997 LOS ANGEL 04:39:19 58.9 309 1 146 0 60 2
03/19/97 CATALINA 05:01:00 58.9 370 7 146 0 60 3
04/27/97 Big Sur 04:47:00 59.1 371.5 5 146 0 55 5
3/5/1997 Shiprock 04:32:00 59.1 369 6 146 0 55 5
7/6/1997 PALOS VER 04:50:00 59.2 381 4 148 0 55 4
07/13/97 SAN FRANSI 04:42:43 59.3 378.5 3 148 0 50 2
09/28/97 PORTLAND 04:17:37 59.5 446 1 146 0 50 2
10/19/97 CHICAGO 04:14:00 59.5 410 1 146 0 50 3
03/14/98 CATALINA 05:20:11 59.9 419 7 155 0 55 1
03/29/98 LOS ANGEL 04:50:00 60 440 1 154 0 70 2
04/26/98 Big Sur 04:50:00 60.1 459 5 153 0 50 5
2/5/1998 Shiprock 05:06:00 60.1 450 6 152 0 65 4
7/6/1998 PALOS VER 05:29:00 60.2 460 4 152 0 55 3
12/7/1998 SAN FRANSI 05:08:00 60.3 440 3 152 0 70 3
3/12/2000 Tucson 04:58:00 62.7 419.5 1 153 1 75 1
8/7/2001 SAN FRANSI 05:58:00 63.3 211 3 158 1 55 0
3/3/2002 LOS ANGEL 05:53:00 63.9 353 1 162 1 60 1
05/18/02 PALOS VER 06:02:34 64.1 332 4 162 1 55 1
07/28/02 SAN FRANSI 05:56:57 64.3 315.4 2 160 1 70 0
6/10/2002 Detroit 05:25:16 64.5 269 1 162 1 60 2
1/8/2004 SAN FRANSI 06:40:59 66.3 241 3 177 1 50 0
10/9/2004 LONG BEAC 06:32:05 66.4 244 1 174 1 70 0

Best Time eliminating data sets closest to 0 is = ###


SUMMARY OUTPUT

Regression Statistics
Multiple R 0.91553
R Square 0.83819
Adjusted R Square 0.82246
Standard Error 0.01268
Observations 80.00000

ANOVA
df SS MS F Significance F
Regression 7.00000 0.05994 0.00856 53.27975 0.00000
Residual 72.00000 0.01157 0.00016
Total 79.00000 0.07151

Coefficients Standard Error t Stat P-value Lower 95%


Intercept -0.04094 0.04001 -1.02338 0.30955 -0.12069
AGE 0.00128 0.00020 6.30413 0.00000 0.00087
8 Week Mileage -0.00007 0.00002 -3.39182 0.00113 -0.00011
Course DIFFICULTY 0.00302 0.00076 3.95530 0.00018 0.00150
WEIGHT 0.00113 0.00026 4.38247 0.00004 0.00062
TEMP 0.00025 0.00015 1.66245 0.10077 -0.00005
No. of 20 milers -0.00049 0.00101 -0.48682 0.62787 -0.00252
After Heart Attack 0.01989 0.00722 2.75403 0.00745 0.00549

Regression Equation

Time-hat = -0.04094+0.00128*Age+(-0.00007*mileag)+0.00302*difficulty+0.00113*weight+0.00025*tem
For the first set of data, the time can be calculated from the equation
Time-hat= 4:20:47 AM

From the equation we can preditct the best time taken to finish the marathon by simply plugging in the assumptions
Assumptions:
Age 80
Mileage 480
Difficulty 1
Weight 165
Temperature 60
20 milers 0
After Heart Attack 1

Best Time = 6:04:52 AM

From the equation the best time considering the assumptions is 6 hours, 4 minutes and 52 seconds. This is a point

From the regression analysis, by considering the t-stat, we can eliminate


the data of 'no. of 20 milers' since it is the closest value to 0 and has the
least effect on the time.

Regression Analysis eliminating 20 milers

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.915234636
R Square 0.837654439
Adjusted R Square 0.824310968
Standard Error 0.012611033
Observations 80

ANOVA
df SS MS F Significance F
Regression 6 0.0599030143 0.009984 62.77635 7.7652509E-27
Residual 73 0.0116097856 0.000159
Total 79 0.0715128

Coefficients Standard Error t Stat P-value Lower 95%


Intercept -0.04410231 0.0392689988 -1.12308 0.265083 -0.1223653123
AGE 0.00125033 0.0001946124 6.424719 1.19E-08 0.0008624681
8 Week Mileage -6.98189E-05 1.83465009E-05 -3.80557 0.000292 -0.0001063834
Course DIFFICULTY 0.002995852 0.0007581172 3.951701 0.000177 0.0014849267
WEIGHT 0.001158065 0.0002491279 4.648476 1.45E-05 0.0006615538
TEMP 0.000256928 0.0001498386 1.714696 0.090644 -4.1700206E-05
After Heart Attack 0.02023772 0.0071485367 2.83103 0.005991 0.0059907069

Again, eliminating the data set closest to 0 (temperature) and performing the regression analysis:
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.911655489
R Square 0.831115731
Adjusted R Square 0.819704632
Standard Error 0.012775286
Observations 80

ANOVA
df SS MS F Significance F
Regression 5 0.059435413 0.011887 72.83398 3.5814508E-27
Residual 74 0.012077387 0.000163
Total 79 0.0715128

Coefficients Standard Error t Stat P-value Lower 95%


Intercept -0.01939797 0.0370063364 -0.52418 0.60172 -0.0931347019
AGE 0.001160272 0.0001898321 6.112099 4.22E-08 0.0007820238
8 Week Mileage -7.56719E-05 1.82609423E-05 -4.14392 8.98E-05 -0.0001120576
Course DIFFICULTY 0.002956623 0.0007676415 3.851567 0.000248 0.0014270636
WEIGHT 0.0011489 0.0002523145 4.553445 2.03E-05 0.0006461528
After Heart Attack 0.02055557 0.0072392079 2.839478 0.005831 0.0061311336
Upper 95%
0.03881
0.00168
-0.00003
0.00454
0.00164
0.00055
0.00153
0.03429

fficulty+0.00113*weight+0.00025*temp+(-0.00049*20milers)+0.01989*heartattack

y simply plugging in the assumptions into the equatio above.


nutes and 52 seconds. This is a point predicted value and may be completely accurate due to the presence of some standard error.

Upper 95%
0.03416069
0.00163819
-3.3254E-05
0.00450678
0.00165458
0.00055556
0.03448473

gression analysis:
Upper 95%Lower 95.0%
Upper 95.0%
0.05433877 -0.09313 0.054339
0.00153852 0.000782 0.001539
-3.9286E-05 -0.00011 -3.93E-05
0.00448618 0.001427 0.004486
0.00165165 0.000646 0.001652
0.03498001 0.006131 0.03498
f some standard error.
3

a) We can fit the data with a simple linear regression model since the R-square shows that 90% (0.9078) of the Y-values can b

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.953
R Square 0.908
Adjusted R Square 0.902
Standard Error 27.270
Observations 18.000

ANOVA
df SS MS F Significance F
Regression 1 117153.50 117153.50 157.54 0.00
Residual 16 11898.66 743.67
Total 17 129052.16
Coefficients Standard Error t Stat P-value Lower 95%
Intercept 98.454 13.440 7.326 0.000 69.963
Pressure 0.776 0.062 12.551 0.000 0.645

Regression Equation
Temperature-hat= 98.454+0.776*Pressure

b) For Pressure = 200, the point predicted value for temperature is 253.734
Observed X Observed Y Predicted Residual
Pressure Temperature Y Values
20 44.9 168.156 -123.256
40
40.4 102.4 179.376 -76.976
20
60.8 142.3 190.596 -48.296
80.2 164.8 201.266 -36.466 0
0 50 1
100.4 192.2 212.376 -20.176 -20
120.3 221.4 223.321 -1.921 -40
141.1 228.4 234.761 -6.361
-60
161.4 249.5 245.926 3.574
181.9 269.4 257.201 12.199 -80

201.4 270.8 267.926 2.874 -100


220.8 291.5 278.596 12.904 -120
241.8 287.3 290.146 -2.846
-140
261.1 313.3 300.761 12.539
280.4 322.3 311.376 10.924
300.1 325.8 322.211 3.589
320.6 337 333.486 3.514 400
341.1 332.6 344.761 -12.161
350 f(x) = 0.776400
360.8 342.9 355.596 -12.696 R² = 0.9077995
300

250

200

150
0% (0.9078) of the Y-values can be related to the X-values. 100

50

0
0 50 1
Upper 95%
126.945
0.908
Residual Points
40

20

0
0 50 100 150 200 250 300 350 400
-20

-40

-60

-80

-100

-120

-140

Regression
400

350 f(x) = 0.7764007769x + 98.4541051008


R² = 0.907799581
300

250

200

150

100

50

0
0 50 100 150 200 250 300 350 400
ATT_TOT Quiz Total HW Total MT1 Total Score I would choose the probablility plot to analyze the data since
the class and plot it as a normal distribution. This plot will g
3.3 4.50 5.95 18.50 28.95 the average, highe
3.3 4.40 5.90 18.50 28.80
Assumption: Considering that the attendance will be an
3.3 4.15 5.85 18.50 28.50 the classes attended, we can use regression
3.3 4.40 5.85 18.40 28.65
3.3 4.20 5.90 18.40 28.50
3 3.50 5.90 18.40 27.80 It is clear from the data that 50% of the class got abov
homeworks and the midterms. This shows that 50%
3 4.10 5.80 18.30 28.20 between the mean and the highest value is only 3 poin
2.7 4.25 5.65 17.80 27.70 away from the m
3.3 4.35 5.80 17.50 27.65 SUMMARY OUTPUT
3 4.00 5.85 17.50 27.35
3.3 3.90 5.85 17.50 27.25 Regression Statistics
3.3 4.50 5.95 17.25 27.70 Multiple R 0.3867597105
3.3 4.05 5.90 17.25 27.20 R Square 0.1495830736
2.1 4.20 5.35 17.05 26.60 Adjusted R Square 0.1265988324
3.3 4.25 5.20 17.00 26.45 Standard Error 3.3515447919
3.3 4.35 5.80 16.50 26.65 Observations 39
2.4 3.95 5.65 16.50 26.10
3.3 3.70 5.30 16.25 25.25 ANOVA
3 4.15 5.70 16.00 25.85 df
3 3.80 5.70 16.00 25.50 Regression 1
3.3 3.45 5.45 15.75 24.65 Residual 37
3.3 3.30 5.15 15.55 24.00 Total 38
3 2.70 5.75 15.50 23.95
2.7 1.25 4.80 14.90 20.95 Coefficients
3.3 4.25 5.40 14.50 24.15 Intercept 14.4261752137
3.3 3.85 5.35 14.25 23.45 ATT_TOT 3.3587962963
2.1 3.05 3.65 14.25 20.95
3.3 4.15 5.45 14.00 23.60
2.7 3.90 5.90 14.00 23.80
3.3 3.65 5.55 13.80 23.00 PROBABILITY OUTPUT
2.1 1.30 1.40 13.75 16.45
3.3 3.85 5.60 13.30 22.75 Percentile Total Score
2.7 3.95 5.75 13.25 22.95 1.2820512821 16
1.8 4.30 5.80 12.00 22.10 3.8461538462 16.45
3.3 4.15 4.70 11.50 20.35 6.4102564103 16.95
3 3.50 4.00 11.50 19.00 8.9743589744 19
3 3.90 6.00 10.00 19.90 11.5384615385 19.9
3 3.15 4.10 8.75 16.00 14.1025641026 20.35
2.7 3.65 5.80 7.50 16.95 16.6666666667 20.95
19.2307692308 20.95
21.7948717949 22.1
24.358974359 22.75
26.9230769231 22.95
29.4871794872 23
32.0512820513 23.45
34.6153846154 23.6
37.1794871795 23.8
39.7435897436 23.95
42.3076923077 24
44.8717948718 24.15
47.4358974359 24.65
50 25.25
52.5641025641 25.5
55.1282051282 25.85
57.6923076923 26.1
60.2564102564 26.45
62.8205128205 26.6
65.3846153846 26.65
67.9487179487 27.2
70.5128205128 27.25
73.0769230769 27.35
75.641025641 27.65
78.2051282051 27.7
80.7692307692 27.7
83.3333333333 27.8
85.8974358974 28.2
88.4615384615 28.5
91.0256410256 28.5
93.5897435897 28.65
96.1538461538 28.8
98.7179487179 28.95
ty plot to analyze the data since the plot will allow me to compare the scores of each student in
mal distribution. This plot will give me the percentiles and the corresponding score along with
the average, highest and lowest scores.

hat the attendance will be an independent variable and the total score is dependent on
ended, we can use regression with the probability plots to analyze the data.

hat 50% of the class got above 25.25 total points considering the scores of the quizzes, Normal Probabili
terms. This shows that 50% of the students have got high scores since the difference
e highest value is only 3 points, whereas the students who have got low scores are far 40
away from the mean upto 9 points. 30

Total Score
20
Even though the class average is high, half of the class did not get high scores and
were scattered from the mean. So the students who did not do well on the test 10
should try to get closer to the class average.
0
0 20 40 60
Sample Percen

SS MS F Significance F
73.1042013889 73.1042013889 6.5080709857 0.0150066272
415.6155422009 11.2328524919
488.7197435897

Standard Error t Stat P-value Lower 95% Upper 95%


3.9861266758 3.6190960266 0.0008789864 6.3495153865 22.5028350408
1.3166111388 2.5510921163 0.0150066272 0.6910887302 6.0265038624
Normal Probability Plot

0 20 40 60 80 100 120
Sample Percentile

You might also like