ISE 500 Fall 2018 Assignment 7: Regression Plot
ISE 500 Fall 2018 Assignment 7: Regression Plot
Observed X Observed Y
Temp. Ratio Predicted Y Residuals
170 0.84 0.769 0.071
172 1.31 0.9574 0.3526 3.5
173 1.42 1.0516 0.3684
3
174 1.03 1.1458 -0.1158
174 1.07 1.1458 -0.0758 2.5
175 1.08 1.24 -0.16
176 1.04 1.3342 -0.2942 2
-1
-1.5
-0.5
-1
-1.5
Since the residual plot shows a random distribution of the residuals for the individula temperature values, it is a good fit for
The Regression shows that only 45% (0.4514) of the ratio values will show variation if the temperature is varied. This indicat
Hence, it is difficult to accurately predict ratios for any particular temperature value
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.671861547
R Square 0.4513979383
Adjusted R Square 0.426461481
Standard Error 0.4972447603
Observations 24
ANOVA
df SS MS F
Regression 1 4.4757440972 4.4757440972 18.1019273124
Residual 22 5.4395517361 0.2472523516
Total 23 9.9152958333
1
Ratio at temp. 192 for averages of groups of 5 = 2.9854
0.5
0
170 172 174 176
0
170 172 174 176
I would not be satisfied with these results as I would want a higher R-square value to get a more accurate prediction of any d
which offers me a clearer picture and better accuracy of the relations between X values and Y values.
Having an R-square value of 0.95 and above would help me to predict values with greater accuracy.
The Regression Plot shows a random distribution of the Y variables (Ratios) as compared to the X variables (Temperatures) a
Regression Plot
3.5
2.5
f(x) = 0.0942361111x - 15.2449652778
2 R² = 0.4513979383
1.5
0.5
0
165 170 175 180 185 190
Residual Plot
1
0.5
0
165 170 175 180 185 190
-0.5
-1
-1.5
-0.5
-1
-1.5
mperature is varied. This indicates that the ratio is randomly distributed and does not indicate a fixed pattern.
Significance F
0.0003239397
Regression Statistics
Multiple R 0.91553
R Square 0.83819
Adjusted R Square 0.82246
Standard Error 0.01268
Observations 80.00000
ANOVA
df SS MS F Significance F
Regression 7.00000 0.05994 0.00856 53.27975 0.00000
Residual 72.00000 0.01157 0.00016
Total 79.00000 0.07151
Regression Equation
Time-hat = -0.04094+0.00128*Age+(-0.00007*mileag)+0.00302*difficulty+0.00113*weight+0.00025*tem
For the first set of data, the time can be calculated from the equation
Time-hat= 4:20:47 AM
From the equation we can preditct the best time taken to finish the marathon by simply plugging in the assumptions
Assumptions:
Age 80
Mileage 480
Difficulty 1
Weight 165
Temperature 60
20 milers 0
After Heart Attack 1
From the equation the best time considering the assumptions is 6 hours, 4 minutes and 52 seconds. This is a point
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.915234636
R Square 0.837654439
Adjusted R Square 0.824310968
Standard Error 0.012611033
Observations 80
ANOVA
df SS MS F Significance F
Regression 6 0.0599030143 0.009984 62.77635 7.7652509E-27
Residual 73 0.0116097856 0.000159
Total 79 0.0715128
Again, eliminating the data set closest to 0 (temperature) and performing the regression analysis:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.911655489
R Square 0.831115731
Adjusted R Square 0.819704632
Standard Error 0.012775286
Observations 80
ANOVA
df SS MS F Significance F
Regression 5 0.059435413 0.011887 72.83398 3.5814508E-27
Residual 74 0.012077387 0.000163
Total 79 0.0715128
fficulty+0.00113*weight+0.00025*temp+(-0.00049*20milers)+0.01989*heartattack
Upper 95%
0.03416069
0.00163819
-3.3254E-05
0.00450678
0.00165458
0.00055556
0.03448473
gression analysis:
Upper 95%Lower 95.0%
Upper 95.0%
0.05433877 -0.09313 0.054339
0.00153852 0.000782 0.001539
-3.9286E-05 -0.00011 -3.93E-05
0.00448618 0.001427 0.004486
0.00165165 0.000646 0.001652
0.03498001 0.006131 0.03498
f some standard error.
3
a) We can fit the data with a simple linear regression model since the R-square shows that 90% (0.9078) of the Y-values can b
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.953
R Square 0.908
Adjusted R Square 0.902
Standard Error 27.270
Observations 18.000
ANOVA
df SS MS F Significance F
Regression 1 117153.50 117153.50 157.54 0.00
Residual 16 11898.66 743.67
Total 17 129052.16
Coefficients Standard Error t Stat P-value Lower 95%
Intercept 98.454 13.440 7.326 0.000 69.963
Pressure 0.776 0.062 12.551 0.000 0.645
Regression Equation
Temperature-hat= 98.454+0.776*Pressure
b) For Pressure = 200, the point predicted value for temperature is 253.734
Observed X Observed Y Predicted Residual
Pressure Temperature Y Values
20 44.9 168.156 -123.256
40
40.4 102.4 179.376 -76.976
20
60.8 142.3 190.596 -48.296
80.2 164.8 201.266 -36.466 0
0 50 1
100.4 192.2 212.376 -20.176 -20
120.3 221.4 223.321 -1.921 -40
141.1 228.4 234.761 -6.361
-60
161.4 249.5 245.926 3.574
181.9 269.4 257.201 12.199 -80
250
200
150
0% (0.9078) of the Y-values can be related to the X-values. 100
50
0
0 50 1
Upper 95%
126.945
0.908
Residual Points
40
20
0
0 50 100 150 200 250 300 350 400
-20
-40
-60
-80
-100
-120
-140
Regression
400
250
200
150
100
50
0
0 50 100 150 200 250 300 350 400
ATT_TOT Quiz Total HW Total MT1 Total Score I would choose the probablility plot to analyze the data since
the class and plot it as a normal distribution. This plot will g
3.3 4.50 5.95 18.50 28.95 the average, highe
3.3 4.40 5.90 18.50 28.80
Assumption: Considering that the attendance will be an
3.3 4.15 5.85 18.50 28.50 the classes attended, we can use regression
3.3 4.40 5.85 18.40 28.65
3.3 4.20 5.90 18.40 28.50
3 3.50 5.90 18.40 27.80 It is clear from the data that 50% of the class got abov
homeworks and the midterms. This shows that 50%
3 4.10 5.80 18.30 28.20 between the mean and the highest value is only 3 poin
2.7 4.25 5.65 17.80 27.70 away from the m
3.3 4.35 5.80 17.50 27.65 SUMMARY OUTPUT
3 4.00 5.85 17.50 27.35
3.3 3.90 5.85 17.50 27.25 Regression Statistics
3.3 4.50 5.95 17.25 27.70 Multiple R 0.3867597105
3.3 4.05 5.90 17.25 27.20 R Square 0.1495830736
2.1 4.20 5.35 17.05 26.60 Adjusted R Square 0.1265988324
3.3 4.25 5.20 17.00 26.45 Standard Error 3.3515447919
3.3 4.35 5.80 16.50 26.65 Observations 39
2.4 3.95 5.65 16.50 26.10
3.3 3.70 5.30 16.25 25.25 ANOVA
3 4.15 5.70 16.00 25.85 df
3 3.80 5.70 16.00 25.50 Regression 1
3.3 3.45 5.45 15.75 24.65 Residual 37
3.3 3.30 5.15 15.55 24.00 Total 38
3 2.70 5.75 15.50 23.95
2.7 1.25 4.80 14.90 20.95 Coefficients
3.3 4.25 5.40 14.50 24.15 Intercept 14.4261752137
3.3 3.85 5.35 14.25 23.45 ATT_TOT 3.3587962963
2.1 3.05 3.65 14.25 20.95
3.3 4.15 5.45 14.00 23.60
2.7 3.90 5.90 14.00 23.80
3.3 3.65 5.55 13.80 23.00 PROBABILITY OUTPUT
2.1 1.30 1.40 13.75 16.45
3.3 3.85 5.60 13.30 22.75 Percentile Total Score
2.7 3.95 5.75 13.25 22.95 1.2820512821 16
1.8 4.30 5.80 12.00 22.10 3.8461538462 16.45
3.3 4.15 4.70 11.50 20.35 6.4102564103 16.95
3 3.50 4.00 11.50 19.00 8.9743589744 19
3 3.90 6.00 10.00 19.90 11.5384615385 19.9
3 3.15 4.10 8.75 16.00 14.1025641026 20.35
2.7 3.65 5.80 7.50 16.95 16.6666666667 20.95
19.2307692308 20.95
21.7948717949 22.1
24.358974359 22.75
26.9230769231 22.95
29.4871794872 23
32.0512820513 23.45
34.6153846154 23.6
37.1794871795 23.8
39.7435897436 23.95
42.3076923077 24
44.8717948718 24.15
47.4358974359 24.65
50 25.25
52.5641025641 25.5
55.1282051282 25.85
57.6923076923 26.1
60.2564102564 26.45
62.8205128205 26.6
65.3846153846 26.65
67.9487179487 27.2
70.5128205128 27.25
73.0769230769 27.35
75.641025641 27.65
78.2051282051 27.7
80.7692307692 27.7
83.3333333333 27.8
85.8974358974 28.2
88.4615384615 28.5
91.0256410256 28.5
93.5897435897 28.65
96.1538461538 28.8
98.7179487179 28.95
ty plot to analyze the data since the plot will allow me to compare the scores of each student in
mal distribution. This plot will give me the percentiles and the corresponding score along with
the average, highest and lowest scores.
hat the attendance will be an independent variable and the total score is dependent on
ended, we can use regression with the probability plots to analyze the data.
hat 50% of the class got above 25.25 total points considering the scores of the quizzes, Normal Probabili
terms. This shows that 50% of the students have got high scores since the difference
e highest value is only 3 points, whereas the students who have got low scores are far 40
away from the mean upto 9 points. 30
Total Score
20
Even though the class average is high, half of the class did not get high scores and
were scattered from the mean. So the students who did not do well on the test 10
should try to get closer to the class average.
0
0 20 40 60
Sample Percen
SS MS F Significance F
73.1042013889 73.1042013889 6.5080709857 0.0150066272
415.6155422009 11.2328524919
488.7197435897
0 20 40 60 80 100 120
Sample Percentile