0% found this document useful (0 votes)
6 views5 pages

Ass Part2 Task4 Multiple - Regression

Uploaded by

张凝玉
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views5 pages

Ass Part2 Task4 Multiple - Regression

Uploaded by

张凝玉
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

CP2403 - Assignment – Part 2 – Task 4: Multiple

Regression
First Name:景娇
Last Name:李

1: Data Selection
- Data selected: bottle.csv
- Response variable: T_degC
- Explanatory variable 1: Depthm
- Explanatory variable 2: Salnty
- Explanatory variable 3: O2ml_L
2: Scatter plots between each explanatory variable and response variable
Scatter plot 1 & r-value

Scatter plot 2 & r-value


Scatter plot 3 & r-value

3: Summary of your pre-testing plan - List possible candidate combinations of individual


regression models to compose a multiple regression model and your justification (e.g. why
did you decide to apply such combination strategy?)
Candidate combinations: Model 1: T_degC ~ Depthm
Model 2: T_degC ~ Salnty
Model 3: T_degC ~ Depthm + Salnty
Justification: These combinations were chosen based on the moderate to strong negative
and positive correlations observed in the scatter plots, respectively.

4: Pre-testing Regression analysis results (for each candidate (multiple) regression model)
5: Pre-testing Regression equation/line (for each candidate (multiple) regression model)

6: Q-Q plot for each candidate (multiple) regression model

7: Conclusion from Q-Q plots


Model 3 appears to have residuals that are closer to a normal distribution than Models 1
and 2, suggesting it might be a better fit.

8: Residual Plot for each candidate model


For each candidate model:
- Standardised Residual plot
- percentage of observations over 2 standardized deviation
- percentage of observations over 2.5 standardized

9: Conclusion from Standardised Residual plots


Model 3 seems to have the best fit among the three, with residuals more randomly distributed around
the zero line.

10: Conclusion Overall


- Can you select one best model among your candidate models?
Based on the R-squared values, Q-Q plots, and standardized residual plots,
Model 3 (T_degC ~ Depthm + Salnty) is selected as the best model.
- Justify your selection.
This model not only has a higher R-squared value but also shows residuals that
are closer to a normal distribution and are more randomly scattered around the zero line,
indicating a better fit and less violation of the assumptions of linear regression.

You might also like