Final Task X
Final Task X
Background:
Precipitation, a key component of the Earth's climate system, is crucial to ecosystem dynamics,
agricultural output, and water resource management. The dynamic character of precipitation
patterns, as well as their link with other climate indices such as the SOI, have piqued the interest
of scientists and policymakers alike.
The SOI gives information on air pressure differentials between, allowing the current ENSO
condition to be identified. ENSO presents itself as two extreme phases: El Nio and La Nia, each
with its own influence on global weather patterns. El Nio is often related with lower rainfall in
specific places, whereas La Nia is associated with above-average precipitation.
This study examines the link between seasonal precipitation and the mean seasonal SOI value
using historical meteorological data from numerous locations in the Brisbane area. The purpose
is to identify any observable patterns or trends and examine the SOI's effect on seasonal
precipitation in the area.
if (!file.exists(ghcnd_meta_data_csv)) {
ghcnd_meta_data <- ghcnd_stations()
write_csv(ghcnd_meta_data, ghcnd_meta_data_csv)
1
Transactions of the ASABE. 50(6): 2081-2089. (doi: 10.13031/2013.24110) @2007
Authors: V. W. Keener, K. T. Ingram, B. Jacobson, J. W. Jones
} else {
ghcnd_meta_data <- read_csv(ghcnd_meta_data_csv)
}
if (!file.exists(station_data_csv)) {
station_data <- meteo_pull_monitors(monitors = station_ids,
keep_flags = TRUE,
var = "PRCP")
write_csv(station_data, station_data_csv)
} else {
station_data <- read_csv(station_data_csv)
}
Question 1.2 (3 marks) Convert the relevant variables to factors in your dataset. Be sure to set
the factor levels appropriately for later analysis. Show your code and show the factor levels.
# Convert relevant variables to factors
combined_data <- combined_data %>%
mutate(across(where(is.character), as.factor))
Question 2.1 (5 marks) Create a visualisation that explores the relationship between
SeasonalSOI and total_seasonal_prcp for each season. Use geom_smooth() to add the null model
and the linear model from equation (1).
# Fit the linear regression model
model <- lm(total_seas_prcp ~ SeasonalSOI, data = combined_data)
Question 3.3 (2 marks) Provide a 95% confidence interval for the parameter estimates.
A 95% confidence interval for the parameter estimates can be obtained using the confint function
in R.
# Confidence interval
conf_interval <- confint(model)
cat("Confidence Interval (95%) for the parameter estimates:\n")
print(conf_interval)
Question 3.4 (1 mark) How much variability in the data is explained by this model?
The linear regression model used to estimate seasonal rainfall totals based on the mean seasonal
SOI value explains only a small amount of the data variability. The mean seasonal SOI accounts
for roughly 5.679% of the variability in total seasonal precipitation, with an R-squared value of
0.05679. This implies that there are additional variables not accounted for in the model that
significantly contribute to the observed seasonal rainfall changes. As a result, while the model
gives some insight, it may not be a good predictor of seasonal rainfall totals, and other factors
should be taken into account when studying and forecasting such patterns.
Question 3.5 (4 marks) Visualise the fitted values compared with the residuals, and visualise the
standardised quantiles of the residuals compared with the theoretical quantiles. Discuss the
validity of the underlying assumptions of linear regression.
# Fitted values vs residuals
plot(model$fitted.values, model$residuals, xlab = "Fitted Values", ylab = "Residuals", main =
"Fitted Values vs Residuals")
Question 3.6 (4 marks) Print out a summary of your linear model and interpret the results. As
part of this you must discuss the physical meaning of the model, which parameters are significant
and whether the linear model is significantly different compared to the null model.
According to the model summary, the linear regression analysis demonstrates a significant
relationship between the mean seasonal SOI value and total seasonal precipitation. The
computed intercept and coefficient values relate to the starting point of precipitation and the rate
at which it varies in response to changes in the mean seasonal SOI value. Nonetheless, the low
R-squared value shows that the model explains for just a part of the variation in the data,
showing that other variables may impact total seasonal precipitation.
Question 3.7 (2 marks) Discuss whether your fitted model is a good model to use to predict
seasonal rainfall totals using the mean seasonal SOI.
A significant connection exists between the mean seasonal SOI value and total seasonal
precipitation, according to the fitted linear regression model. The coefficients associated with the
intercept and the mean seasonal SOI value give useful information about the baseline
precipitation and the rate of change in precipitation as a function of fluctuations in the mean
seasonal SOI value.
However, the model's R-squared value is quite low, indicating that the model only explains a tiny
amount of the variability in the data. This shows that variables other than the mean seasonal SOI
impact seasonal rainfall totals. As a result, using the mean seasonal SOI as a predictor alone may
not reflect the entire complexity of precipitation patterns.
As a result, while the model can estimate seasonal rainfall totals using the mean seasonal SOI, it
should be used with caution. It may be advantageous to investigate additional variables or other
models capable of capturing the effect of other factors on seasonal precipitation. Furthermore,
continual evaluation and validation of the model's performance against real-world data is
required to determine its efficacy in actual applications.
Question 4.1 (2 marks) For BRISBANE AERO and the season you chose earlier, fit a linear
regression using polynomial explanatory variables of up to order 2 and the SeasonalSOI. Write
down the equation with your estimated parameter values.
# Fit a polynomial regression model
model_poly <- lm(total_seas_prcp ~ poly(SeasonalSOI, 2), data = combined_data)
Question 4.2 (3 marks) Print out a summary of your fitted model, interpret the significance of
the results and explain the related the physical meaning.
# Print the model summary
summary(model_poly)
According to the polynomial regression model, the SeasonalSOI variable and its quadratic term
have some explanatory value for predicting total seasonal precipitation. The low R-squared
value, on the other hand, indicates that the model only explains a tiny amount of the variability,
suggesting that additional factors not included in the model may impact total seasonal
precipitation.2
Question 4.3 (3 marks) Create a prediction interval for a mean seasonal SOI value of 25 and a
mean seasonal SOI value of -25. Comment on using this model for prediction in relation to your
physical understanding of rainfall and your understanding of extrapolation.
2
An R-squared measure of goodness of fit for some common nonlinear regression models
AA. Colin Cameron a , Frank A.G. Windmeijer
# Create new data points
new_data_poly <- data.frame(SeasonalSOI = c(25, -25))
Question 4.4 (3 marks) Decide whether a linear or polynomial regression is preferred using a
statistical test. Be sure to describe your statistical test in detail.
We used an F-test to assess the fit of two models: a linear regression model and a polynomial
regression model, to decide whether linear or polynomial regression is favorable. The F-test
determines if the polynomial model significantly improves fit over the simpler linear model.3
The test compares the residual sum of squares (RSS) of the two models, which measures the
difference between observed and predicted values. We generate a p-value reflecting the chance
of detecting such an extreme F-statistic if the linear model is sufficient by calculating the F-
statistic, which is the ratio of the RSS difference to the difference in degrees of freedom.
The polynomial regression model has a lower RSS (10389018) in our investigation than the
linear regression model (10554075). The F-statistic computed was 4.5915, with a p-value of
0.03297. Because the p-value is less than 0.05, we infer that the polynomial regression model
outperforms the linear regression model in terms of fit.
As a result, we choose the polynomial regression model based on the statistical test because it
better explains the variability in total seasonal precipitation by integrating more polynomial
factors.
Question 5.1 (3 marks) Consider again BRISBANE AERO and your chosen season. To better
understand the role of different Phases of ENSO, fit a categorical regression model of the from
Question 5.2 (1 mark) Write down the linear model substituting the parameter values into the
equation.
# Linear model substituting the parameter values into the equation
total_seas_prcp <- beta0 + beta1 * as.numeric(Phase == "Neutral") + beta2 * as.numeric(Phase
== "LaNina")
Question 5.3 (3 marks) Interpret the significance of the results of your fitted model and explain
the related the physical meaning.
The fitted categorical regression model findings show the importance of the Phase levels
(Neutral and LaNina) in comparison to the baseline category (ElNino). The coefficient
estimations for each Phase level show the predicted change in total seasonal precipitation vs the
baseline category.
During the El Nino period, the intercept coefficient indicates the average total seasonal
precipitation. The coefficient for the LaNina phase (126.73) is statistically significant (p 0.001),
indicating that total seasonal precipitation during LaNina is greater by 126.73 units on average
than during ElNino. However, the Neutral phase coefficient (30.20) is not statistically significant
(p = 0.298), implying that there is no clear evidence of a substantial difference in total seasonal
precipitation between the Neutral and ElNino phases.
Overall, these findings reveal that the ENSO Phase has a considerable influence on total seasonal
precipitation, with LaNina having a major impact, while the effect of the Neutral phase is
unclear.
Question 5.4 (3 marks) Given the results of this categorical regression, is this in support of your
choice of linear or polynomial regression?
No, the categorical regression findings do not support the use of linear or polynomial regression.
According to the categorical regression model, the four ENSO phases (El Nino, La Nina, and
Neutral) have unique and substantial effects on total seasonal precipitation. This suggests that a
linear or polynomial regression, which implies a continuous connection between the predictors
and the response variable, may fail to reflect the complexities of the ENSO-precipitation
relationship. As a result, the categorical regression model is more suitable and better matched
with the actual data, emphasizing the necessity of addressing the ENSO phases' categorical
character when examining their influence on total seasonal precipitation.
The relevance of the Phase levels in this categorical regression analysis implies that a categorical
regression model that takes into account the distinct ENSO phases gives useful insights into the
association between ENSO and total seasonal precipitation. The linear or polynomial regression
models may not capture the unique effects of each ENSO phase on precipitation patterns
completely. As a result, the categorical regression model is more suited to assessing the influence
of distinct ENSO phases on total seasonal precipitation.
Conclusion:
Finally, our study provided insight into the link between seasonal rainfall and the mean seasonal
Southern Oscillation Index (SOI), revealing important insights into the complicated dynamics of
precipitation patterns. While the linear regression model revealed a modest relationship between
mean seasonal SOI and total seasonal precipitation, it only explained a tiny portion of the
observed variability.
In contrast, the polynomial regression model displayed better fit and provided a more accurate
description of the connection. However, as evidenced by the prediction ranges, the model's
predictive powers beyond the observed data range were restricted.
Furthermore, the categorical regression analysis revealed important information on the impact of
distinct phases of the El Nio-Southern Oscillation (ENSO) on seasonal precipitation. It was
discovered that the La Nia phase had a considerable favorable influence on total seasonal
precipitation when compared to the baseline El Nio phase, but the Neutral phase did not.
Overall, our findings emphasize the complex link between ENSO phases, mean seasonal SOI,
and total seasonal rainfall. While the models used in this investigation gave useful insights,
additional processes are likely at work that contribute to the observed variability in precipitation
patterns. More research and inquiry are required to find other factors and mechanisms that
control seasonal rainfall, which will aid in our knowledge and prediction of these crucial climatic
events.
Code:
library(rnoaa)
library(tidyverse)
if (!file.exists(ghcnd_meta_data_csv)) {
write_csv(ghcnd_meta_data, ghcnd_meta_data_csv)
} else {
"ASN00040096", "ASN00040214")
if (!file.exists(station_data_csv)) {
keep_flags = TRUE,
var = "PRCP")
write_csv(station_data, station_data_csv)
} else {
head(combined_data, 3)
mutate(across(where(is.character), as.factor))
print(factor_levels)
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue", formula = y ~ x) +
facet_wrap(~Season, ncol = 2) +
theme_minimal()
summary(model)
print(predicted_values)
# Confidence interval
print(conf_interval)
qqline(model$residuals)
# Model Summary
summary(model)
summary(model_poly)
print(predictions_poly)
print(anova_result)
summary(model_categorical)