0% found this document useful (0 votes)
64 views17 pages

Final Task X

- The document examines the link between seasonal precipitation and the Southern Oscillation Index (SOI) in the Brisbane region of Australia. The SOI acts as an indicator of El Niño/Southern Oscillation (ENSO) conditions, which impact global weather patterns. - Historical meteorological data from numerous Brisbane-area locations is analyzed to identify any patterns or trends between seasonal precipitation amounts and the mean seasonal SOI value. - Visualization of the data shows a stronger linear relationship between seasonal SOI and precipitation totals for the winter season compared to other seasons. However, the linear model explains only a small portion of the variability in seasonal rainfall amounts.

Uploaded by

Dirty Rajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views17 pages

Final Task X

- The document examines the link between seasonal precipitation and the Southern Oscillation Index (SOI) in the Brisbane region of Australia. The SOI acts as an indicator of El Niño/Southern Oscillation (ENSO) conditions, which impact global weather patterns. - Historical meteorological data from numerous Brisbane-area locations is analyzed to identify any patterns or trends between seasonal precipitation amounts and the mean seasonal SOI value. - Visualization of the data shows a stronger linear relationship between seasonal SOI and precipitation totals for the winter season compared to other seasons. However, the linear model explains only a small portion of the variability in seasonal rainfall amounts.

Uploaded by

Dirty Rajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Introduction:

Understanding weather patterns and anticipating future conditions requires an analysis of


climatic factors and their interdependence. The link between seasonal precipitation and the
Southern Oscillation Index (SOI) in the Brisbane region is investigated in this study. The SOI, a
pressure index, acts as an indication of the El Nio-Southern Oscillation (ENSO) phenomenon,
which has a global impact on weather patterns. 1

Background:
Precipitation, a key component of the Earth's climate system, is crucial to ecosystem dynamics,
agricultural output, and water resource management. The dynamic character of precipitation
patterns, as well as their link with other climate indices such as the SOI, have piqued the interest
of scientists and policymakers alike.
The SOI gives information on air pressure differentials between, allowing the current ENSO
condition to be identified. ENSO presents itself as two extreme phases: El Nio and La Nia, each
with its own influence on global weather patterns. El Nio is often related with lower rainfall in
specific places, whereas La Nia is associated with above-average precipitation.
This study examines the link between seasonal precipitation and the mean seasonal SOI value
using historical meteorological data from numerous locations in the Brisbane area. The purpose
is to identify any observable patterns or trends and examine the SOI's effect on seasonal
precipitation in the area.

Question 1.1 (1 mark) Download the the files total_seasonal_rainfall.csv and


seasonal_soi_data.csv from blackboard. Combine all the variables from in these two datasets into
the single dataset total_seasonal_rainfall. Print the first three rows to show the form of your new
dataset.
# Download the meta data if it doesn't exist
ghcnd_meta_data_csv <- "ghcnd_meta_data.csv"

if (!file.exists(ghcnd_meta_data_csv)) {
ghcnd_meta_data <- ghcnd_stations()
write_csv(ghcnd_meta_data, ghcnd_meta_data_csv)

1
Transactions of the ASABE. 50(6): 2081-2089. (doi: 10.13031/2013.24110) @2007
Authors: V. W. Keener, K. T. Ingram, B. Jacobson, J. W. Jones
} else {
ghcnd_meta_data <- read_csv(ghcnd_meta_data_csv)
}

# Define the station IDs


station_ids <- c("ASN00046037", "ASN00048014", "ASN00056207",
"ASN00058063", "ASN00017028", "ASN00044004",
"ASN00040096", "ASN00040214")

# Download the station data if it doesn't exist


station_data_csv <- "station_data.csv"

if (!file.exists(station_data_csv)) {
station_data <- meteo_pull_monitors(monitors = station_ids,
keep_flags = TRUE,
var = "PRCP")
write_csv(station_data, station_data_csv)
} else {
station_data <- read_csv(station_data_csv)
}

# Load the additional datasets


total_seasonal_rainfall <- read_csv("total_seasonal_rainfall.csv")
seasonal_soi_data <- read_csv("seasonal_soi_data.csv")

# Join datasets based on "Year" and "Season" columns


combined_data <- total_seasonal_rainfall %>%
inner_join(seasonal_soi_data, by = c("Year", "Season"))
# Convert relevant variables to factors
combined_data <- combined_data %>%
mutate(across(where(is.character), as.factor))

# Print the first three rows of the combined dataset


head(combined_data, 3)

Question 1.2 (3 marks) Convert the relevant variables to factors in your dataset. Be sure to set
the factor levels appropriately for later analysis. Show your code and show the factor levels.
# Convert relevant variables to factors
combined_data <- combined_data %>%
mutate(across(where(is.character), as.factor))

# Show factor levels for "Phase" variable


factor_levels <- levels(combined_data$Phase)
print(factor_levels)

Question 2.1 (5 marks) Create a visualisation that explores the relationship between
SeasonalSOI and total_seasonal_prcp for each season. Use geom_smooth() to add the null model
and the linear model from equation (1).
# Fit the linear regression model
model <- lm(total_seas_prcp ~ SeasonalSOI, data = combined_data)

# Create a scatter plot with linear regression line


ggplot(combined_data, aes(x = SeasonalSOI, y = total_seas_prcp)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue", formula = y ~ x) +
geom_smooth(method = "lm", se = FALSE, color = "red", linetype = "dashed", formula = y ~
1) +
facet_wrap(~Season, ncol = 2) +
labs(x = "Seasonal SOI", y = "Total Seasonal Precipitation") +
theme_minimal()
Question 2.2 (3 marks) Using your visualisation, for which seasons would you expect there to be
a significant linear relationship between total seasonal precipitation and the mean seasonal SOI
value? Give detailed reasoning.
The scatter plot illustrates the association between the variables "Seasonal SOI" (x-axis) and
"Total Seasonal Precipitation" (y-axis). It depicts the distribution of the data points and the
connection between the variables graphically.
The fitted linear regression model is represented by the blue line. Based on the regression
analysis, it depicts the predicted linear relationship between "Seasonal SOI" and "Total Seasonal
Precipitation". It indicates the relationship's general trend and direction. The slope of the blue
line represents the rate of change in precipitation for every unit change in the "Seasonal SOI"
variable.
The horizontal line with the formula y 1 is represented by the red dashed line. It acts as a
baseline or reference line for the values of precipitation. Because y 1 in the calculation, the
anticipated value of "Total Seasonal Precipitation" is constant for any value of "Seasonal SOI." If
the "Seasonal SOI" has no impact, it indicates the average or baseline precipitation level.
The scatter plot dots are closer to the red and blue lines in the winter panel, indicating that the
association between "Seasonal SOI" and "Total Seasonal Precipitation" is stronger during the
winter season.
The winter panel's dots are tightly grouped around the blue line, indicating a greater linear
relationship between "Seasonal SOI" and "Total Seasonal Precipitation" during the winter
season.
The "Seasonal SOI" variable has no effect on the red dashed line, which indicates the baseline or
average precipitation level. As a result, its distance from the dots in the winter panel may not
reveal important seasonal information.
Question 3.1 (2 marks) Fill in the blanks in the following sentence so that it refers to the terms in
the regression model.
For the BRISBANE AERO Station and the winter season, a linear model was specified to model
how the total seasonal precipitation,denoted as total seasonal precipitation, , is related to the
mean seasonal SOI value, represented by mean seasonal SOI value. The parameter
slope∨regression coefficient describes the rate of change in the total seasonal precipitation with
an increase in mean seasonal SOI value. The parameter intercept or regression constant.
represents the total seasonal precipitation when the mean seasonal SOI value is 0.
Question 3.2 (2 marks) Write down your linear model substituting the parameter values into the
equation.
The linear model equation with parameter values substituted in is:
total_seas_prcp = [β0] + [β1] * SeasonalSOI

Question 3.3 (2 marks) Provide a 95% confidence interval for the parameter estimates.
A 95% confidence interval for the parameter estimates can be obtained using the confint function
in R.
# Confidence interval
conf_interval <- confint(model)
cat("Confidence Interval (95%) for the parameter estimates:\n")
print(conf_interval)

Question 3.4 (1 mark) How much variability in the data is explained by this model?
The linear regression model used to estimate seasonal rainfall totals based on the mean seasonal
SOI value explains only a small amount of the data variability. The mean seasonal SOI accounts
for roughly 5.679% of the variability in total seasonal precipitation, with an R-squared value of
0.05679. This implies that there are additional variables not accounted for in the model that
significantly contribute to the observed seasonal rainfall changes. As a result, while the model
gives some insight, it may not be a good predictor of seasonal rainfall totals, and other factors
should be taken into account when studying and forecasting such patterns.
Question 3.5 (4 marks) Visualise the fitted values compared with the residuals, and visualise the
standardised quantiles of the residuals compared with the theoretical quantiles. Discuss the
validity of the underlying assumptions of linear regression.
# Fitted values vs residuals
plot(model$fitted.values, model$residuals, xlab = "Fitted Values", ylab = "Residuals", main =
"Fitted Values vs Residuals")

# QQ plot of standardized residuals


qqnorm(model$residuals, main = "Normal Q-Q Plot of Residuals")
qqline(model$residuals)

Question 3.6 (4 marks) Print out a summary of your linear model and interpret the results. As
part of this you must discuss the physical meaning of the model, which parameters are significant
and whether the linear model is significantly different compared to the null model.
According to the model summary, the linear regression analysis demonstrates a significant
relationship between the mean seasonal SOI value and total seasonal precipitation. The
computed intercept and coefficient values relate to the starting point of precipitation and the rate
at which it varies in response to changes in the mean seasonal SOI value. Nonetheless, the low
R-squared value shows that the model explains for just a part of the variation in the data,
showing that other variables may impact total seasonal precipitation.
Question 3.7 (2 marks) Discuss whether your fitted model is a good model to use to predict
seasonal rainfall totals using the mean seasonal SOI.
A significant connection exists between the mean seasonal SOI value and total seasonal
precipitation, according to the fitted linear regression model. The coefficients associated with the
intercept and the mean seasonal SOI value give useful information about the baseline
precipitation and the rate of change in precipitation as a function of fluctuations in the mean
seasonal SOI value.
However, the model's R-squared value is quite low, indicating that the model only explains a tiny
amount of the variability in the data. This shows that variables other than the mean seasonal SOI
impact seasonal rainfall totals. As a result, using the mean seasonal SOI as a predictor alone may
not reflect the entire complexity of precipitation patterns.
As a result, while the model can estimate seasonal rainfall totals using the mean seasonal SOI, it
should be used with caution. It may be advantageous to investigate additional variables or other
models capable of capturing the effect of other factors on seasonal precipitation. Furthermore,
continual evaluation and validation of the model's performance against real-world data is
required to determine its efficacy in actual applications.
Question 4.1 (2 marks) For BRISBANE AERO and the season you chose earlier, fit a linear
regression using polynomial explanatory variables of up to order 2 and the SeasonalSOI. Write
down the equation with your estimated parameter values.
# Fit a polynomial regression model
model_poly <- lm(total_seas_prcp ~ poly(SeasonalSOI, 2), data = combined_data)

Question 4.2 (3 marks) Print out a summary of your fitted model, interpret the significance of
the results and explain the related the physical meaning.
# Print the model summary
summary(model_poly)

According to the polynomial regression model, the SeasonalSOI variable and its quadratic term
have some explanatory value for predicting total seasonal precipitation. The low R-squared
value, on the other hand, indicates that the model only explains a tiny amount of the variability,
suggesting that additional factors not included in the model may impact total seasonal
precipitation.2

Question 4.3 (3 marks) Create a prediction interval for a mean seasonal SOI value of 25 and a
mean seasonal SOI value of -25. Comment on using this model for prediction in relation to your
physical understanding of rainfall and your understanding of extrapolation.

2
An R-squared measure of goodness of fit for some common nonlinear regression models
AA. Colin Cameron a , Frank A.G. Windmeijer
# Create new data points
new_data_poly <- data.frame(SeasonalSOI = c(25, -25))

# Make predictions with confidence intervals


predictions_poly <- predict(model_poly, newdata = new_data_poly, interval = "prediction")

# Print the predictions with confidence intervals


print(predictions_poly)

Question 4.4 (3 marks) Decide whether a linear or polynomial regression is preferred using a
statistical test. Be sure to describe your statistical test in detail.
We used an F-test to assess the fit of two models: a linear regression model and a polynomial
regression model, to decide whether linear or polynomial regression is favorable. The F-test
determines if the polynomial model significantly improves fit over the simpler linear model.3
The test compares the residual sum of squares (RSS) of the two models, which measures the
difference between observed and predicted values. We generate a p-value reflecting the chance
of detecting such an extreme F-statistic if the linear model is sufficient by calculating the F-
statistic, which is the ratio of the RSS difference to the difference in degrees of freedom.
The polynomial regression model has a lower RSS (10389018) in our investigation than the
linear regression model (10554075). The F-statistic computed was 4.5915, with a p-value of
0.03297. Because the p-value is less than 0.05, we infer that the polynomial regression model
outperforms the linear regression model in terms of fit.
As a result, we choose the polynomial regression model based on the statistical test because it
better explains the variability in total seasonal precipitation by integrating more polynomial
factors.
Question 5.1 (3 marks) Consider again BRISBANE AERO and your chosen season. To better
understand the role of different Phases of ENSO, fit a categorical regression model of the from

y i=β 0 + β 1 I ( Phase=Neutral )+ β 2 I ( Phase=LaNina ) , ϵ ∼ N ( 0 , σ 2 ) . (2 )

Print the model summary.

# Fit the categorical regression model


model_categorical <- lm(total_seas_prcp ~ as.factor(Phase), data = combined_data)
3
A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation
Cyril Goutte & Eric Gaussier
# Print the model summary
summary(model_categorical)

Question 5.2 (1 mark) Write down the linear model substituting the parameter values into the
equation.
# Linear model substituting the parameter values into the equation
total_seas_prcp <- beta0 + beta1 * as.numeric(Phase == "Neutral") + beta2 * as.numeric(Phase
== "LaNina")
Question 5.3 (3 marks) Interpret the significance of the results of your fitted model and explain
the related the physical meaning.
The fitted categorical regression model findings show the importance of the Phase levels
(Neutral and LaNina) in comparison to the baseline category (ElNino). The coefficient
estimations for each Phase level show the predicted change in total seasonal precipitation vs the
baseline category.
During the El Nino period, the intercept coefficient indicates the average total seasonal
precipitation. The coefficient for the LaNina phase (126.73) is statistically significant (p 0.001),
indicating that total seasonal precipitation during LaNina is greater by 126.73 units on average
than during ElNino. However, the Neutral phase coefficient (30.20) is not statistically significant
(p = 0.298), implying that there is no clear evidence of a substantial difference in total seasonal
precipitation between the Neutral and ElNino phases.
Overall, these findings reveal that the ENSO Phase has a considerable influence on total seasonal
precipitation, with LaNina having a major impact, while the effect of the Neutral phase is
unclear.
Question 5.4 (3 marks) Given the results of this categorical regression, is this in support of your
choice of linear or polynomial regression?
No, the categorical regression findings do not support the use of linear or polynomial regression.
According to the categorical regression model, the four ENSO phases (El Nino, La Nina, and
Neutral) have unique and substantial effects on total seasonal precipitation. This suggests that a
linear or polynomial regression, which implies a continuous connection between the predictors
and the response variable, may fail to reflect the complexities of the ENSO-precipitation
relationship. As a result, the categorical regression model is more suitable and better matched
with the actual data, emphasizing the necessity of addressing the ENSO phases' categorical
character when examining their influence on total seasonal precipitation.
The relevance of the Phase levels in this categorical regression analysis implies that a categorical
regression model that takes into account the distinct ENSO phases gives useful insights into the
association between ENSO and total seasonal precipitation. The linear or polynomial regression
models may not capture the unique effects of each ENSO phase on precipitation patterns
completely. As a result, the categorical regression model is more suited to assessing the influence
of distinct ENSO phases on total seasonal precipitation.

Conclusion:

Finally, our study provided insight into the link between seasonal rainfall and the mean seasonal
Southern Oscillation Index (SOI), revealing important insights into the complicated dynamics of
precipitation patterns. While the linear regression model revealed a modest relationship between
mean seasonal SOI and total seasonal precipitation, it only explained a tiny portion of the
observed variability.
In contrast, the polynomial regression model displayed better fit and provided a more accurate
description of the connection. However, as evidenced by the prediction ranges, the model's
predictive powers beyond the observed data range were restricted.
Furthermore, the categorical regression analysis revealed important information on the impact of
distinct phases of the El Nio-Southern Oscillation (ENSO) on seasonal precipitation. It was
discovered that the La Nia phase had a considerable favorable influence on total seasonal
precipitation when compared to the baseline El Nio phase, but the Neutral phase did not.
Overall, our findings emphasize the complex link between ENSO phases, mean seasonal SOI,
and total seasonal rainfall. While the models used in this investigation gave useful insights,
additional processes are likely at work that contribute to the observed variability in precipitation
patterns. More research and inquiry are required to find other factors and mechanisms that
control seasonal rainfall, which will aid in our knowledge and prediction of these crucial climatic
events.
Code:
library(rnoaa)

library(tidyverse)

# Set working directory if necessary

# Download the meta data if it doesn't exist

ghcnd_meta_data_csv <- "ghcnd_meta_data.csv"

if (!file.exists(ghcnd_meta_data_csv)) {

ghcnd_meta_data <- ghcnd_stations()

write_csv(ghcnd_meta_data, ghcnd_meta_data_csv)

} else {

ghcnd_meta_data <- read_csv(ghcnd_meta_data_csv)

# Define the station IDs

station_ids <- c("ASN00046037", "ASN00048014", "ASN00056207",

"ASN00058063", "ASN00017028", "ASN00044004",

"ASN00040096", "ASN00040214")

# Download the station data if it doesn't exist

station_data_csv <- "station_data.csv"

if (!file.exists(station_data_csv)) {

station_data <- meteo_pull_monitors(monitors = station_ids,

keep_flags = TRUE,

var = "PRCP")

write_csv(station_data, station_data_csv)
} else {

station_data <- read_csv(station_data_csv)

# Load the additional datasets

total_seasonal_rainfall <- read_csv("total_seasonal_rainfall.csv")

seasonal_soi_data <- read_csv("seasonal_soi_data.csv")

# Join datasets based on "Year" and "Season" columns

combined_data <- total_seasonal_rainfall %>%

inner_join(seasonal_soi_data, by = c("Year", "Season"))

# Print the first three rows of the combined dataset

head(combined_data, 3)

# Convert relevant variables to factors

combined_data <- combined_data %>%

mutate(across(where(is.character), as.factor))

# Show factor levels for "Phase" variable

factor_levels <- levels(combined_data$Phase)

print(factor_levels)

# Fit the linear regression model

model <- lm(total_seas_prcp ~ SeasonalSOI, data = combined_data)

# Create a scatter plot with linear regression line

ggplot(combined_data, aes(x = SeasonalSOI, y = total_seas_prcp)) +

geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue", formula = y ~ x) +

geom_smooth(method = "lm", se = FALSE, color = "red", linetype = "dashed", formula = y ~ 1) +

facet_wrap(~Season, ncol = 2) +

labs(x = "Seasonal SOI", y = "Total Seasonal Precipitation") +

theme_minimal()

# Print the model summary

summary(model)

# Access the estimated coefficients

coefficients <- coef(model)

beta0 <- coefficients[1]

beta1 <- coefficients[2]

# Make predictions for new data

new_data <- data.frame(SeasonalSOI = c(0.5, 1.2, -0.8)) # Example values of xi

predicted_values <- predict(model, newdata = new_data)

# Print the predicted values

print(predicted_values)

# Confidence interval

conf_interval <- confint(model)

cat("Confidence Interval (95%) for the parameter estimates:\n")

print(conf_interval)

# Fitted values vs residuals


plot(model$fitted.values, model$residuals, xlab = "Fitted Values", ylab = "Residuals", main = "Fitted
Values vs Residuals")

# QQ plot of standardized residuals

qqnorm(model$residuals, main = "Normal Q-Q Plot of Residuals")

qqline(model$residuals)

# Model Summary

cat("Summary of the Linear Model:\n")

summary(model)

# Fit a polynomial regression model

model_poly <- lm(total_seas_prcp ~ poly(SeasonalSOI, 2), data = combined_data)

# Print the model summary

summary(model_poly)

# Access the estimated coefficients

coefficients_poly <- coef(model_poly)

beta0_poly <- coefficients_poly[1]

beta1_poly <- coefficients_poly[2]

beta2_poly <- coefficients_poly[3]

# Equation for the polynomial regression model

# total_seas_prcp = beta0_poly + beta1_poly * SeasonalSOI + beta2_poly * SeasonalSOI^2

# Create new data points

new_data_poly <- data.frame(SeasonalSOI = c(25, -25))


# Make predictions with confidence intervals

predictions_poly <- predict(model_poly, newdata = new_data_poly, interval = "prediction")

# Print the predictions with confidence intervals

print(predictions_poly)

# Perform ANOVA test between linear and polynomial models

anova_result <- anova(model, model_poly)

# Print the ANOVA table

print(anova_result)

# Fit the categorical regression model

model_categorical <- lm(total_seas_prcp ~ as.factor(Phase), data = combined_data)

# Print the model summary

summary(model_categorical)

# Linear model substituting the parameter values into the equation

total_seas_prcp <- beta0 + beta1 * as.numeric(Phase == "Neutral") + beta2 * as.numeric(Phase ==


"LaNina")

You might also like