Rainfall Analysis in Madrid: Statistical Insights Using Transformed Metrics
INTRODUCTION
The analysis of historical rainfall data provides critical insights into long-term climatic
patterns and their impact on environmental and socio-economic activities. Understanding trends
in rainfall over time is essential for water resource management, agricultural planning, and
climate change adaptation. The dataset presented includes rainfall depths (in millimeters) from
1860 to 1989, offering a comprehensive overview of precipitation trends over more than a
century. Variations in rainfall across the years reflect both natural fluctuations and potential
influences of broader climatic shifts, such as global warming and regional climate changes.
Analyzing this data can help to identify patterns, anomalies, and periods of extreme
rainfall, which are essential for flood risk management, drought mitigation, and designing
sustainable agricultural practices. Additionally, it may provide context for correlating hydrological
events with other environmental factors, offering opportunities for predictive modeling and long-
term water management strategies.
OBJECTIVES
1. Identify Long-term Rainfall Trends: Analyze the dataset to determine whether there
are discernible long-term increases or decreases in rainfall depth and explore potential
factors contributing to these trends.
2. Examine Rainfall Variability: Investigate the variability of rainfall across the years,
identifying any anomalies or extreme rainfall events that may indicate periods of drought
or excessive precipitation.
3. Assess Climate Change Indicators: Use the data to assess any potential indicators of
climate change, including significant deviations from historical norms and trends that
may align with known climate events.
4. Inform Water Resource Management: Provide insights into how historical rainfall data
can be used to improve water resource management, particularly in areas vulnerable to
flooding or drought.
5. Support Agricultural Planning: Help in forecasting water availability for agricultural
purposes by identifying patterns in rainfall that could inform decisions on planting
seasons and irrigation needs.
6. Predictive Modeling: Develop models that predict future rainfall patterns based on
historical data, contributing to more accurate weather forecasting and climate resilience
strategies.
METHODOLOGY
The rainfall data for Madrid, comprising annual observations, was analyzed to
understand its statistical and probabilistic characteristics. The dataset was subjected to initial
preprocessing, which included: Any gaps in the data were identified. Although no imputation
was necessary, this step ensured completeness. Outliers were detected using the interquartile
range (IQR) method, calculated as: IQR = Q3 – Q1. Observations falling below Q1 – 1.5 x IQR
or above Q3 + 1.5 x IQR were removed to minimize skewness and retain data integrity.
To improve the interpretability and distribution symmetry of the rainfall data, the following
transformations were applied: Square Root Transformation, this reduced the influence of
extremely high values, stabilizing variance while preserving the data’s core structure. Cube Root
Transformation, the cube root was particularly useful for balancing the distribution of both high
and low rainfall observations, making it ideal for exceedance probability analysis. Logarithmic
Transformation, this transformation was applied to differentiate low rainfall values, spreading
smaller values over a wider range for enhanced clarity.
Exceedance probabilities were determined to assess the likelihood of rainfall exceeding
m
specific thresholds. The Weibull formula was used: P = . For each transformation,
n+1
exceedance probabilities were calculated, enabling comparison across rainfall metrics.
Return periods, defined as the average time interval between occurrences of events
1
exceeding specific thresholds, were derived as: T = .This provided actionable insights into the
P
recurrence of extreme rainfall events.
To convey the data’s distribution and exceedance characteristics, Bins were adjusted for
optimal visualization, balancing resolution, and clarity. Relative frequencies were computed as
percentages. Kernel density estimation was applied to smooth the distribution for visual
representation, scaled to percentages for comparability with histograms. Exceedance
probabilities were plotted against transformed rainfall values, showcasing trends for extreme
and typical events.
Descriptive metrics (mean, median, standard deviation) were calculated for raw and
transformed data to establish baselines and evaluate the effects of transformations. The
comparison highlighted the advantages of each method in specific analytical contexts.
RESULTS AND DISCUSSION
The raw dataset revealed an average annual rainfall of 426.64 mm, with a standard
deviation of 96.00 mm, indicating moderate variability. Rainfall values ranged between 258 mm
and 697 mm, capturing both typical and extreme events. The median rainfall was consistent
with the mean, emphasizing the dataset's central tendency. However, the presence of outliers
highlighted the need for transformation to stabilize variance and improve interpretability.
Transformations played a crucial role in addressing the dataset's skewness and
enhancing the visualization of rainfall distribution: Square Root Transformation, reduced the
impact of extreme rainfall values, producing a more symmetric distribution. This transformation
was particularly effective in stabilizing variance while retaining the dataset’s core structure.
Cube Root Transformation, balanced both high and low rainfall values, resulting in an evenly
spread dataset. This transformation provided the clearest and most interpretable visualizations,
especially for exceedance probabilities and return period analyses. Logarithmic Transformation,
spread smaller rainfall values over a wider range, making low-end variations more
distinguishable. However, it compressed higher values, which slightly limited its effectiveness
for analyzing extreme rainfall.
Exceedance probabilities provided valuable insights into the likelihood of surpassing
specific rainfall thresholds. High rainfall events exceeding 600 mm were rare, with probabilities
below 5%, reflecting their extreme nature. Conversely, typical rainfall values around the mean
(426 mm) had an exceedance probability of approximately 50%, confirming the dataset's central
tendency. Low rainfall values below 300 mm were highly unlikely, with probabilities close to
90%. These findings align with the climatological patterns expected for Madrid, where moderate
rainfall dominates, and extremes occur infrequently.
The return periods calculated from exceedance probabilities offered practical
implications for rainfall event recurrence: A rainfall event of 600 mm or more had a return period
of approximately 20 years, emphasizing its rarity. Typical rainfall events (e.g., 426 mm) were
expected to occur every 2 years, aligning with the dataset's moderate variability. Low rainfall
events below 300 mm were frequent, with return periods of less than a year, indicating their
commonality in the region.
The relative frequency and density plots offered complementary perspectives on the
dataset's distribution: Relative frequency histograms showed a steep decline in frequency as
rainfall values increased, underscoring the rarity of extreme events. The peaks of the
histograms consistently aligned with typical rainfall values between 400–500 mm. Density plots
revealed a smooth distribution curve across transformations, with the cube root transformation
achieving the best balance between high and low rainfall values. These plots highlighted how
transformations effectively spread the data, aiding in visual interpretation.
Among the transformations, the cube root emerged as the most effective for balancing
the dataset. It not only smoothed the distribution but also facilitated clear exceedance and return
period visualizations. The square root and logarithmic transformations, while useful, had more
specialized applications: the square root for stabilizing variance and the logarithmic for
analyzing low-end rainfall variations.
The findings emphasize the moderate variability of Madrid's rainfall and the prevalence
of typical rainfall events around 400–500 mm. Extreme events, though rare, are critical for
hydrological and urban planning. By analyzing rainfall through various transformations, the
study provided a multi-faceted understanding of data behavior, making it applicable to both
climatological research and practical applications like disaster risk reduction and resource
allocation.
CONCLUSION
The analysis of annual rainfall data for Madrid revealed critical insights into its statistical
characteristics and patterns. The dataset, with an average annual rainfall of 426.64 mm and a
standard deviation of 96.00 mm, exhibited moderate variability, indicating a relatively stable
climate with occasional extreme events. By applying transformations, particularly the cube root
and logarithmic scales, the study successfully normalized the data distribution, making it easier
to interpret rare and extreme events while preserving the integrity of the data. The square root
transformation was effective in reducing the dominance of high-end values, while the logarithmic
transformation provided a clearer view of lower rainfall magnitudes. The cube root
transformation stood out as the most balanced, effectively spreading data points across the
range and facilitating better visual and statistical analysis.
Exceedance probability calculations highlighted that extreme rainfall events above 600
mm are rare, with probabilities below 5%, while typical rainfall around the median (~426 mm)
occurs with a 50% likelihood. Return periods derived from exceedance probabilities offered
actionable insights, showing that high rainfall events exceeding 600 mm are expected
approximately once every 20 years, whereas moderate rainfall events occur more frequently.
These findings are critical for hydrological planning, flood risk assessments, and water resource
management.
The integration of relative frequency and density visualizations further illuminated the
dataset's behavior, particularly the clustering of typical rainfall values around 400–500 mm.
These graphs also demonstrated the transformations' impact on spreading and smoothing the
data distribution. This multi-faceted approach underscores the importance of applying
transformations and statistical techniques to better understand and predict climate behavior.
Overall, the study offers a robust framework for analyzing rainfall patterns, with applications in
urban planning, agriculture, and disaster risk reduction. The conclusions drawn not only
enhance our understanding of Madrid’s rainfall dynamics but also provide a methodology that
can be adapted for similar analyses in other regions.
RECOMMENDATIONS
Utilize cube root transformations in future studies to balance rainfall data for analysis and
visualization effectively.
Integrate additional climatic factors, such as temperature and humidity, to enrich the
contextual understanding of rainfall variability.
Develop predictive models leveraging return period data to inform water resource
management and urban planning.
Conduct further studies on seasonal and monthly rainfall distributions for a finer temporal
analysis.
REFERENCES
Alexander, L. V., & Jones, P. D. (2001). Updated precipitation series for the UK and
discussion of recent extremes. Atmospheric Science Letters, 1(2), 142-150.
Trenberth, K. E. (2011). Changes in precipitation with climate change. Climate Research,
47(1-2), 123-138.
Kumar, V., Jain, S. K., & Singh, Y. (2010). Analysis of long-term rainfall trends in India.
Hydrological Sciences Journal, 55(4), 484-496.
Hannaford, J., & Marsh, T. J. (2006). An assessment of trends in UK runoff and low flows
using a network of undisturbed catchments. International Journal of Climatology: A Journal
of the Royal Meteorological Society, 26(9), 1237-1253.
Huntington, T. G. (2006). Evidence for intensification of the global water cycle: Review and
synthesis. Journal of Hydrology, 319(1-4), 83-95.
Madsen, H., Lawrence, D., Lang, M., Martinkova, M., & Kjeldsen, T. R. (2014). Review of
trend analysis and climate change projections of extreme precipitation and floods in Europe.
Journal of Hydrology, 519, 3634-3650.
Zhang, X., Zwiers, F. W., Hegerl, G. C., & Francis, W. (2007). Detection of human
influence on twentieth-century precipitation trends. Nature, 448(7152), 461-465.
Rajeevan, M., Bhate, J., & Kale, J. D. (2006). High resolution daily gridded rainfall data for
the Indian region: Analysis of break and active monsoon spells. Current Science, 91(3), 296-
306.
APPENDIX A: R CODE
library(tidyverse)
library(ggpubr)
library(rstatix)
library(car)
library(broom)
MADRID <- read.csv("C:/Users/Cherry mae/Downloads/Madrid-125-obs.csv")
####
#REMOVE OUTLIERS
# Load the necessary library
library(dplyr)
# Assuming your dataset is already loaded as MADRID
# Calculate Q1, Q3, and IQR
Q1 <- quantile(MADRID$observation, 0.25)
Q3 <- quantile(MADRID$observation, 0.75)
IQR <- Q3 - Q1
# Define lower and upper bounds
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
# Filter the dataset
MADRID2 <- MADRID %>%
filter(observation >= lower_bound & observation <= upper_bound)
# View the cleaned dataset
print(MADRID2)
#########
#MEAN AND SD
# Calculate mean and standard deviation
MEAN <- mean(MADRID2$observation)
SD <- sd(MADRID2$observation)
# Print the results
cat("Mean of observations:", MEAN, "\n")
cat("Standard deviation of observations:", SD, "\n")
#######
#RANKING
# Rank the data in descending order
RANKED <- MADRID2 %>%
arrange(desc(observation))
# View the ranked dataset
print(RANKED)
##################
#PROBABILITY OF EXCEEDANCE
#wEIBULL AND GRINGORTEN
# Load necessary libraries
library(dplyr)
# Assuming cleaned_data has been ranked
ranked_data <- MADRID2 %>%
arrange(desc(observation))
# Calculate number of observations
n <- nrow(ranked_data)
# Calculate Weibull and Gringorten probabilities of exceedance
ranked_data <- ranked_data %>%
mutate(
rank = row_number(),
Weibull_P = 1 - (rank / (n + 1)),
Gringorten_P = rank / (n + 1)
)
# View the updated ranked dataset with probabilities
print(ranked_data)
######
#PLOT
#WEIBULL AND GRINGORTEN
# Load the ggplot2 library
library(ggplot2)
# Plot the probabilities of exceedance
ggplot(ranked_data, aes(x = observation)) +
geom_line(aes(y = Weibull_P, color = "Weibull"), size = 1) +
geom_line(aes(y = Gringorten_P, color = "Gringorten"), size = 1) +
scale_y_continuous(labels = scales::percent) + # Convert y-axis to percentage
labs(
title = "MADRID - Total Rainfall",
x = "Rainfall Depth (mm)",
y = "Probability of Exceedance (%)",
color = "Method"
)+
theme_minimal() +
theme(legend.position = "right")
#################
#PROBABILITY OF EXCEEDANCE
# Load necessary libraries
library(dplyr)
# Assuming MADRID2 is already loaded as a data frame
# Calculate exceedance probability
MADRID2 <- MADRID2 %>%
arrange(observation) %>% # Sort the observations in ascending order
mutate(exceedance_prob = (n() - row_number() + 1) / n() * 100) # Calculate exceedance probability
# View the updated dataset with exceedance probabilities
print(MADRID2)
# Load necessary libraries
library(ggplot2)
library(dplyr)
# Assuming MADRID2 is already loaded as a data frame
# Calculate exceedance probability
MADRID2 <- MADRID2 %>%
arrange(observation) %>% # Sort the observations
mutate(rank = row_number(),
exceedance_prob = (n() - rank + 1) / n() * 100) # Calculate exceedance probability
# Create the plot
ggplot(MADRID2, aes(x = observation, y = exceedance_prob)) +
geom_line() +
geom_point() +
scale_y_continuous(name = "Probability of Exceedance (%)") +
scale_x_continuous(name = "Rainfall Depth (mm)") +
ggtitle("MADRID - Total Rainfall") +
theme_minimal()
#####################
#RETURN PERIOD
# Load necessary libraries
library(dplyr)
# Assuming MADRID2 is already loaded as a data frame
# Calculate return period
MADRID2 <- MADRID2 %>%
arrange(observation) %>% # Sort the observations in ascending order
mutate(rank = row_number(),
return_period = (n() + 1) / rank) # Calculate return period
# View the updated dataset with return period
print(MADRID2)
####################
#RELATIVE FREQ AND DENSITY VS. RAINFALL
# Load necessary libraries
library(ggplot2)
library(dplyr)
# Assuming MADRID2 is already loaded as a data frame
# Calculate relative frequency
relative_freq <- MADRID2 %>%
group_by(observation) %>%
summarise(count = n()) %>%
mutate(relative_frequency = (count / sum(count)) * 100) # Convert to percentage
# Calculate density
density_data <- density(MADRID2$observation, na.rm = TRUE)
# Create a data frame for density
density_df <- data.frame(
observation = density_data$x,
density = density_data$y * 100 # Convert to percentage
)
# Create the plot
ggplot() +
geom_bar(data = relative_freq, aes(x = observation, y = relative_frequency),
stat = "identity", fill = "blue", alpha = 0.5) + # Relative frequency
geom_line(data = density_df, aes(x = observation, y = density),
color = "red", size = 1) + # Density
scale_y_continuous(name = "Relative Frequency (%)", sec.axis = sec_axis(~., name = "Density (%)")) +
scale_x_continuous(name = "Rainfall Depth (mm)") +
ggtitle("Density and Relative Frequency vs Rainfall Depth") +
theme_minimal()
####################
#PROBABILITY
#RETURN PERIOD
#EVENTS
# Load necessary libraries
library(dplyr)
# Assuming MADRID2 is already loaded as a data frame
# Step 1: Calculate Exceedance Values and Return Periods
exceedance_probs <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# Calculate return period and corresponding events
exceedance_results <- data.frame(Probability = exceedance_probs) %>%
mutate(Return_Period = 100 / Probability) %>% # Calculate return period
arrange(Probability) %>%
rowwise() %>%
mutate(Event = MADRID2$observation[
max(1, round((nrow(MADRID2) * (1 - Probability / 100)))) # Ensure index starts from 1
])
# View the results
print(exceedance_results)
################
#SQRT TRANSFORMATION
#EXCEEDANCE VS SQRT RAINFALL
# Load necessary libraries
library(dplyr)
library(ggplot2)
# Assuming MADRID2 is already loaded and includes the square root transformed data
# Step 1: Apply square root transformation if not done already
MADRID2 <- MADRID2 %>%
mutate(SQRT_Observation = sqrt(observation))
# Step 2: Sort the transformed data
sorted_data <- MADRID2 %>%
arrange(SQRT_Observation)
# Step 3: Calculate the probability of exceedance
n <- nrow(sorted_data) # Total number of observations
# Add probability of exceedance to the sorted data
exceedance_probabilities <- sorted_data %>%
mutate(Probability = (1 - (row_number() - 1) / n) * 100) # Calculate exceedance probability
# Step 4: Plot Exceedance vs. Square Root Transformed Rainfall
ggplot(exceedance_probabilities, aes(x = SQRT_Observation, y = Probability)) +
geom_line(color = "blue") + # Line for exceedance probability
geom_point(color = "red") + # Points for individual data
scale_x_continuous(limits = c(15, 30), breaks = seq(15, 30, by = 5)) + # Set x-axis range from 0 to 30
scale_y_reverse(limits = c(100, 0), breaks = seq(0, 100, by = 10)) + # Set y-axis range from 0 to 100
labs(title = "MADRID - Total Rainfall",
x = "Square Root Transformed Rainfall (mm)",
y = "Probability of Exceedance (%)") +
theme_minimal() # A cleaner theme
#######
#FREQ VS SQRT RAINFALL
# Load necessary libraries
library(dplyr)
library(ggplot2)
# Assuming MADRID2 is already loaded and includes the square root transformed data
# Step 1: Apply square root transformation if not done already
MADRID2 <- MADRID2 %>%
mutate(SQRT_Observation = sqrt(observation))
# Step 2: Calculate Relative Frequency
# This step will not be necessary for histogram as it will be calculated directly in geom_histogram
# relative_frequency <- MADRID2 %>%
# group_by(SQRT_Observation) %>%
# summarise(Frequency = n()) %>%
# mutate(Relative_Frequency = Frequency / sum(Frequency) * 100)
# Step 3: Create Histogram with Adjusted Bin Width
ggplot(MADRID2, aes(x = SQRT_Observation)) +
geom_histogram(aes(y = ..density.. * 100), # Convert density to percentage
binwidth = 0.5, # Adjust the bin width here
fill = "lightblue",
alpha = 0.5,
color = "black") + # Outline color for the bars
geom_density(aes(y = ..density.. * 100), # Convert density to percentage
color = "blue", size = 1) + # Overlay density line
labs(title = "MADRID - Total Rainfall",
x = "Square Root Transformed Rainfall (mm)",
y = "Relative Frequency (%)") +
theme_minimal() # A cleaner theme
#######
#CUBE ROOT
#EXCEDENCE VS CUBE ROOT RAINFALL
# Load necessary libraries
library(dplyr)
library(ggplot2)
# Assuming MADRID2 is already loaded
# Step 1: Apply cube root transformation
MADRID2 <- MADRID2 %>%
mutate(CUBE_ROOT_Observation = observation^(1/3)) # Cube root transformation
# Step 2: Sort the transformed data
sorted_data <- MADRID2 %>%
arrange(CUBE_ROOT_Observation)
# Step 3: Calculate the probability of exceedance
n <- nrow(sorted_data) # Total number of observations
# Add exceedance probability to the sorted data
exceedance_probabilities <- sorted_data %>%
mutate(Exceedance_Probability = (1 - (row_number() - 1) / n) * 100) # Calculate exceedance
probability
# Step 4: Plot Exceedance vs. Cube Root Transformed Rainfall
ggplot(exceedance_probabilities, aes(x = CUBE_ROOT_Observation, y = Exceedance_Probability)) +
geom_line(color = "blue") + # Line for exceedance probability
geom_point(color = "red") + # Points for individual data
scale_y_reverse(limits = c(100, 0), breaks = seq(0, 100, by = 10)) + # Reverse y-axis from 100 to 0
labs(title = "MADRID - Total Rainfall",
x = "Cube Root Transformed Rainfall (mm)",
y = "Probability of Exceedance (%)") +
theme_minimal() # A cleaner theme
########
#FREQ AND DENSITY VS. CUBE ROOT
# Load necessary libraries
library(dplyr)
library(ggplot2)
# Assuming MADRID2 is already loaded
# Step 1: Apply cube root transformation
MADRID2 <- MADRID2 %>%
mutate(CUBE_ROOT_Observation = observation^(1/3)) # Cube root transformation
# Step 2: Plot Relative Frequency and Density with Increased Bin Width
ggplot(MADRID2, aes(x = CUBE_ROOT_Observation)) +
geom_histogram(aes(y = ..count.. / sum(..count..) * 100), # Relative frequency as a percentage
binwidth = 1.0, # Increased bin width (adjust this value as needed)
fill = "lightblue",
alpha = 0.5,
color = "black") + # Outline color for the bars
geom_density(aes(y = ..density.. * 100), # Convert density to percentage
color = "blue", size = 1) + # Overlay density line
labs(title = "MADRID - Total Rainfall",
x = "Cube Root Transformed Rainfall (mm)",
y = "Relative Frequency (%)") +
scale_y_continuous(sec.axis = sec_axis(~ ., name = "Density (%)")) + # Secondary y-axis for density
theme_minimal() # A cleaner theme
#############
#LOGARITHM
#EXCEEDANCE VS LOGARITHM
# Load necessary libraries
library(dplyr)
library(ggplot2)
# Assuming MADRID2 is already loaded
# Step 1: Apply logarithmic transformation
MADRID2 <- MADRID2 %>%
mutate(LOG_Observation = log(observation)) # Logarithmic transformation
# Step 2: Sort the transformed data
sorted_data <- MADRID2 %>%
arrange(LOG_Observation)
# Step 3: Calculate the probability of exceedance
n <- nrow(sorted_data) # Total number of observations
# Add exceedance probability to the sorted data
exceedance_probabilities <- sorted_data %>%
mutate(Exceedance_Probability = (1 - (row_number() - 1) / n) * 100) # Calculate exceedance
probability
# Step 4: Plot Exceedance vs. Log Transformed Rainfall
ggplot(exceedance_probabilities, aes(x = LOG_Observation, y = Exceedance_Probability)) +
geom_line(color = "blue") + # Line for exceedance probability
geom_point(color = "red") + # Points for individual data
labs(title = "MADRID - Total Rainfall",
x = "Log Transformed Rainfall (mm)",
y = "Probability of Exceedance (%)") +
theme_minimal() # A cleaner theme
###########
#FREQ AND DENSITY VS LOGARITHM
# Load necessary libraries
library(dplyr)
library(ggplot2)
# Assuming MADRID2 is already loaded
# Step 1: Apply logarithmic transformation
MADRID2 <- MADRID2 %>%
mutate(LOG_Observation = log(observation)) # Logarithmic transformation
# Step 2: Plot Relative Frequency and Density
ggplot(MADRID2, aes(x = LOG_Observation)) +
geom_histogram(aes(y = ..count.. / sum(..count..) * 100), # Relative frequency as a percentage
binwidth = 0.1, # Adjust bin width as needed
fill = "lightblue",
alpha = 0.5,
color = "black") + # Outline color for the bars
geom_density(aes(y = ..density.. * 100), # Convert density to percentage
color = "blue", size = 1) + # Overlay density line
labs(title = "MADRID - Total Rainfall",
x = "Log Transformed Rainfall (mm)",
y = "Relative Frequency(%)") +
scale_y_continuous(sec.axis = sec_axis(~ ., name = "Density (%)")) + # Secondary y-axis for density
theme_minimal() # A cleaner theme
APPENDIX B
MADRID TOTAL RAINFALL (RAW)
MADRID RAINFALL (TRANSFORMED: SQRT)
MADRID RAINFALL (TRANSFORMED: CUBE ROOT)
MADRID RAINFALL (TRANSFORMED: LOG)