SASA2023 Proceedings
SASA2023 Proceedings
29 November –
1 December 2023
Durban
Proceedings of the 64th Annual Conference of the
South African Statistical Association for 2023
(SASA 2023)
ISBN 978-0-7961-3746-3
Editor
Charl Pretorius North-West Universtiy
Assistant Editors
Sugnet Lubbe Stellenbosch University
Andréhette Verster University of the Free State
Managing Editor
Charl Pretorius North-West University
Review Process
Eight (8) manuscripts were submitted for possible inclusion in the Proceedings of the 64th Annual
Conference of the South African Statistical Association. All submitted papers were assessed by the
editorial team for suitability, after which all papers were sent to be reviewed by two independent
reviewers each. Papers were reviewed according to the following criteria: relevance to conference
themes, relevance to audience, standard of writing, originality and critical analysis. After consid-
eration and incorporation of reviewer comments, four manuscripts were judged to be suitable for
inclusion in the proceedings of the conference.
i
Reviewers
The editorial team would like to thank the following reviewers:
Renette Blignaut University of the Western Cape
Jan Blomerus University of the Free State
Warren Bretteny Nelson Mandela University
Humphrey Brydon University of the Western Cape
Allan Clark University of Cape Town
Legesse Debusho University of South Africa
Tertius de Wet University of Stellenbosch
Victoria Goodall VLG Statistical Services
Gerrit Grobler North-West University
Johané Nienkemper-Swanepoel Stellenbosch University
Ibidun Obagbuwa Sol Plaatje University
Etienne Pienaar University of Cape Town
Gary Sharp Nelson Mandela University
Neill Smit North-West University
Vaughan van Appel University of Johannesburg
Sean van der Merwe University of the Free State
Stephan van der Westhuizen Stellenbosch University
Tanja Verster North-West University
Contact Information
Queries can be sent by email to the Managing Editor ([email protected]).
ii
Table of Contents
Directional Gaussian spatial processes for South African wind data 1
J. S. Blom, P. Nagar and A. Bekker
Information transmission between Bitcoin and other asset classes on the Johannesburg 13
Stock Exchange
K. Els, C. Mills, W. Turkington and C.-S. Huang
Bandwidth selection in a generic similarity test for spatial data when applied to un- 43
marked spatial point patterns
J. Nel, R. Stander and I. N. Fabris-Rotelli
iii
Proceedings of the 64th Annual Conference of the South African Statistical Association
© 2023 South African Statistical Association
1. Introduction
The objectives, tactics, long-term aspirations, and growth trajectory pertaining to renewable energy
under the framework of Sustainable Development Goal 7 (SDG-7) in the United Nations’ 2030
Sustainable Development Goals1 (SDGs) are designed to facilitate universal access to power, clean
cooking fuels, and advanced technologies. A concise overview of the latest findings and method-
ologies pertaining to the conversion of energy derived from renewable sources into usable forms is
presented by Trinh and Chung (2023). Over the past decade, there has been a notable growth in the
proportion of the worldwide population that has obtained access to electricity, marking a significant
milestone. However, it is worth noting that the number of individuals lacking access to electricity
1
2 BLOM, NAGAR & BEKKER
in Sub-Saharan Africa has experienced a concerning rise during the same period2 . South Africa
must take measures toward the implementation of renewable energy initiatives in a global context
where the popularity of fossil fuels is waning and climate action is viewed as an absolute necessity.
Wind power could provide a remedy to South Africa’s persistent energy shortages. Nevertheless,
harnessing wind energy is a complex endeavour that requires a nuanced understanding of a variety
of factors. The study of wind energy holds significant relevance in promoting the four key aspects of
energy access, energy efficiency, renewable energy, and international collaboration, hence facilitating
the advancement of Sustainable Development Goals. Therefore, modelling wind patterns is crucial
in modern society for multiple reasons, including renewable energy, weather forecasting, air quality,
and aviation.
Numerical models for weather forecasts require statistical post-processing. Linear variables such
as wind speed can be post-processed in different ways as shown in Jona-Lasinio et al. (2007), Kalnay
(2002) and Wilks (2006), whereas a circular (or angular) variable like wind direction cannot be
post-processed using standard methods (Engel and Ebert, 2007; Bao et al., 2010). Bias correction
and ensemble calibration techniques for determining the direction of wind are discussed in Bao et al.
(2010). For the bias correction, Bao et al. (2010) considered a circular-circular regression model as
proposed in Kato et al. (2008) and for the ensemble calibration a Bayesian model averaging with the
von Mises distribution was considered. However, this study did not consider the spatial configuration
in the data. The challenge is incorporating structured dependence into directional data. Directional
statistics have been developed for many years now starting as early as 1961, where the authors studied
complex circular distributions underlying the theoretical framework (Watson, 1961; Stephens, 1963;
Kent, 1978). Various approaches to dealing with circular data, distribution theory and inference can
be found in Ley and Verdebout (2017), Jupp and Mardia (2009) and Mardia (1972). Previous studies
conducted by Rad et al. (2022) and Arashi et al. (2020) explore the feasibility of predicting wind
direction in South Africa. Nevertheless, the inclusion of the spatial component in these studies was
also lacking.
In the past, spatial models were employed to model wind patterns, but they had challenges with
accounting for wind’s nonlinear and complicated behaviour. Due to the spatial dependence structure
that arises in wind data, a straightforward linear model cannot be used to model wind patterns,
as discussed in Jona-Lasinio et al. (2012). Coles (1998) proposed a wrapped Gaussian model for
modelling wind directions. The approach assumed an unspecified covariance matrix and independent
angular information, working in low dimensions. However, an extension to a spatial framework was
briefly discussed. This extension was later introduced by Casson and Coles (1998) where the circular
variables were considered to be conditionally independent von Mises distributed. More recently,
Jona-Lasinio et al. (2012) introduced a model to analyse wave direction data using a wrapped
Gaussian spatial process (WGSP). The WGSP takes into account the spatial structure of directional
variables with a potential for high-dimensional multivariate observations which are driven by a spatial
process. The methodology allows for the implementation of spatial prediction of the mean direction
and concentration while also capturing the dependence structure.
In this paper, we consider the WGSP and projected Gaussian spatial process (PGSP) for modelling
wind patterns in South Africa. These models account for the highly complex dependence structure
2
https://www.iea.org/reports/sdg7-data-and-projections/access-to-electricity[Accessed 31 October 2023]
DIRECTIONAL GAUSSIAN SPATIAL PROCESSES FOR SOUTH AFRICAN WIND DATA 3
that arises in wind data as well as the periodic nature of directional data as developed by Jona-Lasinio
et al. (2012) (See also Ley and Verdebout, 2018). There are significant distinctions between the
two approaches. The wrapping approach constructs a circular distribution that is similar (generally)
to its real line counterpart. In other words, if the real line distribution is symmetric and unimodal
then the wrapped distribution will have the same characteristics (Jammalamadaka and SenGupta,
2001). The projected Gaussian model, however, may result in differing characteristics from the
real line counterpart. For example, the projected Gaussian model can be asymmetric and bimodal.
The main justification for proposing these two techniques resides in that it is simple to introduce
spatial dependence. The wrapping produces results that are relatively simple to interpret in terms
of phenomenon behaviour, whereas the projection is extremely useful when interpretation is less
critical and a highly flexible model is required, as stated in Mastrantonio et al. (2016).
The remainder of the paper is structured as follows. Section 2 explores a South African wind
data set to monitor the wind behaviour over the course of a day. Section 3 outlines the WGSP and
the PGSP models. Section 4 examines the behaviour of two distinct methodologies employed for
evaluating the wind direction over multiple locations in South Africa. In Section 5, we will delve
into the last remarks and potential avenues for future research.
3
http://wasadata.csir.co.za/wasa1/WASAData [Accessed July 2023]
4 BLOM, NAGAR & BEKKER
Figure 1. Map of South Africa with region under consideration indicated with dots.
Table 1. Circular descriptive statistics of the wind direction over the entire region under consideration
for each time period.
Time of day mean direction median direction variance standard deviation
05:00 0.62854 0.55833 0.23251 0.72750
11:00 0.51324 0.52081 0.10676 0.47517
17:00 0.14884 0.11990 0.05948 0.35019
23:00 0.20445 0.18064 0.07174 0.38585
3. Methodology
3.1 Wrapped Gaussian Spatial Process
In the linear domain, suppose we define a multivariate distribution for Y = (𝑌1 , 𝑌2 , ..., 𝑌 𝑝 ) with
Y ∼ 𝑔(·), where 𝑔(·) is a 𝑝-variate distribution on R 𝑝 indexed by 𝜔; a sensible choice for 𝑔(·)
would be a 𝑝-variate Gaussian distribution. Let K = (𝐾1 , 𝐾2 , ..., 𝐾 𝑝 ) be such that Y = X + 2𝜋K.
Then X = (𝑋1 , 𝑋2 , ..., 𝑋 𝑝 ) is defined as a wrapped multivariate distribution induced from Y with the
transformation X = Y mod 2𝜋. If the linear variable Y is defined on R 𝑝 then the wrapped induced
variable X will also be defined on R 𝑝 as defined in Jupp and Mardia (2009). The wrapped Gaussian
process will be fitted within a Bayesian framework using Markov Chain Monte Carlo (MCMC)
methods. For further details the reader is referred to Jona-Lasinio et al. (2012).
For the interpolation step, kriging will be used to make predictions at unobserved locations.
Consider a Gaussian process (GP) in a spatial setting, we have locations 𝑠1 , 𝑠2 , ..., 𝑠 𝑝 where 𝑠 ∈ R𝑑 and
𝑌 (𝑠) is a GP with mean 𝜇(𝑠) and an exponential covariance function 𝜎 2 𝜌(𝑠 − 𝑠′ ; 𝜙) where 𝜙 is known
as the decay parameter. We then have that X = (𝑋 (𝑠1 ), 𝑋 (𝑠2 ), ..., 𝑋 (𝑠 𝑝 )) follows a wrapped Gaussian
distribution with parameters µ = (𝜇(𝑠1 ), ..., 𝜇(𝑠 𝑝 )) and 𝜎 2 R(𝜙) where 𝑅(𝜙)𝑖 𝑗 = 𝜌(𝑠𝑖 − 𝑠 𝑗 ; 𝜙) as
defined in Jona-Lasinio et al. (2012). Suppose we have observations, X = (𝑋 (𝑠1 ), 𝑋 (𝑠2 ), ..., 𝑋 (𝑠 𝑝 )),
and would like to predict a new value 𝑋 (𝑠0 ) at an unobserved location 𝑠0 . The point of departure
follows similarly to a GP in the inline (linear) domain. The joint distribution for the linear observations
DIRECTIONAL GAUSSIAN SPATIAL PROCESSES FOR SOUTH AFRICAN WIND DATA 5
(a) Rose diagram of wind direction over the entire (b) Rose diagram of wind direction over the entire
region at 05:00. region at 11:00.
(c) Rose diagram of wind direction over the entire (d) Rose diagram of wind direction over the entire
region at 17:00. region at 23:00.
Figure 2. Rose diagram of wind direction over the entire region for the four different time periods.
From (1), the conditional distribution of 𝑌 (𝑠0 )|Y, θ can be obtained. The wrapped Gaussian
distribution of 𝑋 (𝑠0 )|X, K, θ, and thus 𝐸 (𝑒 𝑖𝑋 (𝑠0 ) |X, K; θ), can then easily be derived. To obtain
𝐸 (𝑒 𝑖𝑋 (𝑠0 ) |X, K; θ) it is necessary to marginalise over the distribution of K|X, θ which will require
6 BLOM, NAGAR & BEKKER
an 𝑛-fold sum over a multivariate discrete distribution which is problematic even when considering
truncation. Thus, we consider a Bayesian framework to fit the wrapped GP model which will
induce posterior samples (θ𝑏∗ , K∗𝑏 ), 𝑏 = 1, 2, ..., 𝐵. Using Monte Carlo integration the following
approximation is obtained:
1 ∑︁
𝐸 (𝑒 𝑖𝑋 (𝑠0 ) |X) ≈ exp(−𝜎 2 (𝑠0 , θ𝑏∗ )/2 + 𝑖 𝜇(𝑠
˜ 0 , X + 2𝜋K∗𝑏 ; θ𝑏∗ )). (2)
𝐵 𝑏
Africa, specifically the WrapSp, ProjSp, WrapKrigSp and ProjKrigSp functions. CircSpaceTime
was specifically developed for the implementation of Bayesian models for spatial interpolation of
directional data using the wrapped Gaussian distribution and the projected Gaussian distribution.
Firstly, the WrapSp function was applied to estimate the wrapped Gaussian posterior distribution
for the given wind data. The WrapSp function can run for multiple MCMC chains, storing the
posterior samples for 𝜇 (circular mean), 𝜎 2 (variance) and 𝜙 (spatial correlation decay parameter).
Based on the data described in Section 2, there were 97 observations (𝑛 = 97), 87 of which were
used for the modelling, while the other 10 observations were our validation set. The validation set,
consisting of 10 randomly selected points from the 97 observations, were used for prediction and
model diagnostics. The WrapSp function requires the specification of prior distributions and a few
parameters for the MCMC computation. The prior distribution values were chosen based on the data
exploration as discussed in Section 2 and Table 1.
An exponential covariance function was considered. The prior for 𝜇 was a wrapped Gaussian
distribution, for 𝜎 2 an informative inverse gamma prior, and for the decay parameter 𝜙 a uniform
prior which is weakly informative. The details of the model specification was provided for the 23:00
time period only. The remaining time periods follow similarly. Therefore, the prior distribution
values applied for the 23:00 time period data were
• 𝜇 ∼ WN(0, 2),
• 𝜎 2 ∼ IG(7, 0.5),
• 𝜙 ∼ U(0.001, 0.9).
The MCMC ran with two chains in parallel for 100 000 iterations with a burnin of 30 000, thinning
of 10 and an acceptance probability of 0.234 following Jona-Lasinio et al. (2012). The adaptive
process of the Metropolis-Hasting step starts at the 100th iteration and ends at the 10 000th iteration.
It is important that the adaptive procedure ends before the burnin is initiated to guarantee that the
saved samples were drawn from correct posterior distributions as in Jona-Lasinio et al. (2020). The
ConvCheck function was used to check for convergence and to obtain graphs of the MCMC. Figure
3 illustrates the traces and densities of the MCMC. A traceplot is an essential plot for evaluating
convergence and diagnosing chain problems. It shows the time series of the sampling process and the
expected outcome is to get a traceplot that looks completely random. The traceplots and the estimated
posterior density plots of the generated samples are shown in Figure 3 for each of the parameters.
Using our fitted model, WrapKrigSp was applied for the interpolation. The function produces
posterior spatial predictions on the unobserved locations across all posterior samples, together with
the mean and variance of the corresponding linear Gaussian process. Once the predictions were
obtained, the average prediction error (APE) – defined as the average circular distance – and circular
continuous ranked probability score (CRPS) were computed for the model; see Jona-Lasinio et al.
(2012) and Jona-Lasinio et al. (2020).
From Table 2 we observe 95% credible intervals for 𝜇, ˆ 𝜎 2 , 𝜙 as well as the APE and CRPS for
the wrapped Gaussian model. One must take into account that the 𝜇ˆ is a directional variable. The
APE scores can be attributed to the fact that only 97 observations were considered. The APE score
demonstrates sensitivity to the selection of prediction points, resulting in variability when different
coordinates were used in the validation set. The APE is very dependent on the number of observations
8 BLOM, NAGAR & BEKKER
(a) Trace plots of MCMC run for 𝜇 (top), 𝜎 2 (middle) (b) Density plots of MCMC run for 𝜇 (top), 𝜎 2 (middle)
and 𝜙 (bottom). and 𝜙 (bottom).
Figure 3. Traces and densities from the MCMC run for the wrapped Gaussian spatial model.
considered and the prior selection for 𝜙. These results align with the conclusion in Riha (2020), who
emphasises the importance of hyper-parameter settings for the prior distributions of the spatial decay
parameter 𝜙 and the variance 𝜎 2 for spatial interpolation with wrapped Gaussian process models. We
note that the APE is affected by the data’s variability. As depicted in Figure 4 and observed in Table
DIRECTIONAL GAUSSIAN SPATIAL PROCESSES FOR SOUTH AFRICAN WIND DATA 9
ˆ 𝜎 2 , 𝜙, the APE and CRPS for the WGSP model for the
Table 2. The 95% credible intervals for 𝜇,
different time periods.
𝜇ˆ 𝜎2 𝜙
Time of day 95% C.I. 95% C.I. 95% C.I. APE CRPS
05:00 (0.45406; 0.76958) (0.40963; 0.74301) (0.02375; 0.87666) 0.66563 0.45200
11:00 (0.41816; 0.62425) (0.17859; 0.31959) (0.01624; 0.58459) 0.58396 0.47646
17:00 (0.07741; 0.22503) (0.09163; 0.15986) (0.01686; 0.58482) 0.12439 0.06647
23:00 (0.13639; 0.30191) (0.11557; 0.20323) (0.02382; 0.87776) 0.21914 0.14669
1 and 2, there is a noticeable contrast in data variance between the time periods 05:00 and 23:00.
Specifically, at 05:00, wind directions exhibit significant variability, whereas at 23:00, they tend to
align in a more consistent direction. Consequently, this disparity in data variability contributes to the
difference in APE scores between these time periods. A similar pattern emerges when comparing
the conditions at 11:00 and 17:00. It can be noted that the two morning time periods have much
more variability than the two evening time points, with 17:00 having the lowest variance of 0.05948
yielding the lowest APE of 0.12439 as well.
Next we fit the PGSP model to the wind data observed at the 23:00 time period. Note the PGSP
is more sensitive to the choice of priors, specifically for the decay parameter. The details of the
model specification were provided for the 23:00 time period only. The remaining time periods follow
similarly. The prior distribution values used in the PGSP model were
0 10 0
• µ∼N , ,
1 0 10
• 𝜎 2 ∼ IG(7, 0.5),
• 𝜙 ∼ U(0.001, 0.9),
• 𝜏 ∼ U(−1, 1).
We specify an exponential covariance function to be used. The prior for 𝜇 was a bivariate
Gaussian distribution, for 𝜎 2 an informative inverse gamma prior, for 𝜏 a uniform prior and for
the decay parameter 𝜙 a uniform prior which is weakly informative. The remainder of the function
specification was the same as the WGSP model. From the convergence check (including the traceplots
that were not reported) we see that the chains reached convergence. The PGSP’s flexibility allows a
better fit of the model with an APE of 0.03874 and a CRPS of 0.02506 for the 23:00 time period.
Table 3 reports the results of the APE and CRPS for both the WGSP and PGSP models. It is clear
that for the South African wind data the PGSP model outperforms and is able to better capture the
structure of the data. The WGSP model was computationally less demanding and allowed the choice
of less informative priors, as well as for the parameters to be easily interpretable, which is not the
case for the projected Gaussian model.
5. Conclusion
This paper explored the potential of utilising directional statistics within spatial analysis to model
wind patterns in South Africa, drawing on methods developed in Jona-Lasinio et al. (2012). The
10 BLOM, NAGAR & BEKKER
Table 3. Goodness-of-fit measures for the wrapped Gaussian model (WGSP) and projected Gaussian
model (PGSP) for the South African wind data.
Time of day Model APE CRPS
WGSP 0.66563 0.45200
05:00
PGSP 0.12156 0.09955
WGSP 0.58396 0.47646
11:00
PGSP 0.06419 0.04361
WGSP 0.12439 0.06647
17:00
PGSP 0.04839 0.04319
WGSP 0.21914 0.14669
23:00
PGSP 0.03874 0.02506
wrapped Gaussian model and projected Gaussian model were considered to account for the cyclic
nature of the wind directions while also accounting for the spatial dependence. Based on the APE
and CRPS, we conclude that the projected Gaussian process is an effective and precise approach to
modelling wind patterns in South Africa. The model can adeptly manage directional data indexed
by space, capturing the spatial structure among these observations. Looking ahead, enhancements to
this model can be made through a more refined selection of parameters, like prior distributions, and
by incorporating a more extensive set of locations to represent a broader area. Additionally, there is
potential to expand this model into a spatio-temporal model, accounting for time as well. Another
avenue for future work resides in accounting for the wind speed (and other wind characteristics)
along with the wind directions.
In closing, the application of directional Gaussian processes in tandem with the capabilities of
the CircSpaceTime package in R presents a compelling avenue for enhancing the accuracy and
reliability of wind direction modelling. As the world increasingly recognises the critical role of
sustainable energy sources, such as wind power, refining our understanding of wind behaviour
becomes paramount, especially in South Africa with our current electricity problem. Better and
more accurate understanding of wind behaviour can improve the design and optimisation of wind
farms, thus ensuring efficient and effective harnessing of wind energy.
Acknowledgements. The authors would like to thank the anonymous reviewers for their insightful
comments which led to an improvement of this paper. This work was based upon research supported
in part by the National Research Foundation (NRF) of South Africa (Grant SRUG2204203965), as
well as DSI-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS). This
project falls under the ethics application NAS116/2019.
References
Arashi, M., Nagar, P., and Bekker, A. (2020). Joint probabilistic modeling of wind speed and wind
direction for wind energy analysis: A case study in Humansdorp and Noupoort. Sustainability,
12, 4371.
Bao, L., Gneiting, T., Grimit, E. P., Guttorp, P., and Raftery, A. E. (2010). Bias correction
DIRECTIONAL GAUSSIAN SPATIAL PROCESSES FOR SOUTH AFRICAN WIND DATA 11
and Bayesian model averaging for ensemble forecasts of surface wind direction. Monthly Weather
Review, 138, 1811–1821.
Casson, E. and Coles, S. (1998). Extreme hurricane wind speeds: estimation, extrapolation and
spatial smoothing. Journal of Wind Engineering and Industrial Aerodynamics, 74, 131–140.
Coles, S. (1998). Inference for circular distributions and processes. Statistics and Computing, 8,
105–113.
Engel, C. and Ebert, E. (2007). Performance of hourly operational consensus forecasts (OCFs) in
the Australian region. Weather and Forecasting, 22, 1345–1359.
Jammalamadaka, S. R. and SenGupta, A. (2001). Topics in Circular Statistics, volume 5. World
Scientific.
Jona-Lasinio, G., Gelfand, A., and Jona-Lasinio, M. (2012). Spatial analysis of wave direction
data using wrapped gaussian processes. The Annals of Applied Statistics, 1478–1498.
Jona-Lasinio, G., Orasi, A., Divino, F., and Conti, P. L. (2007). Statistical contributions to
the analysis of environmental risks along the coastline. Società Italiana di Statistica-rischio e
previsione. Venezia, 6–8.
Jona-Lasinio, G., Santoro, M., and Mastrantonio, G. (2020). CircSpaceTime: An R package
for spatial and spatio-temporal modelling of circular data. Journal of Statistical Computation and
Simulation, 90, 1315–1345.
Jupp, P. E. and Mardia, K. V. (2009). Directional Statistics. John Wiley & Sons.
Kalnay, E. (2002). Atmospheric modeling, data assimilation and predictability.
Kato, S., Shimizu, K., and Shieh, G. S. (2008). A circular-circular regression model. Statistica
Sinica, 633–645.
Kent, J. (1978). Limiting behaviour of the von Mises-Fisher distribution. In Mathematical Proceed-
ings of the Cambridge Philosophical Society, volume 84. Cambridge University Press, 531–536.
Ley, C. and Verdebout, T. (2017). Modern Directional Statistics. CRC Press.
Ley, C. and Verdebout, T. (2018). Applied Directional Statistics: Modern Methods and Case
Studies. CRC Press.
Mardia, K. V. (1972). Statistics of directional data Academic. Elsevier.
Mastrantonio, G., Jona-Lasinio, G., and Gelfand, A. E. (2016). Spatio-temporal circular models
with non-separable covariance structure. Test, 25, 331–350.
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria.
URL: https:// www.R-project.org/
Rad, N. N., Bekker, A., and Arashi, M. (2022). Enhancing wind direction prediction of South
Africa wind energy hotspots with Bayesian mixture modeling. Scientific Reports, 12, 11442.
Riha, A. E. (2020). Hyperpriorsensitivity of Bayesian Wrapped Gaussian Processes with an Appli-
cation to Wind Data. Master’s thesis, Humboldt-Universität zu Berlin.
Stephens, M. A. (1963). Random walk on a circle. Biometrika, 50, 385–390.
Trinh, V. and Chung, C. (2023). Renewable energy for SDG-7 and sustainable electrical production,
integration, industrial application, and globalization. Cleaner Engineering and Technology, 15,
12 BLOM, NAGAR & BEKKER
100657.
Watson, G. S. (1961). Goodness-of-fit tests on a circle. Biometrika, 48, 109–114.
Wilks, D. S. (2006). Comparison of ensemble-MOS methods in the Lorenz ’96 setting. Meteoro-
logical Applications, 13, 243–256.
Proceedings of the 64th Annual Conference of the South African Statistical Association
© 2023 South African Statistical Association
1. Introduction
The infamous 2008 global financial crisis provoked low confidence among investors in the traditional
centralised financial system. This ultimately catapulted the demand for cryptocurrencies, which are
not subject to a governing body or interferences from a central bank. In addition, cryptocurrencies
exhibit low trading costs and all transactions remain fully anonymous (Coeckelbergh and Reijers,
2016), which renders them suitable as an alternative currency or investment asset. Since the inception
of Bitcoin as the first cryptocurrency in 2009, more than 22,000 different types of cryptocurrency
have been created. In a short span of 14 years, the market capitalisation of cryptocurrencies has
grown to an astonishing $1.19 trillion in comparison to the global equity market of $107 trillion
(Maheshwari, 2023). This extraordinary growth has also drawn significant attention from both
practitioners and academics alike. It has also become increasingly crucial to grasp the behaviours
of cryptocurrencies, as well as their level of integration with other assets, which may help provide
13
14 ELS, MILLS, TURKINGTON & HUANG
regulatory bodies and policy makers with adequate guidance on cryptocurrencies as an investment
tool (Vardar and Aydogan, 2019).
Speculators and investors have been attracted to cryptocurrencies due to their abnormal returns and
high volatility levels. With reference to Bitcoin, the largest cryptocurrency by market capitalisation,
the average volatility level since 2010 is at 114%, almost 10 times the volatility realised by typical
equities and commodities, while obtaining annual returns reaching as high as approximately 254%
(Blokland, 2021). These statistics indicate the high risk-reward characteristic of cryptocurrencies.
Moreover, empirical evidence reveals that cryptocurrencies’ high Sharpe Ratio, accompanied by low
correlation with traditional asset classes, creates the potential for sizeable diversification and hedging
benefits from holding cryptocurrencies in a traditional investment portfolio (Blokland, 2021).
Existing literature has predominantly focused on the characteristics of cryptocurrencies and how
they compare to other asset classes, such as equities, foreign exchange and commodities (see, among
others, Dyhrberg, 2016a; Pieters and Vivanco, 2017; Polasik et al., 2015; Yermack, 2015). On the
other hand, the work of O’Dwyer (2015) primarily focused on the capacity of cryptocurrencies to
create an alternative monetary system due to its characteristics of a more efficient, cheaper, and
unregulated market space to transfer money (Vardar and Aydogan, 2019). However, a plethora
of other studies from authors such as Wu et al. (2014) also advocated for cryptocurrencies to be
considered as a completely new asset class that is independent of the behaviours of a traditional
currency.
The motivations behind regarding cryptocurrencies, such as Bitcoin, as an alternative asset class
instead of a traditional currency, is mainly premised on the discovery of typical stylised facts
embedded within their empirical price returns data. For instance, evidence of leptokurtic behaviour
was presented by Chan et al. (2017). Subsequently, the presence of heteroscedasticity and long
memory properties were identified in the works of Gkillas and Katsiampa (2018) and Phillip et al.
(2019), respectively. Such findings also further advocates for the use of GARCH-type models to
estimate Bitcoin volatility (see, among others, Bouoiyour et al., 2016; Bouri et al., 2017; Dyhrberg,
2016a,b).
Notably, prior literature also provided evidence of low correlations between Bitcoin and other major
financial asset classes (Baur et al., 2018). Such a phenomenon prompted overwhelming interest in
utilising Bitcoin as a potential diversification and hedging tool to manage financial risks within
existing investment portfolios (see, for instance, Briere et al., 2015; Dyhrberg, 2016b; Aslanidis
et al., 2019; Fakhfekh and Jeribi, 2020). The evidence suggests that adding a small portion of
cryptocurrencies to a diversified portfolio, made up of traditional assets, can substantially reduce the
overall risk for a given level of expected return.
In addition to the above, practitioners and academics alike have been interested in the ability of
cryptocurrencies to act as safe-havens during periods of market distress. Klein et al. (2018) made
a valuable contributions to existing literature in this regard. Using the celebrated BEKK-GARCH
model, the authors demonstrated that gold plays an important role in financial markets with ‘flight-
to-quality’ in times of market distress. This is somewhat similar to Bitcoin as the cryptocurrency’s
returns are negatively correlated to downward moving markets (see, for instance, Klein et al., 2018).
Such work has also led to further investigations regarding the ability of cryptocurrencies to act as
a hedging strategy under different market conditions. Given the growing acceptance of cryptocur-
rencies, their information transmission with traditional financial markets is becoming increasingly
INFORMATION TRANSMISSION BETWEEN BITCOIN AND JSE ASSET CLASSES 15
series were calculated with the usual natural logarithmic procedure, 𝑟 𝑡 = ln(𝑆𝑡 ) − ln(𝑆𝑡 −1 ), where 𝑆𝑡
is the spot price of the financial asset at time 𝑡.
While the ALBI is an adequate representative of the bonds asset class, the Top 40 is primarily
used as a benchmark for equities. All listed entities on the JSE are categorised into one of the three
sectors of Resources, Financials and Industrials as per the Industry Classification Benchmark (ICB),
based on their revenue. In particular, RESI includes the largest 20 entities by market capitalisation
and identified as Basic Materials and Energy, while FINI exhausts the list of the largest 15 entities
by market capitalisation that are characterised as Financials and Real Estates. Lastly, the largest 25
entities by market capitalisation in the remaining pool not classified as above are absorbed by the
INDI. This distinct separation on the JSE provides us with the opportunity to further examine the
ability of Bitcoin to act as a possible hedging or safe-haven tool against different market sectors apart
from just asset classes.
In this paper, we deploy the multivariate vector autoregressive GARCH framework with BEKK
specifications (VAR-BEKK-GARCH), as proposed by Engle and Kroner (1995). An important
feature of the VAR-BEKK-GARCH model is the absence of restrictions imposed on the correlation
structure between the variables in question. Moreover, the BEKK specification has the advantage
of allowing for information spillover to be observed from both directions of the time series pair in
question.
The VAR specification in the conditional mean equations allow us to evaluate the spillover in
mean. Through minimising the Akaike Information Criterion, we select the following VAR(1) model
to represent the BEKK-GARCH-in-mean equation:
𝑅𝑡 = 𝜇 + Φ𝑅𝑡 −1 + 𝜖 𝑡 , (1)
where 𝑅𝑡 = (𝑟 𝑡𝑐 , 𝑟 𝑡𝑠 ) ′ . We denote 𝑟 𝑡𝑐 and 𝑟 𝑡𝑠 as the logarithmic return of Bitcoin and the logarithmic
return of a chosen financial index at time 𝑡, respectively. Specifically, the financial indices are either
the TOP40, RESI, FINI, INDI and ALBI at time 𝑡. 𝜇 = (𝜇1 , 𝜇2 ) ′ is a vector of the constant terms of
the conditional mean equations. The (2 × 2) matrix of coefficients for the lag variables in the VAR(1)
mean specification is denoted by
𝜙11 𝜙12
Φ= .
𝜙21 𝜙22
Lastly, 𝜖 𝑡 = (𝜖1,𝑡 , 𝜖2,𝑡 ) ′ is a vector of residuals for the cryptocurrency and a financial index, respec-
tively, and are both assumed to be normally distributed with a mean of 0.
The conditional variance-covariance matrix (𝐻𝑡 ) of the residuals is defined as follows:
ℎ11 ℎ12
𝜖 𝑡 |Ω𝑡 −1 ≈ 𝑁 (0, 𝐻𝑡 ), 𝐻𝑡 = , (2)
ℎ21 ℎ22
where Ω𝑡 −1 is the set of all information up until time 𝑡 −1. The conditional covariances, represented by
ℎ12 and ℎ21 , captures the relationship between Bitcoin and a financial index in question. Specifically,
the BEKK-GARCH(1,1) model can be expressed as
𝐻𝑡 = 𝐶 ′ 𝐶 + 𝐴′ 𝜖 𝑡 −1 𝜖 𝑡′−1 𝐴 + 𝐵′ 𝐻𝑡 −1 𝐵, (3)
where 𝐶 is a (2 × 2) upper triangular matrix of constants for the cryptocurrency and stock index
pair; 𝐴 is the (2 × 2) matrix of ARCH coefficients that capture the effects of local and cross-market
INFORMATION TRANSMISSION BETWEEN BITCOIN AND JSE ASSET CLASSES 17
shocks, while 𝐵 the corresponding (2 × 2) matrix of GARCH coefficients that capture the effect of
own market volatility persistence and the cross-market volatility transmissions, i.e. between Bitcoin
and a financial index. Specifically, our binary BEKK-GARCH(1,1) model, as per expression (3),
may be expanded as follows:
! !′ ! !′ ! !′ !
ℎ11 ℎ12 𝑐 11 𝑐 12 𝑐 11 𝑐 12 𝑎 11 𝑎 12 𝜀1,𝑡 −1 𝜀 1,𝑡 −1 𝑎 11 𝑎 12
= +
ℎ21 ℎ22 0 𝑐 22 0 𝑐 22 𝑎 21 𝑎 22 𝜀2,𝑡 −1 𝜀 2,𝑡 −1 𝑎 21 𝑎 22
!′ ! !
𝑡
𝑏 11 𝑏 12 ℎ11 ℎ12 𝑏 11 𝑏 12
+ .
𝑏 21 𝑏 22 ℎ21 ℎ22 𝑏 21 𝑏 22
𝑡 −1
(4)
We may further express (4) with the following set of equations:
where ℎ11,𝑡 and ℎ22,𝑡 are the conditional variances of Bitcoin and a financial index, respectively.
Similarly, ℎ12,𝑡 and ℎ21,𝑡 represents the conditional covariances across the two respective assets. The
VAR-BEKK-GARCH model parameters (𝜇, Φ, 𝐶, 𝐴, 𝐵) may be estimated using the quasi-maximum
likelihood method, whereby the log-likelihood function for a sample of 𝑇 observations is given by
(Engle and Kroner, 1995)
1 ∑︁
𝑇
log 𝐿 = − 𝑘 log(2𝜋) + log |𝐻𝑡 | + 𝜖 𝑡′ 𝐻𝑡−1 𝜖 𝑡 , (6)
2 𝑡=1
where 𝐿 denotes the likelihood function used to estimate the vector of unknown model parameters,
and 𝑘 the number of variables (𝑘 = 2 for bi-variate form).
Equation set (5) also demonstrates that the conditional variance and covariances across the time
series pair are not only influenced by the residuals of the two time series but also by the square of the
residuals (i.e. ℎ11,𝑡 −1 , ℎ12,𝑡 −1 and ℎ22,𝑡 −1 ). To determine the volatility spillover effects, we observe
the resulting ARCH and GARCH effects, as well as the asymmetric effects of both positive and
negative shocks. Specifically, when 𝑎 12 = 𝑏 12 = 0 the conditional variance of the chosen financial
index is only affected by its own lagged squared residuals and lagged conditional variance, implying
that Bitcoin has no volatility spillover effects on the chosen financial index. Similarly, 𝑎 21 = 𝑏 21 = 0
suggests that the chosen financial index has no volatility spillover effects on Bitcoin. Hence, utilising
the significance of the coefficients from the VAR-BEKK-GARCH model, we may interrogate the
mean and volatility spillover effects between Bitcoin and other financial sectors and asset classes.
Lastly, through conditional covariances of the VAR-BEKK-GARCH model, the dynamic correla-
18 ELS, MILLS, TURKINGTON & HUANG
tion between Bitcoin and other asset classes considered in this study can be obtained as follows:
ℎ12,𝑡
𝜌𝑡 = √︁ . (7)
ℎ11,𝑡 × ℎ22,𝑡
The dynamic conditional correlation (7) may be utilised to observe correlation fluctuations and its
varying characteristics, lending itself as an useful risk measuring tool. Following Baur and Lucey
(2010), we define assets that are uncorrelated (negatively correlated) with another asset or portfolio
in periods of market crisis as a weak (strong) safe-haven tool.
0.15 0.075
0.10 0.050
0.05
0.025
0.00
-0.05 0.000
-0.10 -0.025
-0.15
-0.050
-0.20
-0.25 -0.075
-0.30 -0.100
2020 2021 2020 2021
clearly indicates an absence of mean spillover effects from each financial index into Bitcoin. However,
the rejections of the null hypotheses across the various 𝜙21 , except for RESI, show that Bitcoin tends
to impose a mean spillover effect on the different financial indices at a 10% level of significance.
Hence, all cross-mean effects are dominated by unilateral positive spillovers from Bitcoin to the
different financial indices during the COVID-19 period. More concretely, the current-period returns
across the various financial indices are influenced by the previous-period returns of Bitcoin. As a
result, opportunities may exist for market participants to utilise Bitcoin returns to predict returns
across the different asset classes and market sectors on the JSE.
From Panel B of Table 3, we use 𝑎 12 , 𝑎 21 , 𝑏 12 and 𝑏 21 to interrogate the shocks and volatility
spillovers across Bitcoin and financial indices. Notably, we omit results from matrix 𝐶 as it does not
influence the volatility spillover effects. Failures in the rejection of the null for 𝑎 12 and 𝑎 21 for Bitcoin
against the FINI, INDI and ALBI are empirical evidence to suggest an absence of shock spillover
20 ELS, MILLS, TURKINGTON & HUANG
(or ARCH) effects from Bitcoin to the three indices and vice versa. However, a unilateral shock
spillover effect from TOP40 on Bitcoin was detected at the 5% level of significance. Additionally,
the rejection of the null for both 𝑎 12 and 𝑎 21 between Bitcoin and RESI infers a significant bilateral
shock spillovers between the pair.
The resulting values of 𝑏 12 and 𝑏 21 painted a similar pattern from the perspectives of volatility
spillovers. A significant bilateral volatility transmission between Bitcoin and RESI was detected at
the 1% level of significance. Similarly, there is an unilateral volatility spillover (or GARCH effect)
from TOP40 into Bitcoin. However, an opposite unidirectional volatility spillover effect was found
between Bitcoin and INDI at the 5% level of significance. Finally, there is a clear absence of volatility
transmissions between Bitcoin and two indices, namely, FINI and ALBI, advocating the possibility
of using Bitcoin as a safe-haven tool for entities categorised as financials or for South African bonds.
Summaries of our directional spillover results are illustrated in Table 4.
stronger relationship between Bitcoin and TOP40 than other indices during a period of market crisis.
This is consistent with out observations in Figure 2(a), where the dynamic conditional correlation
between Bitcoin and TOP40 experienced a significant upward spike immediately following the
COVID-19 outbreak. This further suggests Bitcoin to be an inadequate safe-haven tool for TOP40.
The mean conditional correlation between Bitcoin and RESI is relatively high. This can be
expected due to the closeness in relationship between Bitcoin and exhaustible resource commodities,
as well as precious metals, as advocated by an existing line of research (see, Gronwald, 2019; Mensi
et al., 2019). Our empirical findings provide further evidence in support of such a phenomena, which
stays persistent even during periods of market crisis such as the recent COVID-19 pandemic. This is
also observable through our dynamic conditional correlation plot in Figure 2(b).
In line with our unconditional correlation in Table 2, lower means in conditional correlation are
observed between Bitcoin and the 3 indices, namely, FINI, INDI and ALBI. However, as illustrated in
Figure 2(d)(e), Bitcoin may be inadequate to serve as a safe-haven tool for both INDI and ALBI. Both
pairs are affected by upticks and prolonged positive trends in the dynamic conditional correlation
following the COVID-19 crisis. The findings against the ALBI are intriguing as it contradicts prior
studies that motivated for Bitcoin to act as a hedging tool for bonds (see, Kang et al., 2020; Wang
et al., 2019). Interestingly, with the lowest mean conditional correlation, we observe significant
22 ELS, MILLS, TURKINGTON & HUANG
Notes: The ↔ represents a bidirectional spillover, whereas → or ← indicates a unilateral transmission. We use − to show
an absence of transmission. Specifically a → demonstrates that Bitcoin is a transmitter, while ← indicates that Bitcoin is a
receiver.
downward ticks and ensuing negative correlations between Bitcoin and FINI following the COVID-
19 crisis. Moreover, the dynamic conditional correlation remained low even after reverting to positive
trends, suggesting adequacy in Bitcoin to act as a possible strong safe-haven tool for JSE entities
categorised as financials. Notably, the dynamic conditional correlation between the pair also exhibits
the lowest standard deviation, indicating the least violent fluctuations in comparison to the movements
in conditional correlation between Bitcoin and other indices.
4. Conclusion
The debate of whether cryptocurrencies may act as an adequate hedging or safe-haven tool for
traditional financial assets remains a contentious one for academics and practitioners alike. In this
paper, we further contributed to the literature by investigating the safe-haven characteristics of Bitcoin
for some traditional asset classes on the Johannesburg Stock Exchange. Specifically, our empirical
analysis is performed over the recent infamous period of COVID-19, an unprecedented period of
major financial market turmoil since the inception of cryptocurrencies. We provide evidence to
further demonstrate the close relationship between Bitcoin and commodities, as represented by the
RESI index, and showed the consistency of such a interconnectedness between the pair even during
periods of extreme market distress. In addition, our results contradicted the existing acceptance
that Bitcoin is an adequate hedging tool for bonds. During periods of market crisis, bonds may
INFORMATION TRANSMISSION BETWEEN BITCOIN AND JSE ASSET CLASSES 23
0.8 0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
-0.2
-0.2 -0.4
-0.4 -0.2
-0.6 -0.4
2020 2021 2020 2021
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
2020 2021
indeed be inadequate to act as a safe-haven tool. Finally, our empirical analysis showed that JSE
entities classified as financial and real estate, as per the ICB, may turn to Bitcoin as a potential
strong safe-haven tool. Limitations of our studies may be remedied by first including other widely
traded cryptocurrencies that have already gained significant market capitalisation, and analyse the
adequacy of these cryptocurrencies to act as potential safe havens for traditional asset classes during
periods of market crisis. Moreover, in-depth investigations on the effect of extreme quantiles of
traditional asset classes and its subsequent effect on their dynamic conditional correlations with
different cryptocurrencies may add compelling evidence to the ongoing debate.
24 ELS, MILLS, TURKINGTON & HUANG
References
Abdelmalek, W. and Benlagha, N. (2023). On the safe-haven and hedging properties of Bitcoin:
New evidence from COVID-19 pandemic. The Journal of Risk Finance, 24, 145–168.
Aslanidis, N., Bariviera, A. F., and Martínez-Ibañez, O. (2019). An analysis of cryptocurrencies
conditional cross correlations. Finance Research Letters, 31, 130–137.
Baur, D. G., Hong, K., and Lee, A. D. (2018). Bitcoin: Medium of exchange or speculative assets?
Journal of International Financial Markets, Institutions and Money, 54, 177–189.
Baur, D. G. and Lucey, B. M. (2010). Is gold a hedge or a safe haven? An analysis of stocks, bonds
and gold. Financial Review, 45, 217–229.
Blokland, J. (2021). Bitcoin as digital gold – a multi-asset perspective.
URL: https:// www.robeco.com/ en-za/ insights/ 2021/ 04/ bitcoin-as-digital-gold-a-multi-asset-
perspective
Bouoiyour, J., Selmi, R., et al. (2016). Bitcoin: A beginning of a new phase. Economics Bulletin,
36, 1430–1440.
Bouri, E., Jalkh, N., Molnár, P., and Roubaud, D. (2017). Bitcoin for energy commodities before
and after the December 2013 crash: Diversifier, hedge or safe haven? Applied Economics, 49,
5063–5073.
Briere, M., Oosterlinck, K., and Szafarz, A. (2015). Virtual currency, tangible return: Portfolio
diversification with Bitcoin. Journal of Asset Management, 16, 365–373.
Chan, S., Chu, J., Nadarajah, S., and Osterrieder, J. (2017). A statistical analysis of cryptocur-
rencies. Journal of Risk and Financial Management, 10, 12.
Coeckelbergh, M. and Reijers, W. (2016). Cryptocurrencies as narrative technologies. ACM
SIGCAS Computers and Society, 45, 172–178.
Conlon, T., Corbet, S., and McGee, R. J. (2020). Are cryptocurrencies a safe haven for equity
markets? An international perspective from the COVID-19 pandemic. Research in International
Business and Finance, 54, 101248.
Cont, R. (2001). Empirical properties of asset returns: Stylized facts and statistical issues. Quanti-
tative Finance, 1, 223.
Dyhrberg, A. H. (2016a). Bitcoin, gold and the Dollar – A GARCH volatility analysis. Finance
Research Letters, 16, 85–92.
Dyhrberg, A. H. (2016b). Hedging capabilities of Bitcoin. Is it the virtual gold? Finance Research
Letters, 16, 139–144.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of
United Kingdom inflation. Econometrica: Journal of the Econometric Society, 987–1007.
Engle, R. F. and Kroner, K. F. (1995). Multivariate simultaneous generalized ARCH. Econometric
Theory, 11, 122–150.
Fakhfekh, M. and Jeribi, A. (2020). Volatility dynamics of crypto-currencies’ returns: Evidence
from asymmetric and long memory GARCH models. Research in International Business and
Finance, 51, 101075.
Gkillas, K. and Katsiampa, P. (2018). An application of extreme value theory to cryptocurrencies.
INFORMATION TRANSMISSION BETWEEN BITCOIN AND JSE ASSET CLASSES 25
Wu, C. Y., Pandey, V. K., and Dba, C. (2014). The value of Bitcoin in enhancing the efficiency of
an investor’s portfolio. Journal of Financial Planning, 27, 44–52.
Yermack, D. (2015). Is Bitcoin a real currency? An economic appraisal. In Handbook of Digital
Currency. Elsevier, 31–43.
Proceedings of the 64th Annual Conference of the South African Statistical Association
© 2023 South African Statistical Association
1. Introduction
Variable selection and regularisation in quantile regression (𝑄𝑅) have been topical in recent years,
especially in the presence of collinearity. The adverse effects of collinearity in regression analysis are
wrong signs of parameter estimates, erroneous interpretation of parameter estimates, and estimates
with disproportionately large variances, amongst others (Hoerl and Kennard, 1970). The phenomenon
of collinearity occurs when at least two predictor variables are intercorrelated, resulting in an almost
impossible separation of coefficient influences in the regression equation. In the literature, population
characteristics, deficiencies in sampling, and over-defined models are major sources of collinearity
(see Gunst and Mason, 1980; Montgomery, 2017; Adkins et al., 2015). These collinearity challenges
27
28 MUDHOMBO & RANGANAI
have been mitigated via variable selection and regularisation, with varying degrees of success. In the
literature, to circumvent the problem of collinearity, the ridge regression (𝑅𝑅) (Hoerl and Kennard,
1970), the 𝐿 𝐴𝑆𝑆𝑂 regression (Tibshirani, 1996), and their mixture version, namely the elastic net
(𝐸-𝑁 𝐸𝑇) (Zou and Hastie, 2005), have been suggested.
The least absolute deviation (𝐿 𝐴𝐷) procedure (Norouzirad et al., 2018) is a robust procedure
that generalises to 𝑄𝑅 at any quantile level of interest. In the literature, like the 𝐿 𝐴𝐷 procedure,
the 𝐿 𝐴𝑆𝑆𝑂 procedure is based on the ℓ1 -norm penalty; hence, it was conveniently modified to
the least absolute deviation 𝐿 𝐴𝑆𝑆𝑂 (𝐿 𝐴𝐷-𝐿 𝐴𝑆𝑆𝑂) and weighted least absolute deviation 𝐿 𝐴𝑆𝑆𝑂
(𝑊 𝐿 𝐴𝐷-𝐿 𝐴𝑆𝑆𝑂), which have oracle properties when appropriate tuning parameters are chosen
(Arslan, 2012). In a similar fashion, 𝑄𝑅-𝐿 𝐴𝑆𝑆𝑂 and 𝑄𝑅-𝐸-𝑁 𝐸𝑇 procedures have been suggested
as variable selection and regularisation in the 𝑄𝑅 framework (Ranganai and Mudhombo, 2021).
Although 𝐿 𝐴𝑆𝑆𝑂 regression (Tibshirani, 1996) does parameter shrinkage and variable selection,
simultaneously, and is appropriate for variable selection and regularisation, it falls short in the
presence of collinearity. 𝐿 𝐴𝑆𝑆𝑂 tends to over-penalise coefficients, especially in the presence of
collinearity, where all coefficients in a group of correlated variables are penalised to zero except
one. On the contrary, ridge regression (Hoerl and Kennard, 1970) is far less "greedy" as it tends to
select all coefficients and result in a complex model. The 𝐸-𝑁 𝐸𝑇 (see Zou and Hastie, 2005) was
proposed in response to the challenges of the 𝐿 𝐴𝑆𝑆𝑂 and the 𝑅𝑅s, and is a compromise between
the two procedures. The 𝐿 𝐴𝑆𝑆𝑂 regularisation method, which has an ℓ1 -norm penalty, is dominated
in prediction performance by the ridge procedure (Zou and Hastie, 2005). The 𝐿 𝐴𝑆𝑆𝑂 and 𝐸-
𝑁 𝐸𝑇 regularisation procedures have been extended to their adaptive scenarios, namely; the adaptive
𝐿 𝐴𝑆𝑆𝑂 (𝐴𝐿 𝐴𝑆𝑆𝑂) and adaptive elastic net (𝐴𝐸-𝑁 𝐸𝑇), respectively, as solutions to problems posed
by collinearity in data sets (see Zou, 2006; Zou and Zhang, 2009). In literature, to circumvent
the problem of collinearity, adaptive penalised variable selection and regularisation procedures are
suggested, such as adaptive ridge regression (𝐴𝑅) (Frommlet and Nuel, 2016), 𝐴𝐿 𝐴𝑆𝑆𝑂 (Zou,
2006), and the adaptive elastic net (𝐴𝐸-𝑁 𝐸𝑇) (Zou and Zhang, 2009). The 𝐴𝐿 𝐴𝑆𝑆𝑂 was proposed
by (Zou, 2006), and it allows different tuning parameters for different coefficients. The suggested
𝐴𝐿 𝐴𝑆𝑆𝑂, uses ridge regression coefficient estimates to form adaptive weights.
The performance of variable selection and regularisation procedures heavily depends on the ap-
propriate selection of the tuning parameters. For these procedures, the true model is identified
consistently depending on the appropriate tuning parameter selection (see Fan and Li, 2001; Zou,
2006). In literature, methods such as 𝐶 𝑝 , the Akaike information criterion (𝐴𝐼𝐶), the Bayesian infor-
mation criterion (𝐵𝐼𝐶), cross-validation (𝐶𝑉), and bootstrap have been used for variable selection
and choosing tuning parameters in regularisation techniques (Hastie et al., 2009). The 𝐶 𝑝 , the 𝐴𝐼𝐶,
and the 𝐵𝐼𝐶 methods are estimators of in-sample prediction errors. The basis functions are used
in the proportional adjustment of the training error in the 𝐶 𝑝 criterion, and the 𝐴𝐼𝐶 criterion uses
a log-likelihood loss function instead. Unlike the 𝐴𝐼𝐶, the 𝐵𝐼𝐶 gives preference to uncomplicated
models in variable selection over complex ones, which are penalised heavily. In contrast, some
out-of-sample estimators of prediction errors include the 𝐶𝑉 and bootstrap methods as examples.
The method of 𝐶𝑉 is widely used to choose the tuning parameters (𝜆 𝑚𝑖𝑛 ) in the literature. In the
regularisation and penalisation techniques, some criteria are used with 𝐶𝑉 criteria to select variables.
In the 𝐶𝑉 technique, estimates from the training set are compared to the rest of the data (validation
set).
RIDGE-BASED ADAPTIVE WEIGHTS IN PENALISED QUANTILE REGRESSION 29
• We carry out a detailed comparative study of the performances of 𝑅𝑅-based adaptive weights
and 𝑄𝑅𝑅-based adaptive weights under different levels of collinearity at different distribution
scenarios, namely:
– mixed collinearity (three predictor variables are highly correlated and the other two are
not);
– moderate collinearity (all five predictor variables have moderate correlations);
– high collinearity (all five predictor variables have high or severe correlations, i.e., above
0.80).
• The adaptive weights are based on the 𝑅𝑅 and 𝑄𝑅𝑅 coefficients. The 𝑅𝑅-based adaptive
weight is a global estimate, as suggested in the literature, compared to the 𝑄𝑅𝑅-based adaptive
weights, which are local. The 𝑄𝑅𝑅-based adaptive weights are different at each quantile level.
The adaptive variable selection and regularisation procedures based on these adaptive weights
are the 𝑄𝑅-𝐴𝑅, 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂, and 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 procedures.
• We use simulation studies and an example from the literature to carry out a comparative study
of adaptive weights using penalised variable selection and regularisation procedures in the 𝑄𝑅
framework, namely; 𝑄𝑅-𝐴𝑅, 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂, and 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇. A better performance by the
regularisation procedure translates to a better performance by the adaptive weights.
The rest of the article is organised as follows. Section 2 reviews the adaptive weights for penalised
procedures, namely, 𝑅𝑅-based adaptive weights and 𝑄𝑅𝑅-based adaptive weights. In Section 2.1,
we review the adaptive penalised 𝑄𝑅 variable selection and regularisation techniques, namely, 𝑄𝑅-
𝐴𝑅, 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂, and 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇. Simulations are done in Section 3, with simulation results
discussed in Sections 3.1 and 3.2 and examples discussed in Section 3.3. We conclude with a
discussion in Section 4.
where 𝑦 𝑖 is the 𝑖th entry of the response vector Y , x𝑖′ , the 𝑖th row vector of the 𝑛 × 𝑝 design matrix
X, β is the vector of parameters to be estimated from the data, and 𝜖 𝑖 ∼ 𝐹, the 𝑖th error term. The
𝑅𝑅 estimator with an ℓ2 penalty (Hoerl and Kennard, 1970) for the coefficient vector β in (1), is
given by the minimisation problem
∑︁
𝑛 ∑︁
𝑝
β̂𝑅𝑅 = 𝑎𝑟𝑔𝑚𝑖𝑛β ∈𝑅 𝑝 (𝑦 𝑖 − x𝑖′ β) 2 + 𝑛𝜆 𝛽2𝑗 , 𝑗 = 1, 2, ..., 𝑝, 𝑖 = 1, 2, ..., 𝑛, (2)
𝑖=1 𝑗=1
where 𝜆 is a positive tuning parameter in the range 0 < 𝜆 < 1, the second term is the penalty term,
and β is a vector of parameters, found using the ridge trace. The 𝑅𝑅 estimator is the most popular
30 MUDHOMBO & RANGANAI
regularisation procedure that deals with collinearity, though its drawbacks are bias and instability,
culminating from its dependence on 𝜆 (Muniz and Kibria, 2009). The 𝛽(𝜆) → 𝛽 𝐿𝑆 as 𝜆 → 0,
which is an unbiased estimator of 𝛽. The best 𝜆 value is when the system stabilises with orthogonal
characteristics and the issue of incorrect signs of coefficients and the inflated sum of squared errors
(𝑆𝑆𝐸) is resolved.
Consider the 𝑅𝑅 solution in (1), thus the first adaptive weight (𝑅𝑅𝑊) is given by
−𝛾
𝜔 𝑗 = | 𝛽ˆ𝑅𝑅 𝑗 | + 1/𝑛 , 𝑗 = 1, 2, ..., 𝑝, (3)
where 𝛽ˆ𝑅𝑅 𝑗 is the 𝑗th 𝑅𝑅 parameter estimate, and 1/𝑛 is added to avoid dividing by a near zero term,
for 𝛾 > 0. Frommlet and Nuel (2016) proposed the adaptive weights 𝜔 = (| 𝛽ˆ𝑅𝑅 𝑗 | 𝛾 + 𝛿 𝛾 ) ( 𝜃 −2)/𝛾 ,
translating to (3) when 𝜃 = 1, 𝛿 = 1/𝑛 and 𝛾 = 1.
We introduce the second adaptive weights by first stating a 𝑅𝑅 penalised quantile regression (𝑄𝑅𝑅)
(Hoerl and Kennard, 1970; Koenker and Bassett Jr, 1978). The 𝑄𝑅𝑅 procedure is the minimisation
problem
∑︁
𝑛 ∑︁
𝑝
β̂(𝜏)𝑄𝑅𝑅 = 𝑎𝑟𝑔𝑚𝑖𝑛β ∈𝑅 𝑝 𝜌 𝜏 |𝑦 𝑖 − x𝑖′ β(𝜏)| + 𝑛𝜆 𝛽2𝑗 , (4)
𝑖=1 𝑗=1
where β̂(𝜏)𝑄𝑅𝑅 𝑗 is the 𝑗th coefficient estimate at the 𝜏th regression quantile (𝑅𝑄) level, and 𝜆 is
the tuning parameter. The check function,
(
𝜏𝑢 if 𝑢 ≥ 0,
𝜌 𝜏 (𝑢) =
(𝜏 − 1)𝑢 if 𝑢 < 0,
where 𝜔˜ 𝑗 are the 𝑄𝑅𝑅-based adaptive weights at a specified 𝜏 quantile level and other terms are
defined in (4). The adaptive weights, 𝜔˜ 𝑗 can be adjusted to a particular distribution and to all 𝜏
quantile levels.
Í
A penalty in regularisation procedures takes the form of a bridge penalty term: 𝑝𝑗=1 | 𝛽ˆ 𝑗 | 𝑞 . When
𝑞 = 1 and 𝑞 = 2, the bridge penalty becomes the 𝐿 𝐴𝑆𝑆𝑂 and 𝑅𝑅 penalties, respectively, as special
cases. A combination of the 𝐿 𝐴𝑆𝑆𝑂 and 𝑅𝑅 penalties results in the 𝐸-𝑁 𝐸𝑇 penalty, which inherits
their respective properties. The inclusion of adaptive weights from (3) and (5) results in the adaptive
Í
bridge penalty, 𝑝𝑗=1 𝜑| 𝛽ˆ 𝑗 | 𝑞 . The 𝐴𝐿 𝐴𝑆𝑆𝑂 and 𝐴𝑅 penalties are special cases when 𝑞 = 1 and
𝑞 = 2. The combination results in 𝐴𝐸-𝑁 𝐸𝑇, where 𝜑 ∈ (𝜔 𝑗 ; 𝜔˜ 𝑗 ) is the adaptive weight. These
adaptive weights can be applied to both the least squares (𝐿𝑆) and 𝑄𝑅 scenarios.
(the 𝐿𝑆 and 𝑄𝑅-based adaptive weights). For further reading on adaptive weights, the reader is
referred to 𝐴𝐿 𝐴𝑆𝑆𝑂 (Zou, 2006) and 𝐴𝐸-𝑁 𝐸𝑇 (Zou and Zhang, 2009).
We present the adaptive penalised 𝑄𝑅 regularisation and variable selection procedures with
adaptive weights presented in (3) and (5). Consider a 𝑄𝑅 with an 𝐴𝐸-𝑁 𝐸𝑇 penalty denoted by
𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 (see also Zou and Zhang, 2009, for the 𝐿𝑆 version of the 𝐴𝐸-𝑁 𝐸𝑇 regularisation
procedures). The 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 procedure has both the ℓ1 and ridge penalties, hence it is an
extension of both the adaptive 𝐿 𝐴𝑆𝑆𝑂 penalised 𝑄𝑅 (𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂) and 𝐴𝑅 penalised 𝑄𝑅 (𝑄𝑅-
𝐴𝑅) procedures. The 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 inherits some attractive properties from both the 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂
and 𝑄𝑅-𝐴𝑅 procedures. The 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 procedure is given by the minimisation problem
∑︁
𝑛 ∑︁
𝑝 ∑︁
𝑝
β̂(𝜏) = 𝑎𝑟𝑔𝑚𝑖𝑛β ∈ 𝑅 𝑝 𝜌 𝜏 |𝑦 𝑖 − x𝑖′ β(𝜏)| + 𝛼𝜆 𝜑|𝛽 𝑗 | + (1 − 𝛼)𝜆 𝜑𝛽2𝑗 , (6)
𝑖=1 𝑗=1 𝑗=1
where 𝜑 is one of the adaptive weights 𝜔 𝑗 and 𝜔˜ 𝑗 , 𝛼 ∈ [0 : 1] is mixing parameter resulting in 𝑄𝑅-
𝐴𝑅 (𝛼 = 0) and 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂 (𝛼 = 1), and 𝜆 is the tuning parameter for the two adaptive penalties.
In this article, the ℓ1 and ridge penalties have equal weighting in 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇, achieved by invoking
the mixing parameter 𝛼 = 0.50. The tuning parameter, 𝜆 𝑗 = 𝜆𝜑, is varying for 𝑗 = 1, 2, ..., 𝑝 and
shrinks coefficients to zero differently. Equations (1)–(5) define the other terms. The 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂
and 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 procedures inherit the desired optimal minimax bound from 𝐴𝐿 𝐴𝑆𝑆𝑂 (see Zou,
2006) and the procedures are also robust in the presence of collinearity. Under suitable conditions,
the variable selection and regularisation techniques satisfy the sparsity condition, and the distribution
converges in limit to a normal distribution in the 𝑄𝑅 scenarios.
3. Simulation study
In this section, we compare the performances of 𝑅𝑅 and 𝑄𝑅𝑅-based adaptive weights (𝜔 𝑗 and
𝜔˜ 𝑗 ) under penalised 𝑄𝑅 procedures. These adaptive weights are compared in terms of their ability
to improve the performance of 𝐴𝑅, 𝐴𝐿 𝐴𝑆𝑆𝑂, and 𝐴𝐸-𝑁 𝐸𝑇 penalised 𝑄𝑅 procedures (𝑄𝑅-𝐴𝑅,
𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂, and 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇) in variable selection and regularisation at 𝜏 ∈ (0.25, 0.50, 0.75)
𝑅𝑄 levels. The simulation results are summarised in terms of the 𝑀 𝐴𝐷 of test errors, the percentage
of correctly fitted models, and the average of correct zero coefficients.
(1) Generate the matrix Z with five variables (Gibbons, 1981), where
𝑖 = 1, 2, 3, . . . , 𝑛,
𝑍𝑖 𝑗 ∼ 𝑁 (0, 1) (7)
𝑗 = 1, 2, 3, 4, 5.
and
𝑖 = 1, 2, 3, . . . , 𝑛,
𝑋2𝑖 𝑗 = (1 − 𝜃 ∗2 ) 1/2 𝑍𝑖 𝑗 + 𝜃 ∗ 𝑍𝑖5 (9)
𝑗 = 4, 5.
(3) Form the 60 × 5 design matrix X = (X1𝑖 𝑗 , X2𝑖 𝑗 ), resulting in severe/high collinearity
(𝜃 = 𝜃 ∗ = 0.90), moderate collinearity (𝜃 = 𝜃 ∗ = 0.7), and mixed collinearity (𝜃 = 0.90 and
𝜃 ∗ = 0.1), where 𝜃 is the theoretical correlation between any pair of the first three variables,
and 𝜃 ∗ is theoretical correlation between 𝑋4 and 𝑋5 . The coefficients are such that 𝛽0 = 0,
′
and β is the eigenvector corresponding to the largest eigenvalue of X ∗ X ∗ , where X ∗ is a
′
standardised design matrix and X ∗ X ∗ is in correlation form.
where 𝑛 = 60, 𝜖 𝑖 ∼ 𝑡 𝑑 is the error term (𝑑 is the degrees of freedom, where 𝑑 ∈ (6; 20)), and x𝑖′ is
the 𝑖th row of the design matrix X. The coefficient vector β is given by β = (0.9, 0, 0, 0, 0.5) for
the mixed collinearity scenario, β = (0.9, 0, 0.7, 0, 0.6) for the moderate collinearity scenario,
and β = (0.9, 0.7, 0, 0, 0.6) for the high collinearity scenario. 𝑄𝑅 is robust to outliers since
𝑅𝑄s influence functions are bounded in the response variable and 𝑄𝑅 is designed to handle
heavy-tailed distributions, such as 𝑡 𝑑 . We employed 200 simulation runs and 10-fold cross-
validation to obtain the tuning parameters.
3.2 Results
We compare the performance of two adaptive weights (𝜔 𝑗 and 𝜔˜ 𝑗 ) applied to penalised adaptive
𝑄𝑅 techniques in variable selection and regularisation in the presence of collinearity. The simulated
results are summarised and discussed in this section. Tables 1, 2, 3 and 4 show the performance
of two adaptive weights (𝜔 𝑗 and 𝜔˜ 𝑗 ) when applied to different variable selection and regularisation
procedures (𝑄𝑅-𝐴𝑅, 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂, and 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇) using 𝑀 𝐴𝐷 of test errors, percentage of
correctly fitting the models, and the average of correct zero coefficients at 𝜏 ∈ (0.25, 0.50, 0.75) 𝑅𝑄
levels and 𝑑 = (6; 20) degrees of freedom (see also Figure 1). The performance of these penalised
𝑄𝑅 techniques gauges the performance of corresponding adaptive weights. The 𝑀 𝐴𝐷 of test errors
is given by 𝑀 𝐴𝐷 = 1.4826 (median | 𝑒 𝑖 − median {𝑒 𝑖 } |), for 1 ≤ 𝑖 ≤ 𝑛.
the two adaptive weights perform equally well in correctly fitting the models 100% of the time under
the 𝑄𝑅-𝐴𝑅 technique.
Remark 1. In Tables 1, 2, 3, and 4, the performance of the adaptive weights 𝜔 𝑗 and 𝜔˜ 𝑗 in penalised
𝑄𝑅 procedures at 𝜏 = (0.25, 0.50) 𝑅𝑄 levels. A better performance by the penalised 𝑄𝑅 procedures
indicates a better performance by the corresponding adaptive weights.
3.3 Examples
Under three adaptive 𝑄𝑅 procedures for variable selection and regularisation, we compare the
performance of adaptive weights 𝜔 𝑗 and 𝜔˜ 𝑗 namely, 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂 and 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇, using the
Jet-Turbine Engine (Montgomery et al., 2009) data set. The 40 observation Jet-Turbine Engine data are
known to be correlated (see also Bagheri and Midi, 2012). In this data set, primary speed of rotation
(𝑋1 ), secondary speed of rotation (𝑋2 ), fuel flow rate (𝑋3 ), pressure (𝑋4 ), exhaust temperature (𝑋5 ),
and ambient temperature at time of test (𝑋6 ) are predictor variables with a response variable (𝑌 ).
We generate the response variable by 𝑌𝑖 = X𝑖′ β + 𝜖𝑖 , where 𝜖𝑖 ∼ 𝑡 𝑑 (𝑑 ∈ (6; 20) is the error term,
X𝑖′ is the 𝑖th row of the design matrix X which is in correlation form, and β = (0, 0, 0, 6, 0, −3) ′ is
the vector of parameters. Results are reported only at 𝜏 ∈ 0.25, 0.50 since similar results were found
when 𝜏 = 0.75 𝑅𝑄 level.
The results of the estimated 𝑄𝑅 𝛽s of 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂 and 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 procedures based on
adaptive weights 𝜔 𝑗 and 𝜔˜ 𝑗 , and coefficient biases are presented in Tables 5 and 6. Zero coefficients
are shrunk to zero/near zero in both scenarios (100% of the time) for all adaptive weights. The
adaptive weights 𝜔 𝑗 yields marginally better results than 𝜔˜ 𝑗 under 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂 when 𝑑 ∈ (6, 20)
at 𝜏 = 0.25 𝑅𝑄 level. At the same 𝑅𝑄 level, 𝜔˜ 𝑗 yields marginally better results than 𝜔 𝑗 under the
34 MUDHOMBO & RANGANAI
Table 1. Performance of adaptive weights in 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂 at mixed, moderate, and high collinearity
scenarios under the heavy-tailed t-distributions when 𝑑 = 6 & 𝑑 = 20 degrees of freedom.
Adaptive Median (MAD) Correctly Average no. of
weight test error fitted correct zero incorrect zero Median(𝜆)
𝑑 = 6, 𝜏 = 0.25
𝜔𝑗 0.78(1.17) 71.00 2.94 0.29 0.01
Mixed collinearity
𝜔˜ 𝑗 0.76(1.18) 58.00 2.90 0.42 0.00
𝜔𝑗 0.82(1.27) 58.50 1.70 0.20 0.01
Moderate collinearity
𝜔˜ 𝑗 0.81(1.29) 38.50 1.44 0.27 0.01
𝜔𝑗 0.83(1.30) 51.00 1.94 0.50 0.01
High collinearity
𝜔˜ 𝑗 0.83(1.30) 40.00 1.94 0.66 0.01
𝑑 = 6, 𝜏 = 0.50
𝜔𝑗 -0.04(1.16) 74.00 2.98 0.27 0.01
Mixed collinearity
𝜔˜ 𝑗 -0.04(1.16) 77.50 2.96 0.22 0.01
𝜔𝑗 0.01(1.22) 58.50 1.66 0.13 0.01
Moderate collinearity
𝜔˜ 𝑗 0.00(1.22) 79.00 1.87 0.09 0.01
𝜔𝑗 0.02(1.27) 52.50 1.94 0.43 0.01
High collinearity
𝜔˜ 𝑗 0.02(1.27) 58.00 1.98 0.41 0.02
𝑑 = 6, 𝜏 = 0.75
𝜔𝑗 -0.88(1.24) 58.00 2.96 0.52 0.01
Mixed collinearity
𝜔˜ 𝑗 -0.88(1.24) 60.00 2.95 0.48 0.01
𝜔𝑗 -0.81(1.24) 54.00 1.68 0.25 0.01
Moderate collinearity
𝜔˜ 𝑗 -0.80(1.26) 57.00 1.68 0.20 0.01
𝜔𝑗 -0.78(1.31) 48.00 1.87 0.45 0.01
High collinearity
𝜔˜ 𝑗 -0.77(1.30) 52.50 1.83 0.39 0.01
𝑑 = 20, 𝜏 = 0.25
𝜔𝑗 0.75(1.13) 62.00 2.98 0.39 0.01
Mixed collinearity
𝜔˜ 𝑗 0.73(1.14) 65.00 2.92 0.30 0.01
𝜔𝑗 0.71(1.23) 47.50 1.42 0.08 0.01
Moderate collinearity
𝜔˜ 𝑗 0.72(1.23) 47.50 1.42 0.06 0.01
𝜔𝑗 0.25(1.16) 61.00 1.85 0.29 0.01
High collinearity
𝜔˜ 𝑗 0.74(1.16) 66.50 1.93 0.30 0.01
𝑑 = 20, 𝜏 = 0.50
𝜔𝑗 0.00(1.14) 69.00 2.97 0.30 0.01
Mixed collinearity
𝜔˜ 𝑗 0.00(1.13) 73.00 2.98 0.27 0.01
𝜔𝑗 -0.02(1.18) 53.00 1.51 0.03 0.01
Moderate collinearity
𝜔˜ 𝑗 -0.01(1.19) 65.00 1.66 0.04 0.01
𝜔𝑗 0.00(1.18) 70.50 1.88 0.21 0.01
High collinearity
𝜔˜ 𝑗 0.00(1.17) 76.50 1.91 0.16 0.02
RIDGE-BASED ADAPTIVE WEIGHTS IN PENALISED QUANTILE REGRESSION 35
Table 2. Performance of adaptive weights in 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 at mixed, moderate, and high collinearity
scenarios under the heavy-tailed t-distributions when 𝑑 = 6 & 𝑑 = 20 degrees of freedom.
Adaptive Median (MAD) Correctly Average no. of
weight test error fitted correct zero incorrect zero Median(𝜆)
𝜔𝑗 0.81(1.21) 62.00 2.80 0.26 0.01
Mixed collinearity
𝜔˜ 𝑗 0.80(1.21) 46.50 2.70 0.36 0.01
𝜔𝑗 0.81(1.31) 21.00 0.94 0.04 0.02
Moderate collinearity
𝜔˜ 𝑗 0.82(1.31) 13.00 0.86 0.05 0.02
𝜔𝑗 0.85(1.31) 13.50 0.74 0.01 0.02
High collinearity
𝜔˜ 𝑗 0.85(1.30) 10.50 0.63 0.01 0.02
𝑑 = 6, 𝜏 = 0.50
𝜔𝑗 -0.03(1.19) 69.00 2.84 0.20 0.01
Mixed collinearity
𝜔˜ 𝑗 -0.03(1.18) 72.00 2.88 0.20 0.01
𝜔𝑗 0.00(1.25) 22.50 0.97 0.01 0.02
Moderate collinearity
𝜔˜ 𝑗 -0.01(1.25) 53.00 1.46 0.01 0.02
𝜔𝑗 0.01(1.28) 12.50 0.68 0.00 0.02
High collinearity
𝜔˜ 𝑗 0.02(1.27) 36.50 1.33 0.00 0.03
𝑑 = 6, 𝜏 = 0.75
𝜔𝑗 -0.90(1.26) 57.50 2.86 0.44 0.01
Mixed collinearity
𝜔˜ 𝑗 -0.89(1.25) 62.50 2.90 0.42 0.01
𝜔𝑗 -0.83(1.29) 28.50 1.04 0.03 0.02
Moderate collinearity
𝜔˜ 𝑗 -0.81(1.28) 28.50 1.17 0.02 0.02
𝜔𝑗 -0.79(1.33) 11.50 0.70 0.00 0.02
High collinearity
𝜔˜ 𝑗 -0.77(1.32) 9.00 0.80 0.00 0.02
𝑑 = 20, 𝜏 = 0.25
𝜔𝑗 0.75(1.15) 65.00 2.92 0.31 0.01
Mixed collinearity
𝜔˜ 𝑗 0.74(1.16) 53.50 2.69 0.25 0.01
𝜔𝑗 0.71(1.24) 15.50 0.86 0.01 0.02
Moderate collinearity
𝜔˜ 𝑗 0.71(1.25) 11.50 0.70 0.01 0.02
𝜔𝑗 0.75(1.15) 10.00 0.60 0.01 0.02
High collinearity
𝜔˜ 𝑗 0.75(1.15) 29.50 1.05 0.00 0.02
𝑑 = 20, 𝜏 = 0.50
𝜔𝑗 0.01(1.14) 70.00 2.90 0.23 0.01
Mixed collinearity
𝜔˜ 𝑗 -0.01(1.14) 70.00 2.95 0.26 0.01
𝜔𝑗 -0.01(1.18) 13.50 0.80 0.00 0.02
Moderate collinearity
𝜔˜ 𝑗 -0.02(1.18) 22.00 0.95 0.00 0.02
𝜔𝑗 0.00(1.17) 7.00 0.53 0.00 0.02
High collinearity
𝜔˜ 𝑗 0.00(1.16) 14.00 0.67 0.00 0.03
36 MUDHOMBO & RANGANAI
Table 3. Performance of adaptive weights of the 𝑄𝑅-𝐴𝑅 procedure at mixed, moderate, and high
collinearity scenarios under the heavy-tailed t-distributions when 𝑑 = 6 & 𝑑 = 20 degrees of freedom.
Figure 1. The stacked bar chart shows the performance of the weights 𝜔 𝑗 and 𝜔˜ 𝑗 at different collinearity levels. For each pair of stacked bar
37
charts, the first stacked bar represents the performance of 𝜔 𝑗 (𝑅𝑅𝑊), and the second represents the performance of 𝜔˜ 𝑗 (𝑄𝑅𝑅𝑊). The second
graph shows the performance of the two weights, where the blue line graph is for 𝜔 𝑗 and the red one is for 𝜔˜ 𝑗 .
38 MUDHOMBO & RANGANAI
Table 4. Performance of adaptive weights of 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂, 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇, and 𝑄𝑅-𝐴𝑅 procedures
at mixed, moderate, and high collinearity scenarios under the heavy-tailed 𝑡-distributions when
𝑑 = 20 degrees of freedom.
Adaptive Median (MAD) Correctly Average no. of
weight test error fitted correct zero incorrect zero Median(𝜆)
𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂, 𝑑 = 20, 𝜏 = 0.75
𝜔𝑗 -0.74(1.14) 63.50 2.96 0.38 0.01
Mixed collinearity
𝜔˜ 𝑗 -0.73(1.14) 65.50 2.92 0.32 0.00
𝜔𝑗 -0.78(1.19) 52.50 1.49 0.08 0.01
Moderate collinearity
𝜔˜ 𝑗 -0.79(1.19) 53.00 1.58 0.13 0.01
𝜔𝑗 -0.74(1.18) 54.00 1.86 0.37 0.01
High collinearity
𝜔˜ 𝑗 -0.74(1.18) 55.50 1.80 0.31 0.01
𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇, 𝑑 = 20, 𝜏 = 0.75
𝜔𝑗 -0.75(1.16) 68.50 2.92 0.29 0.01
Mixed collinearity
𝜔˜ 𝑗 -0.75(1.16) 62.50 2.81 0.26 0.00
𝜔𝑗 -0.78(1.20) 13.00 0.83 0.01 0.02
Moderate collinearity
𝜔˜ 𝑗 -0.78(1.19) 21.50 1.00 0.02 0.02
𝜔𝑗 -0.75(1.17) 10.50 0.57 0.01 0.02
High collinearity
𝜔˜ 𝑗 -0.75(1.17) 5.00 0.44 0.01 0.02
QR−ARIDGE, 𝑑 = 20, 𝜏 = 0.75
𝜔𝑗 -0.76(1.19) 0.00 0.51 0.00 0.03
Mixed collinearity
𝜔˜ 𝑗 -0.75(1.19) 0.00 0.55 0.00 0.01
𝜔𝑗 -0.85(1.24) 0.00 0.03 0.00 0.05
Moderate collinearity
𝜔˜ 𝑗 -0.85(1.24) 0.00 0.02 0.00 0.07
𝜔𝑗 -0.80(1.21) 0.00 0.00 0.00 0.05
High collinearity
𝜔˜ 𝑗 -0.80(1.21) 0.00 0.00 0.00 0.04
RIDGE-BASED ADAPTIVE WEIGHTS IN PENALISED QUANTILE REGRESSION 39
Table 5. Estimated coefficients and biases for the Jet-Turbine Engine data set with 𝑑 = 6.
𝜏 = 0.25 𝜏 = 0.50
Adaptive QR-ALASSO QR-AE-NET QR-ALASSO QR-AE-NET
weight 𝛽 𝛽(𝐵𝑖𝑎𝑠) 𝛽(𝐵𝑖𝑎𝑠) 𝛽(𝐵𝑖𝑎𝑠) 𝛽(𝐵𝑖𝑎𝑠)
-0.72 -3.20(2.48) -11.08(10.36) 35.71(-36.43) 35.71(-36.43)
0.00 0.00(0.00) 0.04(-0.04) 0.01(-0.01) 0.01(0.01)
0.00 0.00(0.00) 0.00(0.00) 0.00(0.00) 0.00(0.00)
𝜔𝑗 0.00 0.00(0.00) 0.00(0.00) 0.00(0.00) 0.00(0.00)
6.00 6.00(0.00) 5.62(0.38) 5.95(0.05) 5.95(0.05)
0.00 0.00(0.00) -0.03(0.03) 0.01(-0.01) 0.01(-0.01)
-3.00 -2.97(-0.03) -3.11(0.11) -2.95(-0.05) -2.95(-0.05)
0.00 -7.30(7.30) -11.51(11.51) 35.71(-35.71) 35.71(-35.71)
0.00 0.00(0.00) 0.04(-0.04) 0.01(-0.01) 0.01(-0.01)
0.00 0.00(0.00) 0.00(0.00) 0.00(0.00) 0.00(0.00)
𝜔˜ 𝑗 0.00 0.00(0.00) 0.00(0.00) 0.00(0.00) 0.00(0.00)
6.00 6.00(0.00) 5.62(0.38) 5.96(0.04) 5.95(0.05)
0.00 0.00(0.00) -0.03(0.03) 0.01(-0.01) 0.01(-0.01)
-3.00 -2.93(0.07) -3.10(0.10) -2.95(-0.05) -2.95(-0.05)
1 The coefficients are estimated at 𝜏 = (0.25, 0.50) 𝑅𝑄 levels for each of the adaptive weights 𝜔 𝑗 and 𝜔˜ 𝑗 of the penalised
𝑄𝑅 procedures.
𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 scenario. However, at 𝜏 = 0.50 and 𝑑 ∈ (6, 20), the two adaptive weights perform the
same.
4. Discussion
This article compared the 𝑄𝑅𝑅-based adaptive weights 𝜔˜ 𝑗 and the 𝑅𝑅-based adaptive weights 𝜔 𝑗 .
These adaptive weights are used to formulate some variable selection and regularisation procedures
(𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂, 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇, and 𝑄𝑅-𝐴𝑅). The adaptive weights 𝜔˜ 𝑗 have the advantage that each
weight is different at each 𝑅𝑄 level as compared to constant weights for all quantile levels in the case
of 𝜔 𝑗 (Mudhombo and Ranganai, 2022).
A simulation study is used to compare the adaptive weights based on their performance in the
mixed, moderate, and high collinearity scenarios. We compare the performance of the adaptive
weights 𝜔 𝑗 and 𝜔˜ 𝑗 by checking the performance of the 𝑄𝑅-𝐴𝑅, 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂 and 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇
procedures in variable selection and regularisation.
In the presence of mixed collinearity (a combination of very high and very low collinearity), the
adaptive weights 𝜔˜ 𝑗 outperform the weights 𝜔 𝑗 at the median quantiles, while the latter is better in
the lower quantiles in terms of prediction under 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂 procedure. The 𝑄𝑅𝑅-based adaptive
weights are superior in correctly fitting models and in correctly shrinking zero coefficients. When
the 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 procedure is used, 𝜔˜ 𝑗 outperforms 𝜔 𝑗 in prediction. The adaptive weights perform
the same in prediction under the 𝑄𝑅-𝐴𝑅 scenario.
In the moderate collinearity situation under the 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂 scenario, the two adaptive weights
40 MUDHOMBO & RANGANAI
Table 6. Estimated coefficients and biases for the Jet-Turbine Engine data set with 𝑑 = 20.
𝜏 = 0.25 𝜏 = 0.50
Adaptive QR-ALASSO QR-AE-NET QR-ALASSO QR-AE-NET
weight 𝛽 𝛽(𝐵𝑖𝑎𝑠) 𝛽(𝐵𝑖𝑎𝑠) 𝛽(𝐵𝑖𝑎𝑠) 𝛽(𝐵𝑖𝑎𝑠)
-0.72 1.06(-1.78) 24.17(-24.89) -35.37(34.65) -35.37(34.65)
0.00 0.00(0.00) 0.04(-0.04) 0.02(-0.02) 0.02(-0.02)
0.00 0.00(0.00) 0.00(0.00) 0.00(0.00) 0.00(0.00)
𝜔𝑗 0.00 0.00(0.00) 0.00(0.00) 0.00(0.00) 0.00(0.00)
6.00 5.93(0.07) 5.55(0.45) 5.84(0.16) 5.84(0.16)
0.00 0.01(-0.01) 0.01(-0.01) 0.00(0.00) 0.00(0.00)
-3.00 -3.11(0.11) -3.22(0.22) -3.16(0.16) -3.16(0.16)
0.00 -4.03(4.03) 54.84(-54.84) -35.37(35.37) -35.37(35.37)
0.00 0.00(0.00) 0.04(-0.04) 0.02(-0.02) 0.02(-0.02)
0.00 0.00(0.00) 0.00(0.00) 0.00(0.00) 0.00(0.00)
𝜔˜ 𝑗 0.00 0.00(0.00) 0.00(0.00) 0.00(0.00) 0.00(0.00)
6.00 5.92(0.08) 5.56(0.44) 5.84(0.16) 5.84(0.16)
0.00 0.01(-0.01) 0.01(-0.01) 0.00(0.00) 0.00(0.00)
-3.00 -3.14(0.14) -3.23(0.22) -3.16(0.16) -3.16(0.16)
1 The coefficients are estimated at 𝜏 = (0.25, 0.50) 𝑅𝑄 levels for each of the adaptive weights 𝜔 𝑗 and 𝜔˜ 𝑗 of the penalised
𝑄𝑅 procedures.
perform similarly in prediction performance. Although 𝜔 𝑗 performs better in correctly fitting models
and correctly shrinking zero coefficients at lower quantile levels, 𝜔˜ 𝑗 performs better at 𝜏 = 0.50.
The 𝑄𝑅-𝐴𝐸-𝑁 𝐸𝑇 scenario shows the 𝑅𝑅-based adaptive weights outperforming the 𝑄𝑅𝑅-based
adaptive weights in the majority of cases in prediction, though 𝜔ˇ 𝑗 is better at correctly fitting models.
The adaptive weights have similar prediction performance most of the time in the presence of
high collinearity, although 𝜔˜ 𝑗 is better at correctly fitting models in the 𝑄𝑅-𝐴𝐿 𝐴𝑆𝑆𝑂 scenario.
The adaptive weights are comparatively similar in the percentage of correctly fitted models in all
scenarios.
References
Adkins, L. C., Waters, M. S., Hill, R. C., et al. (2015). Collinearity diagnostics in gretl. Economics
Working Paper Series, 1506, 1–28.
Arslan, O. (2012). Weighted LAD-LASSO method for robust parameter estimation and variable
selection in regression. Computational Statistics & Data Analysis, 56, 1952–1965.
Bagheri, A. and Midi, H. (2012). On the performance of the measure for diagnosing multiple high
leverage collinearity-reducing observations. Mathematical Problems in Engineering, 2012.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle
properties. Journal of the American Statistical Association, 96, 1348–1360.
Frommlet, F. and Nuel, G. (2016). An adaptive ridge procedure for 𝑙 0 regularization. PloS One,
RIDGE-BASED ADAPTIVE WEIGHTS IN PENALISED QUANTILE REGRESSION 41
11, e0148620.
Gibbons, D. G. (1981). A simulation study of some ridge estimators. Journal of the American
statistical Association, 76, 131–139.
Gunst, R. and Mason, R. (1980). Regression analysis and its applications.
Hastie, T., Tibshirani, R., and Friedman, J. H. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, volume 2. Springer.
Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal
problems. Technometrics, 12, 55–67.
Koenker, R. and Bassett Jr, G. (1978). Regression quantiles. Econometrica: journal of the
Econometric Society, 33–50.
Montgomery, D. C. (2017). Design and Analysis of Experiments. John wiley & sons.
Montgomery, D. C., Runger, G. C., and Hubele, N. F. (2009). Engineering Statistics. John Wiley
& Sons.
Mudhombo, I. and Ranganai, E. (2022). Robust variable selection and regularization in quantile
regression based on adaptive-LASSO and adaptive E-NET. Computation, 10, 203.
Muniz, G. and Kibria, B. G. (2009). On some ridge regression estimators: An empirical compar-
isons. Communications in Statistics – Simulation and Computation®, 38, 621–630.
Norouzirad, M., Hossain, S., and Arashi, M. (2018). Shrinkage and penalized estimators
in weighted least absolute deviations regression models. Journal Statistical Computation and
Simulation, 88, 1557–1575.
Ranganai, E. and Mudhombo, I. (2021). Variable selection and regularization in quantile regression
via minimum covariance determinant based weights. Entropy, 23, 33.
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal
Statistical Society. Series B (Methodological), 58, 267–288.
Yi, C. (2017). hqreg: Regularization Paths for Lasso or Elastic-Net Penalized Huber Loss Regression
and Quantile Regression. R package version 1.4.
URL: https:// CRAN.R-project.org/ package=hqreg
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical
Association, 101, 1418–1429.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of
the Royal Statistical Society, Series B, 67, 301–320.
Zou, H. and Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters.
Annals of Statistics, 37, 1733–1751.
42
Proceedings of the 64th Annual Conference of the South African Statistical Association
© 2023 South African Statistical Association
The similarity between spatial point pattern data sets is crucial for evaluating the
quality and changes in spatial data. A generic similarity test has been developed that
is able to handle any type of spatial data. When comparing unmarked point patterns,
the generic test starts by calculating the kernel density estimate which requires a
bandwidth value. In this research, we test the similarity between unmarked point
patterns using this recently proposed generic similarity test. The focus of this work
is to assess the effect that the bandwidth choice has on this similarity test. A
simulation study is done to evaluate the effect of the different bandwidths on the
similarity test. From the simulation study, it is seen that the similarity test could be
sensitive towards the choice of the bandwidth depending on the number of points
being compared and whether the points are being compared on the same window or
not but is robust in general.
Keywords: Bandwidth, Generic similarity test, Point patterns, Spatial similarity
1. Introduction
Spatial data is data that references a specific location and contains information about variables at
that location. It can take on different forms, namely geostatistical data, lattice data, or point patterns
(Cressie, 2015). Geostatistical data are measurements of spatial data that have been collected at
predetermined locations. Lattice data are observations observed on a subset of a spatial domain.
Point patterns are the collection of events that take place in a finite number of locations. Point
patterns may be marked or unmarked. If attributes are observed at each location, the data is known
as a marked point pattern, and if only the location is known the data is an unmarked point pattern.
Spatial data sets are declared similar when the spatial data sets originate from the same stochastic
process in terms of their spatial structure (Borrajo et al., 2020). Spatial point patterns and the
similarity between them have become of interest to many researchers and some tests have been
proposed, namely work done by Andresen (2009); Alba-Fernández et al. (2016). These tests may
be used to determine how similar spatial point patterns of interest and the population at risk are,
to compare two spatial point patterns of interest, or to compare the similarity between one event
measured at different time points. This research will specifically focus on a recently developed
similarity test for unmarked spatial point patterns by Kirsten and Fabris-Rotelli (2021).
43
44 NEL, STANDER & FABRIS-ROTELLI
Andresen (2009) developed a test that evaluates the similarity between two different point patterns
using a non-parametric approach, known as the spatial point pattern test. The test results in a local
measure, as well as a global measure, of spatial similarity. The local measure of similarity is used to
indicate the locations of significantly higher, significantly lower, and insignificant differences in the
concentration of a spatial point pattern. The output of the local measure of the test can be mapped
which makes it a popular test to use. In order to perform the test proposed an index of similarity is
calculated for each spatial unit, e.g. grid cells. The proportion of spatial units with a similar spatial
pattern for both sets of data is represented by the 𝑆-index, which is the global similarity measure.
The spatial point pattern test has been used to test the spatial similarity of crime data by Andresen
(2009), Andresen and Linning (2012), Andresen and Malleson (2013a,b, 2014), and Linning (2015).
Kirsten and Fabris-Rotelli (2021) proposed a generic spatial similarity test that can handle more
than one type of spatial data. This test consists of three significant steps. First, a pixel image
representation of both data sets must be obtained. Secondly, the structural similarity index (SSIM
index) is calculated for each pixel (Wang et al., 2004). In the third step, a global similarity index is
calculated based on Andresen’s 𝑆-index (Andresen, 2009).
Using the generic spatial similarity test to obtain a pixel image representation of unmarked point
patterns, kernel density estimation (KDE) is used. Kirsten and Fabris-Rotelli (2021) used Diggle’s
bandwidth and focused on how the similarity test handles various types of spatial data. In this
research, we specifically apply the similarity test to unmarked point patterns with the focus to
investigate the effect of different bandwidths on the performance of the test.
Using the individual locations of sample data, kernel density estimation results in a smooth empir-
ical distribution function (Węglarczyk, 2018). Węglarczyk (2018) explores the different symmetric
and asymmetric kernels that can be used in one-dimensional non-spatial data, such as Gaussian,
Epanechinikov, biweight, triangular, gamma, and rectangular. These kernel functions can be ex-
tended to spatial data as well, in other words, bivariate data. The type of kernel chosen is not of
too much importance, however, the chosen bandwidth plays a fundamental role in kernel density
estimation. The bandwidth of the kernel is known as the standard deviation of the kernel or it can
be seen as the smoothing parameter of the kernel (Kirsten and Fabris-Rotelli, 2021). There are
various bandwidths that can be used when estimating the KDE for unmarked point patterns such as
Diggle’s bandwidth (Berman and Diggle, 1989), likelihood cross-validation method (Loader, 2006)
and Scott’s rule of thumb (Odell-Scott, 1992).
Selecting the most suitable bandwidth is not an easy task. Kuter et al. (2011) studied the effects of
different bandwidth choices and kernel density functions using Turkish fire density mapping based
on forest fire records at the forest sub-district level. Heidenreich et al. (2013) did a simulation study
to find a data-driven optimal bandwidth focusing on small and moderate sample sizes and smooth
densities. They found that the choice of bandwidth does, in fact, matter in terms of the quality of the
density estimation. It was found that different bandwidths are preferred in different situations. This
brings us back to the problem at hand, to assess the effect of different bandwidths on the robustness
of the similarity test proposed by Kirsten and Fabris-Rotelli (2021).
Section 2 will discuss the methodology used to perform the similarity test as well as introduce
different possible bandwidths to be considered. In Section 3 the method will be tested using the
different bandwidths with a simulation study. Section 4 will discuss the results of the simulation
study. Section 5 will be the conclusion.
BANDWIDTH SELECTION IN A GENERIC SIMILARITY TEST FOR SPATIAL DATA 45
(a) Complete spatial random (b) Clustered point pattern (c) Regular point pattern
point pattern
Figure 1. Different types of spatial point patterns.
2. Methodology
2.1 Point pattern theory
A point process 𝑋 = {𝑋1 , 𝑋2 , . . . 𝑋𝑛 } with 𝑋𝑖 ∈ 𝐷 ⊂ R𝑑 is a stochastic model governing the location
of events in a subset of the spatial domain 𝐷 (Cressie, 2015). Point processes are stochastic models
consisting of irregular point patterns (Illian et al., 2008). A spatial point pattern, 𝑥 = {𝑥1 , 𝑥2 , . . . 𝑥 𝑛 },
is a collection of points giving the observed spatial locations of objects or occurrences (Baddeley
et al., 2015). A point pattern is interpreted as a sample from a point process (Illian et al., 2008). In
point pattern data analyses 𝑋𝑖 ∈ 𝐷 would usually be in two or three dimensions. This could be the
locations of earthquakes, trees in a forest, road accidents and many more. An example of a point
process is a spatial Poisson process (Cox and Isham, 1980).
There are three classifications of point patterns data namely, complete spatial random (CSR),
clustered and regular. These are illustrated in Figure 1. A CSR pattern occurs when the locations of
the points are randomly distributed in space. A clustered pattern occurs when the points are grouped
together in certain regions of space. A regular point pattern occurs when spatial points inhibit each
other. If the point pattern is modelled as a Poisson process with parameter 𝜆, where 𝜆 is the intensity
of the process, then the expected number of points per unit of space for CSR pattern is equal to 𝜆,
𝐸 [𝑋] = 𝜆, for a clustered pattern the expected number of points per unit of space is greater than 𝜆,
𝐸 [𝑋] > 𝜆, and for a regular pattern it is smaller than 𝜆, 𝐸 [𝑋] < 𝜆.
which is estimated using numerical integration. The numerical integration is done by dividing the
spatial domain into a finer 𝑔 × 𝑔 grid. The centroids of the 𝑄 = 𝑔 2 grid cells are denoted as the
spatial locations 𝑣 = {𝑣 1 , 𝑣 2 , ..., 𝑣 𝑄 }. In order to calculate (2) through numerical integration, one
must calculate the differences, 𝑑 𝑒 = {𝑑1 , 𝑑2 , ..., 𝑑 𝑄 }, between the coordinates of each observation
in the spatial point pattern, 𝑥 𝑖 , 𝑖 = 1, . . . 𝑛, and the spatial locations 𝑣 𝑘 , 𝑘 = 1, ..., 𝑄, needs to be
calculated. The edge correction factor is then calculated as
area(D) ∑︁
𝑄
𝑒(𝑥𝑖 ) = 𝑓 (𝑑 𝑘 ), (3)
𝑄 𝑘=1
where 𝑓 (𝑑 𝑘 ) is the bivariate Gaussian density. An illustration of a point pattern and the resulting
pixel image representation for 𝑚 = 5 and 𝑚 = 15 is given in Figure 3.
(a) 𝑚 = 5 (b) 𝑚 = 15
Figure 2. Illustration of how the spatial domain is divided into pixels for two values of parameter 𝑚.
The 𝑢 𝑗 are represented by the dots.
BANDWIDTH SELECTION IN A GENERIC SIMILARITY TEST FOR SPATIAL DATA 47
where 𝛼 > 0, 𝛽 > 0 and 𝛾 > 0 and 𝑦 𝑖 𝑗 are the values contained in the sliding window 𝑗 of data set 𝑖.
Wang et al. (2004) suggests 𝛼 = 𝛽 = 𝛾 = 1, which ensures equal weight is given to each term. The
components of the SSIM value are calculated as follows,
2𝜇 𝑦1 𝑗 𝜇 𝑦2 𝑗 + 𝐶1
Luminance: 𝑙 (𝑦 1 𝑗 , 𝑦 2 𝑗 ) = ,
𝜇2𝑦1 𝑗 + 𝜇2𝑦2 𝑗 + 𝐶1
2𝜎𝑦1 𝑗 𝜎𝑦2 𝑗 + 𝐶2
Contrast: 𝑐(𝑦 1 𝑗 , 𝑦 2 𝑗 ) = ,
𝜎𝑦21 𝑗 + 𝜎𝑦22 𝑗 + 𝐶2
𝜎𝑦1 𝑗 ,𝑦2 𝑗 + 𝐶3
Structure: 𝑠(𝑦 1 𝑗 , 𝑦 2 𝑗 ) = .
𝜎𝑦1 𝑗 𝜎𝑦2 𝑗 + 𝐶3
The 𝐶1 , 𝐶2 and 𝐶3 values are the constants which are used to avoid inconsistency (Wang et al.,
2004). In literature, the constants are calculated as 𝐶1 = (𝐾1 𝐿) 2 , 𝐶2 = (𝐾2 𝐿) 2 and 𝐶3 = 𝐶22 where
𝐾1 = 0.01, 𝐾2 = 0.03 and 𝐿 is the difference between the maximum pixel value and the minimum
pixel value from the two images (Wang et al., 2004). The 𝜇𝑖 𝑗 are the mean values of the 𝑌𝑖 and the
𝜎𝑖 𝑗 are the standard deviation of the 𝑌𝑖 .
1 ∑︁
𝑀
𝐺𝑆 = 𝑆𝑆𝐼 𝑀 (𝑢 𝑗 ), (5)
𝑀 𝑗=1
where 𝑆𝑆𝐼 𝑀 (𝑢 𝑗 ) is the SSIM value for the pixel with centroid 𝑢 𝑗 and 𝑀 number of pixels in the
pixel image. This provides a mean similarity value instead of a proportion of similar areas as in
Andresen’s 𝑆-index (Andresen, 2009) within the domain which is expected to improve the accuracy.
Diggle’s bandwidth
The algorithm used to calculate Diggle’s bandwidth uses a method by Berman and Diggle (1989) to
compute the quantity
𝑀𝑆𝐸 (𝜎) − 𝜆 𝑑 (0)
𝑀 (𝜎) = , (6)
𝜆2
where 𝜎 is the bandwidth, 𝜆 the mean intensity, 𝑀𝑆𝐸 (𝜎) = 𝐸 {[𝜆˜ 𝜎 (𝑥 𝑖 ) − Λ(𝑥𝑖 )] 2 } is the mean
squared error at bandwidth 𝜎 and 𝜆˜ 𝜎 (𝑥𝑖 ) = 𝑁 (𝐵 𝜎 ( 𝑥𝑖 ) ) . Diggle’s bandwidth assumes a stationary
| 𝐵 𝜎 ( 𝑥𝑖 ) |
Cox process and Λ(𝑥𝑖 ) is the rate process of the Cox process (Cressie, 2015). 𝐵 𝜎 (𝑥 𝑖 ) is the 𝑑-
dimensional sphere of radius 𝜎 centred at 𝑥𝑖 and 𝑁 (𝐵 𝜎 (𝑥𝑖 )) denotes the number of points of the
underlying Cox process in the 𝑑-dimensional sphere of radius 𝜎 as defined by Berman and Diggle
(1989). The bandwidth 𝜎 is chosen to minimise the mean square error criterion by direct inspection
or numerical integration (Diggle, 1985).
Likelihood cross-validation
This method determines an acceptable bandwidth 𝜎 for the kernel density estimate of a point process
intensity using a kernel smoothed intensity function for which 𝜎 maximises the point process
likelihood cross-validation criterion (Loader, 2006)
∑︁ ∫
𝐿𝐶𝑉 (𝜎) = log(𝜆ˆ −𝑖 (𝑥𝑖 )) − ˆ
𝜆(𝑢) 𝑑𝑢,
∀𝑖 ∀𝐷
where 𝑥𝑖 the point locations of the point pattern, 𝑢 are the spatial locations of the centroids of the
grid, 𝐷 is the spatial domain and 𝜆ˆ −𝑖 (𝑥𝑖 ) is leave-one-out kernel-smoothing estimate of the intensity
ˆ
at 𝑥 𝑖 with smoothing bandwidth 𝜎. The kernel smoothing estimate of the intensity is 𝜆(𝑢) at a spatial
location 𝑢 with smoothing bandwidth 𝜎 (Loader, 2006).
BANDWIDTH SELECTION IN A GENERIC SIMILARITY TEST FOR SPATIAL DATA 49
𝜎 ∝ 𝑛 ( 𝑑+4 ) ,
−1
where 𝑛 is the number of points and 𝑑 the number of spatial dimensions. In most cases 𝑑 = 2. This
rule can be calculated relatively quickly. Compared to Diggle’s bandwidth, it often produces a larger
bandwidth.
3. Simulation study
A simulation study is conducted in this section to evaluate the performance of the proposed spatial
similarity test on different spatial point patterns with various bandwidth choices. The purpose of
the simulation study is to generate data, apply the spatial similarity test to the data, and investigate
which choice of bandwidth yields the best set of results that are in line with the expected results. The
robustness of the choice of the bandwidth value on the similarity test will be evaluated by comparing
the similarity scores each bandwidth yields and the expected similarity score.
using the rSSI function and the clustered spatial point patterns are simulated using the rMatClust
function (Baddeley et al., 2015). The built-in functions in R for the bandwidths used are, bw.diggle,
bw.ppl, bw.CvL, bw.scott, bw.abram.ppp, bw.frac and bw.stoyan (Baddeley et al., 2015).
Summary statistics of the results obtained from the simulation study are given in Table 3. A
constant intensity refers to a homogeneous point pattern and a non-constant intensity refers to an
inhomogeneous point pattern.
The bandwidth 𝜎 is calculated as a quantile of the distance between two independent random
locations in the window. The lower quartile of the distribution is used as the default. Suppose 𝐹 (𝜎)
is a uniform cumulative distribution function representing the distance between two independent
random points in a window, then the value returned is the quantile with probability 𝑓 . Thus, the
value 𝜎 such that 𝐹 (𝜎) = 𝑓 is the bandwidth.
the similarity test performs well. The standard deviations of these bandwidths are all quite similar
except for bandwidth selection on a geometry window.
The similarity test yields larger than expected similarity values for the second simulation method.
Abramson’s adaptive bandwidth performs best in terms of mean (0.8455, 0.8616, 0.8800) and median
(0.8640, 0.9062, 0.9151). It is still higher than expected when looking at 70% and 80% identical point
patterns, but lower than the rest of the bandwidths. However, it has the largest standard deviation
(0.1350, 0.1316, 0.1188) and coefficient of variation (0.1597, 0.1527, 0.1350). The reason for this
might be that this bandwidth determines a bandwidth for each point in the spatial data set and a pixel
image representation is obtained for each point. The rest of the bandwidths yields large similarity
values (mean and median) with small standard deviations and coefficients of variation. Stoyan’s
rule of thumb yields the second closest mean (0.8790, 0.9172, 0.9606) and median (0.8733, 0.9160,
0.9611) to what is expected and small standard deviation (0.0699, 0.0517, 0.0794) and coefficient of
variation (0.0794, 0.0564, 0.0267).
The third method yields higher similarity values than expected, particularly when considering the
mean and median value. The standard deviation and coefficient of variation is small. Stoyan’s rule
of thumb yields the closest to expected similarity values where the means are 0.9092, 0.9333 and
0.9681, and the medians are 0.9229, 0.9464 and 0.9714. Note these values are still very high and
might be because this case is highly theoretical.
Overall for the simulation study all bandwidths, except bandwidth selection based on a geometry
window, perform quite well for the similarity test. Diggle’s bandwidth performed best for the noisy
patterns. Abramson’s adaptive bandwidth performed best for point patterns that have uneven sample
sizes. It can still be investigated how a change in constants for Abramson’s adaptive bandwidth and
Stoyan’s rule of thumb influences the result of the similarity test as well as whether a change in the
probability value 𝑓 for bandwidth selection based on a geometry window will yield better results.
5. Conclusion
The robustness of the proposed spatial similarity test (Kirsten and Fabris-Rotelli, 2021) to different
bandwidths was tested. Diggle’s bandwidth (Diggle, 1985), likelihood cross-validation (Loader,
2006), Cronie & van Lieshout (Cronie and Van Lieshout, 2018), Scott’s rule of thumb (Odell-
Scott, 1992), Abramson’s adaptive bandwidths (Abramson, 1982), bandwidth selection based on a
geometry window (Baddeley et al., 2015) and Stoyan’s rule of thumb (Stoyan and Stoyan, 1994)
were the different bandwidths used to compute the pixel image representation in Step 1 of the spatial
similarity test in order to test the robustness of the test. A suggestion for future work is to investigate
how a change in constants for Abramson’s adaptive bandwidth and Stoyan’s rule of thumb influences
the result of the similarity test as well as whether a change in the probability value 𝑓 for bandwidth
selection based on a geometry window will yield better results. Another suggestion for future work
is a further investigation on the negative similarity values obtained.
The applications in Section 5 also provided a real data case for testing similarity across different
windows. It was observed that different bandwidths perform differently for point patterns of different
sizes and point patterns with different windows.
BANDWIDTH SELECTION IN A GENERIC SIMILARITY TEST FOR SPATIAL DATA 53
Table 3. Summary statistics of the results from the proposed spatial similarity test.
Diggle Likelihood Cronie & Scott Abramson Geometry Stoyan
cross van window
Lieshout
Method one
Mean
70% 0.7316 0.7782 0.7793 0.7941 0.7746 0.8973 0.6652
80% 0.8047 0.8365 0.8458 0.8525 0.8379 0.9283 0.7548
90% 0.8899 0.9062 0.9179 0.9244 0.9071 0.9686 0.8539
Median
70% 0.7387 0.7819 0.7919 0.8119 0.7824 0.9330 0.6880
80% 0.8271 0.8670 0.8793 0.8888 0.8642 0.9572 0.7889
90% 0.9221 0.9453 0.9497 0.9582 0.9492 0.9872 0.8905
Standard deviation
70% 0.1401 0.1585 0.1346 0.1373 0.1645 0.0980 0.1447
80% 0.1486 0.1519 0.1537 0.1513 0.1547 0.1083 0.1621
90% 0.1478 0.1246 0.1378 0.1307 0.1351 0.0690 0.1725
Coeffcient of variation
70% 0.1914 0.2037 0.1727 0.1729 0.2124 0.1092 0.2175
80% 0.1847 0.1815 0.1817 0.1775 0.1846 0.1167 0.2148
90% 0.1661 0.1375 0.1501 0.1414 0.1489 0.0712 0.2020
Method two
Mean
70% 0.9186 0.9421 0.9362 0.9424 0.8455 0.9749 0.8790
80% 0.9443 0.9616 0.9561 0.9598 0.8616 0.9839 0.9172
90% 0.9704 0.9799 0.9780 0.9806 0.8800 0.9930 0.9606
Median
70% 0.9377 0.9616 0.9499 0.9516 0.8640 0.9856 0.8733
80% 0.9542 0.9752 0.9693 0.9691 0.9062 0.9925 0.9160
90% 0.9818 0.9877 0.9855 0.9858 0.9151 0.9972 0.9611
Standard deviation
70% 0.0680 0.0601 0.0567 0.0465 0.1350 0.0320 0.0699
80% 0.0459 0.0405 0.0416 0.0383 0.1316 0.0228 0.0517
90% 0.0339 0.0230 0.0235 0.0193 0.1188 0.0089 0.0794
Coeffcient of variation
70% 0.0740 0.0638 0.0606 0.0493 0.1597 0.0328 0.0794
80% 0.0486 0.0421 0.0435 0.0399 0.1527 0.0232 0.0564
90% 0.0350 0.0235 0.0240 0.0197 0.1350 0.0089 0.0267
Method three
Mean
70% 0.9627 0.9567 0.9672 0.9682 0.9326 0.9842 0.9092
80% 0.9667 0.9640 0.9685 0.9734 0.9465 0.9874 0.9333
90% 0.9814 0.9827 0.9842 0.9867 0.9750 0.9926 0.9681
Median
70% 0.9824 0.9697 0.9797 0.9800 0.9557 0.9913 0.9229
80% 0.9836 0.9752 0.9810 0.9856 0.9636 0.9927 0.9464
90% 0.9836 0.9892 0.9933 0.9938 0.9841 0.9968 0.9714
Standard deviation
70% 0.0360 0.0360 0.0312 0.0313 0.0589 0.0174 0.0659
80% 0.0394 0.0307 0.0291 0.0258 0.0464 0.0127 0.0489
90% 0.0163 0.0155 0.0200 0.0150 0.0249 0.0106 0.0229
Coeffcient of variation
70% 0.0374 0.0377 0.0323 0.0324 0.0632 0.0177 0.0725
80% 0.0408 0.0319 0.0300 0.0265 0.0491 0.0129 0.0524
90% 0.0166 0.0158 0.0203 0.0152 0.0256 0.0107 0.0237
54 NEL, STANDER & FABRIS-ROTELLI
References
Abramson, I. S. (1982). On bandwidth variation in kernel estimates: A square root law. The Annals
of Statistics, 10, 1217–1223.
Alba-Fernández, M., Ariza-López, F., Jiménez-Gamero, M. D., and Rodríguez-Avi, J. (2016).
On the similarity analysis of spatial patterns. Spatial Statistics, 18, 352–362.
Andresen, M. A. (2009). Testing for similarity in area-based spatial patterns: A nonparametric
Monte Carlo approach. Applied Geography, 29, 333–345.
Andresen, M. A. and Linning, S. J. (2012). The (in)appropriateness of aggregating across crime
types. Applied Geography, 35, 275–282.
Andresen, M. A. and Malleson, N. (2013a). Crime seasonality and its variations across space.
Applied Geography, 43, 25–35.
Andresen, M. A. and Malleson, N. (2013b). Spatial heterogeneity in crime analysis. In Crime
Modeling and Mapping Using Geospatial Technologies. Springer, 3–23.
Andresen, M. A. and Malleson, N. (2014). Police foot patrol and crime displacement: A local
analysis. Journal of Contemporary Criminal Justice, 30, 186–199.
Baddeley, A., Rubak, E., and Turner, R. (2015). Spatial Point Patterns: Methodology and
Applications with R. CRC press.
Berman, M. and Diggle, P. (1989). Estimating weighted integrals of the second-order intensity of
a spatial point process. Journal of the Royal Statistical Society: Series B (Methodological), 51,
81–92.
Borrajo, M., González-Manteiga, W., and Martínez-Miranda, M. (2020). Testing for signifi-
cant differences between two spatial patterns using covariates. Spatial Statistics, 40, 100379.
Cox, D. R. and Isham, V. (1980). Point Processes, volume 12. CRC Press.
Cressie, N. (2015). Statistics for Spatial Data. John Wiley & Sons.
Cronie, O. and Van Lieshout, M. N. M. (2018). A non-model-based approach to bandwidth
selection for kernel estimators of spatial intensity functions. Biometrika, 105, 455–462.
Diggle, P. (1985). A kernel method for smoothing point process data. Journal of the Royal Statistical
Society: Series C (Applied Statistics), 34, 138–147.
Guan, Y. (2007). A least-squares cross-validation bandwidth selection approach in pair correlation
function estimations. Statistics & Probability Letters, 77, 1722–1729.
Hall, P. and Marron, J. (1988). Variable window width kernel estimates of probability densities.
Probability Theory and Related Fields, 80, 37–49.
Heidenreich, N.-B., Schindler, A., and Sperlich, S. (2013). Bandwidth selection for kernel
density estimation: A review of fully automatic selectors. AStA Advances in Statistical Analysis,
97, 403–433.
Illian, J., Penttinen, A., Stoyan, H., and Stoyan, D. (2008). Statistical Analysis and Modelling
of Spatial Point Patterns. John Wiley & Sons.
Kirsten, R. and Fabris-Rotelli, I. N. (2021). A generic test for the similarity of spatial data. South
African Statistical Journal, 55, 55–71.
Kuter, S., Usul, N., and Kuter, N. (2011). Bandwidth determination for kernel density analysis
BANDWIDTH SELECTION IN A GENERIC SIMILARITY TEST FOR SPATIAL DATA 55