AI and Machine Learning Report Sample 5
AI and Machine Learning Report Sample 5
I confirm that the work was solely undertaken by myself and that no help was provided from
any other sources than those permitted. All sections of the thesis that use quotes or describe
an argument or concept developed by another author have been referenced, including all
secondary literature used, to show that this material has been adopted to support my work.
ii
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction 1
2 Related Work 3
2.1 Stylised facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Financial Time Series Generation . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Other related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Dataset 7
3.1 Data acquisition and preparation for generative models . . . . . . . . . . . . 7
4 Methodology 9
4.1 Variatonal Autoencoders and Generation . . . . . . . . . . . . . . . . . . . 9
4.1.1 Variational Autoencoders . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.2 Vector Quantized Variational Autoencoder . . . . . . . . . . . . . . 10
4.2 Final Generative Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Long short Term Memory Neural Networks for Stock movement prediction . 18
A Appendix 42
A.1 Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A.2 Example of a bad Generative Model . . . . . . . . . . . . . . . . . . . . . . 44
iii
iv
List of Figures
3.1 Close log return price time window samples of length 256. . . . . . . . . . . 8
A.1 Mean moment comparison between VAE Dense Layer 2 generation and four
real batches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
A.2 Variance moment comparison between VAE Dense Layer 2 generation and four
real batches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A.3 Skewness moment comparison between VAE Dense Layer 2 generation and four
real batches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
v
A.4 Kurtosis moment comparison between VAE Dense Layer 2 generation and four
real batches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.5 Probability density function comparison between VAE Dense Layer 2 generation
and four real batches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.6 Autocorrelation plot obtained on a batch of synthetic samples generated by
VAE Dense layer 2 model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.7 Absolute autocorrelation plot obtained on a batch of synthetic samples generated
by VAE Dense layer 2 model. . . . . . . . . . . . . . . . . . . . . . . . . . 50
vi
List of Tables
6.1 Alpha exponents of the power-law for Heavy-tailed distribution . . . . . . . . 28
6.2 Summary of the qualitative results of the generative models . . . . . . . . . 28
6.3 Summary of the quantitative results of the generative models . . . . . . . . 32
6.4 Base line accuracy test of training on Real Data: 0.7384 . . . . . . . . . . 34
vii
Acronyms
GAN Generative Adversarial Networks
JS Jensen-Shannon
KL Kullback-Leibler
viii
Abstract
In recent times, the generation of financial time series data has attracted significant attention
due to its usefulness as a data augmentation technique in scenarios where data updates occur
on a daily basis. The primary objective is to generate synthetic data that faithfully replicate
well-known financial properties or stylized facts, such as central moments, autocorrelation, and
clustered volatility. While several models have been proposed to address this challenge, the
majority have focused on Generative Adversarial Networks (GANs).
Motivated by the success of image generation techniques, this paper explores the application of
Vector Quantised Variational Autoencoders (VQ-VAE) for the generation of financial time series
data. My research reveals that increasing the dimensionality within the embedding space signifi-
cantly enhances the quality of time series reconstructions, a factor closely tied to subsequent gen-
eration performance. The proposed model, coupled with a straightforward latent autoregressive
sampling method, not only satisfies qualitative evaluation metrics but also demonstrates compet-
itive quantitative performance compared to VAE-based models, effectively capturing the essen-
tial stylized facts of financial time series data. Furthermore, the accuracy score improved 0.84%
and 0.98% over the baseline after training with synthetic and augmented data, respectively.
ix
Acknowledgements
I would like to express my heartfelt gratitude to the individuals who have played pivotal roles
in shaping my academic and professional journey in Artificial Intelligence. First and foremost, I
owe an immeasurable debt of gratitude to my family, especially my mother, Esther, father,
Ricardo, sister Juliana, grandparents, Santos and Isabel, and our dogs Nash and Copito for their
unwavering support, encouragement, and sacrifices that have enabled me to pursue my dreams.
Your love and belief in me have been my constant source of strength. ¡Los amo!.I extend my
sincere appreciation to my close friends from Colombia, who have been with me through thick
and thin, providing emotional support during this journey and had to listen countless times
to my plans. I am deeply thankful to the University of Birmingham for providing me with
the opportunity to pursue my studies in Artificial Intelligence. My supervisor, Peter Tino, the
School of Computer Science staff, and fellow students have been instrumental in my academic
growth. I am grateful for the knowledge and experiences gained during my time here. Lastly,
I want to acknowledge Un Ticket para el futuro Scholarship, Globant and especially Hector
Zárate, David Ibañez, Gustavo Montamat, and Andres F. Sierra, who have supported me
throughout my professional career in Artificial Intelligence. Your mentorship, collaboration,
and guidance have been invaluable. ¡Gracias totales!
x
CHAPTER 1
Introduction
Financial time series Generation is gaining attention as a means to augment the limited real
data needed in the financial sector. After augmenting the training data, a model may improve
its generalisation ability by reducing overfitting. Moreover, this practice finds applications in
training and evaluating trading strategies, forecasting, enhancing accessibility, and privacy
guarantees. However, reproducing the financial assets’ dynamic nature (across time), including
the so-called stylised facts, presents a significant challenge. Historically, the parametric
approaches, as the Generalised Autoregressive Conditional Heteroskedasticity models (GARCH)
[1]–[3], attempted to model dependent structures, like volatility, through precise mathematical
formulation. Also, stylised facts can emerge from the agent-based approaches but are complex
to design [4]–[6].
As a data-driven and free assumption approach, interest in generative deep learning has recently
increased. Trained generators model the distribution 𝑃(𝑋) of a given data set 𝑋 in some high-
dimensional space where time series features can be learned to generate new series. Inspired
by their success in various domains, most solutions in the field focus on Generative Adversarial
Networks (GAN) [7] combined with recurrent neural networks [8], and convolutional layers [9].
However, finding an optimal architecture is still an open problem. GANs face issues like model
collapse and non-convergence, while Recurrent Neural Networks (RNN) struggle with model
long-distance dependencies. To highlight the discrepancy between different generative models
applied to financial data, a recent study [10] favoured the Variational Autoencoder (VAE)
architecture for S&P 500 data∗ after an extensive evaluation. Nevertheless, these models could
only capture some of the stylised facts.
Motivated by its success in image generation, this project aims to generate a synthetic financial
time series that meets the stylised facts through Vector Quantised Variational Autoencoder
(VQ-VAE) [11]. VQ-VAE learns discrete representation in the latent space, which is a more
natural fit for the daily returns on the market [11]. Following the original two-stage approach
modelling, different combinations based on fully dense and convolutional layers and varying
embedding space dimensions are explored in the first stage. Next, the generative stage employs
a simple autoregressive sampling instead of training a second model in the latent space. Given
the showed VAE performance for the S&P 500 data in [10], the encoder-decoder option for the
time series case was preferred over other VQ-based options as ViT-VQGAN [12] and VQ-GAN
[13] that can be further explored.
The evaluation uses multiple qualitative and quantitative metrics to contrast synthetic batches
against real S&P 500 data. Finally, the prediction accuracy of real data training is compared
∗
https://www.spglobal.com/ratings/en
1
with synthetic and augmented data. The final proposed model, coupled with a straightforward
latent autoregressive sampling method, meets qualitative evaluation metrics and demonstrates
competitive quantitative performance, effectively capturing essential stylised facts of financial
time series data. Furthermore, the accuracy score improved 0.84% and 0.98% over the baseline
after training with synthetic and augmented data, respectively.
Overall, it is clear from the number of papers and limited resources for algorithms that this
is a young topic that has only lately begun to gain popularity. This paper builds upon the
work of [11], [14] and [10], following the same batch-feeding, evaluation strategies, and
specially focused VAEs. Moreover, this paper expands with the VQ-VAE algorithms applied
to the domain as it has never been explored specifically to financial time series generation.
The rest of this paper is organised as follows. Section 2 reviews the statistical properties of
financial time series and the related work on generative models in the field. In Section 3, the
Deep learning models used as VAE, VQ-VAE, and Long-Short Term Memory (LSTM) are
revisited—furthermore, a detailed explanation of the final architectures and hyperparameters for
training. Next, Section 4 discusses data preprocessing, and Section 5 presents the evaluation
metrics. Results are discussed in Section 6. Finally, Section 7 provides the conclusion and
further work.
• Employing simple autoregressive sampling from the discrete encoded latent space yields
impressive results, eliminating the need for a secondary complex model.
• Training with synthetic and augmented VQ VAE Convolutional model generated data
improved the predictive accuracy. The enhancement is a slight but noticeable 0.84%
and 0.98%, respectively, compared to training exclusively with real data. Showing the
potential of these models for predictive modelling.
2
CHAPTER 2
Related Work
This chapter aims to discuss the stylised facts of the financial time series and review the
existing current state-of-the-art models regarding generative time series, emerging practices,
and challenges.
The time series under study is the returns of an asset, which is the stock performance on
a given periodicity. In this case, the daily closing price. A stock’s price at the end of a
trading day represents the company’s market value. However, as closing prices may have
different currencies and magnitudes, using the ratios or returns allows to compare the market
performance between companies. Besides, calculating an asset return acts as a preprocess in
the time series as it reduces the intra-variation and normalises it. Stock’s returns are measured
as 𝑝 𝑡𝑝−𝑝 𝑡 −1
𝑡 −1
, but if the price’s changes are small, they commonly are approximate as :
Stylised facts of asset returns are empirical observations and patterns that have been consistently
observed in different financial instruments and markets over time [15], [16]. These stylised
facts provide valuable insights into the characteristics of financial data and have important
implications for financial modelling and risk management. Hence, the generated time series
must exhibit these properties not to be considered merely random sequences. Some of the key
stylised facts of asset returns include:
• Volatility Clustering: Volatility tends to cluster, meaning that periods of high volatility
are often followed by high volatility and vice versa. The volatility clustering is linked
3
with the presence of long-range temporal dependence. To observe this property, the
auto-correlation function of absolute returns should decay slowly as a function of the
time lag.
• Leverage effect: There is often a negative correlation between asset returns and
changes in volatility. When asset prices decline, volatility tends to increase.
• Gain/loss asymmetry: Large drawdowns can be observed but not equal upward
movements.
Generating synthetic financial data is important because financial datasets often contain
sensitive and personally identifiable information, making them unsuitable for public sharing
due to legal regulations and privacy concerns. Thus, it is desirable to create realistic datasets
that mirror the properties of real financial data while protecting the privacy of customers and
entities involved, enabling data sharing within and outside of the organisation, such as the
research community, without compromising the privacy requirements [18].
In one of the early studies on financial time series generation [14] through neural networks,
it was found that a GAN with Multilayer Perceptron layers was effective in producing time
series data that effectively replicated the stylised facts. Interestingly, in contrast to common
practices in Image Generation (IMG), the research highlighted that the application of batch
normalisation had a detrimental impact on the generation of realistic financial time series.
However, it is important to note that their early results are only qualitative over a small set of
long generated series of 8,192 steps, which limited the dataset variability by not using rolling
window samples.
Since then, GANs have become the preferred architecture. For instance, the work of [19]
explored various GAN variations, including conditional GANs and the gradient penalty option.
Their research supports the notion that generated synthetic data could enhance the generali-
sation capabilities of predictive models. Another example is QuantGans [9]. This approach
incorporates temporal convolutional layers within the GAN architecture to capture long-range
dependencies such as volatility clustering. A key insight is the effectiveness of the Lambert W
transformation in generating heavy-tailed stock price returns. Similarly, in the study conducted
by [20], attention and transformed layers are utilised to simulate long-range dependencies.
After comparing multiple models and providing a training framework that incorporates sparse
attention, the temporal attention GANs are shown to effectively replicate the stylised facts
while smoothing the autocorrelation of returns.
∗
https://www.bloomberg.com/professional
4
In addition, in a recent study by [10], various generative model architectures, including GANs,
Matching Statistics Moments, and VAEs, were meticulously compared using both qualitative
and quantitative metrics. One of the main contributions is to allow the capture of cross-
correlations between stocks on the training dataset. The researchers emphasise the effectiveness
of the Jensen-Shanon divergence as a valuable indicator for distinguishing between good and
bad models. Furthermore, contrary to the GANs’ predominance, the VAE-based family of
models consistently outperformed the other architectures across most evaluation metrics,
showcasing their performance in generating realistic synthetic financial time series data. It is
worth mentioning that many of the previously mentioned frameworks were evaluated using the
S&P 500 dataset.
Lastly, as a recent research field, common practices discussion is active. For example, [14],
[20], [21] agree regarding the detrimental effect of batch normalisation on generating realistic
financial time series. In contrast, [10] argues that by keeping cross-sectional batches, the
normalisation hurt the generation as prices across a single day would have similar trends. In
other words, cross-sectional batches result in similar mean and variances between series. This is
not true when the batch is built by random time series samples, where the batch normalisation
effectively reduces the introduced variance.
Aside from stock returns time series, other valuable quantitative financial applications focus
on probability distribution generation. [22] employ Wasserstein GANs to generate stock
market orders that closely resemble the original orders. In contrast, [23] propose Restricted
Boltzmann Machines for market scenario generations that replicate the probability distribution of
market data. This approach competes directly with traditional Value-at-Risk (VaR) simulation
techniques. Additionally, [24] demonstrate that conditional GANs can effectively model
heavy-tailed distributions with specific categorical conditions. In addition, [25] compares their
proposed SigWGAN, based on stochastic differential equations and the Wasserstein-1 𝑊1
metric, with conventional quantitative risk management models.
Finally, several models have emerged as valuable tools for general Time Series Generation
(TSG). Within the literature, datasets can be composed by multivariate time series from
multiple sources, including repositories like the UCR Time Series Archive. Several advanced
evaluation metrics should be considered beyond the common approach of training on synthetic
data and making predictions on real data. These encompass sample likelihood, maximum
mean discrepancy, scores reminiscent of the Frechet Inception distance, and visualisations
using techniques such as t-SNE and PCA. Most of these metrics come from image generation
evaluation. Among the advanced architectures, which can be further explored on financial time
5
series, there are models such as Recurrent GANs, Recurrent conditional GANs[8], PSA-GAN
[26], TimeGAN [27], TimeVAE [28], and TimeVQVAE [29]. It is worth noting that TimeVQVAE
is the first to explore VQ-VAE for TSG and employs a similar architecture as this study.
6
CHAPTER 3
Dataset
Each country’s financial market exhibits unique characteristics due to multiple factors unique
to each context. These factors include economic conditions, regulatory frameworks, political
stability, and foreign exchange risk, all of which influence market size, liquidity and investors’
behaviour. Moreover, global economic events such as the 2008 financial crisis and the recent
COVID-19 pandemic have profoundly impacted financial markets worldwide. The dynamic
nature of these conditions constantly challenges the development and evaluation of financial
algorithms. One of the most closely monitored equity indices globally and the predominant
dataset for benchmarking financial algorithms is the S&P 500, which exhibits the 500 largest
publicly traded companies in the United States.
For this research, the daily stock training dataset comprises publicly listed companies within the
S&P 500 index from July 1, 2009, to December 1, 2019, serving as a representative sample for
training a generative market model. This period has relatively lower volatility since it started
after the 2008 financial crisis and precedes the COVID-19 pandemic. However, major events
also caused significant uncertainty in the financial market during these years, such as the
European debt and U.S. Debt ceiling crises, the Arab Spring, Brexit and the U.S.-China Trade
War. The daily periodicity minimises intra-day noise without smoothing important trends. The
final time series dataset comprises the close log return prices, as determined by Equation 2.1.
The primary source is the publicly available data from Yahoo Finance ∗ .
In order to maintain uniform batch sizes, the dataset only keeps companies continuously
listed during the specified period, resulting in 434 companies in total, 𝐶. Additionally, any
occurrences of ’nan’ values are dropped by this filter. Subsequently, each company’s return
time series is divided into fixed time windows of 256 samples each, 𝐿 𝑤 . This 256-length window
size approximates one working year and aligns well with convolutional-based architectures. It
is also long enough to capture seasonal patterns yet short enough to avoid significant changes
observed in longer periods. A sliding window step of six working weeks or 30 days is employed,
resulting in 79 subsets or 34,286 series in total. Equation 3.1 describes a subset 𝑘.
∗
https://uk.yahoo.com
7
Fig. 3.1. Close log return price time window samples of length 256.
𝑥 𝑥 2,1 . . . 𝑥 𝐿 𝑤 ,1
1, 1
𝑥 𝑥 2,2 . . . 𝑥 𝐿 𝑤 ,2
1, 2
. .. .. ..
.. . . .
𝑋𝑏𝑎𝑡𝑐ℎ 𝑘 = ; 𝐿 = 256, 𝐶 = 434, ∀𝑘 ∈ [1, 79] (3.1)
𝑥 1, 𝑗 𝑥 2, 𝑗 . . . 𝑥 𝐿 𝑤 , 𝑗 𝑤
.. .. .. ..
. . . .
𝑥 1,𝐶 𝑥 2,𝐶 . . . 𝑥 𝐿 𝑤 ,𝐶
Lastly, as discussed in section 2.2, the cross-correlation between price stocks over time is an
important factor that should also be considered in batch formation to enhance more realistic
generations. Moreover, understanding the relationships between stocks is vital for portfolio
optimisation, risk management, market sentiment analysis, and how major economic events
impact certain industries or the entire market. For example, The COVID-19 pandemic increased
demand for healthcare services, e-commerce, and technology while restricting tourism and
outdoor activities and affecting manufacturing.
Following the approach proposed in [10], batch formation preserves cross-correlated samples
by avoiding mixing sliding window subsets from different time periods. Consequently, shuffling
is allowed within each batch and between batches for data feeding, but samples from different
periods are not mixed. This practice ensures that each batch exhibits a similar mean and
variance range among series, eliminating the need for normalisation. As this is a generative
task, the complete dataset serves as the training dataset. Figure 3.1 shows samples of the
dataset’s close log return price time series.
8
CHAPTER 4
Methodology
In broader terms, the main objective is to obtain a fixed-length series of consecutive samples
simultaneously through a generative neural network. Thus, in this paper, the training dataset
is a collection of time window samples of the same length extracted from the financial assets
time series under study as explained in chapter 3. The generated sequences are entirely novel
time series, closely resembling the training series but not mere transformed copies. It is
important to highlight that this approach differs from traditional forecasting methods, where
predictions are based on recent historical data. Given the VAE performance for the S&P 500
data in [10], the encoder-decoder option for the time series case was preferred over other
neural Network architectures. This study aims to apply the VQ-VAE model and compare it
with similar VAE models as baseliners rather than testing different layers or other generative
architectures. Thus, this research applies the multilayer perceptron and convolutional layers,
although other VQ-based alternatives and more advanced layers can be further explored. This
chapter discussed the VAE and VQ-VAE architectures, the generation phase, details of the
development of the final generative models, and the LSTM model used for the predictive
accuracy test.
Variational autoencoders are powerful generative models that learn a probabilistic distribution
within a latent space, employing an encoder-decoder architecture. The encoder network, given
an input 𝑥, produces a lower-dimension representation, parametrising the posterior distribution
𝑄 𝜙 (𝑧|𝑥) of the latent variable 𝑧 and its prior distribution 𝑃𝜃 (𝑧). Each dimension of this
representation corresponds to a learned feature of the data. In contrast, the decoder network
reconstructs these encodings to the original data dimension with a 𝑝(𝑥|𝑧) distribution over
the input 𝑥.
The VAE incorporate probabilistic modelling by encoding two vectors: the mean 𝜇 and
standard deviation 𝜎 of a multivariate Gaussian distribution within the latent space. The
Gaussian generate a sampled vector 𝑧, which is subsequently passed to the decoder, introducing
variability within the latent space. The reparameterisation trick (Equation 4.1) plays a key
role in facilitating the backpropagation process during training.
9
Fig. 4.1. Variational Autoencoder architecture
Nevertheless, to enhance the post-generation results and prevent distanced 𝜇 and collapsed
𝜎, the Kullback-Leibler (KL) divergence [30] term is introduced into the loss (Equation 4.2).
The KL divergence quantifies the divergence between two probability distributions and, by
minimising it relative to a standard Gaussian distribution, encourages the learned latent space
distribution to approximate a standard Gaussian distribution closely.
𝑧 = 𝜇 + 𝜎 · 𝜖; 𝜖 ∼ N (0, 𝐼) (4.1)
𝑑
1 ∑︁ 2
LKL = 𝐷 𝐾 𝐿 [ 𝑝 𝜙 (𝑧|𝑥)||N (0, 𝐼)] = (𝜇 + 𝜎 𝑗2 − 2 log(𝜎 𝑗2 ) − 1) (4.2)
2 𝑗=1 𝑗
The final loss function balances reconstruction and regularisation, aiming to generate recon-
structions that closely resemble the original input data while maintaining a continuous, dense
latent space where similar encodings cluster together (Equation 4.4). Both networks are jointly
optimised by minimising the KL divergence and the reconstruction error, which in this case
corresponds to the mean-squared error 4.3 [31]–[33].
𝑁
1 ∑︁
Lrecon = |𝑋𝑖 − 𝑋ˆ𝑖 | 2 (4.3)
𝑁 𝑖=1
After training, the frozen decoder is capable of generating new data by passing samples from
a multivariate normal distribution of the same size, as the decoder has learned to decode such
similar samples. The VAE algorithm is presented in Algorithm 1 and is illustrated in Figure 1.
The VQ-VAE was introduced by [11] and is an adaptation of the VAE model. In contrast to
VAE, VQ-VAE incorporates vector quantisation to learn a discrete latent representation of
data instead of a continuous one. Additionally, the prior is learned rather than static. This
approach might be a more natural choice for discrete time series representation learning as
10
Algorithm 1: Variational Autoencoder Training
Input: Training data 𝑋, latent dimension 𝑑, learning rate 𝛼, batch size 𝐵, number of
epochs 𝐸
Output: Trained VAE model
1 Initialize the encoder neural network 𝑄 𝜙 and decoder neural network 𝑃𝜃 ;
2 Initialize the parameters 𝜙 and 𝜃 randomly;
3 Initialize an Adam optimizer for 𝜙 and 𝜃;
4 for 𝑒 𝑝𝑜𝑐ℎ ← 1 to 𝐸 do
5 Shuffle the order batches and samples inside batches;
6 for each batch 𝑋batch do
7 Compute the mean 𝜇 and log-variance log(𝜎 2 ) of the encoder output for each
sample series in 𝑋batch using 𝑄 𝜙 ;
8 Sample latent variables 𝑧 from the reparameterized Gaussian distribution;
9 Compute the reconstruction loss;
10 Compute the KL divergence loss;
11 Compute the total loss;
12 Update 𝜙 and 𝜃 using gradient descent with respect to Ltotal ;
13 return Trained VAE model 𝑄 𝜙 and 𝑃𝜃 ;
the log returns. The final model generation involves two stages. Firstly, the encoder, decoder,
and codebook are trained given a dataset. Subsequently, in the second stage, a discret prior
distribution is learned from the discrete latent space. However, in this study, a straightforward
autoregressive sampling approach over the discrete latent space proved to generate high-quality
financial time series.
Following the methodology in the original paper, once the input 𝑥 passes to the encoder
producing 𝑧 𝑒 (𝑥), the discrete latent variables 𝑧 are learned by identifying the nearest embeddings
within the latent embedding space or codebook. This codebook comprises 𝐾 embeddings,
each of dimensionality 𝐷 (as indicated in Equation 4.5). The posterior categorical distribution
𝑞(𝑧|𝑥) is learned by finding the closest embeddings. Figure 4.2 illustrates the quantisation
process.
Consequently, vector quantisation effectively discretises the continuous latent space outputted
by the encoder by selecting the closest embeddings 𝑒 𝑘 from equation 4.6. The authors [11] note
that, as an initial experimental approach, during the backpropagation process, copying gradients
from the decoder input 𝑧 𝑞 (𝑥) to the encoder output 𝑧 𝑒 (𝑥) serves as a reasonable gradient
approximation. Figure 4.3 compares side-by-side the encoder output and its quantisation
results for a given sample after passing through the VQ-VAE Convolutional model.
(
1 𝑓 𝑜𝑟 𝑘 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑗 ||𝑧 𝑒 (𝑥) − 𝑒 𝑗 || 2
𝑞(𝑧 = 𝑘 |𝑥) = (4.5)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The equation 4.7 shows the objective function to minimise. The first term represents the
reconstruction loss, as explained in Equation 4.3. The second term quantifies the 𝑙2 norm
11
Fig. 4.2. VQ-VAE architecture and embedding space.
Fig. 4.3. Encoder output and quantised representation from the VQ-VAE Convolutional
model.
between the output of the frozen encoder, denoted as 𝑧 𝑒 (𝑥), and the embeddings 𝑒. This term
encourages the embeddings to be close to the encoder output. The notation 𝑠𝑔 corresponds
to the stop-gradient operator. The last term, the commitment loss, helps the embeddings to
train as fast as the encoder parameters. Algorithm 2 presents the complete VQ-VAE algorithm,
while for a comprehensive understanding, further reference to the original paper [11].
Finally, once the network and embeddings are optimised, the generation process typically
involves learning a categorical prior distribution over the discrete latent variable 𝑝(𝑧). However,
in this paper, a simpler approach is adopted. Once the parameters are fitted, and the dataset is
quantised, the embedding occurrences of each position within the quantisation representation
across the entire dataset are counted. Subsequently, these counts serve as the probabilities for
randomly sampling embeddings, filling each position autoregressively with a lag of 1.
This autoregressive sampling procedure continues until a sequence of embeddings of the same
dimension as the encoder output is obtained. The new quantisation sequence is then decoded
with the learned weights to generate a new time series, thus completing the generation process.
It is important to highlight that within batches produced by the VQ-VAE Convolutional and
Dense Layers model, there were no instances of sampled embedding sequences that duplicated
the quantised sequences from the dataset, concluding that the generations are not reconstructed
series.
For Example, in the VQ-VAE Convolutional model, the embedding dictionary comprises 128
embeddings, each having a dimension of 16. Following the quantisation of the dataset, the
12
Algorithm 2: Variational Autoencoder Training
Input: Training data 𝑋, 𝐶𝑜𝑑𝑒𝑏𝑜𝑜𝑘 with dimensions 𝐾 𝑋 𝐷, learning rate 𝛼, batch size 𝐵,
number of epochs 𝐸
Output: Trained VAE model
1 Initialize the encoder neural network 𝑄 𝜙 and decoder neural network 𝑃𝜃 ;
2 Initialize the parameters 𝜙 and 𝜃 randomly;
3 Initialize an Adam optimizer for 𝜙 and 𝜃;
4 for 𝑒 𝑝𝑜𝑐ℎ ← 1 to 𝐸 do
5 Shuffle the order batches and samples inside batches;
6 for each batch 𝑋batch do
7 Compute encoder output 𝑧 𝑒 (𝑥) for each sample series in 𝑋batch using 𝑄 𝜙 ;
8 Compute quantised representation 𝑧 𝑞 (𝑥) = 𝑒 𝑘 where 𝑘 = argmin 𝑗 ||𝑧 𝑒 (𝑥) − 𝑒 𝑗 || 2 for
each sample series in 𝑋batch ;
9 Compute the total loss following equation 4.7 ;
10 Update 𝜙, 𝜃, and 𝐶𝑜𝑑𝑒𝑏𝑜𝑜𝑘 using gradient descent with respect to Ltotal ;
11 return Trained VQ-VAE model 𝑄 𝜙 , 𝑃𝜃 , and 𝐶𝑜𝑑𝑒𝑏𝑜𝑜𝑘;
embedding 70 is used at first 4299 times within the set of 34,286 series. In simpler terms,
embedding 70 has a probability of approximately 12% of being in the first position of a quantised
sequence. Filtering the quantised dataset by identifying the instances where embedding 70
occupies the first position, the most probable embedding sequence next in the following four
positions are 70, 118, 35, and 65.
The adoption of this strategy was motivated after carefully examining the quantised dataset
produced by VQ-VAE. This analysis revealed that specific embeddings tend to occupy particular
positions within the quantised sequences. For instance, when decoding a randomly generated
embedding sequence, the resulting data often does not resemble a typical financial time series.
Instead, it exhibits numerous brisk movements, similar to the PIXELCNN and LSTM generation
in figure 6.9.
Finding the underlying data process generation is not a trivial task. After limiting the type of
layers to test to dense and convolutional, choosing the ReLU activation function between layers
for all models, and ADAM optimiser, the next is to find model selection criteria according to
the generation task. As a key choosing criterion, models capable of generating diverse series
exhibited a lower reconstruction error and reached a stable loss convergence 4.5 4.4. Smaller
learning rates, such as 0.00001, ensure more stable convergence. For instance, when models
collapse, they generate flat lines 4.7, while others merely alter the level of an indexed time
series 4.8. Thus, visual inspection played a pivotal key in model selection by comparing the
original time series with model reconstruction and generation 4.6.
For each generative model, VAE and VQ-VAE based models, a range of model configurations,
such as the number of layers, neurons, filters and hyperparameter settings, were tested. The
training times were relatively short due to the small dataset, consisting of 79 batches, each
with only 434 companies observed across 256 days. The expedited training allows it to iterate
13
Fig. 4.4. Training Losses VQ-VAE Convolutional model.
efficiently through various model configurations, facilitating the identification of the most
effective generative models for the financial dataset.
Starting with the vanilla VAE architectures as an experimentation starting point, two final
models are presented. The first, VAE-Dense Layers 1, is proposed, while the VAE-Dense Layers
2 is similar to VAE-MLP from the referenced paper [10]. The last one failed to capture any
essential stylised facts. The dataset size might be a possible reason, as it spanned a decade
and comprised only 434 companies, in contrast to the 1506 companies over 20 years in the
source paper. Thus, in the study, the most effective VAE models tended to have fewer layers
and parameters. For all the VAE models, 𝛼 from equation Equation 4.4 is set to 0.5, giving
equal importance to the reconstruction and generation task. Notice that the 𝐿 𝑤 represents
the 256-length window, while the 𝐿𝑆 abbreviate the Latent space dimension. The numbers in
between represent the number of neurons in the dense layer.
Convolutional VAE
Similar to VAE-Dense Layers 2, the VAE-FC from [10] was replicated but collapsed to flat
line reconstruction. Thus, VAE Convolutional 1 and 2 are proposed. The numbers in between
are the number of filters. For all models on each convolutional layer, the kernel size, the
stride, and the padding were 3, 2 and 1, respectively. The Unflatten layer only adds an extra
dimension to the input to be processed by the convolutional layer. Also, a dense layer equal to
256 as 𝐿 𝑤 was added to avoid the model collapsing.
14
Fig. 4.5. Time series reconstruction from VQ-VAE Convolutional model.
15
Fig. 4.7. Example of a bad reconstruction by a VAE model.
16
• VAE- Convolutional 1 (VAE-Conv1): 𝐿𝑤 - 16 - flatten layer - (𝐿𝑆: 32)- unflatten layer -
16 - Dense layer - 𝐿𝑤
VQ VAE
Next, to implement the quantisation technique and create the VQ-VAE models, Vector
quantisation was introduced to replace the conventional continuous latent space from the
base VAE models shown previously—initially, the one-dimensional embedding dictionary 𝐸 𝐷
was tested. For Example, the proposed VQ-VAE Dense Layers, which show the best visual
generation results, have a dictionary of 256x1. The 𝛽 parameter that weights the importance
of the commitment cost from the equation 4.7 is equal to 0.25 for both VQ models following
[11]. The numbers in between represent the number of neurons in the dense layer.
For the Convolutional architecture, and similar to the VAE convolutional models, the encoder
output was flatted before quantisation to be represented by the embeddings from dictionaries of
one dimension. However, it became apparent that the latent space dimensions were inadequate
to capture the various patterns inherent in financial time series processed by the convolutional
layers. Nevertheless, any previous VAE Convolutional base architecture transformed to VQ-
VAE showed an appealing visual generation as VQ-VAE Dense Layers. The final VQ-VAE
Convolutional architecture that showcased superior performance came from IMG, and it is a
Pytorch adaptation from the original architecture [11], the Keras documentation example [34],
and the DeepMind tutorial [35] to generate images with VQ-VAE.
Consequently, the high performance of the VQ-VAE Convolutional, with its 128x16 embedding
dictionary, shows that by increasing the embedding dimensions, reconstruction and generation
remarkably improve 4.5 4.6. Then, to determine the optimal dictionary size as the original
architecture is for IMG, the number of embeddings tested were 64, 128, and 256. Also, the
embedding dimension was systematically increased by two units starting at two. After each
training session, the number of embeddings effectively employed through dataset quantisation
was revised. For Example, training with a dictionary size of 256x16 revealed the utilisation of
only 120 embeddings.
17
Fig. 4.9. LSTM cell source: ’Understanding LSTM Networks – Colah’s Blog.’
In the final stage of the modelling exploration, the state-of-the-art architecture TimeVAE
base [28] was employed for financial time series generation. This architecture consists of
three convolutional layers in both the encoder and decoder, utilising Rectified Linear Unit
(ReLU) activation functions, and it concludes with a Time-distributed, fully connected layer.
It is important to note that this model was not fine-tuned specifically for the dataset in
question. Instead, the hyperparameters from the original repository were used, including a
latent dimension of 8∗ . This approach ensures consistency with the original model configuration
and enables a comparative evaluation of its performance on the dataset.
For the Prediction accuracy test, multiple LSTM were trained. Long Short-Term Memory
(LSTM) architecture significantly advanced over traditional Recurrent Neural Networks (RNNs).
The key innovation of LSTM was the introduction of a specialised memory cell [36] and forget
gate [37], resulting in the LSTM architecture comprising an input gate, forget gate, cell state,
and output gate. The structure of LSTM enables it to effectively handle long data sequences
that exhibit dependencies on preceding sequences, making it particularly well-suited for the
stock movement classification. The baseline architecture employed is an LSTM model featuring
two layers with 128 hidden cell units, a dropout rate of 0.2, and a learning rate set at 0.0001.
The LSTM cell unit is described and shown in Figure 4.9 † . Parameters 𝑊 and 𝑏 denote the
weight matrices and bias vector.
• Forget gate: The forget gate plays a crucial role in selecting which information to
retain and what to discard. It evaluates the previous hidden state, denoted as ℎ𝑡−1 , and
the current input is 𝑥𝑡 . It utilises a sigmoid function 𝜎 to produce an output ranging
∗
https://github.com/abudesai/timeVAE
†
LSTM cell source: ’Understanding LSTM Networks – Colah’s Blog’ https://colah.github.io/posts/2015-
08-Understanding-LSTMs/
18
from 0 (indicating information to forget) to 1 (indicating information to retain).
• Input Gate: The input gate, denoted as 𝑖𝑡 , assesses the significance of the current
input vector by computing a sigmoid function based on the past hidden state and the
current input.
𝑖𝑡 = 𝜎(𝑊𝑖 · [ℎ𝑡−1 , 𝑥𝑡 ] + 𝑏𝑖 ) (4.9)
• Output Gate: The output gate, labelled as 𝑜𝑡 , employs the results from the sigmoid
functions to identify which portions of the cell output should be exposed and passed on
to the next time step.
𝑜𝑡 = 𝜎(𝑊𝑜 · [ℎ𝑡−1 , 𝑥𝑡 ] + 𝑏 𝑜 ) (4.10)
• Input Modulation Gate: This gate applies a hyperbolic tangent (tanh) activation
function to determine how much information should be written to the cell.
19
CHAPTER 5
Generated Data Evaluation
The evaluation of generation represents a current challenge within the domains of natural
language processing (NLP), IMG, and TSG. Determining generated content quality is intrinsi-
cally subjective, making establishing universally accepted evaluation metrics challenging. For
example, authors contrasted visually and aurally original and generated images, audio and
videos after presenting the VQ -VAE [11], but no metrics were used. Nevertheless, several
metrics have been proposed for evaluation purposes, such as Inception Score [38] and Fréchet
Inception Distance [39] for IMG as well as metrics like log-likelihood and Kullback-Leibler
divergence [30] for density estimation. Even so, it is essential to recognise that for the
generation task, good performance concerning a metric does not imply the same for another
[40].
Studies have employed a multifaceted evaluation approach in the context of Financial Time
Series Generation, including qualitative analysis, quantitative analysis, and predictive accuracy
testing. Qualitative evaluation mainly relies on visual inspection and metrics regarding the
stylised facts. In an effort to introduce quantitative analysis, [10] have suggested applying
information theory metrics. Additionally, predictive accuracy testing involves comparing models
trained on real and synthetic data [19]. This study adopts a similar evaluation procedure
to that proposed by [10] augmenting it with the utilisation of L1 and L2 norms to compare
probability density functions between synthetic and real data, as well as assessing the presence
of a Fat-tailed distribution through the power-law decay, mirroring the approach of [14]. Each
model generated a synthetic batch of 434-time series of length 256 for the different evaluations.
These synthetic batches were compared with real random batches, allowing an aggregate
evaluation rather than focusing on single generations.
• Central Moments: The first approach to evaluate the similarity between two distri-
butions is to analyse their statistical central moments: mean, variance, skewness, and
kurtosis. These synthetic batches were subsequently subjected to visual comparison with
four randomly selected real batches through scatter plots after calculating the moments
to each series, looking for similarities in both mass points.
• Absence of autocorrelations:
The autocorrelation function (ACF) calculates the correlation between a return time
series and its lagged versions. The function at lag 𝑘 is defined as:
20
Fig. 5.1. Autocorrelation plot obtained on a real random batch.
Í𝑇−𝑘
𝑡=1(𝑟 𝑡 − 𝑟¯)(𝑟 𝑡+𝑘 − 𝑟¯)
𝐴𝐶𝐹 (𝑘) =
2
Í𝑇
𝑡=1 (𝑟 𝑡 − 𝑟¯)
Where 𝑟 𝑡 represents the values of the return time series at time 𝑡, 𝑇 is the total number
of data points, 𝑘 is the lag, and 𝑟¯ is the mean of the time series.
The boxplot of the autocorrelation for each lag for each series is calculated to find
the linear unpredictability property in the batch. Figure 5.1 displays the no significant
autocorrelation plot for a real random batch across the first 24 lags.
𝑃(𝑟) ∝ 𝑟 −𝛼 (5.1)
Typical the exponent ranges 3 ≤ 𝛼 ≤ 5. The real batches present an average of 4.33
over the 79 batches, with a standard deviation of 0.5 and a median of 4.1.
• Cumulative sum: The cumulative sum of returns for a specific company should result
in a non-monotonic curve exhibiting changing shapes over time. The cumulative sum of
Í
a company is calculated as 𝑐𝑢𝑚𝑠𝑢𝑚𝑖 = 𝑖𝑡=1 𝑟𝑖 . Figure 5.3 shows the cumulative sum of
four random returns.
21
Fig. 5.2. Autocorrelation plot obtained on real random batch.
Fig. 5.3. Cumulative sum plot obtained on four random real returns.
22
5.2 | Quantitative Analysis
For every metric, the average result metric is computed after calculating it for each synthetic
batch with respect to each real batch.
Where 𝑖 indexes the outcomes in the probability distributions 𝑃 of synthetic data and 𝑄
of real.
1
𝐽𝑆𝐷 (𝑃 || 𝑄) = 𝐾 𝐿(𝑃 || 𝑀) + 𝐾 𝐿(𝑄 || 𝑀)
2
Where 𝑃 and 𝑄 are the two probability distributions, and 𝑀 is their midpoint defined
as 𝑀 = 21 (𝑃 + 𝑄). The JSD ranges from 0 (when 𝑃 and 𝑄 are identical) to a positive
value (when 𝑃 and 𝑄 are dissimilar).
Where 𝐹sample (𝑥) is the ECDF of the sample, 𝐹reference (𝑥) is the CDF of the reference
distribution, and 𝑥 represents the data values. The KS test is often used to test whether
a sample follows a particular theoretical distribution, and the value of 𝐷 is compared to
critical values to determine the test’s significance.
23
∑︁
𝐸 𝑀 𝐷 (𝑃, 𝑄) = min 𝑓𝑖 𝑗 · 𝑑𝑖 𝑗
𝑖, 𝑗
Where 𝑃 and 𝑄 are the two probability distributions, 𝑓𝑖 𝑗 represents the amount of
probability mass to be moved from a point in 𝑃 to a point in 𝑄, and 𝑑𝑖 𝑗 is the cost of
moving mass from point 𝑖 in 𝑃 to point 𝑗 in 𝑄.
𝑛
∑︁
||x|| 1 = |𝑥𝑖 |
𝑖=1
The 𝐿 2 norm, also known as the Euclidean norm, calculates the square root of the sum
of the squares of the vector.
v
t 𝑛
∑︁
||x|| 2 = 𝑥𝑖2
𝑖=1
These synthetic batches were subsequently subjected to visual comparison with four randomly
selected real batches through scatter plots, focusing on each series’s first four distribution
moments. First, the underlying probability density functions were estimated through histogram
and kernel density estimation methods to measure the similarity between the synthetic and
real probability density distributions using Jensen-Shannon (JS) and Kullback-Leibler (KL)
divergence.
It is important to note that these divergence metrics assume that the reference distribution, in
this case, the real one, must be strictly positive over the support or domain of the synthetic
distribution. However, in practice, after fitting the density function using the histogram
approach, the bins between the mass and the extreme values were empty for all the synthetic
batches. This presence of empty bins introduced bias in the divergence calculation as it
assigned a zero probability to those values within the domain. Thus, outliers were initially
removed by determining the upper and lower boundaries using a range of three standard
deviations from the mean. Next, the common domain was established between the real and
synthetic data’s maximum and minimum values.
Specifically for the histogram approach, the tailed probabilities were wrapped to ensure they
were defined for values outside the density estimation by replicating the first and last bins
values. Acknowledging that this final adjustment at the distribution’s tails may introduce bias
into the results is important. Since this potential bias, L1 and L2 norms were also calculated,
wherein the likelihood for values outside the established domain was treated as zero.
24
Fig. 5.4. Data set split for accuracy test
The last aspect of assessing the quality of synthetic batches, and a crucial validation step,
involves a predictive accuracy test. The fundamental concept is based on comparing the
performance of different models on real test data after training on real data, synthetic data,
and a combination of both or augmented data. Thus, to conduct this sanity test, the synthetic
batches and a randomly selected real batch are temporally split into training, validation, and
testing subsets, with the respective proportions being 50%, 30%, and 20%, equivalent to 128,
78, and 51 days of the 256 window. The objective is to predict whether the stock return for
the following day will be greater or not than today based on the last 30 return days. This
setup transforms the test into a classification problem.
Following the preprocessing of the sequences, the resulting splits contain 42,098 samples for
the training dataset, 1,950 for validation, and 9,114 for testing, as depicted in Figure 5.4. After
preprocessing, the labels 0 and 1 are balanced on the three real data sets. The baseline LSTM
model is described in section 4.3 and is applied to all real, synthetic and augmented datasets
with no hyper-tuning. This approach ensures consistency for data performance comparison.
The best model for each training scenario is determined by achieving the lowest Cross-Entropy
5.2 on the validation set during training. Next, the evaluation shows each scenario’s Accuracy
score 5.3.
2
∑︁
𝐿 (𝑦, 𝑦ˆ) = − 𝑦𝑖 log 𝑦ˆ𝑖 (5.2)
𝑖=1
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (5.3)
𝑇 𝑃 + 𝑇 𝑁 + 𝐹𝑃 + 𝐹 𝑁
25
Where:
26
CHAPTER 6
Results and Discussion
Among the seven models, five of them successfully replicated the mean moment. However,
only the VAE Dense Layers and VQ-VAE Convolutional models demonstrated similarity in
variance compared to the real batches. Notably, the VQ-VAE Convolutional model exhibited
the closest resemblance to skewness and kurtosis, as depicted in Figure 6.1. These findings
were further validated by visualising real and synthetic probability density distributions, as
shown in Figure 6.2. In contrast, the VAE Dense layers 2 model case A.1 A.2 A.3 A.4 A.5 did
not meet any of these metrics after comparing with the four random batches chosen A.2.
Subsequently, the autocorrelation plot analysis revealed that, except for VAE Dense Layers 2,
there was no significant linear autocorrelation from lag 1 to 25 within all synthetic batches.
Figure 6.3 illustrates the boxplot representation of autocorrelation for each lag in the VQ-VAE
Convolutional synthetic series case. Similarly, when examining the presence of cluster volatility,
only the VQ-VAE Convolutional model displayed a slow decay in the autocorrelation function
of absolute returns over time lags, indicative of long-range temporal dependence 6.4. This
observation remained consistent when considering the square of returns. The plots for the
autocorrelation and absolute autocorrelation for the VAE Dense Layers 2 model are exhibited
in figures A.6, A.7.
Furthermore, to assess the fat-tailed property, the 𝛼 exponent corresponding to the power-law
decay at the tails of the density function is obtained. Empirically, this exponent ranged from
3 to 5. Among the 79 real batches, the average 𝛼 value was 4.3. Remarkably, the VQ-VAE
Convolutional generation fell within this range, as depicted in Table 6.1a. Furthermore, it is
worth noting that all generated batches exhibit non-monotonous curves when plotting the
cumulative sum of the returns for four randomly selected synthetic samples, as illustrated in
Figure 6.5. Table 6.2 summarises the qualitative findings. Finally, the appendix shows the
exposed qualitative metrics for the VAE Dense layers 2 model case as an example of a bad
generative model. Although the model passed the model choosing criteria and generated
different time series, it shows an example of generated series that do not exhibit time structures
and are also far from the real probability distribution.
27
Fig. 6.1. Distribution moments of synthetic samples generated by VQ-VAE Convolutional
model.
Table 6.1
Alpha exponents of the power-law for Heavy-tailed distribution
(a) Typical for financial returns the exponent ranges between 3 ≤ 𝛼 ≤ 5. The real batches present
an average 𝛼 of 4.33 with a standard deviation of 0.5 and a median of 4.1
Table 6.2
Summary of the qualitative results of the generative models
28
Fig. 6.2. Probability density function of synthetic samples generated by VQ-VAE Convolutional
model.
Fig. 6.3. Autocorrelation plot obtained on a batch of synthetic samples generated by VQ-VAE
Convolutional model.
29
Fig. 6.4. Absolute autocorrelation plot obtained on a batch of synthetic samples generated
by VQ-VAE Convolutional model.
Fig. 6.5. Cumulative sum plot obtained on four synthetic samples generated by VQ-VAE
Convolutional model.
30
6.2 | Quantitative Results
For a robust quantitative analysis, the metrics reported in Table 6.3 represent the resulting
means obtained when comparing the synthetic batches with each of the 79 real batches. First,
the underlying probability density functions were estimated through histogram and kernel
density estimation methods to measure the similarity between the synthetic and real probability
density distributions using JS and KL divergence.
Table 6.3 provides an overview of the divergence metrics, showing that, except for the VQ-VAE
Convolutional model, the divergence metrics do not present as much variation between the
histogram or kernel estimation approach. However, Figure 6.2, and more notably, Figure 6.6
after the removal of outliers, highlight that the density function produced by the VQ-VAE
Convolutional model is not as smooth as the other models data generated. This lack of
smoothness can likely be attributed to the pronounced effects of the discrete latent space by
the vector quantisation together with the embedding dimension choice. It is worth noting that
the VQ-VAE Dense Layers model, in contrast, exhibits a smooth density function but features
a one-dimensional embedding dictionary. As a result, the disparities in the divergence results
can be attributed to the greater smoothness achieved through probability estimation via kernel
density estimation, as seen in Figure 6.7, in comparison to the histogram method, as depicted
in Figure 6.8.
The remaining metrics, such as the Earth Mover’s distance and the L1 and L2 norms,
consistently indicate that the synthetic batch generated by the VAE Dense Layers 1 model
closely approximates the probability distribution of the real data. However, this model does
not capture certain stylised facts such as cluster volatility and heavy tails. In contrast, the
performance of the VQ-VAE Convolutional model closely aligns with that of the VAE Dense
Layers 1 model in the quantitative metrics. Following the Kolmogorov-Smirnov test, none of
the models exhibited a p-value greater than 0.05. In all cases, the null hypothesis was rejected,
indicating that the generated distributions are not distributed according to the real distribution.
However, it is important to recognise that this test is highly sensitive to sample size. With
larger sample sizes, the test becomes more sensitive, making it more likely to detect even
minor discrepancies between the sample data and the assumed distribution [42]:. For example,
in the context of the qualitative probability density function comparison illustrated in Figure
6.2, the real batch encompassed extreme values ranging from -0.34 to 0.34, while the model
with the widest range was the VAE Convolutional model, which exhibited a range of -0.18 to
0.17. This sensitivity can make it challenging to pass the test.
The quantitative and qualitative performance contrast between the VQ VAE Convolutional
and VAE Dense Layers 1 model underscores a critical point in financial time series generation:
it is insufficient to solely generate time series data that, in the aggregate, closely resembles
the real distribution. Equally important is to evaluate the presence of temporal structures
mirroring those in real data. The probability distribution comparison metrics may only partially
capture the specific statistical properties of the financial time series.
Nevertheless, it is important to emphasise that these quantitative metrics are valuable tools
for distinguishing between good and bad generative models. For instance, during the initial
training stages, the visual inspection revealed that the VAE Dense Layers 2 model consistently
generated different time series. However, it failed to meet any stylised facts and consistently
exhibited the worst quantitative metrics. This early differentiation facilitated by the metrics
31
Fig. 6.6. Probability density function of synthetic samples generated by VQ-VAE Convolutional
model after removing outliers.
aids in identifying promising models and guiding the refinement of the generative process by
setting a baseline threshold.
TIME-VAE BASE 0.2552 0.2377 1868.4517 1906.2902 0.1656 0.0066 3742.9767 240.5911
VAE-CONV1 0.2075 0.1907 1313.7562 1447.2064 0.1293 0.0057 2969.7313 174.9255
VAE-CONV2 0.2727 0.2646 2132.299 1836.7015 0.1721 0.0107 3929.7425 193.0259
VAE-DL1 0.0683 0.07 201.4196 290.3397 0.0437 0.002 870.909 47.742
VAE-DL2 0.3266 0.3009 3237.5054 3597.4851 0.2202 0.0078 5065.0014 357.9087
VQ-VAE-CONV 0.0906 0.1756 232.5357 682.3586 0.0553 0.0019 2314.2570 69.7095
VQ-VAE-DL 0.1684 0.1546 858.4102 891.333 0.0989 0.0048 2250.7675 121.2894
Table 6.3
Summary of the quantitative results of the generative models
Table 6.4 provides an overview of the accuracy scores for each model on the real test set. The
baseline accuracy score, obtained by exclusively training and testing on real data, is 0.7384.
In most cases, training exclusively with synthetic or augmented data results in a modest
improvement in accuracy. The highest improvement, 1.15% over the baseline, is achieved by
training exclusively with the synthetic data from the VAE Convolutional 1 model. From the
quantitative analysis, the model ranks 4th and has poor qualitative results. One possible reason
is the length of 30 used for the prediction, which is not that long to hold temporal structures.
About the VQ-VAE Convolutional, their results about synthetic and augmented data are
close to the best result, adding an improvement of 0.84% and 0.98% over the baseline. It is
worth noting that this test serves as a basic benchmark, as no alternative model options were
explored, and the LSTM models were not extensively fine-tuned. Nevertheless, the observed
improvement supports the notion that synthetic series enhances the ability of machine learning
32
Fig. 6.7. Probability and cumulative density functions by Kernel estimations of synthetic
samples generated by VQ-VAE Convolutional model after removing outliers.
Fig. 6.8. Probability and cumulative density functions by the histogram of synthetic samples
generated by the VQ-VAE Convolutional model after removing outliers.
33
models to generalise effectively on testing data, highlighting the added value provided by
generative models and synthetic data.
Table 6.4
Base line accuracy test of training on Real Data: 0.7384
6.4 | Findings
• Generative Variational Autoencoders architectures for financial time series face issues
such as collapsing to a single point or generating repetitive series with different indices.
• In this context, some of the most effective VAE models have fewer layers, emphasising
that there is no optimal architecture for financial time series generation due to the
dependence on training data characteristics.
• Metrics for comparing probability distributions play an essential role in identifying
promising generative models, although they do not substitute for the qualitative analysis
of stylised facts.
Advancements in Prediction:
• Training exclusively with synthetic data from VAE Convolutional 1 model improved
predictive accuracy, with the enhancement being a slight but noticeable 1.15% compared
to training exclusively with real data.
• Training with synthetic and augmented VQ VAE Convolutional model generated data
improved the predictive accuracy. The enhancement is a slight but noticeable 0.84%
and 0.98%, respectively, compared to training exclusively with real data.
• Both training with synthetic data and augmenting real data with synthetic data produced
results comparable to training solely with real data, showing the potential of these models
in financial time series generation and predictive modelling.
34
6.5 | Open challenges
Establishing a rigorous and simplified standardised evaluation of the financial time series
generation of returns is necessary for the future advancement of the field. On the one hand,
probability distribution comparison metrics can contribute partially to this objective, while on
the other hand, qualitative analysis can be enriched by incorporating metrics tailored to each
stylised fact. Furthermore, the utilisation of traders’ experiential knowledge to evaluate the
generated series holds potential, although it necessitates the implementation of specialised
surveys within the field, which may be expensive to implement. The standardised evaluation
would facilitate the eventual development of a joint optimisation framework for balancing
reconstruction and generation, achieved through the design of a dedicated cost function.
Notably, all these open challenges are shared in other generative domains, such as image and
text generation. It is worth mentioning that, as the VQ-VAE architecture discretises the latent
space, it provides stability for the hypothetical ”generation loss” in the envisioned framework,
an attribute not present in the traditional VAE.
Among the other explored ideas, several models were tested during the generation phase
modelling for the VQ-VAE-based models. These included models such as PIXELCNN [45],
the Neural Autoregressive Distribution Estimator [11], and an LSTM network. The training
dataset is the quantisation representation of the complete dataset, and the predictions are new
representations to pass to the fitted decoder.
However, none of these models generated series visually similar to real return time series data.
As illustrated in Figure 6.9, the series generated by PIXELCNN displayed patterns significantly
different, as brisk movements, from those typical of a normal return time series and no presence
of volatility clustering. In the case of the LSTM, the aim was to predict the next embedding
based on the previous 32 embeddings. After training the model, the recurrent prediction
generated a new embedding order or quantisation. This exercise provides insights into the
importance of how the embeddings were ordered and learned by the VQ-VAE, leading to the
adoption of the simple autoregressive sampling strategy.
Furthermore, the state of the state-of-the-art models TimeGAN [27] and the interpretable
version of TimeVAE [28] were fitted to this dataset. However, the training collapsed to a
single line, producing a horizontal line. No original hyperparameter parameter or modification
to the architecture was done to any of these models, as with the TimeVAE base version.
35
Fig. 6.9. Generated sample by PiXELCNN over the VQ-VAE Convolutional model
36
CHAPTER 7
Conclusion and Further Work
In conclusion, this research applies the VQ-VAE architecture to generate realistic financial time
series resembling well-known stylised facts. Combined with a simple autoregressive sampling
over the learned latent space, the final architecture generated series that meet the qualitative
evaluation over these properties and present an aggregate distribution similar to the real data.
Adopting higher-dimensional embeddings in the VQ-VAE Convolutional model significantly
improved model reconstruction and generation capabilities, showing impressive qualitative
and quantitative after-performance across multiple metrics compared to VAE baseline models.
Moreover, the predictive accuracy test exhibits a modest improvement after training with
synthetic and augmented generated data of 0.84% and 0.98%, respectively, over the baseline,
showing promising results to expand the research of this architecture in the field. As a general
conclusion, metrics for comparing probability distributions are essential in identifying promising
financial generative models, although they do not substitute for the qualitative analysis of
stylised facts.
However, it is important to acknowledge several limitations and further work in this study.
Firstly, the presence of cross-correlation in the batch formation raises the need to evaluate this
factor in the generated data. Nevertheless, this aspect still needs to be explored, and further
research is required to assess its presence between generated series. Second, as a stress test,
and given the proposed architecture, a post-generation evaluation can be conducted by training
with different volatility cohorts of the S&P 500 dataset. The test would help to understand
the model’s capacity to learn and adapt to high volatility in data.
Third, this study limited the explored number of architecture configurations and layers. A natural
extension would incorporate recurrent, temporal convolutional, attention, and transformer
layers in the encoder-decoder structures. Additionally, testing other VQ-based alternatives,
such as ViT-VQGAN [12] and VQ-GAN [13], is worth considering. Furthermore, it is important
to note that the state-of-the-art models in the field of time series generation needed to be
fine-tuned or adapted to the dataset, which limits the scope of comparison.
Fourth, this study focused on replicating the main stylised facts associated with financial
returns. However, the generation data was not subjected to an evaluation encompassing other
financial properties, such as the leverage effect, coarse-fine volatility correlation, gain/loss
asymmetry and trend ratios. As mentioned before, the absence of a standardised evaluation
framework in the field that asses all properties remains a notable gap that would contribute
to the advancement of the field. In addition, a more rigorous assessment of the predictive
accuracy test, where multiple models are fine-tuned and trained on the real data to be further
37
applied to the generated data, would provide stronger evidence regarding the enhancement of
machine learning models through the incorporation of synthetic data.
Lastly, there are recent studies that have employed representation learning techniques, applying
embeddings to segment stocks into industry sectors and to lower the portfolio volatility [46],
[47]. Consequently, the discrete representations acquired through the VQ-VAE model have
further applications in financial portfolio and index construction. These learned embeddings,
which effectively represent stocks using VQ-VAE Convolutional, can be leveraged to compute
distances between stocks and facilitate the application of hierarchical clustering methods to
enhance hedging strategies and portfolio compositions. Furthermore, they can be utilised to
propose innovative indexes tailored to specific cohorts within the financial domain.
38
Bibliography
[1] Bollerslev, T.. “Generalized autoregressive conditional heteroskedasticity. ”Journal of
Econometrics. Vol. 31. No. 3, 307–327. Apr. 1986.
[2] Engle, R. F.. “Autoregressive Conditional Heteroscedasticity with Estimates of the
Variance of United Kingdom Inflation. ”Econometrica. Vol. 50. No. 4, 987–1007. 1982.
JSTOR: 1912773.
[3] M. Casas & E. Cepeda, ARCH, GARCH and EGARCH Models: Applications to
Financial Series, SSRN Scholarly Paper, Rochester, NY, Aug. 2008.
[4] Challet, D., & Y. .-. Zhang. “Emergence of cooperation and organization in an
evolutionary game. ”Physica A: Statistical Mechanics and its Applications. Vol. 246.
No. 3, 407–418. Dec. 1997.
[5] Lux, T., & M. Marchesi. “Scaling and criticality in a stochastic multi-agent model of a
financial market. ”Nature. Vol. 397. No. 6719, 498–500. Feb. 1999.
[6] Chen, J.-J., B. Zheng & L. Tan. “Agent-based model with asymmetric trading and
herding for complex financial systems. ”PloS One. Vol. 8. No. 11, e79531. 2013.
[7] Radford, A., L. Metz & S. Chintala, Unsupervised Representation Learning with Deep
Convolutional Generative Adversarial Networks. Jan. 2016. arXiv: 1511.06434 [cs].
[8] Esteban, C., S. L. Hyland & G. Rätsch, Real-valued (Medical) Time Series Generation
with Recurrent Conditional GANs. Dec. 2017. arXiv: 1706.02633 [cs, stat].
[9] Wiese, M., R. Knobloch, R. Korn & P. Kretschmer. “Quant GANs: Deep generation of
financial time series. ”Quantitative Finance. Vol. 20. No. 9, 1419–1440. Sep. 2020.
[10] Dogariu, M., L.-D. Ştefan, B. A. Boteanu, C. Lamba, B. Kim & B. Ionescu. “Gener-
ation of Realistic Synthetic Financial Time-series. ”ACM Transactions on Multimedia
Computing, Communications, and Applications. Vol. 18. No. 4, 96:1–96:27. Mar. 2022.
[11] A. van den Oord, O. Vinyals & K. Kavukcuoglu, Neural Discrete Representation
Learning, May 2018. arXiv: 1711.00937 [cs].
[12] Yu, J., X. Li, J. Y. Koh, H. Zhang, R. Pang, J. Qin, A. Ku, Y. Xu, J. Baldridge &
Y. Wu, Vector-quantized Image Modeling with Improved VQGAN. Jun. 2022. arXiv:
2110.04627 [cs].
[13] Esser, P., R. Rombach & B. Ommer, Taming Transformers for High-Resolution Image
Synthesis. Jun. 2021. arXiv: 2012.09841 [cs].
[14] Takahashi, S., Y. Chen & K. Tanaka-Ishii. “Modeling financial time-series with generative
adversarial networks. ”Physica A: Statistical Mechanics and its Applications. Vol. 527,
121261. Aug. 2019.
[15] R. Cont, “Empirical properties of asset returns: Stylized facts and statistical issues,”
QUANTITATIVE FINANCE,
39
[16] Chakraborti, A., I. M. Toke, M. Patriarca & F. Abergel. “Econophysics review: I.
Empirical facts. ”Quantitative Finance. Vol. 11. No. 7, 991–1012. Jul. 2011.
[17] Fama, E. F.. “Efficient Capital Markets: A Review of Theory and Empirical Work. ”The
Journal of Finance. Vol. 25. No. 2, 383–417. 1970. JSTOR: 2325486.
[18] Assefa, S., Generating Synthetic Data in Finance: Opportunities, Challenges and Pitfalls.
SSRN Scholarly Paper. Rochester, NY, Jun. 2020.
[19] F. De Meer Pardo, “Enriching Financial Datasets with Generative Adversarial Networks,”
2019.
[20] Fu, W., A. Hirsa & J. Osterrieder. “Simulating financial time series using attention.
”2022.
[21] Dogariu, M., L.-D. Ştefan, B. A. Boteanu, C. Lamba & B. Ionescu, “Towards Realistic
Financial Time Series Generation via Generative Adversarial Learning. ”2021 29th
European Signal Processing Conference (EUSIPCO). Aug. 2021, 1341–1345.
[22] J. Li, X. Wang, Y. Lin, A. Sinha & M. P. Wellman, “Generating Realistic Stock
Market Order Streams,” Sep. 2018.
[23] Kondratyev, A., & C. Schwarz, The Market Generator . SSRN Scholarly Paper. Rochester,
NY, May 2019.
[24] Fu, R., J. Chen, S. Zeng, Y. Zhuang & A. Sudjianto, Time Series Simulation by
Conditional Generative Adversarial Net. Apr. 2019. arXiv: 1904.11419 [cs, eess,
stat].
[25] Ni, H., L. Szpruch, M. Sabate-Vidales, B. Xiao, M. Wiese & S. Liao, “Sig-wasserstein
GANs for time series generation. ”Proceedings of the Second ACM International Confer-
ence on AI in Finance. Ser. ICAIF ’21. New York, NY, USA, Association for Computing
Machinery, May 2022, 1–8.
[26] J. Paul, B.-S. Michael, M. Pedro, K. Shubham, S. N. Rajbir, F. Valentin, G. Jan &
J. Tim, “PSA-GAN: Progressive Self Attention GANs for Synthetic Time Series,” Aug.
2021.
[27] J. Yoon, D. Jarrett & M. van der Schaar, “Time-series Generative Adversarial Networks,”
in Advances in Neural Information Processing Systems, vol. 32, Curran Associates, Inc.,
2019.
[28] A. Desai, C. Freeman, Z. Wang & I. Beaver, TimeVAE: A Variational Auto-Encoder
for Multivariate Time Series Generation, Dec. 2021. arXiv: 2111.08095 [cs].
[29] Lee, D., S. Malacarne & E. Aune, Vector Quantized Time Series Generation with a
Bidirectional Prior Model. Apr. 2023. arXiv: 2303.04743 [cs, stat].
[30] Kullback, S., & R. A. Leibler. “On Information and Sufficiency. ”The Annals of
Mathematical Statistics. Vol. 22. No. 1, 79–86. Mar. 1951.
[31] Kingma, D. P., & M. Welling. “An Introduction to Variational Autoencoders. ”Foun-
dations and Trends® in Machine Learning. Vol. 12. No. 4, 307–392. 2019. arXiv:
1906.02691 [cs, stat].
[32] Kingma, D. P., & M. Welling, Auto-Encoding Variational Bayes. Dec. 2022. arXiv:
1312.6114 [cs, stat].
[33] Doersch, C., Tutorial on Variational Autoencoders. Jan. 2021. arXiv: 1606.05908 [cs,
stat].
40
[34] K. Team, Keras documentation: Vector-Quantized Variational Autoencoders, https://keras.io/examples/g
[35] Sonnet/sonnet/examples/vqvae example.ipynb at v1 · google-deepmind/sonnet, https://github.com/goo
deepmind/sonnet/blob/v1/sonnet/examples/vqvae example.ipynb.
[36] Hochreiter, S., & J. Schmidhuber. “Long Short-Term Memory. ”Neural Computation.
Vol. 9. No. 8, 1735–1780. Nov. 1997.
[37] Gers, F. A., J. Schmidhuber & F. Cummins. “Learning to forget: Continual prediction
with LSTM. ”Neural Computation. Vol. 12. No. 10, 2451–2471. Oct. 2000.
[38] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen & X. Chen,
“Improved Techniques for Training GANs,” in Advances in Neural Information Processing
Systems, vol. 29, Curran Associates, Inc., 2016.
[39] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler & S. Hochreiter, “GANs Trained
by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium,” in Advances
in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., 2017.
[40] Theis, L., A. van den Oord & M. Bethge, A note on the evaluation of generative models.
Apr. 2016. arXiv: 1511.01844 [cs, stat].
[41] Divergence measures based on the Shannon entropy — IEEE Journals & Magazine —
IEEE Xplore, https://ieeexplore.ieee.org/document/61115.
[42] Massey, F. J.. “The Kolmogorov-Smirnov Test for Goodness of Fit. ”Journal of the
American Statistical Association. Vol. 46. No. 253, 68–78. Mar. 1951.
[43] Mathematical Methods of Organizing and Planning Production — Semantic Scholar,
https://www.semanticscholar.org/paper/Mathematical-Methods-of-Organizing-and-Planning-
Kantorovich/b63665e46a1c8e2286e7f88418fac29e3db8858a.
[44] Horn, R. A. and Johnson, C. R, Matrix Analysis. Cambridge, England: Cambridge
University Press, 1990, vol. Norms for Vectors and Matrices.
[45] Oord, A. van den, N. Kalchbrenner & K. Kavukcuoglu, Pixel Recurrent Neural Networks.
Aug. 2016. arXiv: 1601.06759 [cs].
[46] Dolphin, R., B. Smyth & R. Dong. “Stock Embeddings: Representation Learning for
Financial Time Series. ”Engineering Proceedings. Vol. 39. No. 1, 30. 2023.
[47] Sokolov, A., J. Mostovoy, B. Parker & L. Seco. “Neural Embeddings of Financial
Time-Series Data. ”The Journal of Financial Data Science. Vol. 2. No. 4, 33–43. Oct.
2020.
41
APPENDIX A
Appendix
A.1 | Repository
The project repository ∗ contains various Jupyter Notebook files and directories related to the
dissertation project. Below is a brief description of each file and directory:
Files
1) data/download data/download.ipynb
Description: This Jupyter Notebook file downloads the S&P 500 stock data from Yahoo
Finance.
Output: data/download data/sp500 joined closes.csv
2) preprocess.ipynb
Description: This Jupyter Notebook file likely contains code related to data preprocessing
steps. It calculates the stocks’ returns and generates the batches to feed.
Input: data/download data/sp500 joined closes.csv
Output: master data/data.npy
3) VAE pytorch.ipynb
Description: This Jupyter Notebook file contains code and plots related to the Variational
Autoencoder models implemented using PyTorch. Explores VAE architecture, training,
and generation.
Input: master data/data.npy
Outputs: • generated data/vae[MODEL NAME].npy
∗
https://git.cs.bham.ac.uk/projects-2022-23/drz236
42
• models/vae[MODEL NAME].pth
• plots/training/[MODEL NAME] *.png
4) VQ-VAE pytorch.ipynb
Description: This Jupyter Notebook file contains code and plots related to the Vector
Quantised Variational Autoencoder models implemented using PyTorch. Explores VQ-
VAE architecture, training, and generation.
Input: master data/data.npy
Outputs: • generated data/vqvae[MODEL NAME].npy
• models/vqvae[MODEL NAME].pth
• plots/training/[MODEL NAME] *.png
5) qualitative evaluation.ipynb
Description: This Jupyter Notebook file contains code to calculate the alpha exponent of
fat-tailed distributions for the generated data.
6) quantitative evaluation.ipynb
Description: This Jupyter Notebook file contains code related to quantitative evaluation
metrics. It includes the code to compare the probability distributions between generated
and real data.
Inputs: • master data/data.npy
• generated data/[MODEL NAME].npy
Outputs: plots/results/*
Directories
generated data
43
master data
Description: This directory contains the preprocessed batches to train the models.
plots
Description: This directory contains the different plots shown in the report and more.
results
Please refer to the specific files and directories for more detailed information on each aspect
of the project. If you have any questions or need further assistance, feel free to reach out to
[email protected].
This section exhibits the qualitative plots for the VAE Dense Layer 2 as the worst generative
model in the research. The multiple qualitative plots showed that the model did not meet the
central moments and autocorrelation plots.
44
Fig. A.1. Mean moment comparison between VAE Dense Layer 2 generation and four real
batches.
45
Fig. A.2. Variance moment comparison between VAE Dense Layer 2 generation and four real
batches.
46
Fig. A.3. Skewness moment comparison between VAE Dense Layer 2 generation and four
real batches.
47
Fig. A.4. Kurtosis moment comparison between VAE Dense Layer 2 generation and four real
batches.
48
Fig. A.5. Probability density function comparison between VAE Dense Layer 2 generation
and four real batches.
Fig. A.6. Autocorrelation plot obtained on a batch of synthetic samples generated by VAE
Dense layer 2 model.
49
Fig. A.7. Absolute autocorrelation plot obtained on a batch of synthetic samples generated
by VAE Dense layer 2 model.
50