Timeseries Augmentation and Model Selection
Timeseries Augmentation and Model Selection
Context: While gradient boosting models like XGBoost serve as powerful baselines for
such tabular and time-series forecasting tasks, achieving a competitive edge often
necessitates exploring techniques beyond standard approaches. This report
addresses the need for advanced strategies, specifically focusing on enhancing the
available training data through augmentation and synthetic generation, exploring
alternative and potentially more powerful modeling architectures, and leveraging
ensemble methods to boost predictive performance. The aim is to provide an
expert-level guide tailored to the nuances of the Enefit competition, offering
actionable strategies for participants seeking to improve their model accuracy.
Scope: This report delves into several key areas. It begins by defining time series data
augmentation, outlining its goals and inherent challenges. Common augmentation
techniques and generative models for creating synthetic data are then discussed,
followed by an evaluation of their applicability to the specific data structures within
the Enefit competition. Subsequently, the report explores advanced forecasting
models beyond XGBoost, including other Gradient Boosting Machines (LightGBM,
CatBoost), various deep learning architectures (LSTMs, GRUs, TCNs, Transformers),
and relevant statistical models. A comparative analysis assesses these models based
on criteria pertinent to the competition context. Strategies for combining model
predictions through ensembling and hybrid architectures are examined. Finally,
insights gleaned from general Kaggle competition practices and specific energy
forecasting challenges are presented, culminating in concrete recommendations for
tackling the Enefit competition.
II. Enhancing Training Data: Time Series Augmentation Strategies
A. Definition and Goals of Time Series Data Augmentation (DA)
Data Augmentation (DA) encompasses a collection of techniques designed to
artificially expand the size and enhance the diversity of a training dataset.8 This is
achieved either by creating modified copies of existing data samples or by generating
entirely new synthetic data points based on the original dataset.8 The primary
objectives of applying DA, particularly in the context of time series analysis, are
multi-faceted:
1. Expand Limited Datasets: Real-world time series datasets, especially in
specialized domains, can be scarce or expensive to acquire.12 DA provides a
cost-effective means to increase the volume of training data, which is often
crucial for training complex machine learning models effectively.8 Limited data
can lead to poor model generalization and overfitting.9
2. Improve Model Robustness and Generalization: By exposing models to a wider
variety of data instances during training, DA helps them learn more robust
representations and generalize better to unseen data.8 This involves making the
model less sensitive to minor variations, noise, or shifts that might occur in
real-world operational data.8
3. Reduce Overfitting: Overfitting occurs when a model learns the training data too
well, including its noise and specific idiosyncrasies, leading to poor performance
on new data. Increasing data diversity through augmentation acts as a regularizer,
mitigating the risk of overfitting.8
4. Address Data Imbalance: While less directly applicable to regression tasks like
the Enefit competition's primary goal, DA techniques like SMOTE are often used in
classification to oversample minority classes, improving model performance on
imbalanced datasets.10
The power of generative models, especially conditional variants 34, lies in their ability
to learn the underlying data generation process, p(data | context). This contrasts with
basic augmentation techniques that merely transform existing samples. By learning
this conditional distribution, generative models can synthesize data for specific,
potentially rare or even unseen, contexts (like unusual weather combinations or new
prosumer characteristics).34 This capability is particularly valuable for improving model
robustness and generalization in forecasting competitions where future conditions
might differ from the training data. For the Enefit competition, this could mean
generating plausible energy profiles for weather scenarios or customer segments
underrepresented in the historical data. However, the increased complexity of these
models necessitates significant expertise, computational resources, and rigorous
evaluation to ensure the generated data is realistic and beneficial for the downstream
MAE task.35
The landscape of time series forecasting models offers a spectrum of options. The
choice involves navigating trade-offs between established robustness and speed
(GBMs), explicit sequence modeling capabilities (RNNs, TCNs), potential for capturing
very long-range patterns (Transformers), and interpretability or ease of use (Statistical
Models). For a competition like Enefit, with its mix of sequential targets and rich
tabular features, GBMs like LightGBM and CatBoost represent a highly effective
starting point. Deep Learning models, particularly TCNs and Transformers, offer
avenues for potentially higher performance by directly learning complex temporal
dynamics, but demand greater investment in tuning and computational resources.
Statistical models primarily serve as valuable benchmarks or components in feature
engineering pipelines rather than as standalone contenders for top performance. The
Enefit dataset's structure, featuring extensive exogenous information (weather, client
details) alongside the target time series, naturally favors models adept at integrating
diverse data types, a strength of GBMs.73 However, DL models can also effectively
incorporate these exogenous inputs through appropriate architectural design.101 The
success of patching techniques in recent Transformer models 99 underscores the
potential importance of capturing long-range temporal context, which might give DL
approaches an edge if implemented effectively. CatBoost's specialized handling of
categorical features 84 directly addresses a key characteristic of the Enefit client and
segment data. Ultimately, the prevalence of LightGBM in Kaggle successes 77
highlights that its practical balance of speed and performance often proves decisive,
enabling rapid iteration which is critical under competition constraints.
(Note: Suitability scores are subjective estimates based on the analysis for the Enefit
context).
A. Ensemble Methods
Ensemble learning combines predictions from several individual base models (often
diverse in their algorithms, hyperparameters, or training data subsets) to produce a
final prediction that is often more accurate and robust than any single base model.118
This is a cornerstone technique in competitive machine learning.79
1. Simple Averaging / Weighted Averaging: The most basic ensemble approach
involves simply averaging the predictions made by multiple base models for a
given input.119 For regression tasks like Enefit, this means averaging the predicted
target values. A refinement is weighted averaging, where predictions from models
deemed more reliable (e.g., based on their validation set performance) are given
higher weights in the final average.119 This is easy to implement and can provide a
quick performance boost if the base models are reasonably accurate and
diverse.123
2. Blending: Blending is a more sophisticated approach that uses a meta-model to
learn how to best combine the predictions of base models.118 The process
typically involves:
○ Splitting the training data into a sub-training set and a hold-out validation set.
○ Training each base model on the sub-training set.
○ Making predictions with each trained base model on the validation set and
the test set.
○ Training a meta-model (e.g., linear regression, a simple neural network, or
another GBM) using the predictions on the validation set as input features
and the true validation set labels as the target.
○ Using the trained meta-model to combine the predictions made by the base
models on the test set to generate the final submission. Blending is simpler to
implement than stacking as it avoids complex cross-validation folds for
meta-feature generation.118
3. Stacking (Stacked Generalization): Stacking is conceptually similar to blending
but uses cross-validation more effectively to generate predictions for training the
meta-model.118 The process involves:
○ Splitting the training data into K folds.
○ For each fold k: Train each base model on the other K-1 folds and make
predictions on fold k.
○ Concatenate the out-of-fold predictions for each base model across all K
folds. These concatenated predictions form the input features for the
meta-model. The original training labels are the target for the meta-model.
○ Train the meta-model on these out-of-fold predictions.
○ Train each base model on the entire original training dataset.
○ Make predictions with these fully trained base models on the test set.
○ Use the trained meta-model to combine the base model predictions on the
test set for the final output. Stacking generally uses the training data more
efficiently than blending but is more complex and computationally intensive.118
Stacking can also involve multiple layers, where the outputs of one level of
meta-models become inputs for the next.118
● Considerations: The success of ensembling relies heavily on the diversity of the
base models. Combining models that make different types of errors is more
beneficial than combining highly correlated models. Diversity can be achieved by
using different algorithms (e.g., LGBM, CatBoost, TCN), different feature subsets,
different hyperparameters, or different training data samples (as in bagging).
Care must be taken to avoid overfitting the meta-model, especially with stacking
where the meta-features are derived from the training data itself. Ensembles
inevitably increase computational cost and complexity.118
● Enefit Relevance: Extremely high. Given the multifaceted nature of the Enefit
data (temporal patterns, weather influences, client characteristics, price signals),
different models might excel at capturing different aspects. An ensemble
combining a strong GBM (like LGBM or CatBoost, good with tabular features) with
a strong DL model (like a TCN or Transformer, potentially better at long temporal
dependencies) is a highly promising strategy.81 Blending or stacking are the
standard techniques to implement such combinations effectively.
B. Hybrid Models
Hybrid models differ from ensembles in that they integrate distinct architectural
components within a single model structure, aiming to leverage the complementary
strengths of different approaches internally.91
1. Concept: The goal is to create a unified model that benefits from the specific
capabilities of its constituent parts. Common strategies involve using one
component for feature extraction or data preprocessing and another for the core
prediction task 91, or combining statistical rigor with machine learning flexibility.105
2. Examples Relevant to Time Series:
○ Decomposition-Based Hybrids: A common approach involves first
decomposing the time series into simpler components like trend, seasonality,
and residuals using statistical methods (e.g., moving averages, STL
decomposition).67 Then, different models can be applied to each component –
perhaps a simple linear model for the trend, deterministic functions for
seasonality, and a complex ML/DL model (like LSTM or GBM) for the
harder-to-predict residual component.67 Models like N-BEATS and N-HiTS
embody this philosophy.
○ CNN-LSTM Models: These models use Convolutional Neural Network (CNN)
layers initially to act as feature extractors, identifying local patterns or spatial
correlations within windows of the sequence. The outputs of the CNN layers
are then fed into LSTM layers, which model the longer-term temporal
dependencies among the extracted features.91
○ LSTM-XGBoost Models: Various combinations exist. One approach uses LSTM
to process the sequential aspects of the data and generate hidden states or
preliminary predictions, which are then used as input features (along with
other static/exogenous features) for an XGBoost model that makes the final
prediction.92 Another approach replaces the final dense output layer of an
LSTM network with an XGBoost model, potentially leveraging XGBoost's
effectiveness on the final regression task.90
○ Transformer Hybrids: Transformers can be combined with other architectures.
For instance, convolutional layers might be used for initial patching or local
feature extraction before feeding into Transformer blocks 91, or RNN layers
might be integrated alongside attention mechanisms.81
● Enefit Relevance: Moderate to High. Hybrid approaches offer tailored solutions.
For Enefit, decomposing the energy series first might help isolate predictable
seasonal patterns from more volatile weather-driven components, allowing
specialized models for each. An LSTM-XGBoost architecture 92 could potentially
leverage LSTM's sequence handling for lagged target/weather features while
using XGBoost's power to integrate all available static client information and price
data effectively for the final prediction.
Final Thoughts
Achieving success in a challenging time series forecasting competition like Enefit
rarely stems from a single "silver bullet" model or technique. Instead, it typically
results from a meticulous, iterative process combining deep data understanding,
creative feature engineering, careful model selection and tuning, robust validation,
and the intelligent combination of multiple approaches through ensembling. While
starting with efficient and powerful GBMs like LightGBM and CatBoost is pragmatic,
systematically exploring data augmentation and advanced deep learning models,
followed by sophisticated ensembling, offers the most promising path to potentially
achieving top-tier performance.
Works cited
1. Predict Energy Behavior of Prosumers - Enefit - Kaggle, accessed on April 16,
2025,
https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/co
de)
2. Predict Energy Behavior of Prosumers - Enefit - Kaggle, accessed on April 16,
2025,
https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/da
ta
3. Enefit - Predict Energy Behavior of Prosumers - Kaggle, accessed on April 16,
2025,
https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers
4. Predict Energy Behavior of Prosumers, accessed on April 16, 2025,
https://www.nbi.dk/~petersen/Teaching/ML2024/FinalProject/FinalProject09_Pros
umers_TheoXaverInigoAlicja.pdf
5. Enefit - Kaggle, accessed on April 16, 2025,
https://www.kaggle.com/code/farzonaeraj/enefit
6. DylanTartarini1996/enefit_challenge: Repository created for the Enefit Kaggle
Challenge - GitHub, accessed on April 16, 2025,
https://github.com/DylanTartarini1996/enefit_challenge/
7. dextercorley19/Enefit-Kaggle-Competition: The goal of the competition is to
create an energy prediction model of prosumers to reduce energy imbalance
costs. - GitHub, accessed on April 16, 2025,
https://github.com/dextercorley19/Enefit-Kaggle-Competition
8. Basic Data Augmentation Method Applied to Time Series - Mad Devs, accessed
on April 16, 2025,
https://maddevs.io/writeups/basic-data-augmentation-method-applied-to-time-
series/
9. A Comprehensive Survey on Data Augmentation - arXiv, accessed on April 16,
2025, https://arxiv.org/html/2405.09591v2
10.A Comprehensive Survey on Data Augmentation - arXiv, accessed on April 16,
2025, https://arxiv.org/pdf/2405.09591
11. Data Augmentation: Techniques, Examples & Benefits - CCS Learning Academy,
accessed on April 16, 2025,
https://www.ccslearningacademy.com/what-is-data-augmentation/
12.Data Augmentation techniques in time series domain: A survey and taxonomy -
arXiv, accessed on April 16, 2025, https://arxiv.org/html/2206.13508v4
13.Data Augmentation techniques in time series domain: a survey and taxonomy,
accessed on April 16, 2025,
https://www.researchgate.net/publication/369505251_Data_Augmentation_techni
ques_in_time_series_domain_a_survey_and_taxonomy
14.An empirical survey of data augmentation for time series classification with
neural networks, accessed on April 16, 2025,
https://pmc.ncbi.nlm.nih.gov/articles/PMC8282049/
15.Data Augmentation for Time-Series Classification: a Comprehensive Survey -
arXiv, accessed on April 16, 2025, https://arxiv.org/html/2310.10060v5
16.Data Augmentation for Time-Series Classification: a Comprehensive Survey -
arXiv, accessed on April 16, 2025, https://arxiv.org/html/2310.10060v4
17.Data Augmentation for Multivariate Time Series Classification: An Experimental
Study - arXiv, accessed on April 16, 2025, https://arxiv.org/html/2406.06518v1
18.10 Ways to Master Data Augmentation for Incredible Results - Data Science Dojo,
accessed on April 16, 2025,
https://datasciencedojo.com/blog/understanding-data-augmentation/
19.Overview of Data Augmentation Techniques in Time Series Analysis - The Science
and Information (SAI) Organization, accessed on April 16, 2025,
https://thesai.org/Downloads/Volume15No1/Paper_118-Overview_of_Data_Augme
ntation_Techniques.pdf
20.Data Augmentation with Suboptimal Warping for Time-Series Classification -
MDPI, accessed on April 16, 2025, https://www.mdpi.com/1424-8220/20/1/98
21.Class-Based Time Series Data Augmentation to Mitigate Extreme Class Imbalance
for Solar Flare Prediction - arXiv, accessed on April 16, 2025,
https://arxiv.org/html/2405.20590v1
22.Time Series Augmentations | Towards Data Science, accessed on April 16, 2025,
https://towardsdatascience.com/time-series-augmentations-16237134b29b/
23.www.ijcai.org, accessed on April 16, 2025,
https://www.ijcai.org/proceedings/2021/0631.pdf
24.Time Series Data Augmentation – tsai - GitHub Pages, accessed on April 16, 2025,
https://timeseriesai.github.io/tsai/data.transforms.html
25.A Deep Dive Into Data Augmentation Techniques - EMB Global, accessed on April
16, 2025,
https://blog.emb.global/a-deep-dive-into-data-augmentation-techniques/
26.An introduction to Dynamic Time Warping - Romain Tavenard, accessed on April
16, 2025, https://rtavenar.github.io/blog/dtw.html
27.DTW Explained - Dynamic Time Warping - Papers With Code, accessed on April
16, 2025, https://paperswithcode.com/method/dtw
28.Dynamic time warping - Wikipedia, accessed on April 16, 2025,
https://en.wikipedia.org/wiki/Dynamic_time_warping
29.Dynamic Time Warping Algorithm Review - CSDL, accessed on April 16, 2025,
https://csdl.ics.hawaii.edu/techreports/2008/08-04/08-04.pdf
30.Time Series Data Augmentation for Neural Networks by Time Warping with a
Discriminative Teacher - arXiv, accessed on April 16, 2025,
https://arxiv.org/pdf/2004.08780
31.Data Augmentation Time Series Python | Restackio, accessed on April 16, 2025,
https://www.restack.io/p/data-augmentation-answer-time-series-python-cat-ai
32.How is data augmentation applied to time-series data? - Milvus, accessed on
April 16, 2025,
https://milvus.io/ai-quick-reference/how-is-data-augmentation-applied-to-times
eries-data
33.tsgm/tutorials/augmentations.ipynb at main - GitHub, accessed on April 16, 2025,
https://github.com/AlexanderVNikitin/tsgm/blob/main/tutorials/augmentations.ipy
nb
34.CENTS: Generating synthetic electricity consumption time series for rare and
unseen scenarios - arXiv, accessed on April 16, 2025,
https://arxiv.org/html/2501.14426v3
35.Survey of Time Series Data Generation in IoT - PMC, accessed on April 16, 2025,
https://pmc.ncbi.nlm.nih.gov/articles/PMC10422358/
36.SeriesGAN: Time Series Generation via Adversarial and Autoregressive Learning -
arXiv, accessed on April 16, 2025, https://arxiv.org/html/2410.21203v1
37.A Synthetic Time-Series Generation Using a Variational Recurrent Autoencoder
with an Attention Mechanism in an Industrial Control System - MDPI, accessed on
April 16, 2025, https://www.mdpi.com/1424-8220/24/1/128
38.IH-TCGAN: Time-Series Conditional Generative Adversarial Network with
Improved Hausdorff Distance for Synthesizing Intention Recognition Data - MDPI,
accessed on April 16, 2025, https://www.mdpi.com/1099-4300/25/5/781
39.Data Augmentation for Pseudo-Time Series Using Generative Adversarial
Networks - CEUR-WS.org, accessed on April 16, 2025,
https://ceur-ws.org/Vol-3498/paper5.pdf
40.papers.neurips.cc, accessed on April 16, 2025,
http://papers.neurips.cc/paper/8789-time-series-generative-adversarial-network
s.pdf
41.[1811.08295] T-CGAN: Conditional Generative Adversarial Network for Data
Augmentation in Noisy Time Series with Irregular Sampling - arXiv, accessed on
April 16, 2025, https://arxiv.org/abs/1811.08295
42.[2006.16477] Conditional GAN for timeseries generation - ar5iv - arXiv, accessed
on April 16, 2025, https://ar5iv.labs.arxiv.org/html/2006.16477
43.Generative Adversarial Network for Synthetic Time Series Data Generation in
Smart Grids - OSTI.GOV, accessed on April 16, 2025,
https://www.osti.gov/servlets/purl/1607585
44.Probabilistic Net Load Forecasting for High-Penetration RES Grids Utilizing
Enhanced Conditional Diffusion Model - arXiv, accessed on April 16, 2025,
https://arxiv.org/html/2503.17770v1
45.TemperatureGAN: generative modeling of regional atmospheric temperatures |
Environmental Data Science | Cambridge Core, accessed on April 16, 2025,
https://www.cambridge.org/core/journals/environmental-data-science/article/tem
peraturegan-generative-modeling-of-regional-atmospheric-temperatures/1B55
A7DF1CCFACE1A89FE4653D3FCA22
46.[2202.02691] TTS-GAN: A Transformer-based Time-Series Generative Adversarial
Network, accessed on April 16, 2025, https://arxiv.org/abs/2202.02691
47.Evaluation is Key: A Survey on Evaluation Measures for Synthetic Time Series,
accessed on April 16, 2025,
https://www.researchgate.net/publication/373853713_Evaluation_is_Key_A_Survey
_on_Evaluation_Measures_for_Synthetic_Time_Series
48.Smart Home Energy Management: VAE-GAN synthetic dataset generator and
Q-learning, accessed on April 16, 2025, https://arxiv.org/abs/2305.08885
49.Smart Home Energy Management: VAE-GAN synthetic dataset generator and
Q-learning - arXiv, accessed on April 16, 2025, https://arxiv.org/pdf/2305.08885
50.(PDF) Smart Home Energy Management: VAE-GAN synthetic dataset generator
and Q-learning - ResearchGate, accessed on April 16, 2025,
https://www.researchgate.net/publication/370814682_Smart_Home_Energy_Man
agement_VAE-GAN_synthetic_dataset_generator_and_Q-learning
51.Time Weaver: A Conditional Time Series Generation Model - arXiv, accessed on
April 16, 2025, https://arxiv.org/html/2403.02682v1
52.Diffusion Model for Time Series and Spatio-Temporal Data - GitHub, accessed on
April 16, 2025,
https://github.com/yyysjz1997/Awesome-TimeSeries-SpatioTemporal-Diffusion-M
odel
53.[2006.16477] Conditional GAN for timeseries generation - arXiv, accessed on
April 16, 2025, https://arxiv.org/abs/2006.16477
54.TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series -
arXiv, accessed on April 16, 2025, https://arxiv.org/html/2305.11567v2
55.How to evaluate synthetic data quality - Syntheticus, accessed on April 16, 2025,
https://syntheticus.ai/blog/how-to-evaluate-synthetic-data-quality
56.Advancing Retail Data Science: Comprehensive Evaluation of Synthetic Data -
arXiv, accessed on April 16, 2025, https://arxiv.org/html/2406.13130v1
57.How to evaluate the quality of the synthetic data – measuring from the
perspective of fidelity, utility, and privacy | AWS Machine Learning Blog, accessed
on April 16, 2025,
https://aws.amazon.com/blogs/machine-learning/how-to-evaluate-the-quality-of
-the-synthetic-data-measuring-from-the-perspective-of-fidelity-utility-and-priv
acy/
58.A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated
by Large Language Models - arXiv, accessed on April 16, 2025,
https://arxiv.org/html/2404.14445v1
59.Utility Metrics for Evaluating Synthetic Health Data Generation Methods:
Validation Study, accessed on April 16, 2025,
https://pmc.ncbi.nlm.nih.gov/articles/PMC9030990/
60.Evaluating Synthetic Data Generation from User Generated Text - MIT Press
Direct, accessed on April 16, 2025,
https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00540/124625/Evaluating-Synt
hetic-Data-Generation-from-User
61.How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and
Auditing Generative Models, accessed on April 16, 2025,
https://proceedings.mlr.press/v162/alaa22a/alaa22a.pdf
62.ENEFIT_Predict_energy_behavi, accessed on April 16, 2025,
https://www.kaggle.com/code/serdargundogdu/enefit-predict-energy-behavior-
eda
63.Synthetic Random Environmental Time Series Generation with Similarity Control,
Preserving Original Signal's Statistical Characteristics - arXiv, accessed on April
16, 2025, https://arxiv.org/html/2502.02392v1
64.Agent-based Modeling in Energy Systems - SmythOS, accessed on April 16, 2025,
https://smythos.com/ai-industry-solutions/energy/agent-based-modeling-in-ener
gy-systems/
65.Agent Based Modelling for Smart Grids | JRC SES, accessed on April 16, 2025,
https://ses.jrc.ec.europa.eu/agent-based-modelling-smart-grids
66.(PDF) Agent Based Models in Power Systems: A Literature Review -
ResearchGate, accessed on April 16, 2025,
https://www.researchgate.net/publication/351244257_Agent_Based_Models_in_P
ower_Systems_A_Literature_Review
67.QuLTSF.pdf - Nanyang Technological University, accessed on April 16, 2025,
https://personal.ntu.edu.sg/ariel.neufeld/QuLTSF.pdf
68.A Hybrid Loss Framework for Decomposition-based Time Series Forecasting
Methods: Balancing Global and Component Errors - arXiv, accessed on April 16,
2025, https://arxiv.org/html/2411.11340
69.Resampling Methods that Generate Time Series Data to Enable Sensitivity and
Model Analysis in Energy Modeling - arXiv, accessed on April 16, 2025,
https://arxiv.org/html/2502.08102v1
70.[2502.08102] Resampling Methods that Generate Time Series Data to Enable
Sensitivity and Model Analysis in Energy Modeling - arXiv, accessed on April 16,
2025, https://arxiv.org/abs/2502.08102
71.Optimal starting point for time series forecasting - arXiv, accessed on April 16,
2025, https://arxiv.org/html/2409.16843v1
72.Benchmarking state-of-the-art gradient boosting algorithms for classification -
arXiv, accessed on April 16, 2025, https://arxiv.org/pdf/2305.17094
73.Electricity Load and Peak Forecasting: Feature Engineering, Probabilistic
LightGBM and Temporal Hierarchies - GitHub Pages, accessed on April 16, 2025,
https://ecml-aaltd.github.io/aaltd2023/papers/Electricity%20Load%20and%20Pea
k%20Forecasting_%20Feature%20Engineering,%20Probabilistic%20LightGBM%
20and%20Temporal%20Hierarchies.pdf
74.XGBoost vs LightGBM vs CatBoost vs AdaBoost - Kaggle, accessed on April 16,
2025,
https://www.kaggle.com/code/faressayah/xgboost-vs-lightgbm-vs-catboost-vs-a
daboost
75.Performance Comparison: CatBoost vs XGBoost and CatBoost vs LightGBM |
Towards Data Science, accessed on April 16, 2025,
https://towardsdatascience.com/performance-comparison-catboost-vs-xgboost
-and-catboost-vs-lightgbm-886c1c96db64/
76.When to Choose CatBoost Over XGBoost or LightGBM [Practical Guide] -
Neptune.ai, accessed on April 16, 2025,
https://neptune.ai/blog/when-to-choose-catboost-over-xgboost-or-lightgbm
77.Lightgbm vs xgboost vs catboost - Data Science Stack Exchange, accessed on
April 16, 2025,
https://datascience.stackexchange.com/questions/49567/lightgbm-vs-xgboost-vs
-catboost
78.Comparison and Explanation of Forecasting Algorithms for Energy Time Series -
MDPI, accessed on April 16, 2025, https://www.mdpi.com/2227-7390/9/21/2794
79.Kaggle forecasting competitions: An overlooked learning opportunity | Request
PDF - ResearchGate, accessed on April 16, 2025,
https://www.researchgate.net/publication/344096318_Kaggle_forecasting_comp
etitions_An_overlooked_learning_opportunity
80.The ASHRAE Great Energy Predictor III competition: Overview and results -
ResearchGate, accessed on April 16, 2025,
https://www.researchgate.net/publication/343855462_The_ASHRAE_Great_Energ
y_Predictor_III_competition_Overview_and_results
81.NN Transformer using LGBM Knowledge Distillation - American Express - Default
Prediction | Kaggle, accessed on April 16, 2025,
https://www.kaggle.com/competitions/amex-default-prediction/discussion/34764
1?ref=localhost
82.M5_Forecasting with LSTM and LightGBM | Kaggle, accessed on April 16, 2025,
https://www.kaggle.com/code/surekharamireddy/m5-forecasting-with-lstm-and-l
ightgbm
83.Enefit pebop submission- change HPs - Kaggle, accessed on April 16, 2025,
https://www.kaggle.com/code/nischaymahamana/enefit-pebop-submission-chan
ge-hps/code
84.Choosing Between XGBoost, LightGBM and CatBoost - Kaggle, accessed on
April 16, 2025,
https://www.kaggle.com/discussions/questions-and-answers/544999
85.CatBoost For Accurate Time-Series Predictions: Here's How - AI, accessed on
April 16, 2025,
https://aicompetence.org/catboost-for-accurate-time-series-predictions/
86.Forecasting with XGBoost, LightGBM and other Gradient Boosting models -
skforecast, accessed on April 16, 2025,
https://skforecast.org/0.11.0/user_guides/forecasting-xgboost-lightgbm
87.A Comparative Study of Detecting Anomalies in Time Series Data Using LSTM and
TCN Models - arXiv, accessed on April 16, 2025, https://arxiv.org/pdf/2112.09293
88.Unlocking the Power of LSTM for Long Term Time Series Forecasting - arXiv,
accessed on April 16, 2025, https://arxiv.org/html/2408.10006v1
89.An Empirical Evaluation of Generic Convolutional and Recurrent Networks for
Sequence Modeling - arXiv, accessed on April 16, 2025,
https://arxiv.org/pdf/1803.01271
90.(PDF) Short-Term Traffic Flow Prediction Based on LSTM-XGBoost Combination
Model, accessed on April 16, 2025,
https://www.researchgate.net/publication/346287900_Short-Term_Traffic_Flow_P
rediction_Based_on_LSTM-XGBoost_Combination_Model
91.Hybrid deep learning models for time series forecasting of solar power -
ResearchGate, accessed on April 16, 2025,
https://www.researchgate.net/publication/378395432_Hybrid_deep_learning_mo
dels_for_time_series_forecasting_of_solar_power
92.(PDF) Load forecasting for energy communities: a novel LSTM-XGBoost hybrid
model based on smart meter data - ResearchGate, accessed on April 16, 2025,
https://www.researchgate.net/publication/363397991_Load_forecasting_for_ener
gy_communities_a_novel_LSTM-XGBoost_hybrid_model_based_on_smart_meter
_data
93.Electricity consumption forecasting with Transformer models - NTNU Open,
accessed on April 16, 2025,
https://ntnuopen.ntnu.no/ntnu-xmlui/bitstream/handle/11250/3095097/no.ntnu%3
Ainspera%3A142737689%3A34440404.pdf?sequence=1&isAllowed=y
94.Key takeaways from Kaggle's most recent time series competition - Ventilator
Pressure Prediction | Towards Data Science, accessed on April 16, 2025,
https://towardsdatascience.com/key-takeaways-from-kaggles-most-recent-time
-series-competition-ventilator-pressure-prediction-7a1d2e4e0131/
95.Development and Comparative Analysis of Temporal Convolutional Network for
Time Series Data Classification | Journal of Neonatal Surgery, accessed on April
16, 2025, https://www.jneonatalsurg.com/index.php/jns/article/view/3195
96.(PDF) Temporal Convolutional Networks Applied to Energy-related Time Series
Forecasting, accessed on April 16, 2025,
https://www.researchgate.net/publication/339745646_Temporal_Convolutional_N
etworks_Applied_to_Energy-related_Time_Series_Forecasting
97.Temporal Convolutional Networks and Forecasting - Unit8, accessed on April 16,
2025,
https://unit8.com/resources/temporal-convolutional-networks-and-forecasting/
98.[2112.09293] A Comparative Study of Detecting Anomalies in Time Series Data
Using LSTM and TCN Models - arXiv, accessed on April 16, 2025,
https://arxiv.org/abs/2112.09293
99.DeformableTST: Transformer for Time Series Forecasting without Over-reliance
on Patching - NIPS papers, accessed on April 16, 2025,
https://proceedings.neurips.cc/paper_files/paper/2024/file/a0b1082fc7823c4c68a
bcab4fa850e9c-Paper-Conference.pdf
100. PSformer: Parameter-efficient Transformer with Segment Attention for Time
Series Forecasting - arXiv, accessed on April 16, 2025,
https://arxiv.org/html/2411.01419v1
101. Deep Learning for Time Series Forecasting: A Survey - arXiv, accessed on April
16, 2025, https://arxiv.org/html/2503.10198v1/
102. Generative Pretrained Hierarchical Transformer for Time Series Forecasting -
arXiv, accessed on April 16, 2025, https://arxiv.org/html/2402.16516v2
103. Unified Training of Universal Time Series Forecasting Transformers - arXiv,
accessed on April 16, 2025, https://arxiv.org/pdf/2402.02592
104. A Comprehensive Survey of Time Series Forecasting: Architectural Diversity
and Open Challenges - arXiv, accessed on April 16, 2025,
https://arxiv.org/pdf/2411.05793
105. A Survey of Deep Learning and Foundation Models for Time Series
Forecasting - arXiv, accessed on April 16, 2025, https://arxiv.org/html/2401.13912v1
106. Transfer Learning with Foundational Models for Time Series Forecasting using
Low-Rank Adaptations - arXiv, accessed on April 16, 2025,
https://arxiv.org/html/2410.11539v1
107. Fredformer: Frequency Debiased Transformer for Time Series Forecasting -
arXiv, accessed on April 16, 2025, https://arxiv.org/html/2406.09009v4
108. Comparative Analysis of ARIMA, SARIMA and Prophet Model in Forecasting,
accessed on April 16, 2025,
https://www.sciencepublishinggroup.com/article/10.11648/j.rd.20240504.13
109. ARIMA vs Prophet vs LSTM for Time Series Prediction - Neptune.ai, accessed
on April 16, 2025, https://neptune.ai/blog/arima-vs-prophet-vs-lstm
110. A Comparative Study of ARIMA and SARIMA Models to Forecast Lockdowns
due to SARS-CoV-2 - Longdom Publishing SL, accessed on April 16, 2025,
https://www.longdom.org/open-access/a-comparative-study-of-arima-and-sari
ma-models-to-forecast-lockdowns-due-to-sarscov2-98209.html
111. (PDF) Comparative Analysis of ARIMA, SARIMA and Prophet Model in
Forecasting, accessed on April 16, 2025,
https://www.researchgate.net/publication/385157901_Comparative_Analysis_of_A
RIMA_SARIMA_and_Prophet_Model_in_Forecasting
112. A Comparison of Time Series Forecast Models for Predicting the Outliers
Particles in Semiconductor Cleanroom - Korean Institute of Information
Technology, accessed on April 16, 2025, https://ki-it.com/xml/34789/34789.pdf
113. A Review of ARIMA vs. Machine Learning Approaches for Time Series
Forecasting in Data Driven Networks - MDPI, accessed on April 16, 2025,
https://www.mdpi.com/1999-5903/15/8/255
114. Statistical comparison of Prophet and ARIMA/SARIMA models. -
ResearchGate, accessed on April 16, 2025,
https://www.researchgate.net/figure/Statistical-comparison-of-Prophet-and-ARI
MA-SARIMA-models_fig1_363701373
115. Statistical Comparison of Time Series Models for Forecasting Brazilian Monthly
Energy Demand Using Economic, Industrial, and Climatic Exogenous Variables -
MDPI, accessed on April 16, 2025, https://www.mdpi.com/2076-3417/14/13/5846
116. A Comprehensive Survey of Time Series Forecasting: Architectural Diversity
and Open Challenges - arXiv, accessed on April 16, 2025,
https://arxiv.org/html/2411.05793v1
117. ddz16/TSFpaper: This repository contains a reading list of papers on Time
Series Forecasting/Prediction (TSF) and Spatio-Temporal Forecasting/Prediction
(STF). These papers are mainly categorized according to the type of model. -
GitHub, accessed on April 16, 2025, https://github.com/ddz16/TSFpaper
118. Stacking & Blending in ML from scratch in Python - Kaggle, accessed on April
16, 2025,
https://www.kaggle.com/code/egazakharenko/stacking-blending-in-ml-from-scra
tch-in-python
119. Ensemble Learning Techniques Tutorial - Kaggle, accessed on April 16, 2025,
https://www.kaggle.com/code/pavansanagapati/ensemble-learning-techniques-t
utorial
120. 1-Guide to Ensembling methods - Kaggle, accessed on April 16, 2025,
https://www.kaggle.com/code/amrmahmoud123/1-guide-to-ensembling-method
s
121. Ensemble Learning: Bagging, Boosting & Stacking - Kaggle, accessed on April
16, 2025,
https://www.kaggle.com/code/satishgunjal/ensemble-learning-bagging-boosting
-stacking
122. What methods do top Kagglers employ for score gain? - Data Science Stack
Exchange, accessed on April 16, 2025,
https://datascience.stackexchange.com/questions/124709/what-methods-do-top
-kagglers-employ-for-score-gain
123. Blending Ensemble for Regression Problems - Kaggle, accessed on April 16,
2025,
https://www.kaggle.com/code/ahmedabdulhamid/blending-ensemble-for-regres
sion-problems
124. Tips for stacking and blending - Kaggle, accessed on April 16, 2025,
https://www.kaggle.com/code/zaochenye/tips-for-stacking-and-blending
125. Matt Motoki | Grandmaster - Kaggle, accessed on April 16, 2025,
https://www.kaggle.com/mmotoki/discussion
126. [2503.10198] Deep Learning for Time Series Forecasting: A Survey - arXiv,
accessed on April 16, 2025, https://arxiv.org/abs/2503.10198
127. Two-stage hybrid models for enhancing forecasting accuracy on
heterogeneous time series, accessed on April 16, 2025,
https://arxiv.org/html/2502.08600v1
128. Stock-Price Forecasting Based on XGBoost and LSTM, accessed on April 16,
2025,
https://cdn.techscience.cn/ueditor/files/csse/TSP_CSSE-40-1/TSP_CSSE_17685/TSP
_CSSE_17685.pdf
129. Stock Price Prediction based on LSTM and XGBoost Combination Model,
accessed on April 16, 2025, https://wepub.org/index.php/TCSISR/article/view/90
130. Forecast of LSTM-XGBoost in Stock Price Based on Bayesian Optimization,
accessed on April 16, 2025, https://www.techscience.com/iasc/v29n3/43035/html
131. A Reference Guide to Feature Engineering Methods - Kaggle, accessed on
April 16, 2025,
https://www.kaggle.com/code/prashant111/a-reference-guide-to-feature-engine
ering-methods
132. Use XGBoost for Time-Series Forecasting - Analytics Vidhya, accessed on
April 16, 2025,
https://www.analyticsvidhya.com/blog/2024/01/xgboost-for-time-series-forecast
ing/
133. Feature Engineering for Time Series - Kaggle, accessed on April 16, 2025,
https://www.kaggle.com/code/patrickurbanke/feature-engineering-for-time-serie
s
134. Practical Guide for Feature Engineering of Time Series Data - dotData,
accessed on April 16, 2025,
https://dotdata.com/blog/practical-guide-for-feature-engineering-of-time-series
-data/
135. Ensemble Methodology: Innovations in Credit Default Prediction Using
LightGBM, XGBoost, and LocalEnsemble - arXiv, accessed on April 16, 2025,
https://arxiv.org/html/2402.17979
136. Kaggle Past Solutions, accessed on April 16, 2025,
https://ndres.me/kaggle-past-solutions/
137. The State of Machine Learning Competitions - ML Contests, accessed on April
16, 2025, https://mlcontests.com/state-of-machine-learning-competitions-2024/
138. Enefit - Predict Energy Behavior of Prosumers | Kaggle, accessed on April 16,
2025,
https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/di
scussion/462266
139. This Competition has an Official Discord Channel - Enefit - Predict Energy
Behavior of Prosumers | Kaggle, accessed on April 16, 2025,
https://www.kaggle.com/competitions/predict-energy-behavior-of-prosumers/di
scussion/452899
140. Kaggle Solutions, accessed on April 16, 2025,
https://farid.one/kaggle-solutions/
141. Winning solutions of kaggle competitions, accessed on April 16, 2025,
https://www.kaggle.com/code/sudalairajkumar/winning-solutions-of-kaggle-com
petitions
142. Kaggle Winning Solution Methods Review, accessed on April 16, 2025,
https://www.kaggle.com/code/thedrcat/kaggle-winning-solution-methods-revie
w