0% found this document useful (0 votes)
29 views7 pages

Accurate AI-Driven Emergency Vehicle Location Tracking

This study evaluates various time series forecasting models, including statistical, machine learning, deep learning, and foundation models, for predicting hourly sales in the hospitality industry using real-world data from restaurants in Germany. The results indicate that machine learning models, particularly those utilizing gradient boosting, outperform others in accuracy and robustness, while foundation models like Chronos and TimesFM show promise for their minimal feature engineering requirements. The research highlights the effectiveness of a hybrid Spark-Pandas approach for scalable deployments and emphasizes the potential of foundation models for simplified forecasting in resource-rich environments.

Uploaded by

heavenlyzoro22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views7 pages

Accurate AI-Driven Emergency Vehicle Location Tracking

This study evaluates various time series forecasting models, including statistical, machine learning, deep learning, and foundation models, for predicting hourly sales in the hospitality industry using real-world data from restaurants in Germany. The results indicate that machine learning models, particularly those utilizing gradient boosting, outperform others in accuracy and robustness, while foundation models like Chronos and TimesFM show promise for their minimal feature engineering requirements. The research highlights the effectiveness of a hybrid Spark-Pandas approach for scalable deployments and emphasizes the potential of foundation models for simplified forecasting in resource-rich environments.

Uploaded by

heavenlyzoro22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Benchmarking Time Series Forecasting Models:

From Statistical Techniques to Foundation Models


in Real-World Applications
Issar Arab Rodrigo Benitez
Ruhrdot GmbH Ruhrdot GmbH
Munich, Germany Berlin, Germany
Email: [email protected] Email: [email protected]

Abstract— Time series forecasting is essential for II. METHODS


operational intelligence in the hospitality industry, and
particularly challenging in large-scale, distributed systems. This A. Feature generation
study evaluates the performance of statistical, machine learning
(ML), deep learning, and foundation models in forecasting
To train our forecasting models, we compiled a set of features
hourly sales over a 14-day horizon using real-world data from a
network of thousands of restaurants across Germany. The that expand the available information, providing the models
forecasting solution includes features such as weather with a comprehensive view of the various factors that could
conditions, calendar events, and time-of-day patterns. Results influence sales volume at a restaurant. These independent
demonstrate the strong performance of ML-based meta-models variables can be categorized into the following groups:
and highlight the emerging potential of foundation models like
Chronos and TimesFM, which deliver competitive performance Group Feature
with minimal feature engineering, leveraging only the pre- is_rainy
trained model (zero-shot inference). Additionally, a hybrid Weather-Related Features is_snowy
PySpark-Pandas approach proves to be a robust solution for is_clear
achieving horizontal scalability in large-scale deployments. humidity
temperature
Keywords— Time Series Forecasting, Multi-Time Series, Time of Day Features hour_sin
Statistical Models, Machine Learning, Transformers, Foundation hour_cos
day_of_week_sin
Models, Hospitality Industry.
day_of_week_cos
is_monday
I. INTRODUCTION
is_tuesday
AI and machine learning have shown remarkable potential Day of the Week Features is_wednesday
in solving complex problems across diverse domains. They is_thursday
have revolutionized fields such as natural language processing is_friday
[1], protein sequence analysis [2][3][4], mass spectrometry is_saturday
is_sunday
proteomics [5], drug discovery [6][7][8], software testing and Week of Year Features week_of_year_sin
fault localization [9], and the wide field of time series [10]. week_of_year_cos
Time series forecasting has evolved significantly, moving is_peak_hour
from classical statistical approaches like ARIMA [11] to Event/Activity Features is_dinner
advanced machine learning and deep learning techniques [12] is_lunch
such as RNNs, LSTMs [13], and transformers [14][15]. is_weekend
Recent advancements in foundation models introduce is_christmas_day
promising avenues for multivariate and univariate time series is_all_saints_day
is_ascension_day
forecasting with minimal feature engineering. is_ascension_day
This study addresses the challenge of forecasting hourly is_whit_monday
sales for thousands of restaurants across Germany over the is_corpus_christi
is_easter_monday
next 14 days. Using 2 years historical sales data, weather, and Holiday/Calendar Features is_easter_sunday
location-specific calendar features, the objective is to optimize is_epiphany
staffing and procurement decisions based on the future sales is_german_unity_day
volume on hourly basis in a restaurant. Using a sample of 4 is_good_friday
restaurants located in different regions in the country, we is_international_womens_day
benchmarked statistical, ML-based, conventional deep is_labor_day
learning, and foundation models, analyzing their performance, is_new_years_day
scalability, and practicality for real-world deployment for is_reformation_day
is_repentance_and_prayer_day
large-scale production. is_second_day_of_christmas
B. Forecast Models Analyzed III. RESULTS AND DISCUSSION
We discuss the study from four different points: dataset
To conduct a thorough evaluation of various forecasting structure, performance analysis, scalability, and finally
techniques, we benchmarked the following models across recommendations.
different categories:

Category Model Library A. A Representative Diverse Dataset


Statistical Prophet, multivariate Prophet Prophet
Models The dataset in this study comprised time series data from four
restaurants, each randomly selected from a different
restaurant chain and geographically distributed. To maintain
Machine Linear Regression (Baseline), scikit-learn, confidentiality, we refer to them as Restaurants A, B, C, and
Learning XGBoost, LightGBM MLForecast D. This selection ensured diversity in time series patterns,
Deep Learning N-Beats [16] N-Beats capturing various trends, seasonality, and unique operational
dynamics. Forecasts were generated for the next 14 days for
Foundation TimesFM [14] (multivariate TimesFM
Models transformer), Chronos [15] [torch], each restaurant, and the results were benchmarked across all
(univariate and multivariate AutoGluon models. By including restaurants with different patterns, the
transformers) dataset provided a comprehensive evaluation of the
forecasting models.
This benchmarking encompasses a diverse set of models
ranging from statistical approaches to cutting-edge
foundation models, allowing for a comprehensive B. Performance Analyisis
comparison. Regarding transformer-based models, we
evaluated TimesFM and four sizes of the Chronos-Bolt The performance analysis, as visualized in the figures 1, 2, 3,
pretrained foundation models (Tiny, Mini, Small, and Base). ,4, and 5, highlighted several key findings:
These four models were evaluated using zero-shot inference • Machine Learning Models:
taking as input the time series data only, except for the o ML-based models, particularly the ones
“Chronos-Bolt (Base)” model, which was also assessed with implementing gradient boosting
the regressors option too. algorithms, consistently achieved the
highest performance across all restaurants,
leveraging Pandas Data Frames as input for
C. Evaluation Metrics training.
o Scalability for Retraining: To enable
To assess the performance of the forecasting models, a set of accurate periodic retraining at scale, a
standard evaluation metrics was employed. These metrics hybrid Spark-Pandas solution was
were selected to provide a holistic view of the models' deployed. This approach combined Spark’s
performance, robustness, and overall predictive quality. Each scalability for distributed data processing
metric captures a distinct aspect of model performance, with Pandas' flexibility for training, using
enabling a well-rounded comparison. The metrics and their user-defined functions (UDFs). This
respective descriptions are outlined in the table below: hybrid method demonstrated excellent
stability, horizontal scalability, ease of
Metric Description Desired implementation, and robust performance.
Performance
Trend
o These features make ML-based solutions
MSE Mean Squared Error: Measures Lower the most effective and practical choice for
the average squared difference large-scale deployment.
between actual and predicted • Foundation Models:
values.
MAE Mean Absolute Error: Measures Lower
o Chronos-Bolt foundation models provided
the average absolute difference competitive performance, matching the
between actual and predicted accuracy of ML-based models in 3 out of 4
values. cases.
RMSE Root Mean Squared Error: Square Lower
root of MSE, providing error in the
o Their zero-shot inference capability
same units as the target variable. allowed them to capture trends,
MAPE Mean Absolute Percentage Error: Lower seasonality, and special patterns directly
Measures error as a percentage of from the data without requiring additional
actual values. regressors or extensive feature
𝑹𝟐 Coefficient of Determination: Higher
Indicates the proportion of engineering.
variance in the target explained by o While foundation models strike a good
the model. balance between computational
Corr Correlation Coefficient: Measures Higher complexity, performance, and feature
the strength and direction of the
linear relationship between actual
engineering, they are GPU-dependent,
and predicted values. which may limit their scalability in
environments with constrained IV. CONCLUSION
computational resources. This study highlights the potential of hybrid Spark-Pandas
solutions and foundation models in large scale multi-time
• Statistical and Deep Learning Models: series forecasting. While ML-based models, particularly
o Prophet models, while effective in gradient boosting algorithms, remain a better choice for
capturing seasonality, underperformed accuracy and robustness, pretrained foundation models
compared to both ML-based and represent a promising direction for scenarios prioritizing
foundation models. simplicity and minimal preprocessing (zero-shot inference).
o The N-Beats deep learning model showed The results showcase a balanced approach between
promise but required extensive scalability, computational complexity, and performance, with
computational resources and feature the hybrid Spark-Pandas solution emerging as a practical and
engineering, making it less efficient for this effective choice for large-scale, accurate deployment,
use case. particularly in the absence of advanced computational
C. Scalability resources such as GPUs.
REFERENCES
• ML-Based Models:
o The combination of Spark’s scalability and [1] Hapke, H., Howard, C., & Lane, H. (2019). Natural Language
Pandas’ flexibility enables efficient Processing in Action: Understanding, analyzing, and generating text
horizontal scaling, making ML-based with Python. Simon and Schuster.
models well-suited for large-scale, [2] Arab, I. (2023, August). PEvoLM: Protein Sequence Evolutionary
Information Language Model. In 2023 IEEE Conference on
distributed deployment. Computational Intelligence in Bioinformatics and Computational
o This hybrid solution ensures accurate Biology (CIBCB) (pp. 1-8). IEEE.
periodic retraining of models at scale, [3] Arab, I. (2020). Variational Inference to Learn Representations for
providing robustness and stability. Protein Evolutionary Information. mediatum.ub.tum.de,
• Foundation Models: https://mediatum.ub.tum.de/1579236
o Foundation models, such as Chronos-Bolt, [4] Arab, I. (2023, December). EPSAPG: A Pipeline Combining MMseqs2
and PSI-BLAST to Quickly Generate Extensive Protein Sequence
scale by adding GPUs and require minimal Alignment Profiles. In Proceedings of the IEEE/ACM 10th
preprocessing. This simplicity makes them International Conference on Big Data Computing, Applications and
an attractive option for environments with Technologies (pp. 1-9).
abundant GPU resources, but their [5] Arab, I., Fondrie, W. E., Laukens, K., & Bittremieux, W. (2023).
Semisupervised machine learning for sensitive open modification
dependency on GPUs—given that CPU spectral library searching. Journal of proteome research, 22(2), 585-
execution yielded suboptimal results— 593.
could limit their widespread deployment in [6] Arab, I., Egghe, K., Laukens, K., Chen, K., Barakat, K., & Bittremieux,
resource-constrained settings. W. (2023). Benchmarking of small molecule feature representations for
hERG, Nav1. 5, and Cav1. 2 cardiotoxicity prediction. Journal of
D. Recommendations Chemical Information and Modeling, 64(7), 2515-2527.
[7] Arab, I., & Barakat, K. (2021). ToxTree: descriptor-based machine
learning models for both hERG and Nav1. 5 cardiotoxicity liability
Based on the results and analysis, we propose the following predictions. arXiv preprint arXiv:2112.13467.
recommendations: [8] Arab, I., Laukens, K., & Bittremieux, W. (2024). Semisupervised
• For Scalable Accurate Systems: Learning to Boost hERG, Nav1. 5, and Cav1. 2 Cardiac Ion Channel
o ML-based meta-models are the optimal Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data
Set. Journal of chemical information and modeling, 64(16), 6410-6420.
choice for environments requiring highly
[9] Arab, I., & Magel, K. (2025) Machine Learning-Based Prediction of
scalable and robust solutions. The hybrid Software Fault Localization. arXiv preprint.
Spark-Pandas approach combines the best [10] Masini, R. P., Medeiros, M. C., & Mendes, E. F. (2023). Machine
of both worlds, enabling efficient handling learning advances for time series forecasting. Journal of economic
of large-scale data while maintaining high surveys, 37(1), 76-111.
accuracy and stability—all without relying [11] Mehrmolaei, S., & Keyvanpour, M. R. (2016, April). Time series
on GPU-enabled or dependent clusters. forecasting using improved ARIMA. In 2016 Artificial Intelligence
and Robotics (IRANOPEN) (pp. 92-97). IEEE.
• For Simplicity and Rapid Deployment:
[12] Lim, B., & Zohren, S. (2021). Time-series forecasting with deep
o Foundation models, particularly Chronos- learning: a survey. Philosophical Transactions of the Royal Society A,
Bolt, offer a compelling alternative for 379(2194), 20200209.
environments with abundant GPU [13] Abbasimehr, H., & Paki, R. (2022). Improving time series forecasting
resources. Their ability to provide accurate using LSTM and attention models. Journal of Ambient Intelligence and
Humanized Computing, 13(1), 673-691.
forecasting without requiring additional
[14] Das, A., Kong, W., Sen, R., & Zhou, Y. (2023). A decoder-only
regressors or extensive feature engineering foundation model for time-series forecasting. arXiv preprint
makes them ideal for simplified arXiv:2310.10688.
deployment pipelines. [15] Ansari, A. F., Stella, L., Turkmen, C., Zhang, X., Mercado, P., Shen,
H., ... & Wang, Y. (2024). Chronos: Learning the language of time
series. arXiv preprint arXiv:2403.07815.
[16] Oreshkin, B. N., Carpov, D., Chapados, N., & Bengio, Y. (2019). N-
BEATS: Neural basis expansion analysis for interpretable time series
forecasting. arXiv preprint arXiv:1905.10437.
Figure 1: Chronos-Bolt (Base) accurate 14-day forecast using only time series data for all four restaurants.

Figure 2: Benchmarking results of the 12 models evaluated for 14-day horizon forecasting in Restaurant A.
Figure 3: Benchmarking results of the 12 models evaluated for 14-day horizon forecasting in Restaurant B.
Figure 4: Benchmarking results of the 12 models evaluated for 14-day horizon forecasting in Restaurant C.
Figure 5: Benchmarking results of the 12 models evaluated for 14-day horizon forecasting in Restaurant D.

You might also like