0% found this document useful (0 votes)

5 views

LSTM Model Architecture for Rare Event Time Series Forecasting - MachineLearningMastery.com

The document discusses a scalable end-to-end LSTM model for time series forecasting, particularly for rare events like public holidays. It highlights the model's architecture, which includes separate autoencoder and forecasting sub-models, and its effectiveness in multivariate, multi-step forecasting across various cities. The findings suggest that the proposed LSTM architecture outperforms existing models and can be reused for different forecasting problems.

Uploaded by

rowjatul.jannat.aust

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

LSTM Model Architecture for Rare Event Time Series Forecasting - MachineLearningMastery.com

Uploaded by

rowjatul.jannat.aust

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

 Navigation

GET STARTED BLOG TOPICS  EBOOKS FAQ ABOUT CONTACT 

LSTM Model Architecture for Rare Event Time Series Forecasting

by Jason Brownlee on August 5, 2019 in Deep Learning for Time Series  71

Share Post Share

Time series forecasting with LSTMs directly has shown little success.

This is surprising as neural networks are known to be able to learn complex non-linear relationships and the LSTM is perhaps the most successful
type of recurrent neural network that is capable of directly supporting multivariate sequence prediction problems.

A recent study performed at Uber AI Labs demonstrates how both the automatic feature learning capabilities of LSTMs and their ability to handle input
sequences can be harnessed in an end-to-end model that can be used for drive demand forecasting for rare events like public holidays.

In this post, you will discover an approach to developing a scalable end-to-end LSTM model for time series forecasting.

After reading this post, you will know:

The challenge of multivariate, multi-step forecasting across multiple sites, in this case cities.
An LSTM model architecture for time series forecasting comprised of separate autoencoder and forecasting sub-models.
The skill of the proposed LSTM architecture at rare event demand forecasting and the ability to reuse the trained model on unrelated
forecasting problems.

Kick-start your project with my new book Deep Learning for Time Series Forecasting, including step-by-step tutorials and the Python source code
files for all examples.

Let’s get started.

Overview
In this post, we will review the 2017 paper titled “Time-series Extreme Event Forecasting with Neural Networks at Uber” by Nikolay Laptev, et al.
presented at the Time Series Workshop, ICML 2017.

This post is divided into four sections; they are:

1. Motivation
2. Datasets
3. Model
4. Findings

Motivation
The goal of the work was to develop an end-to-end forecast model for multi-step time series forecasting that can handle multivariate inputs (e.g.
multiple input time series).

The intent of the model was to forecast driver demand at Uber for ride sharing, specifically to forecast demand on challenging days such as holidays
where the uncertainty for classical models was high.

Generally, this type of demand forecasting for holidays belongs to an area of study called extreme event prediction.


Extreme event prediction has become a popular topic for estimating peak electricity demand, traffic jam severity and surge pricing for
ride sharing and other applications. In fact there is a branch of statistics known as extreme value theory (EVT) that deals directly with
this challenge.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

Two existing approaches were described:

Classical Forecasting Methods: Where a model was developed per time series, perhaps fit as needed.
Two-Step Approach: Where classical models were used in conjunction with machine learning models.

The difficulty of these existing models motivated the desire for a single end-to-end model.

Further, a model was required that could generalize across locales, specifically across data collected for each city. This means a model trained on
some or all cities with data available and used to make forecasts across some or all cities.

We can summarize this as the general need for a model that supports multivariate inputs, makes multi-step forecasts, and generalizes across multiple
sites, in this case cities.

Need help with Deep Learning for Time Series?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Datasets
The model was fit in a propitiatory Uber dataset comprised of five years of anonymized ride sharing data across top cities in the US.


A five year daily history of completed trips across top US cities in terms of population was used to provide forecasts across all major
US holidays.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

The input to each forecast consisted of both the information about each ride, as well as weather, city, and holiday variables.

To circumvent the lack of data we use additional features including weather information (e.g., precipitation, wind speed, temperature)
 and city level information (e.g., current trips, current users, local holidays).

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

The figure below taken from the paper provides a sample of six variables for one year.
Scaled Multivariate Input for Model
Taken from “Time-series Extreme Event Forecasting with Neural Networks at Uber”.

A training dataset was created by splitting the historical data into sliding windows of input and output variables.

The specific size of the look-back and forecast horizon used in the experiments were not specified in the paper.

Sliding Window Approach to Modeling Time Series

Taken from “Time-series Extreme Event Forecasting with Neural Networks at Uber”.

Time series data was scaled by normalizing observations per batch of samples and each input series was de-trended, but not deseasonalized.

Neural networks are sensitive to unscaled data, therefore we normalize every minibatch. Furthermore, we found that de-trending the
 data, as opposed to de-seasoning, produces better results.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

Model
LSTMs, e.g. Vanilla LSTMs, were evaluated on the problem and show relatively poor performance.

This is not surprising as it mirrors findings elsewhere.


Our initial LSTM implementation did not show superior performance relative to the state of the art approach.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

A more elaborate architecture was used, comprised of two LSTM models:

Feature Extractor: Model for distilling an input sequence down to a feature vector that may be used as input for making a forecast.
Forecaster: Model that uses the extracted features and other inputs to make a forecast.

An LSTM autoencoder model was developed for use as the feature extraction model and a Stacked LSTM was used as the forecast model.


We found that the vanilla LSTM model’s performance is worse than our baseline. Thus, we propose a new architecture, that leverages
an autoencoder for feature extraction, achieving superior performance compared to our baseline.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

When making a forecast, time series data is first provided to the autoencoders, which is compressed to multiple feature vectors that are averaged and
concatenated. The feature vectors are then provided as input to the forecast model in order to make a prediction.
… the model first primes the network by auto feature extraction, which is critical to capture complex time-series dynamics during
 special events at scale. […] Features vectors are then aggregated via an ensemble technique (e.g., averaging or other methods). The
final vector is then concatenated with the new input and fed to LSTM forecaster for prediction.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

It is not clear what exactly is provided to the autoencoder when making a prediction, although we may guess that it is a multivariate time series for the
city being forecasted with observations prior to the interval being forecasted.

A multivariate time series as input to the autoencoder will result in multiple encoded vectors (one for each series) that could be concatenated. It is not
clear what role averaging may take at this point, although we may guess that it is an averaging of multiple models performing the autoencoding
process.

Overview of Feature Extraction Model and Forecast Model

Taken from “Time-series Extreme Event Forecasting with Neural Networks at Uber.”

The authors comment that it would be possible to make the autoencoder a part of the forecast model, and that this was evaluated, but the separate
model resulted in better performance.


Having a separate auto-encoder module, however, produced better results in our experience.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

More details of the developed model were made available in the slides used when presenting the paper.

The input for the autoencoder was 512 LSTM units and the bottleneck in the autoencoder used to create the encoded feature vectors as 32 or 64
LSTM units.
Details of LSTM Autoencoder for Feature Extraction
Taken from “Time-series Extreme Event Forecasting with Neural Networks at Uber.”

The encoded feature vectors are provided to the forecast model with ‘new input‘, although it is not specified what this new input is; we could guess
that it is a time series, perhaps a multivariate time series of the city being forecasted with observations prior to the forecast interval. Or, features
extracted from this series as the blog post on the paper suggests (although I’m skeptical as the paper and slides contradict this).

The model was trained on a lot of data, which is a general requirement of stacked LSTMs or perhaps LSTMs in general.

The described production Neural Network Model was trained on thousands of time-series with thousands of data points each.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

The model is not retrained when making new forecasts.

An interesting approach to estimating forecast uncertainty was also implemented that used the bootstrap.

It involved estimating model uncertainty and forecast uncertainty separately, using the autoencoder and the forecast model respectively. Inputs were
provided to a given model and dropout of the activations (as commented in the slides) was used. This process was repeated 100 times, and the model
and forecast error terms were used in an estimate of the forecast uncertainty.

Overview of Forecast Uncertainty Estimation

Taken from “Time-series Extreme Event Forecasting with Neural Networks at Uber.”

This approach to forecast uncertainty may be better described in the 2017 paper “Deep and Confident Prediction for Time Series at Uber.”
Findings
The model was evaluated with a special focus on demand forecasting for U.S. holidays by U.S. city.

The specifics of the model evaluation were not specified.

The new generalized LSTM forecast model was found to outperform the existing model used at Uber, which may be impressive if we assume that the
existing model was well tuned.

The results presented show a 2%-18% forecast accuracy improvement compared to the current proprietary method comprising a
 univariate timeseries and machine learned model.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

The model trained on the Uber dataset was then applied directly to a subset of the M3-Competition dataset comprised of about 1,500 monthly
univariate time series forecasting datasets.

This is a type of transfer learning, a highly-desirable goal that allows the reuse of deep learning models across problem domains.

Surprisingly, the model performed well, not great compared to the top performing methods, but better than many sophisticated models. The result is
suggests that perhaps with fine tuning (e.g. as is done in other transfer learning case studies) the model could be reused and be skillful.

Performance of LSTM Model Trained on Uber Data and Evaluated on the M3 Datasets
Taken from “Time-series Extreme Event Forecasting with Neural Networks at Uber.”

Importantly, the authors suggest that perhaps the most beneficial application of deep LSTM models to time series forecasting are situations where:

There are a large number of time series.

There are a large number of observations for each series.
There is a strong correlation between time series.


From our experience there are three criteria for picking a neural network model for time-series: (a) number of timeseries (b) length of
time-series and (c) correlation among the time-series. If (a), (b) and (c) are high then the neural network might be the right choice,
otherwise classical timeseries approach may work best.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

This is summarized well by a slide used in the presentation of the paper.

Lessons Learned Applying LSTMs for Time Series Forecasting
Taken from “Time-series Extreme Event Forecasting with Neural Networks at Uber” Slides.

Further Reading
This section provides more resources on the topic if you are looking to go deeper.

Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

Engineering Extreme Event Forecasting at Uber with Recurrent Neural Networks, 2017.
Time-Series Modeling with Neural Networks at Uber, Slides, 2017.
Time-series Extreme Event Forecasting Case study, Slides 2018.
Time Series Workshop, ICML 2017
Deep and Confident Prediction for Time Series at Uber, 2017.

Summary
In this post, you discovered a scalable end-to-end LSTM model for time series forecasting.

Specifically, you learned:

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning models for Time Series Today!

Develop Your Own Forecasting models in Minutes
...with just a few lines of python code
Discover how in my new Ebook:
Deep Learning for Time Series Forecasting

It provides self-study tutorials on topics like:

CNNs, LSTMs, Multivariate Forecasting, Multi-Step Forecasting and much more...

Finally Bring Deep Learning to your Time Series Forecasting Projects

Skip the Academics. Just Results.

SEE WHAT'S INSIDE

Share Post Share

How to Use Features in LSTM Stateful and Stateless LSTM for

Networks for Time Series… Time Series…

About Jason Brownlee

Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get results with modern machine learning methods via hands-on tutorials.
View all posts by Jason Brownlee →

 Comparing Classical and Machine Learning Algorithms for Time Series Forecasting A Gentle Introduction to LSTM Autoencoders 
71 Responses to LSTM Model Architecture for Rare Event Time Series Forecasting

Valentin Nagacevschi November 2, 2018 at 6:33 pm # REPLY 

Hi,

Is there a way to identify and remove outliers from data sets without affecting rare events?
Or how not to mistakenly have outliers as rare events ?
Thanks

Vali

Jason Brownlee November 3, 2018 at 7:01 am # REPLY 

You must carefully define what you mean by “outlier” and “rare event” so that the methods that detect the former don’t detect the latter.

Valentin Nagacevschi November 3, 2018 at 4:52 pm # REPLY 

Outliers usually are anomalies which are abnormal ie. outside a normal distribution. Something like mean+/-2*std. in a time series
outliers are sparks, with much higher freq than the normal signal even with rare events.
For instance a Black Friday is rare event but fits in the normal frequency whereas an outlier is much higher frequency.
So how can I bring the frequency part in the equation?
Thanks

Jason Brownlee November 4, 2018 at 6:25 am # REPLY 

Good question, I don’t have material on this topic so I can’t give you good off the cuff advice.

I may cover the topic in the future.

Kevin Van Horn May 23, 2019 at 7:25 am # REPLY 

“Higher frequency” means more often. I think you mean that true outliers have a much *lower* frequency. But even that isn’t
necessarily accurate. I’ve seen web traffic time series that have occasional spikes that correspond to no known event, occurring in some
cases more commonly than the few known special events.

Ian Downard November 7, 2018 at 4:45 am # REPLY 

Thanks for the post. Do you know where an implementation for this algorithm can be found?

Jason Brownlee November 7, 2018 at 6:13 am # REPLY 

Not at this stage. Although this might help as a start:

https://machinelearningmastery.com/lstm-autoencoders/

MANISH KUMAR November 17, 2018 at 12:01 pm # REPLY 

I don’t understood this paper as it includes terms like time series multivariate lstm recurrent model

Jason Brownlee November 18, 2018 at 6:37 am # REPLY 

Perhaps start with something simpler, for example:

https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

Juninho December 20, 2018 at 3:52 am # REPLY 

Hi,
Thanks for this article. I’m trying to implement this paper using the Tensorflow low-level api.
Can you explain more about the confident interval computation, please.

I mean one you got the uncertainty error and the irreducible error, how can you get the interval through MC Dropout

Thank a lot

Jason Brownlee December 20, 2018 at 6:30 am # REPLY 

Perhaps check the paper or contact the author of the paper, it has been months since I read the paper.

sophia December 24, 2018 at 12:09 pm # REPLY 

very well explained, as always! a lot of your other articles contain code that help us understand the concepts better. I’m sure you’re very
busy, but it’d be great if you could add code to this post, or point me to some articles/repos that have some code related to this post. Thanks,

Jason Brownlee December 25, 2018 at 7:17 am # REPLY 

Thanks, you can learn more about LSTMs here:

https://machinelearningmastery.com/start-here/#lstm

Bob July 30, 2020 at 10:09 pm # REPLY 

I would like that too!

André January 9, 2019 at 1:35 pm # REPLY 

Hi Jason,

I Master Degree student and I got interested in aply this approach in climate data series. I my research I have an adiction challenge (dimesions) the are
latitude and longitude of an extrem or rare event.

I created a time series downloading 10 years of ERA Interim, Daily: Pressure and sufarce data form ECMWF
(https://apps.ecmwf.int/datasets/data/interim-full-daily/levtype=sfc/).

So, as an example, I’m interested in predict an extreme rain (> 50mm in 24h) for a selected area (0.75 resolution): latitude from-18.75 to -20.25 and
longitude from 315.0 to 316.5., It’s a grid 3 x 3 = 9 grids.

The rain (total precipitation in mm) doesn’t have a gaussian distribution, so, there are a lot of 00mm days, and the time series of rain are not a
continuos sequence.

In your experience, this “Ubber” approach can fit despite of distribution problem? I have some doubts about the approach, like how this “LSTM
Autoencoder for Feature Extraction” works. Do you expected do code a complete example like this “Uber” approach?

Jason Brownlee January 10, 2019 at 7:45 am # REPLY 

I don’t know how this approach will fair with your data, perhaps try it and see?

Savan Gowda January 10, 2019 at 10:52 pm # REPLY 

Hi Jason,

Thank you for the explanation of this paper.

I have one question and maybe you could help me with that. The LSTM Autoencoder that I created looks like this —

inputs = Input(shape=(n_steps, input_dim))

encoder1 = CuDNNLSTM(128, return_sequences = True)(inputs)
encoder2 = CuDNNLSTM(64, return_sequences = True)(encoder1)
encoder3 = CuDNNLSTM(32)(encoder2)

repeat = RepeatVector(10)(encoder3)

decoder1 = CuDNNLSTM(32, return_sequences=True)(repeat)

decoder2 = CuDNNLSTM(64, return_sequences=True)(decoder1)
decoder3 = CuDNNLSTM(128, return_sequences=True)(decoder2)
dense1 = TimeDistributed(Dense(100, activation=’relu’))(decoder3)
dense2 = TimeDistributed(Dense(1))(dense1)
sequence_autoencoder = Model(inputs, dense2)
encoder_model = Model(inputs, repeat)

Should we extract the feature from the “repeat” layer or the “encoder3” layer?

Could you please give me a hint for plotting/visualization of the extracted features please?

Thanks & Regards

Savan

Jason Brownlee January 11, 2019 at 7:51 am # REPLY 

I’m eager to help, but I don’t have the capacity to debug your code, sorry.

Savan Gowda January 11, 2019 at 10:19 pm # REPLY 

Thank you for the answer Jason! You need not be sorry 🙂 Do you have any example code or could you suggest me some methods
with which I can visualize the feature vectors?

Thanks you 🙂

Jason Brownlee January 12, 2019 at 5:41 am # REPLY 

You can use a PCA to visualize high-dimensional vectors.

manish February 9, 2019 at 5:34 pm # REPLY 

where to find the dataset for this paper of uber could you please send me
ansd how to implement this

Jason Brownlee February 10, 2019 at 9:40 am # REPLY 

All code and data is here:

https://github.com/M4Competition/M4-methods

MANISH KUMAR April 26, 2019 at 2:12 am # REPLY 

please give me the implementation with results of this data set

MANISH KUMAR February 10, 2019 at 1:41 am # REPLY 

please send me dataset for this paper . i need this desperately for my research work please help me

Jason Brownlee February 10, 2019 at 9:44 am # REPLY 

It was uber data and not released.

This is the closest we have:

https://github.com/M4Competition/M4-methods

MANISH KUMAR February 18, 2019 at 4:32 pm # REPLY 

please provie me any downloaded file of data and how to implement it

Jason Brownlee February 19, 2019 at 7:21 am # REPLY 

If you’re looking for datasets, perhaps start here:

https://machinelearningmastery.com/faq/single-faq/where-can-i-get-a-dataset-on-___
MANISH KUMAR May 11, 2019 at 3:28 pm # REPLY 

can you provide me thesis work related to this topic of rare events please help me.with implementation

Jason Brownlee May 12, 2019 at 6:38 am # REPLY 

I cannot, sorry.

Ahmad May 19, 2019 at 9:57 pm # REPLY 

Hi Jason, thank you for the post. Please! what is the difference between Monte Carlo dropout and normal dropout? Do you have a link to any
tutorial that shows how to add Monte Carlo dropout to the LSTM model implementation?

Thank you!

Jason Brownlee May 20, 2019 at 6:29 am # REPLY 

What is monte carlo dropout?

Ahmad May 20, 2019 at 10:09 am # REPLY 

It is a stochastic dropout used as Bayesian approximation for model uncertainty estimation. It is equivalent to performing T
stochastic forward passes through the Neural Network and averaging the result. It can also be approximated by averaging the weights of the
NN (i.e.multiplying each weight by a probability p at test time). MC dropout s used for model uncertainty estimation in the paper you
elaborated and the one you provided as reference (“Deep and Confident Prediction for Time Series at Uber”) in this post.

Jason Brownlee May 20, 2019 at 2:36 pm # REPLY 

Thanks!

Ahmad May 20, 2019 at 6:35 pm #

Please! help me with any tutorial that shows how it can be implemented using the LSTM model

Jason Brownlee May 21, 2019 at 6:30 am #

Thanks for the suggestion, I may be able to cover it in the future.

Marco Cerliani May 22, 2019 at 4:57 pm # REPLY 

I made a post where I replicate these results. You can find the article here: https://towardsdatascience.com/extreme-event-forecasting-with-
lstm-autoencoders-297492485037 (with Python Code)

Jason Brownlee May 23, 2019 at 5:55 am # REPLY 

Well done, thanks for sharing.

Ahmad Idris Tambuwal November 4, 2019 at 11:50 pm # REPLY 

@Marco. In one of your post: https://towardsdatascience.com/anomaly-detection-with-lstm-in-keras-8d8d7e50ab1b you used quantile

regression for anomaly detection. Is it possible to use quantile regression in the extreme event forecasting with lstm autoencoder to identify
anomalies? If yes, how can I update it?

Marco May 25, 2019 at 5:11 am # REPLY 

Good job. But I must say that I’m sick of reading this incomplete paper. “new input” is something not specified clearly in any part of the
paper. They publish a paper and they hide some details or made them obscure. What’s the point?

Jason Brownlee May 25, 2019 at 7:54 am # REPLY 

Papers are always incomplete, they are just enough to give you a rough idea – which might be enough.

It’s a pain. And unless a paper has associated code it is almost fraud – they can make up anything.

Thankfully, most good papers have associated github project – this never used to be the case.

Dong Jae Kim June 24, 2021 at 11:10 pm # REPLY 

What is this ‘new input’? I am stuck here.

Jason Brownlee June 25, 2021 at 6:16 am # REPLY 

Perhaps this will help:

https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/

dong jae kim June 26, 2021 at 5:27 am # REPLY 

If you are transforming new input data from different series then avg + concatenate to prior to making a new prediction then why
do they add another new input? Is this new input the same input as the one prior to transformation by the encoder? I still do not
understand this.

Jason Brownlee June 27, 2021 at 4:31 am #

What do you mean by another new input?

I’d recommend reading the paper, it may have more details and make the situation clearer.

Parth June 30, 2019 at 11:24 pm # REPLY 

How many time series are sufficient enough for these network training? (Author suggest that more number of time series needed for these
type of network to succeed, but how many?)

Can you please give some number to have rough idea?

Thanks for very insightful post!

Jason Brownlee July 1, 2019 at 6:35 am # REPLY 

It really depends.

If you don’t have a lot of data, you can avoid overfitting with regularization:
https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/

Nour Attaallah August 9, 2019 at 7:55 am # REPLY 

Is there a way to separate overlapped events in a time series trace ?

Jason Brownlee August 9, 2019 at 8:19 am # REPLY 

I guess it depends on the data.

Nouraldin Attaallah August 13, 2019 at 2:54 am # REPLY 

So each individual event in the trace has its unique duration and volume (y-value). An overlapped event will look like a block of stacked
rectangular events. Any suggestions ? Thanks

Jason Brownlee August 13, 2019 at 6:14 am # REPLY 

Not off hand, some research may be required. Perhaps try some searches on scholar.google.com

Nouraldin Attaallah August 14, 2019 at 4:55 am #

Okay
Thanks 😀

Amelie October 2, 2019 at 12:00 am # REPLY 

I wanted to make a prediction of a t+1 every 15 minutes, (in real time: Run forever).

will be better to look for a good model, then I predict the next step (off line), Or, at each prediction I update my model with the new prediction (on
line)?

Jason Brownlee October 2, 2019 at 7:59 am # REPLY 

Test a few approaches and see what works best for your specific dataset.

Emily October 18, 2019 at 5:18 pm # REPLY 

Hi Jason,

I have to perform Anomaly detection and I only have a univariate Time series data (~1 year).
Does it make sense to create lagged and derived features from the same time series (such as mean, min, max, sd, deviation etc. for different
windows) and train a LSTM autoencoder model on it?
The idea is that if I score/predict the new data point using the lagged and derived features and the reconstruction error is > threshold then it’s an
anomaly.Do you recommend this approach?

Jason Brownlee October 19, 2019 at 6:29 am # REPLY 

I recommend testing a suite of framings of the problem and models in order to discover what works best.

Emily October 20, 2019 at 3:55 am # REPLY 

Hi Jason,

Thanks for your reply.

But it is not conceptually wrong if I create features that are essentially from 1 univariate time series and then use autoencoder right?

Just want to confirm my understanding.

Regards,
Emily

Jason Brownlee October 20, 2019 at 6:23 am # REPLY 

Hmmm, there is no real right and wrong, there are only models that work and ones that do not.

You’re idea is complex, but perhaps it will work – give it a shot. The great thing about these libraries is that testing ideas is very fast – like just a
few minutes.

Emily October 20, 2019 at 7:11 pm # REPLY 

Thanks a lot!
Adi November 21, 2019 at 2:25 am # REPLY 

Hi Jason,

I’m working on a problem where I have a daily time series with a set of ~100 features associated for every day. There is also an 0/1 event associated
with each day. There are days missing in the data. I want to predict, based on the future features whether or not the event will occur on that day. I’m
not really getting how can I do it?

Thanks a lot!

Jason Brownlee November 21, 2019 at 6:10 am # REPLY 

I recommend testing many different framings of the dataset and see what works.

Adi November 21, 2019 at 7:09 am # REPLY 

Thanks for the prompt reply. I’m not getting what might be a good approach to start with? Should I frame it as a simple classification
problem or a time series approach

Jason Brownlee November 21, 2019 at 1:23 pm # REPLY 

Start simple, then go to complex.

Start with classification, e.g. prior days features as input todays label as output, or something. Perhaps explore feature selection on this.

Then start adding in more history to some/all features, for different prior intervals. Discover what results in skillful models on your data.

Play the scientist.

Bandeep January 15, 2020 at 9:27 pm # REPLY 

Hello Jason,

Is univariate LSTM RNN capable of giving good results with 1200 observation of daily sales data with 20 percent of observations have sales happened
and other 80 percent don’t have any sales happened so taken as zero. The sales data is in the form of daily number of units sold.
I am trying with this model. Is there any other time series model you can suggest me for this kind of problem where there is daily sales but happened
for few days only . An accuracy of 60 Percent as a start will be good .

I have divided the problem in two parts

1. When is the sales happened and
2. How much is the sales happened .

for the1. part I have given 0 flag to the day where sales didn’t happened and 1 where sales happened irrespective of how much .
Is univariate LSTM helpful in pattern recognization of 0 and 1…? after that I will go for 2 part . i.e. how much …If you have any other technique let me
know ..
Regards
Bandeep

Jason Brownlee January 16, 2020 at 6:13 am # REPLY 

Perhaps test the model on your data and evaluate the result?

I would strongly encourage you to test other models as LSTMs are generally terrible at univariate time series forecasting.

Roger Pacey February 23, 2020 at 12:07 am # REPLY 

Thanks for this, and the many other useful articles that you publish.

In the Uber study, did your network identify spikes and dips NOT associated with events known beforehand: public holidays?

I am working towards a network that identifies rare events (demand spikes) before they occur. I have demand spikes that just seem to appear out of
the blue.

I have a simple neural network that predicts when an order is coming in, but predicting whether the next order is a spike has resisted analysis thus far.
A runs test shows that order size is not random and intuition after many years in the business tells me there’s a model out there somewhere.

Jason Brownlee February 23, 2020 at 7:29 am # REPLY 

I believe they were anticipated, but I’m not confident on that guess. Perhaps double check the paper?

Roger Pacey March 8, 2020 at 11:41 pm # REPLY 

As luck would have it, a vanilla LSTM network gave astonishingly good results on my data: really exciting. I’m guessing that, if I can
do it, an expert can do it even better.

Accordingly, I think the guys working for Uber would have forecast random demand spikes not related to holidays. Perhaps it’s so obvious,
they didn’t feel the need to mention it.

As with them, the uncertainty on the actual level of demand shows up in my model. It’s a less urgent issue for me but further improvement
gives me a chance to upgrade my skills.

bita September 5, 2020 at 1:53 am # REPLY 

Hi
my dataset is 11000*6

Is it possible that RNN accuracy to be equal or greater than LSTM?

If you know a source in this field, please let me know

thank you

Jason Brownlee September 5, 2020 at 6:51 am # REPLY 

LSTM is an RNN.

Perhaps test a suite of models on your dataset and discover what works best.

Motilal November 6, 2020 at 6:49 am # REPLY 

How much data points in daily time series data will be there to call it as a long time series data

Jason Brownlee November 6, 2020 at 7:32 am # REPLY 

It really depends.

Generally an LSTM is limited to 200-400 time steps per sample.

Email (will not be published) (required)

SUBMIT COMMENT

Welcome!
I'm Jason Brownlee PhD
and I help developers get results with machine learning.
Read more

Never miss a tutorial:

Picked for you:

How to Develop LSTM Models for Time Series Forecasting

How to Develop Convolutional Neural Network Models for Time Series Forecasting

Multi-Step LSTM Time Series Forecasting Models for Power Usage

1D Convolutional Neural Network Models for Human Activity Recognition

Multivariate Time Series Forecasting with LSTMs in Keras

Loving the Tutorials?

The Deep Learning for Time Series EBook is where you'll find the Really Good stuff.

>> SEE WHAT'S INSIDE

Machine Learning Mastery is part of Guiding Tech Media, a leading digital media publisher focused on helping people figure out technology. Visit our corporate website to learn
more about our mission and team.
PRIVACY | DISCLAIMER | TERMS | CONTACT | SITEMAP

Machine Learning for Time Series Forecasting with Python
From Everand
Machine Learning for Time Series Forecasting with Python
Francesca Lazzeri
4/5 (2)
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
From Everand
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Debananda Ghosh
No ratings yet
Software Requirements Specification - Sign Language To Text
100% (1)
Software Requirements Specification - Sign Language To Text
19 pages
Time-Series Extreme Event Forecasting With Neural Networks at Uber
No ratings yet
Time-Series Extreme Event Forecasting With Neural Networks at Uber
5 pages
s11063-024-11656-3
No ratings yet
s11063-024-11656-3
25 pages
LSTM Autoencoder For Extreme Rare Event Classification in Keras - by Chitta Ranjan - Towards Data Science
No ratings yet
LSTM Autoencoder For Extreme Rare Event Classification in Keras - by Chitta Ranjan - Towards Data Science
19 pages
Forecasting at Uber: A Brief Survey: Andrea Pasqua
No ratings yet
Forecasting at Uber: A Brief Survey: Andrea Pasqua
53 pages
How To Develop LSTM Models For Time Series Forecasting
100% (1)
How To Develop LSTM Models For Time Series Forecasting
188 pages
Time Series Forecasting Using LSTM Networks: A Symbolic Approach
No ratings yet
Time Series Forecasting Using LSTM Networks: A Symbolic Approach
12 pages
Multivariate Multi Step Time Series Forecasting Using Stacked LSTM Sequence To Sequence Autoencoder in Tensorflow 2 0 Keras
No ratings yet
Multivariate Multi Step Time Series Forecasting Using Stacked LSTM Sequence To Sequence Autoencoder in Tensorflow 2 0 Keras
9 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Statistics with Rust: 50+ Statistical Techniques Put into Action
From Everand
Statistics with Rust: 50+ Statistical Techniques Put into Action
Keiko Nakamura
No ratings yet
Multivariate Time Series Forecasting With Dynamic Graph Neural Odes
No ratings yet
Multivariate Time Series Forecasting With Dynamic Graph Neural Odes
14 pages
Modeling Extreme Events in Time Series Prediction: Daizong Ding, Mi Zhang Xudong Pan, Min Yang Xiangnan He
No ratings yet
Modeling Extreme Events in Time Series Prediction: Daizong Ding, Mi Zhang Xudong Pan, Min Yang Xiangnan He
9 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Time-series Forecasting With Deep Learning - A Survey
No ratings yet
Time-series Forecasting With Deep Learning - A Survey
14 pages
Time-Series Forecasting With Deep Learning - A Survey
No ratings yet
Time-Series Forecasting With Deep Learning - A Survey
14 pages
XLSTMTime - Long-term Time Series Forecasting With XLSTM
No ratings yet
XLSTMTime - Long-term Time Series Forecasting With XLSTM
13 pages
Improving Neural Networks For Time-Series Forecasting Using Data Augmentation and Automl
No ratings yet
Improving Neural Networks For Time-Series Forecasting Using Data Augmentation and Automl
8 pages
Sustainability 14 15292 v2
No ratings yet
Sustainability 14 15292 v2
19 pages
LSTM and Transformer
No ratings yet
LSTM and Transformer
4 pages
ssrn-4165241
No ratings yet
ssrn-4165241
28 pages
Time Series Forecasting With Deep Learning: A Survey: Research
No ratings yet
Time Series Forecasting With Deep Learning: A Survey: Research
13 pages
A Hybrid Deep Neural Network Model For Time Series Forecasting
No ratings yet
A Hybrid Deep Neural Network Model For Time Series Forecasting
6 pages
XLSTMTime Long-term Time Series Forecasting With XLSTM
No ratings yet
XLSTMTime Long-term Time Series Forecasting With XLSTM
13 pages
2105.06756v1
No ratings yet
2105.06756v1
16 pages
Long-term Forecasting With TiDE Time-series Dense Encoder
No ratings yet
Long-term Forecasting With TiDE Time-series Dense Encoder
21 pages
An Artificial Neural Network P D Q Model For Times
No ratings yet
An Artificial Neural Network P D Q Model For Times
12 pages
Non - Stationary Former
No ratings yet
Non - Stationary Former
21 pages
Sag Heer 2019
No ratings yet
Sag Heer 2019
19 pages
Data_Regression_Framework_for_Time_Series_Data_with_Extreme_Events
No ratings yet
Data_Regression_Framework_for_Time_Series_Data_with_Extreme_Events
10 pages
Time Series Forecasting Using Deep Learning - MATLAB & Simulink
100% (1)
Time Series Forecasting Using Deep Learning - MATLAB & Simulink
6 pages
Time-Series Forecasting Using Conv1D-LSTM - Multiple Timesteps Into Future
No ratings yet
Time-Series Forecasting Using Conv1D-LSTM - Multiple Timesteps Into Future
6 pages
Predictive Maintenance in Oil & Gas
No ratings yet
Predictive Maintenance in Oil & Gas
35 pages
A Hybrid Method of Exponential Smoothing and Recurre - 2020 - International Jour
No ratings yet
A Hybrid Method of Exponential Smoothing and Recurre - 2020 - International Jour
11 pages
peerj-cs-2481
No ratings yet
peerj-cs-2481
32 pages
1 s2.0 S0925231220300606 Main
No ratings yet
1 s2.0 S0925231220300606 Main
11 pages
Slides PyConfr Bordeaux Calcagno
No ratings yet
Slides PyConfr Bordeaux Calcagno
46 pages
Azure Data Demystified: From SQL to Synapse
From Everand
Azure Data Demystified: From SQL to Synapse
Kameron Hussain
No ratings yet
Time Series Forecasting Using Deep Learning - MATLAB & Simulink
No ratings yet
Time Series Forecasting Using Deep Learning - MATLAB & Simulink
7 pages
Karthik 2022 J. Phys. Conf. Ser. 2161 012005
No ratings yet
Karthik 2022 J. Phys. Conf. Ser. 2161 012005
15 pages
Project
No ratings yet
Project
39 pages
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
632_iTransformer_Inverted_Tran
No ratings yet
632_iTransformer_Inverted_Tran
25 pages
An Overview and Comparative Analysis of Recurrent Neural Networks For Short Term Load Forecasting
No ratings yet
An Overview and Comparative Analysis of Recurrent Neural Networks For Short Term Load Forecasting
41 pages
Time Series Forecasting using RNNs: an Extended Attention Mechanism to Model Periods and Handle Missing Values
No ratings yet
Time Series Forecasting using RNNs: an Extended Attention Mechanism to Model Periods and Handle Missing Values
14 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Multi-Step Ahead Time Series Forecasting For Different Data Patterns Based On LSTM Recurrent Neural Network
No ratings yet
Multi-Step Ahead Time Series Forecasting For Different Data Patterns Based On LSTM Recurrent Neural Network
6 pages
T: I T A E T S F: I Ransformer Nverted Ransformers RE Ffective For IME Eries Orecasting
No ratings yet
T: I T A E T S F: I Ransformer Nverted Ransformers RE Ffective For IME Eries Orecasting
25 pages
On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras
100% (1)
On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras
34 pages
A Systematic Review For Transformer-Based Long-Term Series Forecasting
No ratings yet
A Systematic Review For Transformer-Based Long-Term Series Forecasting
30 pages
Time Series Forecasting With 2D Convolutions
No ratings yet
Time Series Forecasting With 2D Convolutions
33 pages
1.shiyang Li - Enhance Locality and Break The Memory Bottleneck
No ratings yet
1.shiyang Li - Enhance Locality and Break The Memory Bottleneck
14 pages
Time Series 1
No ratings yet
Time Series 1
3 pages
Building Trend Fuzzy Granulation-Based LSTM Recurrent Neural Network for Long-Term Time-Series Forecasting
No ratings yet
Building Trend Fuzzy Granulation-Based LSTM Recurrent Neural Network for Long-Term Time-Series Forecasting
15 pages
Time Series Forecasting of Petroleum
No ratings yet
Time Series Forecasting of Petroleum
11 pages
Time Series Forecasting With DeepAR
No ratings yet
Time Series Forecasting With DeepAR
6 pages
Real-time Analytics with Storm and Cassandra
From Everand
Real-time Analytics with Storm and Cassandra
Shilpi Saxena
No ratings yet
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
NARX Model1
No ratings yet
NARX Model1
11 pages
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Transformer-Based Deep Learning Models For The Sentiment Analysis of Social Media Data
No ratings yet
Transformer-Based Deep Learning Models For The Sentiment Analysis of Social Media Data
12 pages
A Comprehensive Review on Fake News Detection With Deep Learning
No ratings yet
A Comprehensive Review on Fake News Detection With Deep Learning
20 pages
IEEE2023 Transfer Learning Approach To IDS On Cloud IoT Devices Using Optimized CNN
No ratings yet
IEEE2023 Transfer Learning Approach To IDS On Cloud IoT Devices Using Optimized CNN
16 pages
Lec 01 Introduction
No ratings yet
Lec 01 Introduction
116 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
Automatic Music Generation
No ratings yet
Automatic Music Generation
16 pages
A Brief History of Artificial Intelligence: The First AI Winter
No ratings yet
A Brief History of Artificial Intelligence: The First AI Winter
4 pages
UNIT_4_DL
No ratings yet
UNIT_4_DL
31 pages
1 s2.0 S2772783124000256 Main
No ratings yet
1 s2.0 S2772783124000256 Main
9 pages
Group 9 Project Report
No ratings yet
Group 9 Project Report
51 pages
Optimising_Daily_Fantasy_Sports_Teams_with_Artific
No ratings yet
Optimising_Daily_Fantasy_Sports_Teams_with_Artific
15 pages
Creating_Alert_messages_based_on_Wild_Animal_Activ (2)
No ratings yet
Creating_Alert_messages_based_on_Wild_Animal_Activ (2)
16 pages
The Evolution of NLP
No ratings yet
The Evolution of NLP
81 pages
Bianchi
No ratings yet
Bianchi
62 pages
1 s2.0 S0301421522003226 Main
No ratings yet
1 s2.0 S0301421522003226 Main
13 pages
Malicious URL Detection Using Logistic Regression
No ratings yet
Malicious URL Detection Using Logistic Regression
6 pages
1 s2.0 S0360544221003145 Main
No ratings yet
1 s2.0 S0360544221003145 Main
12 pages
Forecasting of Solar and Wind Power Using
No ratings yet
Forecasting of Solar and Wind Power Using
14 pages
Video Clasification PDF
100% (1)
Video Clasification PDF
114 pages
Arabic Aspect Based Sentiment Analysis Using Bidirectional GRU
No ratings yet
Arabic Aspect Based Sentiment Analysis Using Bidirectional GRU
11 pages
Sudhakar An 2017
No ratings yet
Sudhakar An 2017
6 pages
Detection of Fake Online Reviews Using Semi-Supervised and Supervised Learning
No ratings yet
Detection of Fake Online Reviews Using Semi-Supervised and Supervised Learning
10 pages
Literature Review
No ratings yet
Literature Review
7 pages
NLP Notes
No ratings yet
NLP Notes
11 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
11 pages
S3M Siamese Stack (Trace) Similarity Measure
No ratings yet
S3M Siamese Stack (Trace) Similarity Measure
5 pages
Keras and Tensorflow
No ratings yet
Keras and Tensorflow
11 pages
NEURAL NETWORKS - Prediction of Admission in Pediatric Emergency Department
No ratings yet
NEURAL NETWORKS - Prediction of Admission in Pediatric Emergency Department
8 pages
BERT - PLI-Modeling Paragraph-Level Interactions For Legal Case Retrieval
No ratings yet
BERT - PLI-Modeling Paragraph-Level Interactions For Legal Case Retrieval
7 pages

LSTM Model Architecture for Rare Event Time Series Forecasting - MachineLearningMastery.com

Uploaded by

LSTM Model Architecture for Rare Event Time Series Forecasting - MachineLearningMastery.com

Uploaded by

 Navigation

GET STARTED BLOG TOPICS  EBOOKS FAQ ABOUT CONTACT 

LSTM Model Architecture for Rare Event Time Series Forecasting

Share Post Share

After reading this post, you will know:

Let’s get started.

This post is divided into four sections; they are:

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

Two existing approaches were described:

Need help with Deep Learning for Time Series?

Download Your FREE Mini-Course

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

Sliding Window Approach to Modeling Time Series

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

This is not surprising as it mirrors findings elsewhere.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

A more elaborate architecture was used, comprised of two LSTM models:

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

Overview of Feature Extraction Model and Forecast Model

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

The model is not retrained when making new forecasts.

Overview of Forecast Uncertainty Estimation

The specifics of the model evaluation were not specified.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

There are a large number of time series.

— Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

This is summarized well by a slide used in the presentation of the paper.

Time-series Extreme Event Forecasting with Neural Networks at Uber, 2017.

Specifically, you learned:

Do you have any questions?

Develop Deep Learning models for Time Series Today!

It provides self-study tutorials on topics like:

Finally Bring Deep Learning to your Time Series Forecasting Projects

SEE WHAT'S INSIDE

Share Post Share

More On This Topic

How to Use Features in LSTM Stateful and Stateless LSTM for

About Jason Brownlee

Valentin Nagacevschi November 2, 2018 at 6:33 pm # REPLY 

Jason Brownlee November 3, 2018 at 7:01 am # REPLY 

Valentin Nagacevschi November 3, 2018 at 4:52 pm # REPLY 

Jason Brownlee November 4, 2018 at 6:25 am # REPLY 

I may cover the topic in the future.

Kevin Van Horn May 23, 2019 at 7:25 am # REPLY 

Ian Downard November 7, 2018 at 4:45 am # REPLY 

Jason Brownlee November 7, 2018 at 6:13 am # REPLY 

Not at this stage. Although this might help as a start:

MANISH KUMAR November 17, 2018 at 12:01 pm # REPLY 

Jason Brownlee November 18, 2018 at 6:37 am # REPLY 

Perhaps start with something simpler, for example:

Juninho December 20, 2018 at 3:52 am # REPLY 

Jason Brownlee December 20, 2018 at 6:30 am # REPLY 

sophia December 24, 2018 at 12:09 pm # REPLY 

Jason Brownlee December 25, 2018 at 7:17 am # REPLY 

Thanks, you can learn more about LSTMs here:

Bob July 30, 2020 at 10:09 pm # REPLY 

I would like that too!

André January 9, 2019 at 1:35 pm # REPLY 

Jason Brownlee January 10, 2019 at 7:45 am # REPLY 

Savan Gowda January 10, 2019 at 10:52 pm # REPLY 

Thank you for the explanation of this paper.

inputs = Input(shape=(n_steps, input_dim))

decoder1 = CuDNNLSTM(32, return_sequences=True)(repeat)

Thanks & Regards

Jason Brownlee January 11, 2019 at 7:51 am # REPLY 

Savan Gowda January 11, 2019 at 10:19 pm # REPLY 

Jason Brownlee January 12, 2019 at 5:41 am # REPLY 

You can use a PCA to visualize high-dimensional vectors.

manish February 9, 2019 at 5:34 pm # REPLY 

Jason Brownlee February 10, 2019 at 9:40 am # REPLY 

All code and data is here:

MANISH KUMAR April 26, 2019 at 2:12 am # REPLY 

please give me the implementation with results of this data set

MANISH KUMAR February 10, 2019 at 1:41 am # REPLY 