0% found this document useful (0 votes)
12 views16 pages

Autoregressive (AR) Model For Time Series Forecasting - GeeksforGeeks

The document provides a comprehensive guide on Autoregressive (AR) models for time series forecasting, detailing their mathematical foundation, types, and implementation steps using Python. It explains the significance of autocorrelation in model selection and evaluation, along with practical examples of predicting temperature data. Additionally, it discusses the benefits and drawbacks of AR models, emphasizing their applicability in stationary time series analysis and the importance of lag order selection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views16 pages

Autoregressive (AR) Model For Time Series Forecasting - GeeksforGeeks

The document provides a comprehensive guide on Autoregressive (AR) models for time series forecasting, detailing their mathematical foundation, types, and implementation steps using Python. It explains the significance of autocorrelation in model selection and evaluation, along with practical examples of predicting temperature data. Additionally, it discusses the benefits and drawbacks of AR models, emphasizing their applicability in stationary time series analysis and the importance of lag order selection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

AI ML DS Data Science Data Analysis Data Visualization Machine Learning Deep Learning NLP Compute

Autoregressive (AR) Model for Time Series


Forecasting
Last Updated : 13 Dec, 2023
Autoregressive models, often abbreviated as AR models, are a fundamental
concept in time series analysis and forecasting. They have widespread
applications in various fields, including finance, economics, climate science, and
more. In this comprehensive guide, we will explore autoregressive models, how
they work, their types, and practical examples.

Autoregressive Models
Autoregressive models belong to the family of time series models. These
models capture the relationship between an observation and several lagged
observations (previous time steps). The core idea is that the current value of a
time series can be expressed as a linear combination of its past values, with
some random noise.

Mathematically, an autoregressive model of order p, denoted as AR(p), can be


expressed as:

Where:

is the value at time t.

c is a constant.

are the model parameters.

are the lagged values.

represents white noise (random error) at time t.


Autocorrelation (ACF) in Autoregressive Models

Autocorrelation, often denoted as “ACF” (Autocorrelation Function), is a


fundamental concept in time series analysis and autoregressive models. It
refers to the correlation between a time series and a lagged version of itself. In
the context of autoregressive models, autocorrelation measures how closely
the current value of a time series is related to its past values, specifically those
at different time lags.

Here’s a breakdown of the concept of autocorrelation in autoregressive models:

Autocorrelation involves calculating the correlation between a time series


and a lagged version of itself. The “lag” represents the number of time units
by which the series is shifted. For example, a lag of 1 corresponds to
comparing the series with its previous time step, while a lag of 2 compares it
with the time step before that, and so on. Lag values help you calculate
autocorrelation, which measures how each observation in a time series is
related to previous observations.
The autocorrelation at a particular lag provides insights into the temporal
dependence of the data. If the autocorrelation is high at a certain lag, it
indicates a strong relationship between the current value and the value at
that lag. Conversely, if the autocorrelation is low or close to zero, it suggests
a weak or no relationship.
To visualize autocorrelation, a common approach is to create an ACF plot.
This plot displays the autocorrelation coefficients at different lags. The
horizontal axis represents the lag, and the vertical axis represents the
autocorrelation values. Significant peaks or patterns in the ACF plot can
reveal the underlying temporal structure of the data. Autocorrelation plays a
pivotal role in autoregressive models.
In an Autoregressive model of order p, the current value of the time series is
expressed as a linear combination of its past p values, with coefficients
determined through methods like least squares or maximum likelihood
estimation. The selection of the lag order (p) in the AR model often relies on
the analysis of the ACF plot.
Autocorrelation can also be used to assess whether a time series is
stationary. In a stationary time series, autocorrelation should gradually
decrease as the lag increases. Deviations from this behavior might indicate
non-stationarity.

Types of Autoregressive Models

AR(1) Model:
In the AR(1) model, the current value depends only on the previous value.
It is expressed as:

AR(p) Model:
The general autoregressive model of order p includes p lagged values.
It is expressed as shown in the introduction.

Implementing AR Model for predicting Temperature

Step 1: Importing Data


In the first step, we import the required libraries and the temperature dataset.

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set a random seed for reproducibility


np.random.seed(0)

# Load your temperature dataset with columns "Date" and "Temperature"


data = pd.read_excel('Data.xlsx')

# Make sure your "Date" column is in datetime format


data['Date'] = pd.to_datetime(data['Date'])

# Sorting the data by date (if not sorted)


data = data.sort_values(by='Date')

# Resetting the index


data.set_index('Date', inplace=True)

data.dropna(inplace=True)
The data is visualized in this step.

Python

# Visualize the data


plt.figure(figsize=(12, 6))
plt.plot( data['Temperature '], label='Data')
plt.xlabel('Date')
plt.ylabel('Temperature')
plt.legend()
plt.title('Temperature Data')
plt.show()

Output:

Step 2: Data Preprocessing


Now that we have our synthetic data, we need to preprocess it. We’ll create
lag features, split the data into training and testing sets, and format it for
modeling.

In the first step, the lag features are added to the data frame.
Then the rows with null values are completely removed.
The data is then split into training and testing datasets.
The input features and target variable are defined.
Python3

# Adding lag features to the DataFrame


for i in range(1, 6): # Creating lag features up to 5 days
data[f'Lag_{i}'] = data['Temperature '].shift(i)

# Drop rows with NaN values resulting from creating lag features
data.dropna(inplace=True)

# Split the data into training and testing sets


train_size = int(0.8 * len(data))
train_data = data[:train_size]
test_data = data[train_size:]

# Define the input features (lag features) and target variable

y_train = train_data['Temperature ']

y_test = test_data['Temperature ']

ACF Plot
The Autocorrelation Function (ACF) plot is a graphical tool used to visualize
and assess the autocorrelation of a time series data at different lags. The ACF
plot helps you understand how the current value of a time series is correlated
with its past values. You can create an ACF plot in Python using the plot_acf
function from the Stats models library.

Python3

from statsmodels.graphics.tsaplots import plot_acf


series = data['Temperature ']
plot_acf(series)
plt.show()

Output:
ACF Plot

The graph shows, the autocorrelation values for the first 20 lags. The plot
displays autocorrelation values at different lags, with lags on x-axis and
autocorrelation values on the y-axis. The graph helps us to identify the
significant lags where autocorrelation values are outside the confidence
interval (indicated by the shaded region).

We can observe a significant correlation from lag=1 to lag=4. We check the


correlation of the lagged values using the approach mentioned below:

Python3

data['Temperature '].corr(data['Temperature '].shift(1))

Output:

0.7997281316018658

Lag=1 provides us with the highest correlation value of 0.799. Similarly, we


have checked with lag= 2, 3, 4. For the shift set to 4, we get the correlation as
0.31.
Step 3: Model Building
We’ll build an autoregressive model using AutoReg model.

We import the required libraries to create the autoregressive model.


Then we train the autoregressive model on the train data.

Python

from statsmodels.tsa.ar_model import AutoReg


from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.api import AutoReg
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Create and train the autoregressive model


lag_order = 1 # Adjust this based on the ACF plot
ar_model = AutoReg(y_train, lags=lag_order)
ar_results = ar_model.fit()

Step 4: Model Evaluation


Evaluate the model’s performance using Mean Absolute Error (MAE) and Root
Mean Squared Error (RMSE).

We then make predictions using the AutoReg model and label it as y_pred.
MAE and RMSE metrics are calculated to evaluate the performance of
AutoReg model.

Python

# Make predictions on the test set


y_pred = ar_results.predict(start=len(train_data), end=len(train_data) + len(test_
#print(y_pred)

# Calculate MAE and RMSE


mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'Mean Absolute Error: {mae:.2f}')
print(f'Root Mean Squared Error: {rmse:.2f}')

Output:
Mean Absolute Error: 1.59
Root Mean Squared Error: 2.30

In the code, ar_results is an ARIMA model fitted to our time series data. To
make predictions on the test set, we use the predict method of the ARIMA
model. Here’s how it works:

start specifies the starting point for prediction. In this case, we start the
prediction right after the last data point in our training data, which is
equivalent to the first data point in our test set.
end specifies the ending point for prediction. We set it to the last data point
in our test set.
dynamic=False indicates that we are using out-of-sample forecasting. This
means that each forecasted point uses the true values of the previous
observations. This is typically used for model evaluation on the test set.
The predictions are stored in y_pred, which contains the forecasted values
for the test set.

Step 5: Visualization
Visualize the model’s predictions against the actual temperature data. Finally,
the predictions made by the AutoReg model are visualized using Matplotlib
library.

Actual Predictions Plot:

Python

# Visualize the results


plt.figure(figsize=(12, 6))
plt.plot(test_data["Date"] ,y_test, label='Actual Temperature')
plt.plot( test_data["Date"],y_pred, label='Predicted Temperature', linestyle='--')
plt.xlabel('Date')
plt.ylabel('Temperature')
plt.legend()
plt.title('Temperature Prediction with Autoregressive Model')
plt.show()
Output:

Forecast Plot:

Python

# Define the number of future time steps you want to predict (1 week)
forecast_steps = 7

# Extend the predictions into the future for one year


future_indices = range(len(test_data), len(test_data) + forecast_steps)
future_predictions = ar_results.predict(start=len(train_data), end=len(train_data)

# Create date indices for the future predictions


future_dates = pd.date_range(start=test_data['Date'].iloc[-1], periods=forecast_st

# Plot the actual data, existing predictions, and one year of future predictions
plt.figure(figsize=(12, 6))
plt.plot(test_data['Date'], y_test, label='Actual Temperature')
plt.plot(test_data['Date'], y_pred, label='Predicted Temperature', linestyle='--')
plt.plot(future_dates, future_predictions[-forecast_steps:], label='Future Predict
plt.xlabel('Date')
plt.ylabel('Temperature')
plt.legend()
plt.title('Temperature Prediction with Autoregressive Model')
plt.show()

Output:
Benefits and Drawbacks of Autoregressive Models

Autoregressive models (AR models) are a class of time series models that have
their own set of benefits and drawbacks. Understanding these can help in
choosing when to use them and when to consider alternative modeling
approaches.

Benefits of Autoregressive Models:

Simplicity: AR models are relatively simple to understand and implement.


They rely on past values of the time series to predict future values, making
them conceptually straightforward.
Interpretability: The coefficients in an AR model have clear interpretations.
They represent the strength and direction of the relationship between past
and future values, making it easier to derive insights from the model.
Useful for Stationary Data: AR models work well with stationary time
series data. Stationary data have stable statistical properties over time,
which is an assumption that AR models are built upon.
Efficiency: AR models can be computationally efficient, especially for short
time series or when you have a reasonable amount of data.
Modeling Temporal Patterns: AR models are good at capturing short-term
temporal dependencies and patterns in the data, which makes them
valuable for short-term forecasting.

Drawbacks of Autoregressive Models:


Stationarity Assumption: AR models assume that the time series is
stationary, meaning that its statistical properties do not change over time. In
practice, many real-world time series are non-stationary, requiring
preprocessing steps like differencing.
Limited to Short-Term Dependencies: AR models are not well-suited for
capturing long-term dependencies in data. They are primarily designed for
modeling short-term temporal patterns.
Lag Selection: Choosing the appropriate lag order (p) in an AR model can
be challenging. Selecting too few lags may lead to underfitting, while
selecting too many may lead to overfitting. Techniques like ACF and PACF
plots are used to determine the lag order.
Sensitivity to Noise: AR models can be sensitive to random noise in the
data. This sensitivity can lead to overfitting, especially when dealing with
noisy or irregular time series.
Limited Forecast Horizon: AR models are generally not suitable for long-
term forecasting as they are designed for capturing short-term
dependencies. For long-term forecasting, other models like ARIMA,
SARIMA, or machine learning models may be more appropriate.
Data Quality Dependence: The effectiveness of AR models is highly
dependent on data quality. Outliers, missing values, or data irregularities
can significantly affect the model’s performance.

Conclusion

Autoregressive (AR) models provide a powerful framework for analyzing and


forecasting time series data. We explored the fundamental concepts of AR
models, from understanding autocorrelation to fitting models and making
future predictions. By generating a simulated temperature dataset, we were
able to apply AR modeling. AR models are particularly useful when dealing
with stationary time series data, where past values influence future
observations. The choice of lag order is a crucial step, and it can be determined
by examining the Autocorrelation Function (ACF) plot.

As we demonstrated, AR models offer a practical approach to forecasting.


However, they have their limitations and are most effective when the
underlying data exhibits some degree of autocorrelation. For more complex
time series data, other models like ARIMA or SARIMA may be more
appropriate.

The ability to make accurate forecasts is a valuable asset in various domains,


from finance to economics and beyond. By mastering Autoregressive models
and understanding their applications, analysts and data scientists can make
informed decisions based on historical data, helping to anticipate future trends
and make better choices.

Three 90 Challenge is back on popular demand! After processing refunds worth INR
1CR+, we are back with the offer if you missed it the first time. Get 90% course fee
refund in 90 days. Avail now!

Are you passionate about data and looking to make one giant leap into your
career? Our Data Science Course will help you change your game and, most
importantly, allow students, professionals, and working adults to tide over into
the data science immersion. Master state-of-the-art methodologies, powerful
tools, and industry best practices, hands-on projects, and real-world
applications. Become the executive head of industries related to Data Analysis,
Machine Learning, and Data Visualization with these growing skills. Ready to
Transform Your Future? Enroll Now to Be a Data Science Expert!

E excit… Follow 1

Next Article
Python | ARIMA Model for Time Series
Forecasting

Similar Reads
Difference Between Autoregressive And Non-Autoregressive Models
In the realm of natural language processing (NLP) and time series analysis, two
fundamental approaches for generating sequences are autoregressive (AR)…
5 min read
Python | ARIMA Model for Time Series Forecasting
A Time Series is defined as a series of data points indexed in time order. The time
order can be daily, monthly, or even yearly. Given below is an example of a Time…
5 min read

Time Series Forecasting using Recurrent Neural Networks (RNN) in…


Time Series Data: Each data point in a time series is linked to a timestamp, which
shows the exact time when the data was observed or recorded. Many fields,…
14 min read

Time Series Forecasting using Pytorch


Time series forecasting plays a major role in data analysis, with applications
ranging from anticipating stock market trends to forecasting weather patterns. I…
12 min read

Random Forest for Time Series Forecasting using R


Random Forest is an ensemble machine learning method that can be used for
time series forecasting. It is based on decision trees and combines multiple…
7 min read

Univariate Time Series Analysis and Forecasting


Time series data is one of the most challenging tasks in machine learning as well
as the real-world problems related to data because the data entities not only…
15+ min read

Multivariate Time Series Forecasting with GRUs


Multivariate forecasting steps up as a game-changer in business analysis,
bringing a fresh perspective that goes beyond the limits of one-variable…
9 min read

Multivariate Time Series Forecasting with LSTMs in Keras


Multivariate forecasting entails utilizing multiple time-dependent variables to
generate predictions. This forecasting approach incorporates historical data whil…
8 min read

Time Series and Forecasting Using R


Time series forecasting is the process of using historical data to make predictions
about future events. It is commonly used in fields such as finance, economics, an…
9 min read

TIme Series Forecasting using TensorFlow


TensorFlow emerges as a powerful tool for data scientists performing time series
analysis through its ability to leverage deep learning techniques. By incorporatin…
8 min read

Article Tags : AI-ML-DS Data Analysis

Corporate & Communications Address:- A-


143, 9th Floor, Sovereign Corporate Tower,
Sector- 136, Noida, Uttar Pradesh (201305)
| Registered Address:- K 061, Tower K,
Gulshan Vivante Apartment, Sector 137,
Noida, Gautam Buddh Nagar, Uttar
Pradesh, 201305
Company Explore
About Us Job-A-Thon Hiring Challenge
Legal Hack-A-Thon
Careers GfG Weekly Contest
In Media Offline Classes (Delhi/NCR)
Contact Us DSA in JAVA/C++
Advertise with us Master System Design
GFG Corporate Solution Master CP
Placement Training Program GeeksforGeeks Videos
Geeks Community

Languages DSA
Python Data Structures
Java Algorithms
C++ DSA for Beginners
PHP Basic DSA Problems
GoLang DSA Roadmap
SQL DSA Interview Questions
R Language Competitive Programming
Android Tutorial

Data Science & ML Web Technologies


Data Science With Python HTML
Data Science For Beginner CSS
Machine Learning Tutorial JavaScript
ML Maths TypeScript
Data Visualisation Tutorial ReactJS
Pandas Tutorial NextJS
NumPy Tutorial NodeJs
NLP Tutorial Bootstrap
Deep Learning Tutorial Tailwind CSS

Python Tutorial Computer Science


Python Programming Examples GATE CS Notes
Django Tutorial Operating Systems
Python Projects Computer Network
Python Tkinter Database Management System
Web Scraping Software Engineering
OpenCV Tutorial Digital Logic Design
Python Interview Question Engineering Maths

DevOps System Design


Git High Level Design
AWS Low Level Design
Docker UML Diagrams
Kubernetes Interview Guide
Azure Design Patterns
GCP OOAD
DevOps Roadmap System Design Bootcamp
Interview Questions

School Subjects Commerce


Mathematics Accountancy
Physics Business Studies
Chemistry Economics
Biology Management
Social Science HR Management
English Grammar Finance
Income Tax

Databases Preparation Corner


SQL Company-Wise Recruitment Process
MYSQL Resume Templates
PostgreSQL Aptitude Preparation
PL/SQL Puzzles
MongoDB Company-Wise Preparation
Companies
Colleges

Competitive Exams More Tutorials


JEE Advanced Software Development
UGC NET Software Testing
UPSC Product Management
SSC CGL Project Management
SBI PO Linux
SBI Clerk Excel
IBPS PO All Cheat Sheets
IBPS Clerk Recent Articles

Free Online Tools Write & Earn


Typing Test Write an Article
Image Editor Improve an Article
Code Formatters Pick Topics to Write
Code Converters Share your Experiences
Currency Converter Internships
Random Number Generator
Random Password Generator

@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

You might also like