Arima Notes

arima notes

Uploaded by

gaurav kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

18 views4 pages

Arima Notes

arima notes

Uploaded by

gaurav kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 4

Time Series Forecasting Time Series forecasting is the process of using a statistical model to predict future values of a time series based on past results ‘Some Use Cases To predict the number of incoming or churning customers. To explaining seasonal patterns in sales. To detect unusual events and estimate the magnitude of their effect. To Estimate the effect of a newly launched product on number of sold units ARIMA ARIMA stands for Auto-Regressive Integrated Moving Average. There are three integers (p, 4d, q) that are used to parametrize ARIMA models. Because of that, a nonseasonal ARIMA model is denoted with ARIMA(p, d, q): pis the number of autoregressive terms (AR part). It allows to incorporate the effect of past values into our model. Intuitively, this would be similar to stating that likely to be warm tomorrow if it has been warm the past 3 days. dis the number of nonseasonal differences needed for stationarity. Intuitively, this would be similar to stating that ely to be same temperature tomorrow if the difference in temperature in the last three days has been very small. gis the number of lagged forecast errors in the prediction equation (MA part). This allows us to set the error of our model as a linear combination of the error values observed at previous time points in the past. When dealing with seasonal effects, as in our example, seasonal ARIMA is used, which is denoted as ARIMA(p.d,q)(P.,Q)s. Here, (p, d, q) are the nonseasonal parameters described above, (P, D, Q) follow the same definition but are applied to the seasonal component of the time series. The term s is the periodicity of the time series.Mathematically, an autoregressive model of order p, denoted as AR(p), can be expressed as Xp = et Xt + boXp9 t+... + bpXi-p t Where: e X; is the value at time t. © cisaconstant. » bp are the model parameters. +, X-p are the lagged values. © €; represents white noise (random error) at time #. Autocorrelation involves calculating the correlation between a time series and a lagged version of itself. The “lag” represents the number of time units by which the series is shifted For example, a lag of 1 corresponds to comparing the series with its previous time step, while a lag of 2 compares it with the time step before that, and so on. Lag values help you calculate autocorrelation, which measures how each observation in a time series is related to previous observations. The autocorrelation at a particular lag provides insights into the temporal dependence of the data. If the autocorrelation is high at a certain lag, it indicates a strong relationship between the current value and the value at that lag. Conversely, if the autocorrelation is low or close to zero, it suggests a weak or no relationship To visualize autocorrelation, a common approach is to create an ACF plot. This plot displays the autocorrelation coefficients at different lags. The horizontal axis represents the lag, anc the vertical axis represents the autocorrelation values. Significant peaks or patterns in the ACF plot can reveal the underlying temporal structure of the data, Autocorrelation plays @ oivotal role in autoregressive models In an Autoregressive model of order p, the current value of the time series is expressed as @ linear combination of its past p values, with coefficients determined through methods like least squares or maximum likelihood estimation. The selection of the lag order (p) in the AR model often relies on the analysis of the ACF plot Autocorrelation can also be used to assess whether a time series is stationary. Ina stationary time series, autocorrelation should gradually decrease as the lag increases. Deviations from this behavior might indicate non-stationarity import pandas as pd import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.ar_nodel import AutoReg from sklearn.metrics import mean_absolute_error, mean_squared_error# Load the CSV file, skipping the first row if it contains metadata Gf = pd.read_csv(r'C: \Users\kanis\Downloads\international-airline-passengers.csv') 4 Rename columns if necessary df.columns = ['Month', ‘Passenger ] # Convert the ‘Month’ column to datetime format Gf['Month'] = pd.to_datetime(df['Month'], format='%Y-%n' ) # Set the ‘Month’ column as the index Gf.set_index(‘Month', inplace=True) # Split data into train and test sets (80% train, 20% test) split_point = int(len(df) * @.8) train_data, test_data = df.iloc[:split_point], df.iloc[split_point:] # Fit the AR model ‘ar_model = AutoReg(train_data['Passenger'], lags=5) # Adjust the Lag order as neec ar_results = ar_model.fit() C:\Users\kanis\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model..py:473; ValueWarning: No frequency information was provided, so inferred frequency MS will be used. self._init_dates(dates, freq) # Make predictions on the test set y_pred = ar_results.predict(start=len(train_data), end=len(train_ data) + len(test_