0% found this document useful (0 votes)
11 views5 pages

Practical 9 - Time-Series Forecasting

The document outlines a practical exercise in time-series forecasting using the AirPassengers dataset, which contains monthly totals of international airline passengers from 1949 to 1960. It details various R functions for analyzing the dataset, including visualizations, trend analysis, and preprocessing steps to achieve stationarity before fitting an ARIMA model for predictions over the next ten years. The final predictions are converted back from logarithmic form to the original scale for interpretation.

Uploaded by

yashhmehtaa1807
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Practical 9 - Time-Series Forecasting

The document outlines a practical exercise in time-series forecasting using the AirPassengers dataset, which contains monthly totals of international airline passengers from 1949 to 1960. It details various R functions for analyzing the dataset, including visualizations, trend analysis, and preprocessing steps to achieve stationarity before fitting an ARIMA model for predictions over the next ten years. The final predictions are converted back from logarithmic form to the original scale for interpretation.

Uploaded by

yashhmehtaa1807
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Practical 9- Time-series forecasting

#The dataset consists of monthly totals of international airline passengers, 1949 to 1960.Main
aim is to predict next ten years.
#The dataset shows the number of passengers travelling on a flight for all the months in a year.
AirPassengers
View(AirPassengers)
#This tell us that the data series is in a time series format
class(AirPassengers) # This indicates that AirPassengers is an object of class "ts",
meaning it is a time series object in R.
str(AirPassengers)
start(AirPassengers)
end(AirPassengers)
# The start() and end() functions give the start and end times of a time series object. The
AirPassengers dataset contains monthly airline passenger numbers from 1949 to 1960.
frequency(AirPassengers)
# The function frequency(AirPassengers) in R returns the number of observations per unit time
in a time series object. This cycle of the time series is 12 months in a year.
summary(AirPassengers)
#Exploring the data
# Visualization of data
plot(AirPassengers)
 This command plots the AirPassengers dataset as a time series graph.
 The x-axis represents time (years from 1949 to 1960).
 The y-axis represents the number of airline passengers.
 The plot shows an increasing trend and seasonal fluctuations (higher passenger
numbers in certain months).
 Upward trend: The number of passengers increases over time.
 Seasonality: Recurring peaks indicate seasonal patterns in air travel.
 Variability increases: The fluctuations become larger as the number of passengers
increases.
#The abline() function can be used to add vertical, horizontal or regression lines to plot.
abline(reg=lm(AirPassengers~time(AirPassengers)))
 Adds a regression (trend) line to the existing plot(AirPassengers).
 Uses lm(AirPassengers ~ time(AirPassengers)) to fit a linear model where:
 AirPassengers is the dependent variable.
 time(AirPassengers) provides the time index as the independent variable.
 abline() then draws the fitted regression line on the plot.
 This trend line helps visualize the overall growth in airline passengers over
time.
 If the data shows an upward sloping line, it confirms an increasing trend in
air travel.
 Since AirPassengers exhibits seasonality and exponential growth, a simple
linear trend may not perfectly fit the data. For a better fit, consider log
transformation or exponential smoothing models

#cycle gives the positions in the cycle of each observation


cycle(AirPassengers)
 The cycle() function shows the position of each observation within a yearly cycle.
 Since AirPassengers is a monthly time series (frequency = 12), it returns values
from 1 to 12, representing January to December.
 Each row represents a year (1949–1960).
 Each column represents a month (1 = Jan, 12 = Dec).
 This confirms the dataset follows a monthly seasonal cycle.
#Check general trend.
plot(aggregate(AirPassengers,FUN=mean))
 Aggregates the AirPassengers dataset by year and computes the mean number of
passengers for each year.
 Plots the yearly mean values as a time series.
 The x-axis represents years (1949–1960).
 The y-axis represents the average number of passengers per year.
 The plot shows a clear increasing trend, meaning the number of airline passengers
increased over time.
#Let’s use the boxplot function to see any seasonal effects.
boxplot(AirPassengers~cycle(AirPassengers))
 Creates a boxplot to visualize the seasonal variation in air passengers across
months.
 Groups data by cycle(AirPassengers), which represents the month (1 = Jan, 12 =
Dec).
 Each box represents the distribution of passenger counts for that month across
years (1949–1960).
 Higher median values in mid-year months (June–August) → Peak travel season.
 Lower median values in early (Jan–Feb) and late (Nov–Dec) months → Off-
season travel.
 Variability (box height and whiskers) shows how passenger numbers fluctuate
within a month over the years.

#Preprocessing the data


acf(AirPassengers)
#spike crosses blue dotted line, data is not stationary
 Plots the Autocorrelation Function (ACF) for the AirPassengers dataset.
 Measures how current values of the time series are correlated with past values
(lags).
 Helps identify seasonality and trends.
 Lag 12 shows a strong correlation → Indicates yearly seasonality (passenger
numbers repeat patterns every 12 months).
 Gradual decline in correlations → Suggests a long-term trend (passengers
increasing over time).
 Significant spikes at multiples of 12 (24, 36, etc.) → Confirms seasonal patterns.
 The data is non-stationary, indicating the need for differencing or trend adjustment
for forecasting.
#Non-stationarity is a condition where the mean, variance, or autocorrelation of a time series
data change over time.
acf(log(AirPassengers)) # To make variance stationary
 This command takes the natural logarithm of the AirPassengers data to reduce the
exponential growth effect and make the time series more stationary.
 It then computes and plots the Autocorrelation Function (ACF) of the log-
transformed dataset.
 Stabilizes variance: The AirPassengers dataset exhibits exponential growth over
time. Applying a log transformation helps to stabilize the variance, making the series
easier to model.
 Makes trend less dominant: The log transformation helps reduce the effect of the
upward trend in the dataset, making it easier to focus on seasonal and autocorrelated
patterns.
 The ACF plot will likely show similar seasonal patterns (significant correlation at lag
12, 24, 36, etc.) but with reduced correlation at higher lags, as the transformation
mitigates the trend.
acf(diff(log(AirPassengers))) #q=1, c(p,d,q)
 diff(): Computes the first difference of the log-transformed series, which removes
the trend and makes the data more stationary by focusing on changes between
consecutive values.
 Trend Removal: The log transformation reduces the trend (growth), and differencing
eliminates any remaining trend or seasonality.
 Stationarity: Differencing helps in making the time series stationary, which is a
requirement for many time series models like ARIMA.
 p, d, q:
When analysing time series data with ARIMA models, "p" represents the
autoregressive component, "d" is the differencing order (how many times you
difference the data), and "q" is the moving average component.

pacf(diff(log(AirPassengers))) #p=0
#Here we can see that the first lag is significantly out of the limit and the second one
is also out of the significant limit but it is not that far so we can select the order of the
p as 0.
The command pacf(diff(log(AirPassengers))) in R is used to compute and
plot the Partial Autocorrelation Function (PACF) of the differenced
logarithm of the AirPassengers dataset. Here’s a breakdown of the steps:
1. log(AirPassengers)
o Takes the natural logarithm of the monthly international airline
passenger numbers (1949–1960) to stabilize variance (reduce
exponential growth effects).
2. diff(log(AirPassengers))
o Computes the first difference to make the time series stationary by
removing trends.
3. pacf(diff(log(AirPassengers)))
o Plots the PACF to analyze the lagged dependencies after controlling
for intermediate lags.

plot(diff(log(AirPassengers))) # stationary or constant Means and varince


# Auto Regression Integration moving Average (ARIMA) Model Fitting
(fit <- arima(log(AirPassengers), c(0, 1, 1),
seasonal = list(order = c(0, 1, 1), period = 12)))
pred <- predict(fit, n.ahead = 10*12) # In log form
#The above output prediction value are in logarithemic part,convert them to original form we
need to transform them.
pred1 <- round(2.718^pred$pred, 0) # Rounding value -> e = 2.718
 The command pred1 <- round(2.718^pred$pred, 0) is used to convert log-
transformed predictions back to the original scale.

pred1 # Prediction for next 10 Year(1961-1970)


ts.plot(AirPassengers,pred1, log = "y", lty = c(1,3))
#In above graph, dark(solid) line is original values and dotted are predicted values
#Get only 1961 values
data1<-head(pred1,12)
data1
#we are going to take a dataset till 1959, and then we predict value of 1960, then validate that
1960 from already existing value we have it in dataset
datawide <- ts(AirPassengers, frequency = 12, start=c(1949,1), end=c(1959,12))
datawide
#Create model
fit1 <- arima(log(datawide),c(0,1,1),seasonal = list(order=c(0,1,1),period=12))
pred <- predict(fit1,n.ahead=10*12) # predictfor now 1960 to 1970
pred1<-2.718^pred$pred
pred1 #give op of 1960 to 1970

You might also like