0% found this document useful (0 votes)
28 views12 pages

Dynamic Ticket Pricing of Airlines Using Variant Batch Size Interpretable Multi-Variable Long Short-Term Memory

This study introduces a novel Interpretable Multi-Variable Long Short-Term Memory (IMV-LSTM) model for dynamic ticket pricing in airlines, incorporating previously unused features such as cost available seat kilometer (CASK) and target revenue. By analyzing real data from a low-cost airline in Turkey, the model demonstrates improved statistical evaluation scores compared to conventional pricing models, addressing the limitations of human judgment in ticket pricing. The proposed approach can enhance revenue management strategies for airlines and is applicable to other businesses with similar data structures.

Uploaded by

donalpcollins
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views12 pages

Dynamic Ticket Pricing of Airlines Using Variant Batch Size Interpretable Multi-Variable Long Short-Term Memory

This study introduces a novel Interpretable Multi-Variable Long Short-Term Memory (IMV-LSTM) model for dynamic ticket pricing in airlines, incorporating previously unused features such as cost available seat kilometer (CASK) and target revenue. By analyzing real data from a low-cost airline in Turkey, the model demonstrates improved statistical evaluation scores compared to conventional pricing models, addressing the limitations of human judgment in ticket pricing. The proposed approach can enhance revenue management strategies for airlines and is applicable to other businesses with similar data structures.

Uploaded by

donalpcollins
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Expert Systems With Applications 175 (2021) 114794

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Dynamic ticket pricing of airlines using variant batch size interpretable


multi-variable long short-term memory
Ismail Koc, Emel Arslan *
Department of Computer Engineering, Istanbul University-Cerrahpasa, Avcilar, Istanbul, 34320, Turkey

A R T I C L E I N F O A B S T R A C T

Keywords: Research of airlines shows that seat inventory control and therefore, revenue management is based not on a
Air Transportation systematic analysis but more on human judgement. Machine learning models have been developed and applied
Dynamic Ticket Pricing to support decisions for ticket pricing dynamically. However, conventional models and approaches yield low
Neural Networks
statistical evaluation scores. In this study, the features used in other studies were explored and the cost available
Deep learning
Long short term memory
seat kilometer (CASK) value and target revenue features were included for the first time to the best of our
Forecasting knowledge which are essential components of ticket price decision. Real data from a low-cost carrier airline in
Turkey were collected and the observation data were splitted into two to study with the highest profit sale data.
Then the outliers were filtered to let the models learn from and generate better price offerings businesswise.
Observation datasets obtained in each step were recorded to be tested. 7 different model techniques were
simulated and tested with 4 different datasets according to 6 different statistical evaluation criteria. A new
approach to Interpretable Multi-Variable Long Short-Term Memory (IMV-LSTM) model was proposed by taking
every flight and its sales as an independent series, that is to assign a dynamic batch size. Extensive experiments
on real datasets reveal enhanced statistical evaluation scores by using the proposed approach and model. The
proposed model can be used by the airlines to mitigate human judgement on ticket pricing, to manage their price
offerings to reach their target revenues and to increase their profits. The model can be used by other business
cases that have similar historical data as overlapping windows structure.

1. Introduction an average of 2 percent in the last two decades. Saxon and Weber (2017)
have stated that larger aircraft, new technologies and more and more
Airline managers need to match their supply of the seating capacity efficient operations reduce the cost of an airline company. Cook (2014)
which they can fully control, with the demand that they cannot control. has emphasized that airlines do best practice in pricing and revenue
The definition of revenue management can be selling the product to the management because they have invested heavily in developing
right customer at the right time and the right price in order to make advanced systems to forecast demand, monitor and respond to market
maximum profit from a limited capacity product that is the seat in­ competitors’ prices, and manage inventory availability which serves
ventory. Revenue management is a discipline that has attracted signif­ them well in the pursuit of competitive advantage and higher yields.
icant attention in recent years, especially with the increasing Brons, Pels, Nijkamp, & Rietveld (2002) reported that due to various
competition in the aviation industry, low-cost airlines entering the problems such as data availability on prices, the number of passengers
market, continuous advancement of decision support systems and and so on estimating price elasticities in aviation could, however, be
computer science. Many solution models and approaches have been rather tricky. There are some commercial products available in the
developed by the researchers since (Littlewood, 1972) has first intro­ market to support the price setting of the airlines. Nevertheless, these
duced a method. products only implement various algorithms introduced in the literature
In recent years, the airline industry experienced a high growth rate. given the past data as input. Current application research of airlines
However, profitability remained marginal. Determining the cost differ­ shows that seat inventory control is based not on a systematic analysis
ences is a critical issue for airlines. Ticket prices have been decreasing but more on human judgement. However, under the same circum­
throughout the industry’s known history and continue to decrease with stances, the price assigned to a route by revenue managers may differ.

* Corresponding author.
E-mail addresses: [email protected] (I. Koc), [email protected] (E. Arslan).

https://doi.org/10.1016/j.eswa.2021.114794
Received 29 June 2020; Received in revised form 24 February 2021; Accepted 24 February 2021
Available online 3 March 2021
0957-4174/© 2021 Elsevier Ltd. All rights reserved.
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

The appointed price to the same flight may also vary according to within 90 days before the departure date. Li & Li (2018) proposed a
different revenue managers. The price may change according to combination of ARMA with a random forest algorithm. Abdella, Zaki, &
competitor price changes which makes it more challenging to manage Shuaib (2019) surveyed airline ticket price and prediction studies and
seat capacity and load factor. Panic price changes may cause early price tabulated the summary of his survey. Chung, Ma, Hansen, & Choi (2020)
reduction and reduces the total revenue of the flight. Lack of necessary made a study about the usage of data science and analytics in four
features that contribute ticket price assignment and conventional different areas, including Machine Learning.
models and approaches generate low statistical evaluation scores. Tziridis, Kalampokas, Papakostas, & Diamantaras (2017) has studied
This paper deals with this problem by considering the costs and with eight states of the art Machine Learning algorithms and compared
employs a decision support system by using Machine Learning Algo­ these results. Artificial Neural Networks (ANN) is an algorithmic theory
rithms to produce some useful information regarding the price offering of Machine Learning that mimics the human brain based on experience
intervals and the trends for data analysis and revenue managers, which and knowledge. Commonly used in business, ANN is a powerful fore­
can be used to maximize the profit. In this work, the data have been casting tool and can provide better predictions to solve complex prob­
collected from various airlines operating in Turkey. The previous re­ lems than traditional models.
searches have been reviewed, and a different approach is being intro­ Recently, many powerful models have also been introduced. Panda
duced, which yields better results. & Panda (2020) introduced backpropagation learning rules to accelerate
The rest of this paper is organized as follows. Section 2 presents a model training. Altan, Karasu, & Bekiros (2019) developed a hybrid
literature review on airline ticket pricing and the usage of Machine forecasting model based on Long Short-Term Memory (LSTM) neural
Learning algorithms in these studies. Section 3 analyzes the features to network and Empirical Wavelet Transform (EWT) decomposition along
be used as input to the model. Section 4 describes the materials and with Cuckoo Search (CS) algorithm. Karasu, Altan, Bekiros, & Ahmad
models used and the results. Section 5 has the concluding remarks and (2020) studied with a new forecasting model based on Support Vector
future suggestions. Regression (SVR) with a wrapper-based feature selection approach using
a multi-objective optimization technique. Altan and Karasu (2020)
2. Literature review developed a hybrid model consisting of two-dimensional (2D) curvelet
transformation, Chaotic Salp Swarm Algorithm (CSSA) and deep
Airlines forecast demand for adjusting prices is based on fixed seat learning technique to determine the patient infected with coronavirus
inventory. Varedi (2010) has underlined the difficulty in forecasting pneumonia from
passenger demand since the knowledge about the customers’ prefer­ X-ray images. Qin et al. (2017), introduced the Dual-Stage Attention-
ences is limited, and the business environment rapidly changes. Chiou based Recurrent Neural Network (DA-RNN). RNNs are a type of ANNs
and Liu (2016) has studied the advance purchase behaviors of air pas­ which considers the current state as well as the input. Using two
sengers and noted that this behavior might shift by the dynamic pricing different attention mechanisms at the encoder and decoder, the DA-RNN
strategies of the airlines. adaptively selects the most relevant input features and captures the
According to Groves and Gini (2011), the studies that have been long-term temporal dependencies of a time series. Qin et al. (2017)
done in this area to develop mathematical methods have focused on compared the DA-RNN findings with ARIMA, NARX RNN and Encode-
large-scale optimization models and simplified illustrations of the Decoder algorithms and showed that DA-RNN outperforms others.
problem. Therefore, there is a need for practical solution approaches LSTMs were introduced as a solution for vanishing gradient problem
involving quantitative decision tools. On the other hand, many studies since they can learn the long-term dependencies. Guo, Lin, & Antulov-
have been carried out to maximize revenues. Fantulin (2019) introduced the Interpretable Multi-Variable Long
Since the successful management of the effective yield may consider Short-Term Memory (IMV-LSTM) and reported that it outperforms
increasing the revenues, many airlines have considered improving every structural time series models, ARIMAX, Random Forests, Extreme
aspect of the seat inventory control process. In what has become a Gradient Boosting, Elastic-Net, RNN and DA-RNN algorithms.
competition to find better ways to manage the sale of their seat in­ A significant number of current research works has proposed the
ventories, airlines are expanding and reorganizing the departments prediction models for the dynamic pricing for airlines. To train a pre­
responsible, upgrading reservation systems, and developing sophisti­ dictor model, multiple determinants can be used as exogenous time se­
cated decision support systems. The yield management in the airline ries to be input to the model to predict the target series. Papadakis
industry is in a transitional phase, evolving from art that relies almost (2012), counted some features but considered only vacant seats,
exclusively on human expertise to science that employs more systematic competition on the considered route, number of days to departure, the
analysis and decision techniques Groves and Gini (2011). price changes on the ticket, and the ticket price fluctuations. McHugh
Wen & Chen (2017) underlined the high elasticity of the low-cost (2018) used the origin airport, destination airport, day of the week,
airlines’ fares. For this reason, these operators sometimes provide week of the year, duration, days remaining until departure. Varedi
ticket promotions, and therefore fares may vary every day of the week. (2010), used data provided by North American airline which includes
Brons et al. (2002) have discussed a range of economic, demographic, the origin and the destination airport, flight number, flight departure
and geographical determinants of passenger air transport demand and date, flight departure time, aircraft capacity, snapshot-date, the count of
its associated price elasticity. bookings per day, total cancellations, total bookings for regular cus­
Machine Learning is one of the most important research topics in tomers. Tziridis et al. (2017) considered the features departure time,
computer science and engineering, which can be applied to many arrival time, free baggage number (0, 1 or 2), number of days to de­
different disciplines. It provides a set of algorithms, methods and tools parture, number of transfers, flight date is a holiday or not, the flight is
that make it possible to embody intelligence in machines. The strength overnight or not, day of the week. Brons et al. (2002) discussed various
of the Machine Learning is modelling tools provided through a learning economic, demographic, and geographical determinants related to
procedure that can be trained with a set of data describing a particular passenger air transport demand and price flexibility.
problem and respond in a traditional way to unseen data. Chen, Cao, In this study, we have considered the mentioned models and de­
Feng, & Tan (2015) studied with three models using observations relates terminants. Additionally, we have also included the cost available seat
to 110 days and resulted that Learn++.NSE outperforms KNN and kilometer (CASK) value and target revenues feature, which have not
Passive-Aggressive. Lantseva, Mukhina, Nikishova, Ivanov, & Knyazkov been considered in previous studies. The observation data split into two
(2015) used an empirical data-driven Regression Model were 75 and 90 to mitigate human judgement and let the model learn from the best sale
days to departure data collected from two different Global Distribution performances to generate a better result businesswise. We have studied
Systems. The model predicts the price per kilometer for a given flight Autoregressive Integrated Moving Average (ARIMA), Random Forests,

2
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

Fig. 1. The cumulative income graph example for a low season.

Fig. 2. The cumulative income graph example for a high season.

Fig. 3. RASK-CASK value comparisons of routes from İstanbul.

3
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

Table 1
Features used for dynamic ticket pricing.
McHugh Varedi Tziridis et al. Mottini, & Acuna-Agost, Airline Papadakis This
(2018) (2010) (2017) (2017) Webs (2012) paper

Origin airport
Destination airport
Day of week
Week of year
Duration
Days to departure
The sequence of previously sold
seats
The sequence of previous prices
The sequence of price changes
Departure Time
Arrival Time
Holiday
Price
Available Seat Capacity
Competitor Price
CASK
Target Revenue

Extreme Gradient Boosting, Deep Neural Network (DNN), Cellular Maintenance, Insurance), and IOC (Indirect Operating Costs: Buildings
Neural Network (CNN), and IMV-LSTM algorithms. Various hyper­ Equipment, Transport, Ground staff…and so on.). From all these pa­
parameters are used to model an LSTM. One of these parameters is the rameters, the airlines calculate their CASK (Cost per Available Seat
batch size, where accuracy is expected to be higher if this parameter is Kilometer) value for a flight. Although the supply is the available seat
taken more significant. Due to the nature of airline ticket sales, historical capacity, the load factor itself is a function of price. Therefore, the
data is daily time series with overlapping windows. We propose a new revenue system should cover all these costs, and in addition, it should
approach to the model as Variant Batch Size IMV-LSTM (VBS IMV- yield a contribution amount. The sum of cost and the contribution
LSTM) to address this issue by taking every flight and its sales as inde­ amount to determinants the Target Revenue. This amount is calculated
pendent batch to assign a dynamic batch size to the model. Although the according to the expected yearly performance, and therefore, it can
batch size number gets low, we got better results. happen that this amount may be below zero during low seasons.
Two examples are given in Figs. 1 and 2, where a cumulative income
3. Materials and methods of a single flight of low and high seasons 61 days prior to the departure
date is calculated. The size of the dots denotes the number of seats that
3.1. Materials have been sold. The coloured dots indicate the current amount of the
cumulative revenue of that flight. The red dots mean that cumulative
Mainly, the airline costs are classified under DOC (Direct Operating income is under IOC, the yellow dots represent that cumulative income
Costs: Handling, Landing, Fuel, Enroute), ACMI (Aircraft, Crew, is under DOC, the blue dots mean that cumulative income is under

Fig. 4. Sales graph low season.

4
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

Fig. 5. Sales graph high season.

ACMI, the green dots indicate that cumulative income is under Target below, adding some cost and revenue related features. We have studied
Revenue and the purple dots mean that cumulative income is above only ISTANBUL-DUSSELDORF (IST - DUS) route. There is only one flight
Target Revenue. daily, so there is no need to consider departure time, arrival time, and
In general, the high rate of the total revenues for a specific route does the duration of the flight.
not imply that it is the most profitable route. In Fig. 3, the blue line
represents the RASK (Revenue per Available Seat Kilometer), the red • Origin Airport, (İstanbul-Turkey)
line represents the CASK, and the green line represents the difference • Destination Airport, (Dusseldorf-Germany)
between RASK and CASK. (These graphics had been drawn by using the • Day of the week,
data collected from the flights from Istanbul to 17 other cities/airports.) • Week of the year,
It can be seen from Fig. 3 that Odessa flights have the highest reve­ • Days to Departure, (Sale Date – Flight Date)
nue, but Bodrum has the highest profit. For this reason, the CASK data • Is Holiday, (Yes or No)
are also considered in this study for profit maximization rather than • Price, (Since there is only one flight in a day, ticket prices are taken
revenue maximization. In light of the studies mentioned above in the into consideration)
Literature Review section, it is appropriate to use the regressors listed • Available Seat Capacity,

Table 2
A sample observation data for the flight date June 20, 2017.
Day of Week of Days to Class Available Price CASK Target Is Price Price Price
Week Year Flight Capacity Value Revenue Holiday THY Atlas Pegasus

3 25 42 82 217 33.96 0.04 83 0 12.99 58 15.48


3 25 40 82 213 19.41 0.04 83 0 13.02 58 22.17
3 25 35 82 210 35.05 0.04 83 0 13.02 0 22.17
3 25 26 79 209 33.14 0.04 83 0 13.12 58 22.34
3 25 25 79 208 32.00 0.04 83 0 13.12 58 22.34
3 25 22 79 206 23.74 0.04 83 0 13.12 58 22.34
3 25 21 79 200 32.43 0.04 83 0 13.12 58 22.34
3 25 19 79 197 32.23 0.04 83 0 12.98 58 21.20
3 25 18 79 195 32.90 0.04 83 0 12.98 58 21.20
3 25 17 79 191 38.25 0.04 83 0 12.98 58 21.20
3 25 14 79 184 46.15 0.04 83 0 0.00 0 0.00
3 25 12 79 183 37.38 0.04 83 0 14.00 58 23.13
3 25 11 79 181 32.83 0.04 83 0 14.00 58 23.13
3 25 10 88 174 41.91 0.04 83 0 14.00 58 23.13
3 25 10 79 168 21.99 0.04 83 0 14.00 58 23.13
3 25 9 82 165 18.89 0.04 83 0 14.00 58 23.13
3 25 8 88 163 41.56 0.04 83 0 14.00 58 23.13
3 25 8 82 160 22.20 0.04 83 0 14.00 58 23.13
3 25 7 82 157 24.81 0.04 83 0 14.00 58 23.13
3 25 6 82 147 20.09 0.04 83 0 14.05 58 23.27
3 25 6 79 135 40.92 0.04 83 0 14.05 58 23.27
3 25 5 82 134 20.71 0.04 83 0 14.05 58 23.27
3 25 4 79 129 37.56 0.04 83 0 14.05 58 23.27
3 25 3 79 124 31.89 0.04 83 0 14.05 58 23.27
3 25 2 82 116 27.44 0.04 83 0 14.05 58 23.27
3 25 1 82 109 25.10 0.04 83 0 14.05 58 23.27
3 25 1 79 108 34.14 0.04 83 0 14.05 58 23.27
3 25 1 88 106 46.37 0.04 83 0 14.05 58 23.27
3 25 0 88 87 38.15 0.04 83 0 14.05 58 23.27

5
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

Table 3
Augmented Dickey-Fuller (unit root) test.
Raw Without Outliers

ADF Statistic: − 3.496144 − 3.523462


p-value: 0.008087 0.007405
Critical Values:
1%: − 3.431 − 3.431
5%: − 2.862 − 2.862
10%: − 2.567 − 2.567

the series. Often the series analysis technique includes some form to
filter the noise to make the design more prominent (Hill, O’Connor, &
Remus, 1996).
In order to be able to narrow the price prediction interval to be
offered to revenue managers, the outlier price offerings are identified.
There are different reasons for the price of a ticket to be an outlier. The
Global Distribution System (GDS) sales and, group sales are examples of
Fig. 6. The outliers’ graph. such sales. A sample data is given in Table 2 for the flight date June 20,
2017. Two new datasets are prepared for the study called Dataset20 and
• Competitor1,2,3 Price, (Three other airlines are competing for this Dataset10. Dataset20 is the observations that the price difference to
route. They have more than one cycles for a given date. Their average trendline is less than 20 percent of its value. Dataset10 is the observa­
weekly minimum is taken as a feature. There are two airports in tions that the price difference to trendline is less than 20 percent of its
Istanbul, and the departures from both airports are considered. value. For the given month, as in Fig. 6, the red dots are excluded from
• CASK Value: Cost per Available Seat Kilometer for the given route Raw to obtain Dataset20 and green dots are excluded from Dataset20 to
and date obtain Dataset10. The two new datasets have 10,684 and 7,294 obser­
• Target Revenue: The target revenue value vations, respectively.
The high fluctuations in the data prove the difficulty of the seat and
The features used in other studies and our paper are summarized in price management till to the departure date. To reduce subjectivity, the
Table 1, where, in particular, the last two features consider the Cost and trend is studied by smoothing the data. Smoothing is useful as a data
Target Revenue which, had not been considered in the previous litera­ preparation technique. It is used to reduce the random variation in the
ture studies to the best of our knowledge. observations, to reduce the volatility and to reduce the noise, so that we
A low-cost carrier airline, based in Turkey, real data is used for this can better reveal the structure of the underlying causal processes. The
study. The competitor weekly ticket prices have been obtained and used. moving average is used as a data preparation technique to create a
The observation data consists of 24,264 rows. Figs. 4 and 5 are two smoothed version of the original dataset.
examples that represent the sales starting from the 61 days before the Before applying the smoothing to our series, we applied the
departure date. The size of the dots denotes the number of seats sold. The Augmented Dickey-Fuller (unit root) test to check if our data is sta­
Figs. 4 and 5 had been obtained by using the same data which has been tionary. Etienne (2019) defines stationarity as the property of exhibiting
used to get Figs. 1 and 2. The trendline of degree 5 is also drawn and is constant statistical properties such as mean, variance and autocorrela­
expected to steadily increase continuously as the flight date approaches, tion. Unpredictable results can be obtained if there exist unit roots in
instead of the waveform is shown. The waveforms occur due to rapid time series analysis. Table 3 shows the results of both datasets. Since the
response to competitor price changes or due to panic price positioning, p-value is less than the specified significance level (0.01), we can reject
which may cause a considerable amount of revenue loss. the null hypothesis.
The study is based on the data collected between 15 June 2017 and We have used Locally Weighted Smoothing (Loess) with windows
12 June 2019. The data of the airline belongs to IST - DUS route and only size of 5, that is, the transformed value at a time (t) is calculated as the
one-way tickets are taken into consideration. For the same route, there mean value for the previous five observations. The dataset with the
are three competitors, named as Turkish Airlines, Atlas Global, and smoothed price values is called Smoothed for the rest of this paper.
Pegasus, and they all have more than one cycles per day. Since the To illustrate the difference evidently, the last 400 predictions of the
competitor prices are the weekly average minimums, these minimum raw and smoothed data is shown in Figs. 7 and 8, respectively.
prices of each competitor are selected as the features. Previous studies have been studied, and statistical evaluation criteria
were used to evaluate the developed models.
These criteria consist of RMSE, a10-index, a20-index, RMSPE, and
3.2. Preprocessing Accuracy given by the following equations:
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
( )

The collected data mentioned in Materials section is called Raw for √
√∑ y
̂ − y i 2
the rest of this paper. To generate a better model businesswise, we split √ n i
RMSE = (1)
the observation data into two to train our model with the highest profit i=1 n
sale data. For example, for every month, we have taken Monday flights,
sorted, and compared them. We have considered the best two weeks (or a10 − index =
m10
(2)
three weeks if the third-best week profit figure is closer to second rather M
than forth). We have done this to every day of the week and every month
m20
of the year of the observation data. Although fewer observation data a20 − index = (3)
M
remained to train and test the models, the intention is to produce a
better decision support system with a better price offering. The splitted ∑
(yi − ̂y )2
data consists of 12.575 observations. R2 = 1 − ∑ (4)
(yi − y)2
Alekseev and Seixas (2009) state the existence of noise in time series,
and it is this noise that makes it challenging to figure out the pattern of

6
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

Fig. 7. Last 400 predictions with raw data.

Fig. 8. Last 400 predictions with smoothed data.

1.10, and m20 is the number of samples with a value of experimental


Table 4
rate value/predicted value between 0.80 and 1.20.
Used Environments
In this study, we have used Tableau, Spyder Python and the colab
a) Colab Libraries used b) Spyder Libraries used environment. COLAB is a Google research project created to help
Package Version Package Version Package Version disseminate education and research related to machine learning. It is a
python 3.6.9 statsmodels 0.10.2 python 3.7.3 Jupyter notebook environment using popular libraries. The libraries
tensorflow 2.3.0 seaborn 0.11.0 IPython 7.6.1 during his research are listed in Table 4.
torch 1.7.0 + cu101 tqdm 4.41.1
pandas 1.1.4 keras 2.4.3
3.3. Models
numpy 1.18.5 livelossplot 0.5.3
matplotlib 3.2.2 xgboost 0.90
sklearn 0.22.2.post1 The models we have tested are:
ARIMA: The AR part of ARIMA indicates that the evolving variable of
interest is regressed on its own lagged (i.e., prior) values. The MA part
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
)2̅
√ ( indicates that the regression error is a linear combination of error terms

√ ∑
√1 n
y i − yi
̂ whose values occurred contemporaneously and at various times in the
RMSPE = 100% (5) past. The I (for “integrated”) indicates that the data values have been
n i=1 yi
replaced with the difference between their values and the previous
⃒ ⃒ values (and this differencing process may have been performed more
n ⃒ ⃒
1∑ ⃒y i − ̂
y i ⃒⃒ than once). The purpose of each of these features is to make the model fit
MAPE = ⃒ (6)
n i=1 ⃒ yi ⃒⃒

the data as well as possible (Autoregressive integrated moving average,
2020).
Accuracy = 100 − 100*MAPE (7) Random Forests: Random forests are a combination of tree predictors
such that each tree depends on the values of a random vector sampled
where ̂ y1 , ̂
y2 , …, ̂
yn are predicted values, y1 , y2 , …, yn are observed independently and with the same distribution for all trees in the forest
values, y is the mean of the y values, and n is the number of observations. (Breiman, 2001).
M is the number of dataset samples, m10 is the number of samples with a Extreme Gradient Boosting: Boosting works by sequentially applying
value of experimental rate value/predicted value between 0.90 and a classification algorithm to reweighted versions of the training data and

7
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

Fig. 9. Adjusting data batches.

then taking a weighted majority vote of the sequence of classifiers thus


produced. (Friedman, Hastie, & Tibshirani, 2000).
DNN: A Deep Neural Network is a collection of neurons organized in
a sequence of multiple layers, where neurons receive as input the neuron
activations from the previous layer and perform a simple computation
(e.g. a weighted sum of the input followed by a nonlinear activation)
(Montavon, Samek, & Müller, 2018).
CNN: Cellular Neural Network consist of a massive aggregate of
regularly spaced circuit clones, called cells, which communicate with
each other directly only through their nearest neighbors (Chua and
Yang, 1988).
IMV-LSTM: The Long Short Term Memory neural network, a
particular type of RNNs, which has a strong ability to handle long-term Fig. 10. Every batch has its own size and look back shifts.
and short-term dependency problems with its success in processing
nonlinear sequential data, can keep not only adjacent ad-hoc informa­
tion, but also control long-term details. The core of the LSTM neural
network is the memory cell that replaces the hidden layers of conven­
tional neurons (Li, Wu, & Liu, 2018). LSTM model is introduced by
Hochreiter & Schmidhuber (1997) and its unit is composed of a cell, an
input gate, an output gate and a forget gate.
input gates activation vector : it = σ (Wi .[ht− 1 , xt ] + bi ) (5)

outputgate’sactivationvector : ot = σ(Wo .[ht− 1 , xt ] + bo ) (6)


( )
forgetgate’sactivationvector : ft = σ Wf .[ht− 1 , xt ] + bf (7) Fig. 11. Pseudocode of the code change.

cellinputactivationvector : ct = tanh(Wc .[ht− 1 , xt ] + bc ) (8)

cellstatevector : ct = ft *ct− 1 + it *ct (9) Table 5


Hyperparameter settings of the model and used timesteps.
hiddenstatevector : ht = ot *tanh(ct ) (10) a) Hyperparameter settings of the VBS IMV-LSTM model b) Look back time steps
Hyperparameter Setting Dataset Time Steps
The idea of IMV-LSTM is to make use of hidden state matrix and to
Number of Layers 1 Raw 2
develop associated update scheme, such that each element (e.g. row) of
Number of Neurons 256 Dataset20 2
the hidden matrix encapsulates information exclusively from a partic­ Prediction Horizon 1 Dataset10 1
ular variable of the input (Guo et al., 2019). Learning Rate 0.001 Smoothed 2
( ) Gamma 0.9
Epochs 500
the hidden state update : ̃jt = tanh Wj ⊛̃
ht− 1 + Uj ⊛xt + bj (11)
Patience 30
Step Size 20
Batch Size Dynamic (Number of sales date)

8
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

Fig. 12. Regression coefficients for VBS IMV-LSTM.

hidden state vector : ̃


ht = matricization(ot ⊙ tanh(ct ) ) (14)
Table 6
Output calculation formulae in VBS IMV-LSTM. where ⊕is concatenation operation and ⊙is element-wise
Phase Formulae multiplication.
Raw Dataset20 Dataset10 Smoothed One of the hyperparameters used by the models is batch size. Batch
Training 0.92*Target + 0.96*Target 0.97*Target 0.99*Target size denotes the subset size of the training sample (e.g. 256 out of 1000)
11.56 + 6.27 + 4.14 + 0.53 which is going to be used to train the network during its learning pro­
Validation 0.97*Target + 1.06*Target- 1.07*Target- 1.02*Target- cess. Each batch trains network in successive order, considering the
12.33 3.96 5.57 0.26
updated weights coming from the appliance of the previous batch.
Test 0.95*Target + 1.01*Target 1.06*Target- 1.02*Target-
6.29 + 2.45 2.60 0.94
The batch-size in the conventional algorithms has been kept fixed.
Due to the nature of the flight date and the sale patterns, a variant batch-
size approach is developed and applied. All the sales to a specific flight
⎡ ⎤
it ( [ ( )] ) are considered as a single batch. Since the number of the sale dates
gate vectors : ⎣ ft ⎦ = σ W xt ⊕ vec ̃ht− 1 +b (12) differs from each other, the batch-size itself will also vary. The look back
ot adjustment is done for every batch independent from each other since
( ) the values of the regressors and the target value reset for the new flight.
cell state vector : ct = ft ⊙ ct− 1 + it ⊙ vec ̃jt (13) This resetting causes the data to be daily time series with overlapping
windows with a size of sale dates count of a flight. The consideration of

9
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

Table 7
Empirical analysis.
Data Model RMSE a10-index a20-index R2 RMSPE (%) Accuracy (%)
Raw ARIMA 26.21 0.41 0.69 0.71 109.19 28.15
Random Forests 23.01 0.42 0.70 0.74 33.27 78.83
Extreme Gradient Boosting 21.67 0.44 0.70 0.76 34.15 79.83
DNN 35.72 0.19 0.39 0.68 26.56 77.14
CNN 34.74 0.20 0.40 0.69 32.56 75.34
IMV-LSTM 25.75 0.40 0.66 0.67 37.53 77.59
VBS IMV-LSTM 21.97 0.43 0.71 0.82 25.50 82.76
Dataset20 ARIMA 23.29 0.49 0.81 0.77 89.74 39.44
Random Forests 23.12 0.46 0.73 0.75 27.31 81.78
Extreme Gradient Boosting 19.84 0.45 0.75 0.78 23.90 83.84
DNN 32.62 0.20 0.42 0.72 27.14 76.87
CNN 33.37 0.20 0.42 0.71 29.19 77.57
IMV-LSTM 22.15 0.52 0.79 0.77 26.16 85.22
VBS IMV-LSTM 18.80 0.54 0.81 0.87 16.04 88.05
Dataset10 ARIMA 23.79 0.65 0.88 0.74 89.29 41.25
Random Forests 24.49 0.51 0.78 0.74 25.39 83.87
Extreme Gradient Boosting 20.08 0.48 0.76 0.79 19.32 86.47
DNN 32.37 0.18 0.33 0.73 32.19 73.06
CNN 32.94 0.24 0.45 0.72 32.08 77.46
IMV-LSTM 22.67 0.62 0.87 0.75 27.73 87.31
VBS IMV-LSTM 15.41 0.68 0.90 0.91 12.93 90.74
Smoothed ARIMA 5.88 0.92 0.99 0.98 77.39 47.73
Random Forests 38.54 0.30 0.56 0.42 42.14 72.26
Extreme Gradient Boosting 28.85 0.32 0.60 0.53 31.44 78.33
DNN 41.02 0.15 0.31 0.52 32.18 72.95
CNN 25.62 0.24 0.45 0.81 20.95 81.70
IMV-LSTM 5.94 0.90 0.99 0.98 6.54 95.88
VBS IMV-LSTM 5.32 0.95 0.99 0.99 5.57 96.25

Fig. 13. Heat map result of VBS IMB-LSTM for dataset Smoothed with the time step 2.

Fig. 14. Price corridors obtained versus actuals.

10
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

batches as overlapping windows concept is shown in Fig. 9 and the look competitors’ ticket prices are used. ISTANBUL – DUSSELDORF route is
back period is taken as 3 as an example. The look back period of data is studied, and only one-way ticket prices are considered. Different from
colored in green and its size can be tuned after experiments. the other studies in the literature, Cost per Available Seat Kilometer and
We call this new approach as Variant Batch Size IMV-LSTM (VBS Target Revenue features are included. The conventional and recently
IMV-LSTM). The adjusted batches are used in VBS IMV-LSTM model as proposed Machine Learning models are explored and compared. The
shown in Fig. 10. conventional approaches yield low statistical criteria scores. A new
The pseudocode of the changes made on IMV-LSTM python code is approach is proposed to the model that has outperformed the others. The
presented in Fig. 11. observation data is divided into two for the models to be trained and
tested to predict the best profitable ticket price.
4. Results Since the target value significantly depends on the human judge­
ments, we aimed to predict a price interval to be offered to the revenue
Raw, Dataset20, Dataset10 and smoother are tested with the models managers. To reveal the structure of the price better, the raw sale data
mentioned in Models section. RMSE, a10-index, a20-index, R2, RMSPE, with the highest profit is used, the outliers are filtered, and then the
and Accuracy values were recorded as statistical performance criteria. remaining data is smoothed. Data features got more consistent with the
The datasets were partitioned as 80% training, 10% validation, and 10% target value and the empirical studies at every stage exhibited better
for test. If the model does not use validation, the partitioning was done results than the previous ones. Our model yields better results than
as 90% training and 10% test. The characteristics of the proposed conventional models.
approach (VBS IMV-LSTM) and the best values for the look back time As the fluctuation of the prices reduces and as the sale graph ap­
step are presented in Table 5. proaches to steady, the statistical performance of the proposed model
Proposed model regression coefficient numerical values of training increases. By making the use of these price offerings and the trend as a
phases, validation phases, and test phases for all datasets are given in decision support system, we believe that the subjectivity and the panic
Fig. 12 and the output calculation formulae are given in Table 6. price assignment may be reduced and better management may be ach­
Empirical analysis reveals that the models yield different perfor­ ieved. Besides, by collecting the reduced subjective data, the model can
mances according to different datasets. The best results obtained are be trained better to predict less erroneous percentages, by the time.
tabulated in Table 7 and the best scores are in bold. IMV-LSTM and VBS In this study, the weekly minimum prices of the competitors had
IMV-LSTM detailed comparison table with different time steps is shown been collected from the data provider company. However, the use of
in Table A1. daily minimum data would probably give improved results.
The heat map result of the VBS IMV-LSTM for the dataset Smoothed Reducing human judgement on ticket price assignment, avoiding
is shown in Fig. 13. The contribution of the proposed features CASK and missing data and usage of daily data will increase the model perfor­
Target Revenue can be seen from the map. mance, and price offering interval will be narrowed by the time. The
Fig. 14 shows the results obtained with applied thresholds for last 30 future work will be to optimize the study to offer a price rather than a
observations. The dotted line is the actual price offerings, the red price interval. Dynamic batch sizing will be applied to other models like
corridor is obtained with raw data, the light green corridor is obtained Gated Recurrent Unit (GRU).
used by filtered data, and finally, the green corridor is obtained by The model can be used by other business cases that have similar
smoothed data. It can be seen that the corridor obtained with filtered historical data as overlapping windows structure.
data covers the actual price offerings best. The green corridor can be
considered for the trend of the market to be considered for future price
Declaration of Competing Interest
settings.
The authors declare that they have no known competing financial
5. Conclusion
interests or personal relationships that could have appeared to influence
the work reported in this paper.
In this paper, investigates have been carried out for the revenue
management of an airline and how Machine Learning models can be
Appendix
applied for ticket pricing assignment as a decision support system. The
real data of a low-cost carrier based in Istanbul, Turkey and its

Table A1
EXPERIMENTS with IMV-LSTM and VBS IMV-LSTM with various time steps. The best scores are in bold.
Dataset : Raw
IMV-LSTM VBS IMV-LSTM
Time Step MSE RMSE a10 index a20 index R2 RMSPE Accuracy MSE RMSE a10 index a20 index R2 RMSPE Accuracy

t-1 751.72 27.42 0.37 0.62 0.63 39.28 76.35 638.19 25.26 0.41 0.65 0.80 27.93 80.78
t-2 684.07 26.15 0.41 0.67 0.71 36.55 78.07 482.75 21.97 0.43 0.71 0.82 25.50 82.76
t-3 662.88 25.75 0.40 0.66 0.67 37.53 77.59 542.50 23.29 0.45 0.69 0.81 26.83 82.24
t-4 684.18 26.16 0.41 0.67 0.69 35.72 78.65 586.27 24.21 0.42 0.69 0.80 28.63 81.16
Dataset : Dataset20
IMV-LSTM VBS IMV-LSTM
Time Step MSE RMSE a10index a20index R2 RMSPE Accuracy MSE RMSE a10index a20index R2 RMSPE Accuracy
t-1 602.26 24.54 0.49 0.75 0.76 28.19 83.42 369.34 19.22 0.53 0.80 0.87 16.53 87.50
t-2 536.33 23.16 0.51 0.79 0.78 26.31 84.89 353.60 18.80 0.54 0.81 0.87 16.04 88.05
t-3 490.46 22.15 0.52 0.79 0.77 26.16 85.22 394.44 19.86 0.52 0.80 0.86 17.06 87.33
t-4 507.65 22.53 0.50 0.80 0.77 27.07 85.28 412.02 20.30 0.52 0.79 0.85 17.65 86.95
Dataset : Dataset10
IMV-LSTM VBS IMV-LSTM
Time Step MSE RMSE a10index a20index R2 RMSPE Accuracy MSE RMSE a10index a20index R2 RMSPE Accuracy
t-1 552.59 23.51 0.60 0.86 0.78 27.24 86.74 237.34 15.41 0.68 0.90 0.91 12.93 90.74
(continued on next page)

11
I. Koc and E. Arslan Expert Systems With Applications 175 (2021) 114794

Table A1 (continued )
Dataset : Raw
IMV-LSTM VBS IMV-LSTM
Time Step MSE RMSE a10 index a20 index R2 RMSPE Accuracy MSE RMSE a10 index a20 index R2 RMSPE Accuracy

t-2 555.21 23.56 0.62 0.83 0.76 28.62 86.63 291.69 17.08 0.63 0.87 0.89 14.43 89.54
t-3 514.00 22.67 0.62 0.87 0.75 27.73 87.31 272.99 16.52 0.65 0.97 0.89 13.99 89.81
t-4 546.41 23.38 0.59 0.83 0.74 30.56 85.77 367.80 19.18 0.42 0.81 0.84 16.99 86.59
Dataset : Smoothed
IMV-LSTM VBS IMV-LSTM
Time Step MSE RMSE a10index a20index R2 RMSPE Accuracy MSE RMSE a10index a20index R2 RMSPE Accuracy
t-1 88.22 9.39 0.81 0.96 0.95 9.33 93.52 74.13 8.61 0.80 0.98 0.97 8.45 93.72
t-2 45.33 6.73 0.87 0.94 0.98 8.78 94.79 28.25 5.32 0.95 0.99 0.99 5.57 96.25
t-3 47.48 6.89 0.87 0.94 0.97 9.41 94.40 39.26 6.27 0.90 0.99 0.98 6.69 95.37
t-4 35.32 5.94 0.90 0.99 0.98 6.54 95.88 35.12 5.93 0.92 0.99 0.98 5.52 95.98

References Hill, T., O’Connor, M., & Remus, W. (1996). Neural network models for time series
forecasts. Management Science, 42(7), 1082–1092. https://doi.org/10.1287/
mnsc.42.7.1082
Abdella, A., Zaki, N., & Shuaib, K. (2019). Airline ticket price and demand prediction: A
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation,
survey, Journal of King Saud University –Computer and Information Sciences, https
9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
://doi.org/10.1016/j.jksuci.2019.02.001.
Karasu, S., Altan, A., Bekiros, S., & Ahmad, W. (2020). A new forecasting model with
Alekseev, K. C. P., & Seixas, J. M. (2019). A multivariate neural forecasting modeling for
wrapper-based feature selection approach using multi-objective optimization
air transport – Preprocessed by reorganizing decomposition: A Brazilian application.
technique for chaotic crude oil time series. Energy, 212, 118750. https://doi.org/
Journal of Air Transport Management, 15, 212–216. https://doi.org/10.1016/j.
10.1016/j.energy.2020.118750
jairtraman.2008.08.008
Lantseva, A., Mukhina, K., Nikishova, A., Ivanov, S., & Knyazkov, K. (2015). Procedia
Altan, A., Karasu, S., & Bekiros, S. (2019). Digital currency forecasting with chaotic meta-
Comput. Sci., 66, 267–276. https://doi.org/10.1016/j.procs.2015.11.032. ISSN
heuristic bio-inspired signal processing techniques. Chaos, Solitons and Fractals, 126,
1877-0509.
325–336. https://doi.org/10.1016/j.chaos.2019.07.011
Li, Y., & Li, Z. (2018). Design and implementation of ticket price forecasting system. AIP
Altan, A., & Karasu, S. (2020). Recognition of COVID-19 disease from X-ray images by
Conference Proceedings, 1967, Article 040009. https://doi.org/10.1063/1.5039083
hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and
Li, Y., Wu, H., & Liu, H. (2018). Multi-step wind speed forecasting using EWT
deep learning technique. Chaos, Solitons and Fractals, 140, 110071. https://doi.org/
decomposition. LSTM principal computing. RELM subordinate computing and IEWT
10.1016/j.chaos.2020.110071
reconstruction. Energy Conversion and Management, 167, 203–219. https://doi.org/
Autoregressive integrated moving average. (2020). Retrieved from https://en.wikipedia.
10.1016/j.enconman.2018.04.082
org/wiki/Autoregressive_ integrated_moving_average. Accessed December 9, 2020.
McHugh, K. (2018). PricingNet: modelling the global airline industry with neural
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/
networks. https://hackernoon.com/pricingnet-modelling-the-global-airline-ind
10.1023/A:1010933404324
ustry-with-neural-networks-833844d20ea6. Accessed December 6, 2020.
Brons, M., Pels, E., Nijkamp, P., & Rietveld, P. (2002). Price elasticities of demand for
Montavon, G., Samek, W., & Müller, K. R. (2018). Methods for interpreting and
passenger air travel: A meta-analysis. Journal of Air Transport Management, 8(3),
understanding deep neural networks. Digital Signal Processing, 73, 1–15. https://doi.
165–175. https://doi.org/10.1016/S0969-6997(01)00050-3
org/10.1016/j.dsp.2017.10.011
Chen, Y., Cao, J., Feng, S., & Tan, Y. (2015). An ensemble learning based approach for
Mottini, A., & Acuna-Agost, R. (2017). Deep Choice Model Using Pointer Networks for
building airfare forecast service. In: 2015 IEEE International Conference on Big Data
Airline Itinerary Prediction. https://doi.org/10.1145/3097983.3098005.
(Big Data), Santa Clara, CA, pp. 964-969, https://doi.org/10.1109/BigData.201
Panda, S., & Panda, G. (2020). Fast and improved backpropagation learning of multi-
5.7363846.
layer artificial neural network using adaptive activation function. Expert Systems.
Chiou, Y. C., & Liu, C. H. (2016). Advance purchase behaviors of air passengers. A
https://doi.org/10.1111/exsy.12555
continuous logit model. Transportation Research Part E, 93, 474–484. https://doi.org/
Papadakis, M. (2012). Predicting Airfare Prices. https://www.semanticscholar.org/paper
10.1016/j.tre.2016.07.001
/Predicting-Airfare-Prices-Papadakis/28b471f2ee92e6eabc1191f80f8c5117e840ea
Chua, L. O., & Yang, L. (1988). Cellular neural networks: theory. IEEE Transactions on
db?p2df. Accessed November 6, 2020.
Circuits and Systems, 35(10). https://doi.org/10.1109/31.7600
Qin,Y., Song, D., Chen, H., Cheng, W., Jiang, G., & Cottrel, G. W. (2017). A Dual Stage
Chung, S.-H., Ma, H.-L., Hansen, M., & Choi, T.-M. (2020). Data science and analytics in
Attention-Based Recurrent Neural Network for Time Series Prediction, https://arxiv.
aviation. Transportation Research Part E, 134, 101837. https://doi.org/10.1016/j.
org/abs/1704.02971. Accessed August 22, 2020.
tre.2020.101837
Saxon, S., & Weber, M. (2017). A Better Approach to Airline Costs. https://www.mckins
Cook, S. (2014). Opportunities for the Airline and Transport Sectors. Deloitte. https://
ey.com/industries/travel-transport-and-logistics/our-insights/a-better-approach
www2.deloitte.com/content/dam/Deloitte/au/Documents/consumer-business/de
-to-airline-costs. Accessed September 22, 2020.
loitte-au-cb-have-you-flown-off-course-with-your-approach-revenue-management-
Tziridis, K., Kalampokas, T., Papakostas, G. A., & Diamantaras, K. I. (2017). Airfare prices
220914.pdf. Accessed September 17, 2020.
prediction using machine learning techniques. 25th European Signal Processing
Etienne, B. (2019). Time Series in Python — Exponential Smoothing and ARIMA
Conference (EUSIPCO).
processes, https://towardsdatascience.com/time-series-in-python-exponent
Varedi, M. (2010). Forecasting seat sales in passenger airlines: introducing the round-trip
ial-smoothing-and-arima-processes-2c67f2a52788. Accessed December 23, 2020.
model. Thesis (MSc). University of Waterloo, Ontario, Canada. http://hdl.handle.net
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical
/10012/5715. Accessed December 5.
view of boosting (with discussion and a rejoinder by the authors). The Annals of
Wen, C. H., & Chen, P. H. (2017). Passenger booking timing for low-cost airlines: A
Statistics, 28(2), 337–407 Accessed December 8, 2020.
continuous logit approach. Journal of Air Transport Management, xxx, 1–9. https://
Groves, W., & Gini, M.L. (2011). A regression model for predicting optimal purchase
doi.org/10.1016/j.jairtraman.2017.06.030
timing for airline tickets. Technical Report, University of Minnesota, Minneapolis,
Littlewood, Ken (1972). Forecasting and control of passenger bookings. AGIFORS12th
USA. http://hdl.handle.net/11299/215872.
annual symposium proceedings, 95–128.
Guo, T., Lin, T., & Antulov-Fantulin, N. (2019) Exploring Interpretable LSTM Neural
Networks over Multi-Variable Data, https://arxiv.org/abs/1905.12034.

12

You might also like