T.C. Ankara Yildirim Beyazit University Graduate School of Social Sciences
T.C. Ankara Yildirim Beyazit University Graduate School of Social Sciences
PHD THESIS
Fikriye KARACAMEYDAN
Ankara, 2023
T.C.
ANKARA YILDIRIM BEYAZIT UNIVERSITY
GRADUATE SCHOOL OF SOCIAL SCIENCES
PHD THESIS
Fikriye KARACAMEYDAN
Supervisor
Assoc. Prof. Dr. Erhan
Ankara, 2023
PAGE OF APPROVAL
I approve that this thesis satisfies all the requirements to be accepted as a thesis for
the degree of doctor of philosophy at the Department of Finance and Banking of the Institute
I hereby declare that all information in this thesis has been obtained and presented in
accordance with academic rules and ethical conduct. I also declare that, as required by these
rules and conduct, I have fully cited and referenced all materials and results that are not
original to this work; otherwise I accept all legal responsibility. (08/02/2023)
Fikriye KARACAMEYDAN
ACKNOWLEDGEMENTS
I would like to express my deepest gratitude to my supervisor Assoc. Prof. Dr. Erhan
to The Council of Higher Education for the support and permission during my studies.
All the variables selected in the present study consist of the monthly closing prices
for the period January 2001-January 2022. The effects of macroeconomic indicators
consisting of the Dollar Rate (ERUSD, TL/$), Money Supply (M2), Producer Price Index
(PPI), Industrial Production Index (IPI), Active Bond Interest Rate (TYBY), Brent Oil Price
(BRT), and Gold Prices (GP, TL/Gr) on the BIST100 index are examined with both Logit
and Probit regression models with qualitative responses and LSTM (Long-Short Term
Memory) architecture and a deep learning model that is a machine learning method. The
performances of the methods are compared with each other. The research has revealed that
the effects of selected macroeconomic variables on the BIST100 index increase during crisis
periods and during the COVID-19 pandemic. It has been determined that the change in the
prices of stocks in BIST100 is largely explained by money supply, exchange rate, industrial
production index, and consumer price index. It has been determined that the changes in the
real sector have a significant impact on the capital sector. It has been determined that the
deep learning method is more successful in predicting financial crises.
The results of the analysis revealed that the LSTM deep learning model, developed
to analyze the effects of macroeconomic indicators on the Borsa Istanbul (BIST100) index,
has a very low error level and gives more effective and satisfactory results than Logit and
Probit regression models.
i
: BIST100
nitel tepkili regresyon modelleri, hem de LSTM (Long- Short Term Memory) mimarisi
-19
kuru,
Anahtar Kelime: -
.
ii
TABLE OF CONTENTS
ABSTRACT.................................................................................................................... i
............................................................................................................................ ii
TABLE OF CONTENTS ............................................................................................iii
LIST OF ABBREVIATIONS .................................................................................... vii
LIST OF FIGURES ..................................................................................................... ix
LIST OF TABLES ...................................................................................................... xi
1. INTRODUCTION .................................................................................................... 1
iii
6. ARTIFICIAL NEURAL NETWORK .................................................................... 35
6.5.3. Artificial Neural Network Classification Based on the Number of Layers ...... 51
iv
6.11. Selection of Network Structure..................................................................... 72
6.11.1. Determining the Number of Input Neurons and Output Neurons .................. 73
6.11.2. Determination of the Hidden Layer and the Number of Hidden Neurons ..... 73
v
10. EMPIRICAL ANALYSIS .................................................................................. 121
10.4.2. Deep Learning Network Architecture Used in the Study ............................. 135
vi
LIST OF ABBREVIATIONS
vii
MPT : heory
PPI : Wholesale Price Index/Domestic Producer Index
RBF : Radial Basis Funcitons
SML : Security Market Line
SPK : Capital Markets Borad of T rkiye
OM : Self-Organizing Feature Maps
TCCB :
TUIK : Turkish Statistical Institute
UCITS : Undertaking for Collective Investment in Transferable Securities
LSTM : Long Short Term Memory
ML : Machine Learning
MAPE : Mean Absolute Percentage Error
MLP : Multi Layer Perceptron
RCNN : Recurrent Neural Network National
SE : Stock Exchange
NYSTE : New York Stock Exchange
SAE : Sparse Autoencoder
VAE : Variational Autoencoder
viii
LIST OF FIGURES
Figure 1.3. Market capitalization of stock exchange companies (million dollars) ............. 11
Figure 2.4. The conditional expected value of the dependent variable ............................... 17
Figure 2.10. The probability that the threshold value is less than or equal to the utility
index .................................................................................................................................... 20
Figure 2.12. The graphs of the Logit and Probit models ..................................................... 21
ix
Figure 9.3. ISE100-M2 ...................................................................................................... 112
Figure 10.8. Train and test results with adam ................................................................... 143
Figure 10.10. Train and test results with sgdm .................................................................. 144
x
LIST OF TABLES
Table 10.1. Some descriptive statistics for the dependent and independent variables ...... 122
Table 10.7. Results of the Logit model & the Probit model .............................................. 132
Table 10.8. Correct classification rate table for the Logit model ...................................... 133
Table 10.9. Correct classificaiton rate table for the Probit ................................................ 133
xi
1. INTRODUCTION
The main purpose of economics and economic activities, which try to find the cause-
and-effect relationships related to the subjects it deals with, and to reveal them in the form
of scientific rules, is to increase the level of social welfare. Social welfare and economic
development are indicators of the social and economic development of societies. The easier
it is for individuals in a society to reach basic needs such as social security, education, health,
and housing opportunities, the more they feel safe and free, and the more savings turn into
investments, the higher the level of social welfare in that society. For this reason, the concept
of social welfare has become one of the issues that play an important role in scientific
platforms as well as in the political field for both developed and developing countries.
Social welfare and economic development are closely related to the growth of the
money market and capital markets, which constitute the financial markets of the countries.
However, there is no common view on the extent of the development of capital markets, a
good financial system has five components; a reliable public finance and public debt
management, reliable and stable financial regulations, the presence of various banks,
national and international, or both, a central bank that can balance domestic finances and
manage international financial relations, and a well-functioning stock market. A regular
financial system with these components plays an important role in the economic growth and
development of countries (Rousseau & Sylla, 2003, pp. 374 375). As it can be understood
from these defined components, the more developed of the financial market of a country, the
lower of the risk level and vulnerability in the society. In addition, this predictable structure
will play an important role in the development of countries, thereby increasing their social
welfare and by encouraging the private sector, and savings will be transformed into
investments, leading to an increase in economic growth.
Developed and developing countries have made various reforms in order to protect
their economies and minimize the effects of financial fluctuations due to the economic crises
experienced in the world from the 1980s to the 1990s. These reforms have been in the form
1
of practices that include financial liberalization policies such as liberalizing interest rates,
removing credit ceilings, reducing or completely removing the deposit reserve ratios that
banks have to keep at the Central Bank, opening the banking sector to both foreign and
residents and liberalizing capital movements 134). It
has been observed that the financial crises experienced in this period, especially in
developing countries, were experienced due to the financial liberalization steps taken to
sustain growth and the hot money that left the country due to economic and political risks,
or due to international capital movements, excessive borrow money, exchange rate policies
and high inflation rates. Also, some important examples of the crises and the costs caused
by these crises in the country s economies arising from speculative capital movements, are
the crises experienced in Turkey in 1994 and 2001, the Asian crisis in 1997, and the Russian
crisis in 1997-98.
In addition to these economic crises, the Mortgage crisis, which emerged in the USA
in 2007 and then spread to England and turned into a global crisis as of the second quarter
of 2008, is an important example as it shows the reflections of the crises in the world
53). It has been shown this and the other financial crises mentioned
above, regardless of the level of development, the markets of the countries greatly affect
each other, cause great financial fluctuations and ruins in the world, and that financial shocks
in large economies can lead to the bankruptcy of various financial institutions, especially
banks, and even the bankruptcy of countries.
These developments in the world financial market have shown how important it is to
forecast the possible chaos and uncertainties in financial markets and the fluctuations that
may occur in micro and macroeconomic factors affecting price movements, in order to
prevent and/or reduce the effects of financial breaks and uncertainties in the markets. This
situation has revealed the fact that economic decision-makers can gain significant
advantages and opportunities if they can accurately forecast the future value of capital
market instruments and asset prices.
The unexpected changes in stock prices in the United States, many European
countries, and Japan in the 1980s and 1990s attracted the attention of researchers and it was
concluded that these fluctuations may be due to macroeconomic factors (Kaymaz & Yilmaz,
2022). On the other hand, Fama (1990), Barro (1990), and Schwert (1990) concluded in their
2
studies that the changes in the prices of stocks representing the firm value are mainly caused
by the changes in the future cash flows and the changes in the discount rates (Barro, 1990;
Fama, 1990; Schwert, 1990). Similarly, Chakravarty et al., in their studies have
concluded that basic macroeconomic variables affect stock prices (Chakravarty, 2005; Ilahi
et al., 2015).
3
structure of financial data based on computer technologies, have been widely used in recent
years (Henrique et al., 2019).
The fact that accurate estimations that reduce financial uncertainty and costs will
bring successful results and thus play an important role in reducing the problems and costs
that may arise for both investors and governments, and that the benefit functions can be
maximized, has increased interest in forecasting modeling and led to the emergence of new
forecasting techniques. The Artificial Neural Networks (ANN) method from AI
technologies, which is one of these new prediction techniques, has the ability to learn,
generalize, and work with incomplete, faulty, flawed, and even wrong data. It has become a
prominent method because it gives very successful results in classification, optimization,
pattern recognition processes, and especially in estimating nonlinear time series (Cheng et
al., 2022a).
It is formed by ordering the data observed in the financial markets and following each
other according to the order of occurrence; data such as stock prices, stock market index,
exchange rates, and interest rates constitute financial time series, which are the main source
of econometric studies . Time series are defined as numerical quantities in
which the values of the variables are observed consecutively from one period to the next
. Financial time series are not stationary
because they are statistically composed of trends, seasonal movements, cyclical movements,
and random movements (Kennedy, 1998, p. 288). For this reason, it is important to be able
to make forecasts based on the behavior of the series by using methods that will purify the
series from these components. In this context, AI technologies have become a very popular
method used in financial analysis.
4
traditional statistical and econometric methods, ANNs produce successful models using
complex, incomplete, faulty, and flawed data, both qualitative and quantitative data, and for
analyses where traditional methods cannot find a solution and produce weak or ineffective
results. For the reasons listed above, ANNs have started to be used as the best alternative for
data analysis (Kamruzzaman et al., 2006a, pp. 2 3). It is still successfully applied in many
fields such as engineering, industry, business, finance, education, military defense, and
health (Henrique et al., 2019; Quazi, 2022).
ANNs, which find a wide area of use in the field of economics and finance as well as
in many other fields, have become a very interesting method due to their features such as
measuring and analyzing without the need for prior knowledge, making easily continuous
function estimation, and making generalizations. Because of these features, it has started to
be used in many studies (Henrique et al., 2019; Patil et al., 2021). This interest in ANNs,
which have been shown to give successful results in modeling and estimating complex
problems, has also led to the diversification and enrichment of AI applications.
It is very important to make the right investment decisions for investors who want to
turn their savings into investments. Therefore, they need data or a signal to help them make
a decision. Accurate preliminary indicators to be obtained in estimating the prices of stocks
traded on the stock exchange or the direction of price movements will offer many
opportunities to investors. Political events, general economic conditions, basic economic
data, and investors expectations are important factors affecting the prices of instruments
traded on the stock exchange. However, by simplifying the analysis and assuming that each
variable behaves linearly prevents us from reaching the correct results. This situation leads
5
to the use of more accurate calculation and estimation methods offered by advanced
computer technologies in time series analysis due to their non-linear, high-frequency,
polynomial, dynamic, and complex structures. Thus, in order to solve complex real-world
problems, it is attempted to overcome the shortcomings of a single method by utilizing
various advantages of AI technologies. At the same time, the disadvantages arising from the
use of hybrid models and techniques are trying to be eliminated. Therefore, the number of
such studies is increasing (Enke et al., 2011, pp. 201 206; Enke & Mehdiyev, 2013, p. 636;
Quazi, 2022).
Two types of risk that affect the capital markets are systematic risk and unsystematic
risk. Financial risk, management risk, and industry risk are also included in the list of
unsystematic risks while inflation risk, interest rate risk, political risk, market risk, and
currency risk are regarded to be systematic risks (Hull, 2007, pp. 9 12). Macroeconomic
variables are one of the most important findings that will provide an understanding of the
effects of these risk factors. Because of these characteristics, it can be observed that studies
on stock market indices in the literature frequently use fundamental macroeconomic
indicators such as inflation, interest rates, money supply, the industrial production index, the
gross national product, gold prices, oil prices, and exchange rates.
In this study, the predictive efficiency of these methods will be tested by using the
Logit and Probit models, which are traditional econometric methods, and a deep learning
neural network model from AI technologies to determine whether the changes in the value
of the Stock Exchange Istanbul 100 Index can be explained by selected macroeconomic
variables. Thus, it will be revealed from the methods used, that which modeling technique
can best predict the change in economic and financial system dynamics.
6
1.1. Stock Exchanges
Stock exchanges are markets where long-term investment instruments such as stocks
and bonds are bought and sold. The direct confrontation between fund suppliers and fund
demanders is the main feature of these markets. Fund transfers are made in a transparent
environment using standard methods in exchanges that have been established and operate
according to certain rules. Therefore, stock markets are a safe and easy investment area for
investors (Karan, 2013, p. 37). The fact that investors can take risks easily is very important
for the development of the economies of countries. So, the necessary fund transfer is
provided for the savings, which form the basis of economic activities, to be transformed into
investments and investments into production.
Although there are different opinions about the establishment of modern stock
exchanges, stock market activities started with the trading of shares of the The United East
India Company (Vereenigde Oost-Indische Compagnie (VOC) is considered the first
multinational company in the world and was the first company to issue stock) in the early
1600s. In the 1700s, London emerged as the financial and stock market center followed by
America and New York in the early 1800s. The world stock markets, which started to
develop all over the world after the 1950s, gained a global character after the 1900s (Karan,
2013, p. 37). As a result of technological developments, the increase in communication
opportunities and the developments in computer technologies have led to the development
and growth of financial institutions and the expansion of the borders within which they
markets. In addition, the globalization of financial markets has increased both capital
movements and the diversity of securities, thus the depth of the capital markets.
The world s most important stock exchanges are located in a group of countries
located in Europe, America, and Asia, which have come a long way in the industrialization
process. As of 2022, the world s four largest stock exchanges in terms of trading volume can
be listed as follows. The New York Stock Exchange (NYSE) in the United States with a
valuation of $20 trillion and by a wide margin is in first place. The National Association of
Securities Dealers Automated Quotations (NASDAQ), an over-the-counter exchange
founded in 1971, is in second place. The third place is the London Stock Exchange, founded
7
in 1801 and the fourth largest stock exchange is the Far East, Tokyo Stock Exchange (TSE),
which was established in the 1870s.
Emerging stock markets are the stock markets of the developing BRICS countries
(Brazil, Russia, India, China, and South Africa). These countries are rich in natural resources
such as oil and natural gas and have high growth potential. A significant part of the basic
capital needs of developing countries is met from stock portfolio investments, that is, from
stock market activities. In other words, businesses, especially public joint stock companies,
meet their medium and long-term funding needs from the capital markets. Thus, stock
exchanges, which enable individuals to transfer their savings to companies as capital, also
prevent companies from turning to high-interest, high-risk, and short-term loans. In this case,
resources turn into investments and contribute to the national economy. In addition, the book
values and equities of companies listed on the stock exchange are increasing.
The chart below, Figure 1.1, shows the performance comparison of selected world
stock markets between 2022/03 and 2021/12. In this chart, it is seen that the stock markets
of Brazil, Indonesia, Mexico, and Luxembourg have been increased and the biggest increase
was experienced in ISE100 at a rate of %20.
On the other hand, it is observed that the performance of the stock markets such as
NASDAQ, NYSE, Hongkong, Shanghai, and Deutsche, which are among the few world
stock exchanges with high trading volumes, dropped quite high in the range of %1-%16.
This situation can also be interpreted as the reflections of the Covid-19 pandemic
experienced all over the world and the effects of the pandemic on the world economies,
which are already struggling with economic crises, to the capital markets.
8
Athens Exchange
Mexican Exchange
Tel Aviv SE
Philippine SE
Lima SE
Luxembourg SE
Thailand SE
Warsaw SE
Shanghai SE
Budapest SE
Irish SE
Colombo SE
Indonesia SE
Bursa Malaysia
Egyptian Exchange
Shenzhen SE
BSE India
TMX Group
NYSE
Buse Merval
Although stock market indices are a basic indicator for national economies, it is
necessary to know what the factors affecting the index are before making an evaluation.
Internal factors such as the economic data of the countries, unemployment figures, interest
rates, geopolitical position, and risks, and external factors such as global data and the
economic relations of the countries with each other, are the factors that affect the economic
course of the countries and, of course, the stock market index.
The markets where precious metals, financial instruments, and capital market
instruments are bought and sold in a reliable and audited environment based on laws and
certain legislation and prices are formed in this transparent environment, are called the Stock
Exchange. After the financial liberalization movements that started in Turkey in the 1980s,
the Istanbul Stock Exchange was established in 1986. Since its establishment, it has
developed continuously and has become one of the important stock exchanges in the regional
area. With the Capital Markets Law dated December 30, 2012, and numbered 6362, the
capital markets, the Istanbul Stock Exchange market, the Izmir Futures and Options
Exchange market, and the Istanbul Gold Exchange market were gathered under one roof
is law, Istanbul Stock Exchange was removed
from the status of a public institution as of April 3, 2013, and continued its activities as an
autonomous and independent company ( , n.d.).
9
According to the statistics from the Central Securities Depository system show that
in July 2022, there were 28.941.853 investors who transacted a total of 20.386.350 million
TL in market value. A total of 449 companies are listed on the Istanbul Stock Exchange.
Index created to measure the sectoral and overall performances of the prices and returns of
these stocks are also alternative investment tools for the portfolio of investors. Therefore,
individual and corporate portfolio managers, who carry out their investments with an index-
linked portfolio strategy, tend to create their mutual funds according to the returns of the ISE
index.
Source: https://www.mkk.com.tr/en/trade-repository-services/e-veri/fund-management-fees
The market capitalization values of the companies registered in the Istanbul stock
market by year are given in Figure 1.3. Market capitalization is the value of a company as
determined by the stock market. It is observed that the market value of companies, which
had a continuous upward trend from 1997 to 2021, started to decrease significantly after
2012. It is observed that the value of all stocks in actual circulation, which experienced
10
decreases and increases between 2013 and 2019, increased again in 2020. Due to the Covid-
19 pandemic experienced all over the world in 2020, there were significant reflections on
the Turkish market as well as in the world markets, and there were decreases in the capital
markets. It has been observed that the reflections of the Covid-19 pandemic on the Turkish
stock market as a decrease with the data between 2020-2022.
315.198
307.052
286.572
233.997 237.474
219.763 227.512
197.074
195.746 184.966
161.538 157.702
149.264 140.207
162.399 188.862
112.716
98.299 118.329 147.876
61.095 69.659 68.379
47.150
33.646
34.217
Source: https://cmb.gov.tr
AI, which became an industry in 1980, emerged as a science in 1987 and started to
take over every aspect of our lives with the developments in computer technologies in
today s world. It is widely and effectively used in many fields, from health to the military
and defense industry, industrial applications, and even the construction industry to
archeology. In the field of finance, AI applications have become more popular in recent years
due to their increasing accuracy, speed, and data size.
The aim of this research is to predict the long-term relationship between the ISE100
index value and the selected macroeconomic variables in the January 2001-January 2022
period by using Logit and Probit analysis and the deep neural networks method and to
compare the estimation performances of the results obtained for both methods. Thus, the
effectiveness of AI technologies, which emerged from developments in traditional
11
econometric methods and computer technologies and continued to develop rapidly in recent
years, in time series analysis will also be measured. For the models used in the research, the
most commonly used variables in studies on the market index in the literature were selected
(W. Chen et al., 2017; Cheng et al., 2022a; Donaldson & Kamstra, 1999; Kara et al., 2011;
Staub et al., 2015) and the meaningful variables obtained from the created meaningful
models were given as input to the models created by ANNs and DL methods, and the
effectiveness of the modeling methods was tested.
Especially after the global economic crisis of 2008, researchers, policymakers, and
investors have realized how important financial forecasts are in preventing financial crises
and uncertainties. Therefore, economic decision makers needed modeling in order to develop
policies that would minimize future uncertainties and create a portfolio.
Although there are many empirical studies in the literature examining the relationship
between stock market indexes and macroeconomic factors (Henrique et al., 2019, pp. 226
251), the study estimating whether macroeconomic variables have any long-term effects on
stock market index values using the Logit and Probit regression methods and the deep neural
network method is very superficial and limited in number (Aggarwal, 2020; Davis & Karim,
2008; Kantar & Akkaya, 2018).
In the empirical study, both the degree of interaction between the real sector and the
financial sector in Turkey will be measured, and a comparison of the DL method, which is
a ML method from AI technologies and a new estimation method, with qualitative response
of the Logit and Probit regression models from econometric methods will be made. The
results of econometric and AI models created for this purpose are presented comparatively.
Thus, a decision-making system that can be used to predict the short-term movement, trend,
and price of the stock market has been developed and evaluated. It also offers a survey of
the literature on the use of AI in financial analysis, with an emphasis mostly on the modeling
process.
12
1.5. Definition and Limits of the Problem
In this research, the effectiveness of the DL modeling technique has been tested. The
DL modeling technique is a ML method that has become increasingly important in our
country and around the world in recent years and has been widely used in many fields but is
used very limitedly in financial time series estimations for Turkey.
Qualitative regression models were created by using the monthly values of seven
macroeconomic variables in order to show the relationship between ISE100 index values
and macroeconomic variables in the 11-years (253-month) period between January 2001 and
January 2022 in Turkey. Then, using the obtained models, DL models were created using
the macroeconomic variables used.
The first parts of the research will be formed as a result of theoretical and literature
research. The conceptual and theoretical framework of financial time series and ANNs and
DL methodology will be drawn from the information to be obtained from the Turkish and
international literature, such as articles, theses, books, papers, and reports published in these
sections.
In the application part, statistical analysis will be made using the ISE100 index and
macroeconomic variable data. Eviews12, STATA15, and MATLAB R2021b package
programs will be used in the application.
The data sets used in the analysis were compiled as monthly series from the Central
Bank of the Republic of Turkey Electronic Data Distribution System (EVDS), the Turkish
Statistical Institute (TUIK) statistics, the monthly statistical bulletins of the Capital Markets
Board, the statistics of the Ministry of Treasury and Finance, and the Eurostat databases.
The macroeconomic variables used in this study were chosen by considering the
studies by Chen, Roll, and Ross (1986) and Prantik and Vani (2004). The studies of Chen,
Roll and Ross (1986) show that macroeconomic variables such as long- and short-term
13
interest rates, expected and unexpected inflation rates, industrial production, and price
indices systematically affect securities prices and, thus, net asset values and returns. In
addition, studies conducted in recent years have shown that various macroeconomic
variables such as exchange rates, oil prices, gold prices, interest rates, production and price
indices, and money supply significantly affect stock market indices (Aggarwal, 2020;
Henrique et al., 2019; Kantar & Akkaya, 2018; Peng et al., 2021).
The ISE100 index was determined as the dependent variable in the established models,
and seven macroeconomic variables were determined as the independent variables. The
selected variables are the main variables that are most widely used in the literature and have
a direct impact on the pricing of capital market instruments and this effect has been proven
in the literature. In addition, these macroeconomic variables in the model are also used as
indicators (benchmarks or fund benchmarks) for the ISE100 index.
In the first part of the research, in addition to the introduction, a brief overview of the
world stock markets was made and information were given about the Istanbul Stock
Exchange. After briefly introducing the purpose, scope, methodology, and data set of the
research, in the second part, information is given about Logit and Probit models, which are
qualitative response regression models and econometric modeling techniques that constitute
the application part of the study.
From the third to the end of the seventh chapter, the concept of AI and its history, the
development of AI, AI technologies, and the latest DL method are covered. In these sections,
first of all, ANNs, which constitute the basic working system of the DL method, are
theoretically examined, including their definition, history, general structure, types, learning
algorithm, and learning rules of ANNs are given. Then, from the steps showing how to
design an ANN model, information is provided on how to determine the input, output, and
hidden layers of the network and the number of neurons in these layers, how to enter data,
how to determine the parameters of the network, or, in other words, how the structure of the
network can be created. In addition, the application areas, advantages, and disadvantages of
ANNs are also given in this section. In the seventh chapter, which discusses the DL method,
14
which is based on ANNs, the elements that make up the architecture of the network and the
methods used in learning are given wide coverage.
In the eighth chapter, detailed and current literature studies are given, and the
references to the current thesis study are examined. Thus, a wide literature review, including
financial studies using AI technologies, is included. Especially within the scope of the thesis,
financial time series and AI techniques were analyzed, and the methods to be used in the
thesis were determined.
In the last chapters, after the methodology and data set of the study were introduced,
the application of analysis techniques was started. By establishing Borsa Istanbul Index
prediction models, Logit and Probit regression analyses were made, and a DL design is made
using the variables used in these prediction models, Logit and Probit regression analyses are
made, and a DL design is made using the variables used in these models. The effectiveness
of the techniques was tested with the findings obtained from the prediction models created
using analysis techniques. A performance evaluation is carried out within the framework of
the purpose of the study, and the prediction results of the models used are presented
comparatively.
As a general evaluation, in this thesis study, the ability of the DL method to forecast
the future trend of the nonlinear, variable, and complex ISE100 index will be tested. The DL
method is one of the AI technologies that has become increasingly important in recent years
all over the world and has been widely used in many fields but is seen to be used very
limitedly in our country in financial time series estimations. It has also been observed that
DL methods are used to a limited extent or are not used at all in the field of finance, and so
this fact brings forward to new approaches in the field of finance.
15
2. QUALITATIVE RESPONSE MODELS
Linear probability models are simple models that consist of two-valued dependent
variables and can be estimated by the ordinary least squares method. It indicates the
probability of choosing one of the two options presented. Since the dependent variable takes
binary values 0 or 1, it is a restricted qualitative variable defined as a probability.
The model expressed as a linear regression model is a typical linear regression model,
and it is called the Linear Probability Model since the independent (explanatory) variable
has two choices. Because when is given, the conditional expected value of can be
interpreted as E ( ) conditional probability of the occurrence of the event when is
given. Here, when =1, the first option is preferred and when =0, the second option is
preferred, so the dependent variable will be able to take the values of 0 ve 1. Assuming E( )
= 0 when the expected value is taken, the obtained model will be as follows:
16
In this model, the probability of choosing the first option will be =P ( =1) and if
the second option is preferred, =P ( =0). In this case, the expected value of the
dependent variable is (Gujarati & Porter, 2009, p. 543);
In classical regression models, the parameters are found by the least squares method.
However, in the Linear Probability Model (LPM), like the dependent variable, the error term
also fits the Bernoulli distribution as it only takes a binary value of 0-1. Therefore, although
the model parameters are estimated by the least squares method, they do not meet the linear
deviation-free estimator specifications. Therefore, there are some criticisms regarding the
estimation and interpretation of this model. Because there is a problem of varying variance
in error terms, parameter estimates will not be efficient, and this will make traditional
significance tests questionable. For these reasons, the logit and probit models discussed in
the literature are preferred as alternative modeling options instead of linear probability
models to model binary variables (Gujarati, 2011, p. 154).
Although the linear probability model is simple to forecast and use, it is thought that
the limitations arising from its disadvantages, such as the fact that the appropriate
probabilities can be less than or greater than zero and the partial effect of any explanatory
variable being fixed, can be overcome by using more complex binary response models
(Wooldridge, 2010, p. 584).
When the linear probability model described in the previous section is expressed as
and (cumulative) logistic distribution function , is the i probability of
the second individual making one of the two choices;
17
Figure 2.5. Logistic distribution function
Here, e = 2.7182 while the variable takes values in the range of - to + , since
takes values between 0 and 1, the relationship between (or and will not be linear
and will not meet the two requirements mentioned earlier. For this reason, since this
relationship will need to be converted to a linear form to be estimated can be written as
;
This ratio is the probability of an event occurring to the probability of not occurring
and is expressed as the Odds Ratio when the natural logarithm of this ratio is taken.
The resulting model is called the Logit regression model. So the logarithm L of the
odds ratio is linear not only with respect to X but also with respect to the population
coefficients (in terms of coefficient estimation). L is called Logit. The probability of making
a particular choice while estimating the parameters of this model will be = ln (1/0), if not
= ln (0/1). These expressions are obviously meaningless, so Logit models cannot be
18
estimated using the standard least squares method. In this case, the highest likelihood method
can be used to estimate the population parameters (Gujarati & Porter, 2009, p. 556).
Under the assumption of normality, the probability that and are less than or
equal to the standardized normal cumulative probability distribution is expressed and
calculated as follows:
Figure 2.10. The probability that the threshold value is less than or equal to the utility
index
Here, is the standardized normal variable that is, Z N (0, ). F , being the
standard normal cumulative distribution function, it can be written as:
Similar to the Logit model, it is not appropriate to use the classical least squares
method for the estimation of the Probit model. With the highest likelihood method, it is
possible to obtain consistent estimates of the parameters of the Probit model (Gujarati &
Porter, 2009, p. 567).
Although the Logit and Probit models give similar results, the main difference
between the two models is that the logit distribution has slightly thicker tails. A logistically
distributed random variable has a variance of about /3 hereas a standard normally
distributed variable has a variance of 1. However, in practice, there is no reason to prefer
one over the other, and the Logit model is preferred over the Probit model because of its
comparative mathematical simplicity (Gujarati, 2011, p. 163). The response probabilities of
20
these models are complex and difficult to forecast due to their nonlinear nature, but they
have become popular recently (Wooldridge, 2010, p. 596).
Logit
While natural logarithms of odds ratios are used in Logistic regression analysis, the
cumulative normal distribution is used in Probit regression analysis.
21
3. ARTIFICIAL INTELLIGENCE
The work of human beings for self-knowledge has made significant progress with
technological developments. At the latest point reached today, an AI chat robot called
ChatGPT has been put into use by a research company called OpenAI. In the modern world,
where computers and computer systems have become an indispensable part of life, major
developments have occurred in software and hardware systems and data science. These
developments have led to the widespread use of AI technologies, which have the ability to
process information, learn, reason, solve problems, and make decisions quickly, accurately,
and effectively in many areas where more information is obtained every day and the obtained
information is used in many areas where computer technologies are used.
Studies of ANNs in neurobiology were drawn to the field of engineering with the
creation of the first artificial nerve cell in 1943 by neurologist Warren McCulloch and
mathematician Walter Pitts (Haykin, 1999, p. 38). The concept of AI has also taken its place
in the literature after this study. The concept of AI, whose ability to solve problems using
the ability to think and learn, which is one of the most important features of the human brain,
has been discovered, has led to the emergence of new and more complex technologies due
to the intense interest it has received from both researchers and commercial vendors.
22
first attempt at learning neural networks was with the Hebbian learning rule, a
neurophysiological theory developed by Donald Hebb in 1949. According to the theory,
learning occurs through the exchange of messages between neurons in the human brain. In
other words, he suggested that learning ability can be assigned to network structures by
performing logical operations through networks of nerve cells (Haykin, 1999, p. 38).
Neural networks (NN), or more precisely ANN, are a technology with roots in a
variety of fields, including neurological sciences, mathematics, statistics, physics, finance,
computer science, and engineering (Haykin, 1999, p. 1). These computer systems, called
intelligent systems, have continued their development with the contribution of AI. The
developments in these two areas trigger each other, and their development processes
continue.
ANNs are neural networks, also known as simulated neural networks It is a subset
of ML and is at the heart of DL algorithms. These systems were inspired by the human brain,
and they took their names and formed their structures by mimicking the way biological nerve
cells (neurons) send signals to each other. Artificial neural networks (ANNs), Machine
Learning (ML), and Deep Learning (DL) are called artificial intelligence (AI) (Quazi, 2022).
These three concepts are shown in the literature in the Figure 3.1 below.
AI and machine learning (ML) are two emerging and interrelated technologies in
finance. Concepts such as AI, ML, DL, and data science are confusing and intertwined.
ANNs, a sub-branch of AI, are a statistical approach created to develop forecasting models.
23
Basically, ML is a particular embodiment of AI that develops techniques and uses algorithms
that enable machines to recognize patterns in datasets. On the other hand, DL is a subset of
ML that equips machines with the techniques needed to solve complex problems. Data
science is a separate field of study that applies AI, ML, and DL to solve complex problems
15).
AI, defined as understanding the human thinking structure and attempting to develop
computer operations that will reveal the likes of it, is an attempt to give programmed
computers the ability to think. AI studies, which have been going on since the 1950s for the
development of systems that think and act like human beings, have spread to fields such as
engineering, neurology, and psychology since they were aimed at imitating humans at one
point. Despite significant advances, the point reached today in studies for developing
systems that can think and behave like humans is that AI has not yet been fully developed.
Leaving aside the discussions on the possibility of AI, studies on this subject continue along
with studies in different fields that support this field.
ANNs technology is one of the various fields that emerged within the scope of AI
studies and that at some point provided support to those studies. Therefore, ANNs, which is
a sub-branch of AI, forms the basis of systems that can learn. ANNs that imitate the neuron,
which is the basic processing element of the human brain, in a formal and functional way,
are programs created for a simple simulation of the biological nervous system in this way.
Therefore, AI technology, which is thought to be able to transfer the ability of learning by
experimenting (experience) to the computer environment, provides an incredible learning
from input data capacity to a computer system and offers many advantages. This
technology, which offers various advantages and develops day by day, is used in the fields
of economy and statistics, as it is in many other fields today. They are frequently used in
areas that require the definition of the structure contained in the data, such as forecasting and
prediction, because they are known as Universal Function Approximators
Technically, the most basic task of an ANN is to determine an output set that can
correspond to an input set shown to it. In order to do this, the network is trained with
examples of the relevant event and gains the ability to generalize. ANNs are also called
correlated networks, parallel distributed networks, and neuromorphic systems
2012, p. 30).
24
4. DEVELOPMENT PROCESS OF ARTIFICIAL
INTELLIGENCE
The history of AI begins with an interest in neurobiology and the application of the
obtained information to computer science. When AI studies are examined, it is discovered
that the majority of the research is a continuation of each other. In other words, the oldest
developed AI structures and learning algorithms are still used today, and developments in
this field are strongly related to previous developments. Due to the succession of these
studies, AI studies developed rapidly, but some studies conducted later showed that previous
studies were insufficient, which caused the studies in the field of AI and the support given
to these studies to pause until these problems were resolved mel, 2006, p. 37).
For many years, scientists have studied how the human brain works and its functions.
The first work providing information on brain functions was published in 1890. It is known
that before 1940, some scientists such as Helmholtz, Pavlov, and Poincare worked on the
concept of an ANN. However, it cannot be said that these studies have engineering value
. The first foundations of ANNs were laid in an article published in
1943 by neuropsychologist McCullogh and mathematician Pitts, who started research in the
early 1940s. McCullog and Pitts made a model in the form of a simple mathematical
representation of the biological neuron (Zurada, 1992, p. 30). Thus, inspired by the
computational ability of the human brain, the first neuron model or simple neural network
was modeled, and the first mathematical model of the artificial nervous system was
developed. They tried to determine the learning rules by revealing that cells work in parallel
with each other (McCulloch & Pitts, 1943, p. 118).
25
developed later and formed the basis of multi-layer perceptrons, which were revolutionary
in ANNs (Zurada, 1992, p. 19).
With the development of computers in the 1950s, it became possible to model the
foundations of theories about human thought. Nathanial Rochester, a researcher at IBM
research labs, led efforts to create a neural network simulation. Although the first attempt
was unsuccessful, subsequent attempts were successful. After this stage, the search for
traditional computational methods has left its place for the investigation of neural
computation methods. Until then, Rochester and his team had defended thinking machines
by citing evidence of their own work. The Artificial Intelligence Dartmouth Summer
Research Project , which was held in 1956 and later on, increased the talk and therefore the
evidence on AI and ANNs. One of the results of this research is that it encourages interest
in both AI and ANNs research (Anderson & McNeill, 1992, p. 17).
In 1959, Bernard Widrow from Stanford University and his student Marcian Hoff
developed two Perceptron-like models, which they named ADALINE (Adaptive Linear
Elements) and MADALINE (Multiple Adaptive Linear Elements 8).
The work done by Marvin Minsky and Seymour Papert in 1969 turned into a book
called Perceptrons . The authors specifically claimed in this book that sensors based on
ANNs have no scientific value and cannot solve nonlinear problems, and they used the
26
example of not solving the XOR1 logic problem to prove their point. This situation has
caused AI studies to enter a period of stagnation. Although the studies were interrupted in
1969 and the necessary financial support was cut, some scientists continued their studies. In
particular, the studies of researchers such as Shun-Ichi Amari, Stephen Grossberg, Gail A.
Carpenter, Teuvo Kohonen, and James A. Anderson came to fruition in the 1980s, and new
studies on AI began to be put forward. Adaptive Resonance Theory (ART), developed by
Grossberg and Carpenter in 1976, has been the most complex artificial neural network
developed for teacherless learning .
Studies by Hopfield in 1982 and 1984 showed that ANNs can be generalized and can
produce solutions to problems that are difficult to solve, especially with traditional computer
programming. Hopfield used similar studies created in 1972 by electrical engineer Kohonen
and neuropsychologist Anderson, working in different disciplines and unaware of each other.
In order to solve technical problems such as optimization, he developed the nonlinear
dynamic Hopfield network and the Kohonen Self-Organizing Feature Maps
network, which is a learning network without a trainer (Akel & Karacameydan, 2018, p. 53).
Hopfield also revealed the mathematical foundations of ANNs in these studies (Anderson &
McNeill, 1992, p. 18). These studies formed the basis for unsupervised learning rules that
will be developed later.
In 1986, David Rumelhart and James McClelland, in their work called Parallel
Distributed Processing , showed that the previously claimed flaws on this subject could be
overcome by developing a new learning model, the Back Propagation Algorithm, which is a
learning model in feedforward models. Because the XOR problem, which the single-layer
perceptron could not solve, was solved by the discovery of multi-layer perceptrons. Today,
various versions of this learning method are used in a variety of settings
38). The development of this algorithm, which is still one of the most used training
algorithms, has revolutionized the field of ANNs (Zurada, 1992, p. 20). These developments
have led to an increased interest in ANN.
(Executive OR) a logic problem can be defined as a special case of the problem of classification of points in a unit
hypercube. Each point in the hypercube belongs to either class 1 or 0. For detailed information, see (Zurada, 1992).
27
As a result of the increasing interest in AI, various conferences were held on this
sub -Japan Joint Conference on
participants in Kyoto. During the conference, the Japanese presented their fifth-generation
research hereupon, there was a fear among Americans that they are falling behind Japan.
Because of this fear, funds were quickly transferred to AI research in the United States, and
AI was developed on a regular basis (Anderson & McNeill, 1992, p. 18).
The widespread use of ANNs in the field of finance is occurring in parallel with the
development of the backpropagation algorithm by Rumelhart and McClelland. Multi-layered
networks are needed in the solution of financial problems since financial problems include
a linear as well as a non-linear structure. Although there are multi-layered networks that have
been developed before, these networks have not been trained. However, with this algorithm
developed by Rumelhart and McClelland, this problem has been overcome, and ANNs have
attracted great interest in the field of finance. Today, ANNs are effectively applied in
financial forecastings, such as for financial indices, stock prices, portfolio diversification,
bond valuation, loan repayment rates, real estate prices, and the bankruptcy of enterprises.
28
In summary, AI was used in customer orders by DEC (Digital Equipment
Corporation), an American computer and technology company, and when it was seen that
40 million dollars were saved, the interest in AI increased again, and countries started to
make serious investments in this field. In 1975, Holland produced the genetic algorithm
based on the principle of natural development with living things, and in 1976 he developed
the Sejnowski Boltzman Machine and applied the backpropagation algorithm to it. In 1978,
Sutton and Barto developed a reinforcement learning model, and in 1982, hundreds of
studies in this field were presented at the international ANN conference on PC-based neuro,
with Hecht-Nielsen s TRW MARK III, the first modern electronic neuro computer (Kubat,
2019, p. 621).
Martens (2010) and Martens and Sutskever (2011) developed powerful deep and
recurrent neural network (RNN) models that employ momentum stochastic gradient descent
in datasets with well-designed long-term dependencies (Long Short Term Memory, LSTM)
Hessian-Free found that it showed higher training performance than the performance
obtained from optimization (Sutskever et al., n.d., pp. 1 9). AI is a structure that is directly
affected by these disciplines, which affect many disciplines such as computer science,
engineering, biology, psychology, mathematics, logic, philosophy, business, finance, and
linguistics. For this reason, its applications are seen in many fields, from medicine to the
military, from economics to meteorology. Of course, its use in so many different fields
appears impressive, but it has found such a wide range of applications as a result of years of
research from the 1940s to the present r, 2021, p. 3).
29
Table 4.1. Some important advantages in neural network research
Researcher Contribution
In the Table 4.1, the main developments in AI studies are summarized by year. After
the 2000s, advances in AI research accelerated. The increasing interest in this field has led
to the emergence of new AI techniques based on ANNs. One of them is DL technology.
However, in order to understand DL, it is necessary to understand the structure and workings
of ANNs. These issues will be discussed in detail in the following sections.
30
5. ARTIFICIAL INTELLIGENCE TECHNOLOGIES
Computer programs are technologies that have been developed and can solve a
problem in the same way that experts in a certain field can solve based on their expertise. Its
main fields of application are medicine and biomedicine (Staub et al., 2015, p. 1478). Experts
use their knowledge and experience to solve problems. The computer should understand and
store this knowledge and experience . It is used in applications that
need both machine and human intervention. These systems consist of three parts: The Rule
Base, The Database, and The Rule Analyzer (Kubat, 2019, p. 625).
ML is the field of computer science that creates algorithms to process large amounts
of data and learn from it. It can learn the relationships between the inputs and outputs of
events using examples. By interpreting the learned information and similar events, decisions
are made or problems are solved. In order to understand ML, it is necessary to understand
learning. Learning is the process of improving behaviors through the discovery of new
information over time -21). Since this is classical programming in AI
technology, linear or non-linear relationships cannot be established between the data and the
results, and rules cannot be created. Since this is classical programming in AI technology,
31
linear or non-linear relationships cannot be established between the data and the results, and
rules cannot be created. In this case, the solution cannot be found. For this reason, ML
algorithms, which have mostly been used in military and commercial fields in recent years,
are divided into three subgroups: supervised learning, unsupervised learning, and
reinforcement learning algorithms, due to their ability to establish a relationship between
data and results .
Fuzzy logic, which was first used by L. Zadeh in 1965, is based on the combination
of the advantages of ANNs such as learning and decision-making. In addition, fuzzy logic-
neural networks (ANNs) are one of the most suitable methods for time series calculations
(Staub et al., 2015, p. 1478). Today, many events take place in uncertain conditions. For
events that are not known with certainty, experts use definitions such as normal high,
approximate and low Fuzzy sets are developed to represent and use non-statistical
uncertainties with data and information. Every problem has its own uncertainty. This method
tries to describe this uncertainty mathematically. Therefore, these systems are suitable for
uncertain or approximate inference and especially for systems whose model cannot be
expressed mathematically (Kubat, 2019, p. 623). They are technologies that make it easier
to process uncertain information and make decisions in situations where precise numbers
cannot be expressed .
The genetic algorithm process, which is likened to the natural evolution process,
includes operators used in natural evolution such as reproduction, crossover, and mutation.
It is a technology used in solving complex optimization problems. In order to solve a
problem, random initial solutions are determined, and better solutions are sought by
32
matching these solutions with each other. This search continues until the best result is
produced. In the technology of genetic algorithms, it is accepted that the features that will
produce the desired result in solving problems pass to new solutions obtained from initial
solutions, and from them to later solutions, through inheritance . In
other words, it is a method dependent on natural genetics and natural selection mechanisms
by coding parameter sets and using objective function and functioning information.
Mechanical learning, economics, social sciences, and information systems are application
areas (Staub et al., 2015, p. 1478).
ANNs, or in other words, neural networks , are systems that examine the structure
of the human brain, consisting of neurons and learning methods. In the 19th century, the
studies of psychologists and neuropsychologists to understand the human brain formed the
basis of ANNs. The first scientific study on artificial neural networks started with McCulloch
and W. Pitts, as mentioned earlier.These systems try to fulfill the functions of the brain,
developed by taking into account the brain structure and the interaction of neurons with each
other (Kubat, 2019, p. 623). These systems also focus on learning. They learn the
relationships between events from examples, and then make decisions using the information
they have learned about examples they have never seen .
All of the AI technologies were developed to serve people in their daily lives and are
still developing rapidly. The basic structure of AI technologies is an ANN system, which is
created by simulating the working style of the system consisting of simple biological nerve
cells (neurons), and an ANN system created according to this structure also forms the basis
of all artificial intelligence studies. ANNs, which enable computers to learn, will learn to
operate by trial and error and use this knowledge in problem-solving.
In the literature, it is seen that ANNs are used in many fields, from medicine to
defense, from psychology to finance, due to their superior success in forecasting and
classification studies. ANNs, which are briefly introduced here, will be discussed in more
detail in a separate section, since DL, which forms the basis of our study, is a sub-branch of
ANNs and works according to this system.
33
5.6. Deep Learning
Since DL technology and the ANNs on which it is based are the main subjects of the
study, they will be explained in detail in the following sections.
34
6. ARTIFICIAL NEURAL NETWORK
There is no single, agreed-upon definition for ANNs. Many definitions can be found
in broad and narrow contexts. In fact, some researchers argue that instead of giving a general
definition for ANN, ANN types should be defined within themselves. However, some of the
definitions in the literature related to ANN are as follows:
ANNs are logical software developed to imitate the working mechanism of the
human brain and to perform the basic functions of the brain such as learning, remembering,
and deriving new information through generalization (Yazici et al., 2007, p. 65).
ANNs can be defined as a computer program written for a mathematical formula that
will enable the parameters to be adapted with the help of a set of examples, in the shortest
and simplest way without going into technical details (Anderson & McNeill, 1992, p. 4).
The ANN is a system formed by connecting neurons in different ways, which are a
simple processing elements that, in essence, mimic the way the human brain works. Each
neuron receives signals from other neurons or from outside, combines them, transforms
them, and produces numerical results (G. Zhang et al., 1998, p. 37).
This model, inspired by the biological working system of the human brain, is an
information-processing model that describes processes and gives examples. ANN is a
network of highly connected and organized neurons in layers (G. P. Zhang, 2003, pp. 163
164).
35
An ANN is a parallel and distributed single- or multi-layered computing system
consisting of many simple processing elements interconnected with one-way signal channels
in a weighted form (Kamruzzaman et al., 2006a, p. 3).
The human nervous system has a very complex structure. The brain is the central
element of this system, and it is estimated that there are about neurons (nerve cells)
connected to each other by subnets, and they have more than 6x connections. The
human nervous system has a very complex network structure. Nerve cells are specialized
cells that carry information through an electrochemical process. These nerve cells come in
different shapes and sizes. Some are only 4 microns (4/1000 millimeters) wide, while others
are 100 microns wide. Although different types of nerve cells have differences in terms of
shape and function, as shown in Figure 6.1, they all consist of 4 different regions: the
dendrite, the nucleus and soma or cell body, the axon, and the junction or synapse (Anderson
& McNeill, 1992, p. 3).
Synapses can be viewed as connections between nerve cells. These are not physical
connections, but rather gaps that allow electrical signals to pass from one cell to another.
These signals go to the soma, which processes them. The nerve cell creates an electrical
signal and sends it to the dendrites via the axon . Dendrites form the
36
input channels for a cell. Dendrites convert these signals into small electric currents and
transmit them to the cell body. The cell body processes the signals coming through the
dendrites and converts them into output. These outputs produced by the cell body are sent
via axons to be input to other neurons (Anderson & McNeill, 1992, p. 4).
Source: hhttps://en.wikipedia.org/wiki/Dendritetml
In a neuron, the dendrite receives input signals, the soma processes input signals, the
axon converts input signals into output signals, and finally, the synapses provide
electrochemical contact between neurons. In the simulation of this structure adapted to
ANNs, each process element is named as follows: Synapses represent weights; dendrites
represent the summation function, the soma activation function, and the axon output.
An ANN consists of a large number of simple processing units called neurons, units,
cells, nodes, and process elements. Just as biological neural networks have nerve cells,
artificial neural networks also have artificial nerve cells. ANNs are a program designed to
simulate the way the simple biological nervous system works. Artificial neural networks
contain simulated nerve cells, and these cells connect to each other in different ways to form
the network. Figure 6.2 presents the structure of a general artificial neuron (Kamruzzaman
et al., 2006a, p. 3).
37
In the artificial neuron in Figure 6.2, the inputs are given by and i=0, 1, n;
and the weights are shown with mathematical symbols as ve . Each of these
input values is multiplied by a link weight. In the simplest form, these products are summed
up and sent to a transfer function, where the result is produced. This result is then converted
to output (Y). Here, the input vector is shown as {X1, X2, Xn} and the appropriate weights
vector is shown as {Wj1, Wj2 Wjn}.
Inputs
Sum Output
Activation Function
Weights
ANNs, inspired by the biological nervous system, have the capacity to learn,
memorize, and reveal the relationship between data. Just as biological nervous systems are
made up of nerve cells, ANNs are made up of artificial nerve cells. An artificial nerve is a
simple structure that mimics the basic functions of a biological nerve. Neurons in the
network take one or more inputs and give a single output. This output can be output to the
outside of the artificial neural network, or it can be used as an input to other neurons.
Artificial nerve cells are called process elements (neurons) in engineering science. Although
there are differences in the cell models developed, in general, each processing element
consists of 5 basic components: inputs (X), weights (W), summation function , activation
function and cell output ). These components can be explained as follows (Haykin,
1999, pp. 11 51):
6.3.1. Inputs
It is the information coming to an artificial nerve cell from the outside world or other
cells. These inputs are determined by the examples the network is asked to learn. The
38
artificial neuron can receive information from the outside world, as well as from other cells
or from itself. A neuron can usually receive many simultaneous inputs. Each input has its
own relative weight.
6.3.2. Weights
Inputs show the information coming to the artificial neuron, while weights show the
importance of the information coming to the artificial neuron and its effect on the cell. The
weight 1 (w1), in Figure 6.2 shows the effect ot input 1 (x1) on the cell. The fact that the
weights are large or small does not mean that they are important or unimportant. A weight
value of zero may be the most important event for that network. Weights can have positive
or negative values. A weight of value plus or minus indicates whether the effect is positive
or negative, while a zero indicates that there is no effect. Weights can have variable or fixed
values. All connections that provide the transmission of inputs between neurons in the
artificial neural network have different weight values. Thus, the weights act on each input of
each neuron. They are connected to a cell with information, inputs, and weights received
from the external environment. The value of the net input is calculated by using the activation
function by calculating with different methods and the net output value is obtained. The
important thing here is that the error between the actual values and the output values is the
least.
The summation function is a function that calculates the net input to a cell. Although
different functions are used for this, the most common one is to find the weighted sum. Here,
each incoming input value is multiplied by its own weight. Thus, the net incoming input is
found. However, in some cases, the addition function may not be such a simple operation.
Instead, it can be much more complex, such as minimum (min), maximum (max), mode,
product, majority, or a few normalization functions. The algorithm that will combine the
inputs is usually also determined by the chosen network architecture. These functions can
generate values differently, and these values are forwarded. The total input of any neuron is
equal to the weighted sum of the values from other neurons and the threshold value
(Kamruzzaman et al., 2006a, p. 3).
39
Figure 6.3. Summation function
Source
40
6.3.4. Activation Function
It is the function that determines the output that the cell will produce in response to
this input by processing the net input obtained by the summation function. The purpose of
the activation function is to allow the output of the addition function to change when time is
involved. As in the addition function, various types of activation functions can be used in
the activation function, depending on the function the cell will perform. For example, in
multilayer perceptron models, the activation function must be a differentiable function.
With the NET value obtained by the summation function in ANNs, the next step
which is the activation function, is passed. It determines the output to be produced by the
current network structure by processing the obtained NET input value with the selected
activation function. By means of nonlinear functions that can be used instead of a threshold
function or a linear function of the inputs, it is possible to model nonlinear input-output
relations with ANNs. The activation function is the function that provides linear or nonlinear
matching between the input and the output .
The choice of the activation function largely depends on the data of the neural
network and what information it is desired to learn. Generally, the chosen activation function
is a nonlinear one rather than a linear one. Because in linear functions, the output is linearly
proportional to the input. The most suitable function for a problem is determined as a result
6.2. The sigmoid function (Logsig) or hyperbolic tangent function (Tansig) is preferred as
the activation function in the Multilayer Perceptron (MLP) model, which is the most widely
model used today. There is no strict rule about which of the join or activation functions to
use. The researcher decides which one to use in which layer by trial and error.
41
Table 6.2. Some activation functions
Step (Heaviside)
F(NET) =
Function
Sigmoid Function
(Logsig)
Hyperbolic Tangent
(Tansig)
Threshold Value
Function F(NET) =
Source: Howard Demuth and Mark Beale (2004); p.1-7; Kamruzzman Joarder, Rezaul K. Beggand
and Ruhul Sarker (2006); p.4.
6.3.5. Output
It is the output value determined by the activation function. The output produced is
sent to the outside world or to another cell. The cell can also send its own output as input to
itself. In fact, there is only one output value from a neuron. The same value goes to more
than one neuron as input. An artificial neural network can have a single or multi-layered
structure. The number of cells in the layers may differ from each other. As the number of
cells in the intermediate layers increases, the process may become more difficult, but the
42
result may be better. Therefore, the number of neurons to be used in the study, the number
of layers, the number of neurons in the layers, and the selection of the addition and activation
functions will enable the network to learn and make the closest estimation to the truth with
the least error. This is possible only by trial and error. This procedure is carried out using
computer programs that have been created. Matlab, Python, and Neurosolution are some of
these programs.
The easiest way to design a structure is to separate the elements into layers. There
are three parts to the layering here. These are grouping neurons into layers, grouping
connections between layers, and finally grouping summation and transfer functions. This
situation is illustrated in detail in Figure 6.4. In other words, the coming together of nerve
cells cannot be random. In general, they come together in 3 layers: the input layer, hidden
layer or intermediate layer, and output layer, and in each layer, in parallel, they form the
network (Kubat, 2019, pp. 668 53). These layers are:
Neurons in this layer receive information from the outside world and transfer it to the
hidden layers. In other words, inputs are information coming to neurons. In some networks,
there is no information processing at the input layer. They just pass the input values to the
next layer. For this reason, some researchers do not add this layer to the number of layers of
networks.
43
6.4.2. Hidden (Intermediate) Layer
Information from the input layer is processed and sent to the output layer. The
processing of this information is carried out in hidden layers. In a network, the input and
output layers consist of a single layer, while the hidden layer can consist of more than one
layer. The hidden layers contain a large number of neurons, and these neurons are fully
connected with other neurons. The selection of the number of neurons in the hidden layer is
very important in terms of defining the size of the network and knowing its performance. In
addition, increasing or decreasing the number of neurons and layers in the hidden layer
affects whether the network is simple or complex.
The neurons in this layer process the information from the hidden layer and produce
the output that the network should produce for the input set (sample) presented from the
input layer. The output produced is sent to the outside world.
OUTPUTS
INPUTS
Source: p.53.
The neurons in each of these three layers and the relationships between the layers are
shown in Figure 6.4. The round shapes in the figure show neurons. There are parallel
elements in each layer, and the lines connecting these elements show the connections of the
network. Neurons and their connections form an artificial neural network. These
connections weights are determined during learning.
44
6.5. Architectural Structures of the Artificial Neural Networks
In general, ANNs are classified according to three main criteria. One of these criteria
is the connection structure of ANNs, which is also called the architecture of the network.
The artificial neural network's connection structure differs from one another based on the
directions of the connections between the neurons or the flow directions of the signals in the
network. While some networks are configured as feedforward, some networks contain a
feedback network structure.
The third classification criterion is the classification made according to the number
of layers. According to this classification, networks are divided into single-layer and multi-
layer networks.
Feedforward Networks
Neurons in a feedforward network are usually divided into layers. Neurons in each
layer are associated with neurons in the next layer by connection weights. However, the
layers do not have any connection among themselves. In feedforward networks, information
flow is carried out in one direction from the input layer to the output layer without feedback
45
(Fathi & Maleki Shoja, 2018a, p. 251). This is also called the activation direction. An
example of this type of artificial neural network is a single or multi layer perceptron. Such
networks are trained using the supervised learning technique.
Output
Y Output
Layer
Bias w jk
+1
H1 H 2 H p Hidden
Layer
Bias
+1 v ij Weights
x1 x2 x3 xn Input
Layer
shown as {H1, H2, Hp}, Y represents output neuron, and vij shows, the weight of the
connection from the input neuron i to the hidden neuron j and Wjk, shows the weight of the
connection from the hidden neuron j to the output neuron k. The units shown as +1 are the
threshold values.
The input layer transmits the information it receives from the external environment
to the neurons in the hidden layer without making any changes. This information is processed
in the input layer and output layer to determine the network output. With this structure,
networks perform a nonlinear static function. The most well-known backpropagation
learning algorithm is used effectively in the training of this type of artificial neural network,
and sometimes these networks are also called backward propagation networks. The
feedforward backpropagation architecture was developed in the 1970s. Generally, a
backpropagation network has an input layer, an output layer, and at least one hidden layer.
There is no restriction on the number of hidden layers, but usually, one or two hidden layers
are used (Haykin, 1999, p. 22).
46
In backward propagation networks, the number of layers and the selection of the
number of neurons in each layer are important decisions in terms of affecting the
performance of the network. There are no clear selection criteria for what these numbers will
be. However, there are general rules that must be followed. These rules can be expressed as:
1. As the relationship between input data and output becomes more complex,
the number of neurons in the hidden layers also increases.
2. If the subject matter can be divided into several stages, it may be necessary
to increase the number of layers.
3. The amount of training data used in the network, constitutes the upper limit
criterion for the number of processes in the hidden layers.
Recurrent Networks
A recurrent neural network is a network structure in which the outputs in the output
layer and hidden layers are recurrent to the input units or previous intermediate layers. In
this network, there are turns or recurrent connections between neurons and therefore they
are said to have dynamic memory. In recurrent networks, the output of any cell can be sent
directly to the input layer and used as input again. Recurrent can be between cells in a layer
as well as between neurons between layers. ANNs with recurrent exhibit nonlinear dynamic
behavior. As examples of these networks, Hopfield, Elman, and Jordan nets can be given.
Figure 6.6 shows the structure of a recurrent artificial neural network structure (Haykin,
1999, p. 24).
47
1
Z 1
Outputs
1
Z 2
1
Z k
Z
1 N
Unit-Delay
operators
Inputs
Recurrent ANNs can be developed with models with richer dynamics than
feedforward ANNs. However, feedforward networks are more applied than recurrent
networks in academic and practical fields. This is because recurrent networks are difficult to
implement in practice. In particular, the fact that recurrent networks can be created with
many different structures may prevent specialization in a particular model structure and
make the training phase difficult due to the inconsistency of the training algorithms (Kaastra
& Boyd, 1996, p. 217). In addition, the training of recurrent networks takes a long time. In
particular, as the number of data points in the training set increases, the training time gets
longer. For this reason, a feedforward network structure is preferred for solving problems
related to multivariate and long time series.
The main feature of ANNs is their ability to learn. The basic philosophy of learning
is to learn the relationships between the inputs and outputs of an event by using actual
examples of that event, and then determine the outputs of new examples that will occur later
according to these relationships. Here, it is accepted that the relationship between the inputs
and outputs of the examples related to an event contains information that will represent the
overall event 24).
48
The process of determining the weight values of the connections between the neurons
of an artificial neural network is called training the network. The learning process is started
by randomly assigning these weight values at the beginning. As the determined samples are
introduced to the network, the weight values of the network change, and the network
continues this process until it produces the desired outputs. Training of the network is
complete when the network is able to produce the desired outputs and make generalizations
about the events it represents. This is called network learning. Learning styles that enable
finding the best weight set during the solution of the problem are divided into three
categories: supervised (with the teacher), unsupervised (without a teacher), and supportive
learning 26). In the literature, these learning methods are also
referred to as machine learning algorithms , which are frequently used in time series data
analysis and are also a sub branch of ANNs.
Supervised Learning
In the form of supervised learning, a teacher or superviser helps the learning system
to learn the event. The supervisor illustrates the event that will be taught to the system as an
input/output set. In other words, for each sample, both the inputs and the outputs that need
to be generated in return for those inputs are sent to the system. The task of the system is to
map the inputs to the outputs determined by the consultant. In this way, the relationships
between the inputs and outputs of the event are learned.
In this learning, the difference between the outputs produced by the network and the
target outputs is considered as an error, and this error is tried to be minimized. This process
is tried continuously until acceptable accuracy is reached. In these trials, the weights of the
connections are changed to give the most appropriate output. If the intended output cannot
be produced against the given input, the connection weights should be changed to minimize
the error in the output value of the network. For this reason, supervised learning requires a
mentor. The consultant evaluates the performance of the control and guides the learning
process to gradually improve performance. Mean Absolute Error (MAE) and Root Mean
Squared Error (RMSE) performance criteria are used in supervised learning algorithms. As
examples of supervised learning algorithms, Widrow-Hoff s Delta Rule (Delta Rule) and
Rumelhart and McClelland s Generalized Delta Rule (backpropagation algorithm) can be
given. The operation of a supervised learning algorithm is shown below.
49
Inputs
Expected Output
Weights
Training Algorithm
Unsupervised Learning
In unsupervised learning, there is no teacher helping the system learn. Only input
values are shown to the system. The system is expected to learn the relationships between
the parameters in the examples by itself. This is an algorithm mostly used for classification
problems. Only after the learning of the system is finished, the labeling that shows what the
outputs mean be done by the user. Examples of this type of learning are Hebb, Hopfield, and
Kohonen learning algorithms and ART networks. The forecasting of such networks is not
evaluated by how well the network predicts a particular readily observed target variable
(Anderson & McNeill, 1992, p. 41).
Reinforcement Learning
In this type of learning, a teacher assists the network. However, instead of showing
the system the output set that should be produced for each input set, the supervisor expects
the system to produce the output corresponding to the inputs shown to him and generates a
signal indicating whether the produced output is true or false. The system continues the
learning process by taking this signal from the supervisor into account. As an example of
networks where this learning method is used; Linear Vector Quantization Model (LVQ) can
be given. It is used in many fields such as game theory, statistics, information theory,
simulation-based optimization, and genetic algorithms.
50
6.5.3. Artificial Neural Network Classification Based on the Number of Layers
According to the number of layers, ANNs are divided into two types: single-layer
neural networks and multilayer neural networks.
Single-layer neural networks consist of only input and output layers. Every network
has one or more inputs and outputs. Output units are connected to all input units. Every link
has a weight. In these networks, there is also a threshold value ( ) that prevents the values
of the neurons and thus the output of the network from being zero. The input of the threshold
value is always +1. Because there is no hidden layer to provide curvature, this type of mesh
is mostly used for linear problems.
In single-layer networks, each network has one or more inputs (Xi ; and
only one output (Zk; k=1), and each link has a weight shown with (Wik; i=1, 2 n; k=1).
The artificial neural network model in Figure 6.8 has a single-layer structure.
Bias
1
x1
w jk
x2 z k
xn
Source
51
The output of the network is found by summing the weighted input values with the
threshold value (Bias) ( ). This input value is passed through a transfer function to calculate
the output of the network. This process is formulated as below 60):
m
f( wi xi + )
i 1
In single-layer perceptrons, the output function is linear. In other words, the examples
shown to the network are shared between two classes, and the line that separates the two
classes is tring to be found. For this reason, the threshold value function is used. The most
important single-layer ANNs are the Simple Perceptron Model (SPM), the Adaptive Linear
Element Model (ADALINE), and the Multiple Adaptive Linear Element Model
(MADALINE).
The Simple Perceptron Model was first developed by Rosenblatt in 1958 to classify
successive processes in a certain order. It is a single-layered, trainable, and single-output
model that emerged as a result of studies carried out to model brain functions. It is based on
the principle that a nerve cell takes more than one input and produces an output. The output
of the network is a logical value consisting of one or zero. The threshold function is used to
calculate the value of the output . 61).
It is the first artificial neural network to be trainable. In the simple perceptron model,
the output value is derived when the net value, which is the sum of the products of the inputs
and their corresponding weight values, exceeds the threshold value. Learning about the
artificial neural network is possible by changing these weights. The most important feature
of the simple perceptron model is that it has the property of converging the correct weights
from the input variables entered into the network in the event of a solution to the problem.
The simple perceptron model is the basis for multilayer networks that will be developed later
and will be revolutionary for artificial neural network models .
52
an input layer, an output layer, and one or more hidden layers between these two layers. This
feature is the main feature that distinguishes multi-layer perceptrons from single-layer
perceptrons. Multilayer perceptrons are used in solving complex problems, especially in
predictions. Because this network has the ability to automatically transform into a non-linear
structure with a series of operations performed in the hidden layer of its structure (G. Zhang
et al., 1998, pp. 37 38).
The inadequacy of single-layer perceptrons in solving linear problems has led to the
idea that ANNs do not work. However, multilayer perceptrons developed as a result of
studies to solve the XOR problem have been an important step in the historical development
of ANNs. Research has resulted in the emergence of MLP and its learning rule, the back
propagation algorithm. The Multilayer Perceptron Model (MLP) is also called the Error
Propagation Model or Back Propagation Model
The multi-layer artificial neural network structure is given in Figure 6.10. ANNs have
an MLP structure. The first layer of MLP is the input layer. The last layer is the output layer.
Between the input layer and the output layer, there is at least one hidden layer with hidden
neurons. Each neuron in the hidden layer has a nonlinear transfer function, and the results
obtained through these functions provide input to the next neurons (Smith & Gupta, 2000,
p. 1025). The two most commonly used transfer functions in MLP are the sigmoid function
and the hyperbolic tangent function.
53
Inputs
Input Layer
Bias
Hidden Layer
Output Layer
Outputs
Source
The Figure 6.10 illustrates a simple backpropagation network structure that includes
an input layer, a hidden layer, and an output layer. Circles arranged in layers represent
processing elements, or neurons. There are three neurons in the input layer, and three
variables are introduced as input to the network. There are three neurons in the hidden layer
and three neurons in the output layer. Therefore, three variables are outputs from the
network. The values transmitted from the input layer to the hidden layer and the values
transmitted from the hidden layer to the output layer are weighted with a weight set. In the
network structure, the thick arrows represent the information flow during recall. Recall is
the process of presenting and outputting new input data to a trained network. Therefore, back
propagation is not used during the recall process. Back propagation is used only in the
training process, so the information flow in the training process is indicated by all the arrows
in the figure. There are no connections between neurons in the same layer. The links only go
forward .
54
6.6. Learning Algorithms in Artificial Neural Networks
Before an MLP can be used for foresight, it must first be trained. In MLPs, connection
weights are determined that allow the expected output to be calculated during the training
phase . In a linear regression, the computation is easily done by finding
the sum of the squares of the error. However, ANNs include non-linear optimization by
nature, which causes the training process to be more difficult and complex than linear
regression. Although there are many training algorithms developed in the literature, the most
widely used among these algorithms is the back propagation algorithm developed by Werbos
(1974) and Rumelhart (1986) et al (G. P. Zhang, 2004a, p. 5)
The backpropagation algorithm has been one of the most important developments in
the history of ANNs. This algorithm can be applied to multilayer perceptrons consisting of
neurons with a continuous, differentiable activation function. The backpropagation
algorithm minimizes total error by adjusting weights with a gradient descent approach or
gradual decrease (G. P. Zhang, 2004b, p. 5).
The learning rule of the back propagation network is a generalization of the Delta
learning rule based on the least squares method. For this reason, the learning rule is also
called the Generalized Delta Rule 12, p. 77).
The back propagation algorithm allows one to find the weights that will produce the
most suitable solution for the given training set. The backward propagation algorithm
consists of two steps, forward computation and backward computation 2012, pp.
78 80; Smith & Gupta, 2000, p. 6):
The stage of forward computation is the stage of computing the output of the network.
In the backward calculation phase, it is the rearrangement of the weights in the model based
on the errors in the outputs. This algorithm allows for finding the most suitable weight values
for the training set presented to the network. Weights are regulated based on gradient
descent.
55
The implementation process of the back propagation algorithm of a single hidden
layer and feed-forward MLP is as follows:
Step 1: After samples are collected and the topological structure of the network and
learning parameters are determined, the initial values of the weights are randomly assigned.
Step 2: Steps 3 through 9 are repeated until the necessary condition for the training
to stop is met.
Step 3: For each set of training data, steps 4 through 8 are repeated.
Step 4: It starts the input layer of the network by displaying the inputs (G1, G2
from a sample selected from the training set. Inputs are transmitted to the middleware
without in the input layer is Ç ik
Step 5: Each input to the neurons in the hidden layer is multiplied by the weights
{W1, W2 Wn} and the net input is calculated as follows:
n
a i
NET j w kj Ç k
k 1
The sigmoid function is the activation function, and the important thing here is to use
a differentiable function and when used, the output is as given below. j in this equation is
the weight of the threshold value of the element in the hidden layer.
56
a 1
Çj a a
1 e NET j j
When these calculations are made by all neurons and finally the output values of the
output layer are found, the forward calculation phase ends. The activation function in the
hidden layer and the activation function in the output layer do not have to be the same. One
of them can be a sigmoid function and the other a tangent function or some other function
.
Step 6: For the input presented to the network, the output produced by the network
and the expected outputs (E1, E2 ,....) are compared, and the difference, or error value, is
distributed to the weight values of the network, and the error is tried to be reduced in the
next iteration. (em);
em = E m - O m
Figure 6.13.
This value is the error obtained for a processing element (the neuron). To find the
Total Error (TE) for the output layer, all errors must be summed. The main purpose of
training the MLP network is to minimize this error. TE is formulated as:
1
TE = 2
em
2 m
The error is distributed to the neurons to minimize the total error. This is done by
changing the neuron weights. This is achieved by changing the weights between the hidden
layer and the output layer, and between the hidden layer and the hidden layer input layer.
57
Step 7: If we call wa the amount of change in the connection weights connecting
m = ). em
Figure 6.16. output unit error
m = Om (1 - Om ). em
Figure 6.17. Output unit error
Step 8: After calculating the amount of change, the new values of the weights in the
calculated as follows:
a a a
w jm t w jm t 1 w jm t
Figure 6.18. Value of weights
58
In addition, the weights of the threshold value are changed. If the threshold value of
ç
the neurons in the output layer is represented by their weights the amount of change is
given as the threshold output is constant and 1;
ç ç
m t m m t 1
The new weight value of the threshold value in the t iteration is as follows:
ç ç ç
m t m t 1 m t
In the second case, the errors of all elements in the output layer must be taken into
account when changing the weights between the input layer and the middle layer or between
the two middle layers. If this weight change, for example, is represented by wi between the
input layer and the middle layer, the amount of change is;
a
The factor is in the form of;
a a
j f ( NET ) m w jm
m
a
Figure 6.22. factor value
a
and when the sigmoid function is used factor value is;
a
Figure 6.23. factor value
59
The new values of the weights are;
i i i
wkj t wkj t 1 wkj t
Figure 6.24. The new values of the weights
a a a
j t j j t 1
The new values of the weights for the t iterastion are provided by the equation
below,
a a a
j t j t 1 j t
As a result, the forward and backward calculation steps are performed for one
iteration. All of the weights are rearranged, and the process of editing the weights is
continued until the termination criterion is met.
Step 9: If the error has reached the predetermined fault tolerance or the termination
criterion has been met, the training is stopped. Otherwise, the process is repeated starting
from Step 4.
The above-mentioned steps continue until the learning of the multilayer perceptron
is complete, that is, until the errors between the realized outputs and the expected outputs
are reduced to an acceptable level. For the network to learn, there must be a stopping
criterion. This is generally taken as the error falling below a certain level.
Input data is presented to the network and propagates throughout the network until it
reaches the output layer. This forward process produces the calculated output.
An error value is calculated for the network by comparing the calculated output with
the target output. To train the network, back propagation algorithms from supervised
60
learning algorithms are used. This algorithm propagates backwards through the network,
starting with the weights between the output layer neurons and the last hidden layer neurons.
When backpropagation is finished, the forward process starts again, and this cycle
continues until the error between the calculated output and the target output is minimized.
When the back propagation algorithm is chosen as the learning algorithm, two
parameters become important. The first of these is the learning rate ( ), and the other is the
momentum coefficient ( ). These two parameters will be briefly described below.
Learning Rate ( )
The learning rate has a significant impact on the performance of the network and
determines the amount of change in the weight values of the links. This coefficient usually
has a value between 0 and 1. In the case where the learning rate is small, the training process
takes a long time, while it takes a shorter time with the growth of this value. While very
small values of the learning rate cause the learning process to slow down unacceptably, very
high values of the learning rate cause instability. In a case where the learning rate is greater
than 1, the network oscillates between local minima, and no convergence event occurs. In
addition, the fact that this ratio is too low may not provide the opportunity to find the
minimum 12, p. 99; G. Zhang et al., 1998, p. 48).
Momentum Coefficient ( )
This coefficient, which affects the learning performance of the network, is defined as
the addition of a certain rate of the change in the previous one iteration to the new change
amount 12, p. 99). This coefficient has been specifically proposed to enable
networks stuck in local solutions to find better results with a jump and is a factor that helps
the network recover faster. Taking into account the momentum coefficient, which has values
between 0 and 1, produces a reduction in the number of steps and the total network error (G.
Zhang et al., 1998, p. 48). The momentum coefficient helps to speed up the educational
process with a large learning rate, while minimizing the tendency to oscillate.
61
Determining the optimal learning rate and momentum coefficient is largely
experimental and heuristic. In addition, these parameters vary greatly depending on the
problem area of interest.
In the literature on neural networks, there are many learning rules used in learning
systems. The majority of these learning rules are based on the oldest and best-known Hebb
Learning Rule. Some important learning rules in use are given below.
It is the oldest and most famous learning rule, developed on a biological basis by
Canadian psychologist Donald Hebb in 1949. According to this rule, if a neuron receives
input from another neuron and both neurons are highly active (i.e., they have the same
mathematical sign), the weight of the connection between neurons must be increased. In
other words, if a cell is active, it tries to make the cell it is connected to active, and if it is
passive, it tries to make it passive (Hebb, n.d., pp. 60 78).
This rule, which is similar to the Hebb rule, determines how much the connections
between the artificial neural network elements should be strengthened or weakened. The
main difference is that it also determines the size of the change to be made in the connection
weight. Accordingly, if both the input and the desired output are active or both are inactive,
the connection weight is increased by the learning coefficient; otherwise, it is decreased by
the learning coefficient. Strengthening or weakening of the weights is carried out with the
help of the learning rate. The learning coefficient is a fixed and positive value assigned by
the user, which generally takes a value between 0 and 1 2012, p. 26).
One of the most commonly used learning rules is the Delta Rule. This rule is an
improved version of the Hebb rule. It was developed by Widrow and Hoff (1960). It is based
62
on switching input connections in order to reduce the difference (delta) between the desired
output and the actual output of the processing unit. For this reason, this algorithm is also
known as the least-squares rule. The error is reduced by simultaneously propagating back
from one layer to the previous layers. The process of debugging the network continues from
the output layer to the input layer (Anderson & McNeill, 1992, p. 30).
This rule is similar to the Delta rule. The adjustment of the weights can be performed
in a manner proportional to the first derivative of the error (gradient) between the desired
output and the actual output for a processing unit. The aim is to get rid of the local minimum
and catch the general minimum by reducing the error function (Kamruzzaman et al., 2006b,
p. 131). It is one of the most used optimization algorithms in ML. The size or smallness of
the learning rate is an important factor in achieving the optimal result. A low learning rate
increases the processing time required to achieve the optimal result, whereas a high one does
not reduce error because the steps are too large and deviate from the optimal. Therefore, the
learning rate is chosen to be large enough to reach the global optimum by trial and error. The
Loss
Epoch
Source: https://medium.com/deep-learning-turkiye/gradient-descent-nedir-3ec6afcb9900
63
6.7.5. The Kohonen Learning Rule
This rule was developed by Teuvo Kohonen (1982), inspired by learning in biological
systems. Neurons are thought to compete to learn to adjust their weight. According to this
rule, the elements of the mesh compete with each other to change their weight. The cell
producing the largest output becomes the winning cell (neuron), and the connection weights
are changed. This means that, this cell becomes stronger than the cells next to it. The cell
has the capacity to warn and ban its neighbors. The Kohonen rule does not require a target
output. Therefore, it is an unsupervised learning method. This rule, also known as self-
organizing, is used especially in studies on distributions. However, due to the fact that its
theoretical infrastructure is not fully developed, it has not become widespread in practice yet
(Anderson & McNeill, 1992, p. 30; Kohonen, 2001, pp. 71 72).
ANNs, which have gained a wide application area in real-life problems, have been
used in the solution of problems in many different areas, especially those that are difficult
and complex to solve, and generally very successful results have been achieved. It is used in
many industries today. There is no limit to the application area. However, it is mainly used
in some areas, such as forecasting, modeling, and classification. ANNs, which emerged in
the 1950s, became sufficiently powerful for general-purpose use only in the 1980s. Since it
is the method that best describes the trend or structure in the data, it is very suitable for
forecasting and forecasting operations. The following examples can be given for the
common uses of ANNs in real life 2012, p. 203-206).
64
ANNs need special environments in which they can work due to their densely
connected and complicated processing structures. Therefore, ANNs run on computers with
special software prepared for this purpose. Today, special hardware is being developed to
run increasingly dense and complex neural networks and to process information faster.
ANNs can produce results according to the information they have without any
intervention by performing learning, association, classification, generalization, prediction,
feature determination, and optimization processes. ANNs have the ability to make correct
decisions for subsequent inputs by arranging themselves with the information given during
the learning process. ANNs, which are used in many fields today, are used effectively in
many fields due to the following general features. ANNs have the ability to make correct
decisions for subsequent inputs by arranging themselves with the information given during
the learning process. ANNs, are used effectively in many fields today due to the following
general features 2012, p. 31).
ANNs perform ML: The main function of networks is to enable computers to learn.
By learning from the events, it tries to make similar decisions in the face of similar events.
The working style of the programs is not similar to the known programming methods:
There are different information processing methods than the ones in which traditional
programming and AI methods are applied.
Nonlinearity: The neuron, which is the basic processing element of ANNs, is not
linear. Therefore, ANNs, which are formed by the combination of neurons, are also non-
linear. However, the fact that the activation function used is not a linear function turns ANNs
into a non-linear structure. Therefore, ANNs can be used to solve nonlinear problems in real
life.
65
Learning: Neural networks learn and solve problems using examples from real
events. In order to learn about events, examples related to those events should be determined.
Artificial neural network gains the ability to make generalizations about the event by using
examples. In order for ANNs to show the desired behavior, they must be programmed in
accordance with their purpose. In other words, correct connections must be made between
neurons, and connections must have appropriate weights. Due to the complex nature of
neural networks, connections and weights cannot be preset or designed. As a result, ANN
should learn the problem by using training examples from the problem it is interested in, so
that it exhibits the desired behavior.
In order to operate ANNs safely, they must first be trained and their performance
tested: Training the network means showing the existing samples to the network one by one
and determining the relationships between the events in the example by running its own
mechanisms. The samples are divided into two sets, the training set and the test set. Each
network is first trained with the training set, and when the network starts to give correct
answers to all examples, the training work is complete.
Ability to generate information about unseen examples: The network can generate
information about examples it has not seen by making generalizations from the examples
shown to it.
Usability in detection events: Networks are mostly used to process information for
detection.
Ability to self-organize and learn: It is possible for the network to adapt to new
situations shown by examples and to learn about new events continuously.
Ability to work with incomplete information: Networks can work with incomplete
information after they are trained and can produce results even though there is missing
information in new samples. This does not degrade their performance. The network itself
can learn what information is important during training.
66
Having fault tolerance: Networks ability to work with incomplete information
ensures that they are fault-tolerant. Since the ANN consists of many neurons connected in
various ways, it has a parallel distributed structure, and the information of the network is
distributed over all connections in the network. Therefore, the inactivation of some
connections or even some neurons in a trained ANN does not significantly affect the
network's ability to produce accurate information. Therefore, their ability to fix the error is
higher than other methods.
Ability to process uncertain and incomplete information: After learning the events,
networks can make decisions by establishing relationships with the events they learned under
uncertainty.
Gradual degradation: The fact that networks are fault-tolerant causes their
degradation to be gradual. A network breaks down slowly and gently over time. This is due
to incomplete information or disruption of neurons. Networks do not degrade as soon as a
problem arises, they degrade gradually.
Distributed memory: In neural networks, information is spread over the network. The
values of the connections of neurons with each other indicate the information of the network.
A single link makes no sense. The network as a whole characterizes the entire event it learns.
Therefore, the information is distributed over the network. This results in a messy memory.
Hardware and Speed: ANNs can be implemented with large-scale integrated circuit
(VLSI) technology due to their parallel structure. This feature is expected in real-time
applications as it increases the fast information processing capability of the neural network.
67
6.9. The Design of Artificial Neural Networks
In this part of the study, information will be given about how to design ANNs and
which criteria to consider during the creation of network architecture. The number of layers
in ANNs, how many neurons these layers will consist of, how the neurons should be
positioned relative to each other, and the flow directions of signals between neurons
determine the structure of the networks.
The process of determining the weight values of the connections of neurons in ANNs
is called training the network Initially, these values are randomly assigned. Networks
change these weight values as examples are shown to them 2012, p. 55).
In the setup phase of the neural network estimator, the sample dataset is divided into
two datasets for training and testing of the network. There is no general rule for separating
data. However, the data type, amount of data, and the characteristics of the problem are
important factors in separating the data set. Inaccuracies in the selection of the training and
test dataset will affect the performance of the network (G. Zhang et al., 1998, p. 50). The
selected datasets should be at a level that can describe the sample space. There are few
suggestions in the literature for the determination of training and test sets. Many researchers
use 90% of the data as a training dataset, while the remaining 10% is used as a test dataset.
Likewise, the rates of 80%, 20%, or 70%, 30% are frequently used in the literature to divide
68
the data into periods (G. Zhang et al., 1998, p. 50). Training samples from the data separated
according to this rule are used to develop the artificial neural network model, while test
samples are used to evaluate the predictive ability of the developed model.
In the learning process, weights are randomly assigned at the beginning, and the
weight values change as the samples are shown to the network according to the chosen
learning approach. The goal here is to find the weight values that will produce the correct
outputs for the examples shown to the network. The network, which has reached the correct
weight values, has become able to make generalizations about the event represented by the
examples and has completed the network learning.
Error (E)
W* Weight (W)
Source: p..82.
Figure 6.28. Representation of learning in error space
69
W in the figure shows the weight vector with the least error. Multilayer networks are
desired to reach the W* value. This weight value represents the point at which the error for
the problem is the smallest. For this reason, by making a change by W in each iteration, it
is ensured that an error of up to E is reduced at the error level.
It should be noted that the error level of the problem will not always be simple and
two-dimensional, as seen in Figure 6.28. Complex problems in real life have many minimum
and maximum points at the error level. The minimum points will only give the W* minimum
weight vector, or global minimum point, at one level. As can be seen in Figure 6.28 although
the weight vector W*gives the least error level, it may not always be possible to catch this
point. This solution is the best the network can have. It tries to reach this solution during
network training. However, instead of stopping the training process of multilayer networks
at the global minimum, which is the best solution point, there is also the possibility of
stopping at the minimum point (W1, W3) with weight values that will give a higher error rate
(Zurada, 1992, pp. 206 207). In this case, it is said that multilayer networks fall into the
local minimum trap during training (Kamruzzaman et al., 2006b, p. 131). The local minimum
point may be at a level that is considerably higher than the global minimum level, or it may
be close to this level 2012, p. 83).
Error (E)
Weight (W)
As seen in Figure 6.29, although W* is the weight vector that gives the least error for
solving the problem, it is often not possible to catch this error value in practice. It tries to
capture the W* solution of multilayer networks during training. Sometimes it can be stuck
70
on a different solution (the local minimum), and it is not possible to improve performance.
For this reason, users accept a margin of error up to a certain level by specifying a tolerance
value ( ) in the performances of the networks. The problem is considered learned when it
falls below the tolerance value. Since the errors of the W0 and W2 solutions in Figure 6.28
are above the acceptable error level, these solutions are unacceptable. These are called local
solutions. Although solutions W1, and W3 are not the best, their error level is lower than the
acceptable level. Although these are local solutions, they are acceptable solutions. As can be
seen, more than one solution can be produced for a problem. Therefore, it cannot be said that
multi-layer networks always produce the best solution. It would be more accurate to say that
they produced an acceptable solution. Even if the solution produced is the best solution, it is
difficult to know. In most cases, it is not possible to know 83).
Figure 6.28 also shows the following possible reasons why the best results could not
be found :
The training set presented to the network while the problem is being trained may not
represent 100% of the problem space.
The correct parameters may not have been selected when the MLP was created.
The network s weights may not have been determined exactly as desired at the start.
Then, the next day, the next.
For these and similar reasons, MLP may not reduce the error below a certain value
during training. For example, W1 finds the weights and cannot lower the error any further.
But W1 is actually a native solution, not the best solution. It can be seen as the local best
solution since the error has decreased to an acceptable level for the W1 vector. It may also
be possible that the global solution does not exist. It all depends on the design of the network,
the nature of the samples, and the training process.
71
that are far from the performance of the training during the testing process. Users
encountering this type of problem should investigate the cause of this situation (Kaastra &
Boyd, 1996, pp. 229 231).
After the training of the network is completed, the attempts to measure whether it
learns (performance) are called network testing. For testing, the samples that the network
does not see during learning, which is the test set, are used. The weight values of the network
are not changed during testing. The inputs in the test set are given to the model of the artificial
neural network, and the output value of the artificial neural network is compared with the
desired output value. The purpose of this process is to see whether the ANN model can make
adequate generalizations. If the desired success is achieved in the training and testing stages,
the artificial neural network model can be used.
While creating an ANN model, the stages of determining the structure and structure
properties of the network, determining the properties of the functions in the neuron, and
determining the parameters by selecting the learning algorithm should be done carefully and
depending on the characteristics of the problem to be applied. Because these stages are very
important for the success of the results to be achieved. When the learning algorithm to be
used in the network is selected, the structure required by this algorithm will also be
automatically selected. Table 6.3 shows the network types that are successful based on their
intended use.
72
6.11.1. Determining the Number of Input Neurons and Output Neurons
It is easier to determine the number of output neurons than to determine the number
of neurons in other layers. In problems involving the future prediction of a time series, the
number of output neurons is equal to the length of the prediction period. While the number
of output neurons is equal to 1 in the single-period forecasting, in multi-period forecasting,
forecasting can be done in two ways. A one-period forecast is an iterative forecast. The
estimated period value is used as input for the next period. In this case, a single output neuron
is sufficient. The direct method is used in multi-period forecasting when more than one
period is estimated at the same time. In this case, the number of output neurons is equal to
the number of periods to be estimated (G. Zhang et al., 1998, pp. 44 46).
The hidden layer and the hidden neurons in this layer are of great importance in the
success of ANNs (G. Zhang et al., 1998, p. 42).
The number of hidden layers varies depending on the problem, the amount of data,
and the design. Usually one or two hidden layers (intermediate layers) are sufficient. Using
more hidden layers significantly reduces the speed of the network. It can also cause the
network to memorize rather than learn. Preferably, a three-layer structure consisting of input-
hidden-output layers can be used. If the result is not satisfactory, 2 or 3 intermediate layers
can be tried later. Some studies have shown that there is no need for structures with more
than two hidden layers to solve most prediction problems (G. Zhang et al., 1998, p. 44). In
addition, applications have shown that more than four layers in total adversely affect the
success of the network (Kaastra & Boyd, 1996, p. 225).
73
Increasing the number of hidden layers will increase the number of connections
between all processing elements in the network, thus increasing the risk of memorization,
leading to increased computation time for the network and poor prediction results. For this
reason, when determining the number of hidden layers, the most appropriate number should
be found by trial and error, taking the number of data into account.
Similarly, there is no magic formula for the number of hidden neurons. In this regard,
the task falls to the designer s ability. It is generally preferred to work with a small number
of hidden neurons. Because they have higher generalization abilities. However, some rules
have been developed. These rules are not absolute, they are just a generally applied method.
For a 3-layer network with n input cells and m output cells, the number of neurons in the
hidden layer can be . The number of neurons in the hidden layer can vary between 1.5
and 2 times the above formula, depending on the type of problem and the number of data.
This is called the geometric pyramid rule. Baily and Thompson suggest that the number of
hidden cells for a 3-layer network is 75% of the number of cells in the input layer (Kaastra
& Boyd, 1996, p. 225).
Although there is no absolute rule for determining the number of neurons in the
hidden layer, the exact number depends on the structure of the network, the amount of data,
the type of problem, and the experience of the designer. It is determined by trial and error.
The most important factor that provides nonlinearity in ANNs is the standardization
of the data. The data can be standardizated to recover from extreme values so that the
available data can be better modeled. In addition, the method to be chosen for normalizing
the data also affects the performance of the network. Data standardization ensures that the
negative effects of the cumulative totals of the data used during the processing process are
prevented. The activation functions used in the hidden and output layers, which determine
the data normalization range, play the role of compressing the output of a neuron to a range
of [0,1] or [-1,1]. Data standardization is done before the training process begins. The
formulas that are frequently used in data standardization approaches are as follows (G. Zhang
et al., 1998, p. 49):
74
Linear transformation in the range [0, +1]: xn = ( x 0 xmin)/( xmax xmin)
Linear transformation in the range [a,b]: xn=(b-a)(x0 xmin)/(xmax xmin)+a
Xn and X0 in the formulas represent the normalized and original data, and Xmin, Xmax,
x and s represent the minimum, maximum, mean, and standard deviation values along the
row or column, respectively.
Although there are many performance measures for a neural network estimator, such
as modeling time or training time, the best and most important performance metric is the
75
accuracy of the estimation. The accuracy criterion is defined as the difference between the
true value and the predicted value. This difference is called the prediction error (G. Zhang et
al., 1998, p. 51).
t|/N
MAE
SSE t)
2
MSE t)
2
/N
RMSE
MAPE
t/yt |(100)
Here;
et, is the estimation error,
Among these predictive accuracy measures, MSE is the most widely used. An
important feature of this measure is that the prediction error can be decomposed into variance
sums. This feature shows that the MSE criterion depends only on the second moment of the
combined distribution of realization and prediction. Thus, it is a measure that provides useful
information. It should be noted, however, that it does not provide complete information on
the true distribution (G. Zhang et al., 1998, p. 52).
76
6.12.1. Determining the Stopping Criteria
The stopping criterion can be determined in two ways: if the error falls below a
certain value or below an acceptable error determined by the researcher, and when the
network completes the specified number of iterations, the training of the network is stopped.
The two most important factors that determine the success of an artificial neural
network application are the structure of the network and the learning algorithm. The network
structure plays a decisive role in the selection of the learning algorithm. There are many
learning algorithms to be used in the development of ANNs. It is known that some algorithms
are more suitable for some applications. The most used algorithm is the back propagation
algorithm (G. Zhang et al., 1998, pp. 47 48).
The advantages of ANNs are due to their non-linear structure and unique training
process. Generally, strengths of ANNs compared to other models include the ability to model
nonlinear structure, the ability to make generalizations, adaptability and flexibility,
information storage, error tolerance, and the absence of prerequisites and assumptions in
statistical or other modeling techniques (G. P. Zhang, 2003, p. 160).
The relationships between real-life events and the factors behind these events are
non-linear, and it is very difficult to model these relationships. However, by modeling
models more easily, ANNs can solve models that are difficult and complex to model
mathematically l, 2012, p. 207). Thanks to the transfer function they use, ANNs can
produce models for nonlinear problems and perform effective predictive modeling. For this
reason, ANNs are preferred as an effective predictive tool over traditional prediction
methods.
Through their learning ability, ANNs can generalize about situations that they have
not encountered before using known examples. By using the data in the learning phase, the
77
error between the input and the output is minimized, and an ANN model that gives the least
error between the input and output variables can be established. Due to their adaptability and
flexibility, artificial neural netw ANN can be retrained by changing the connection weights
in the case of new information or changes in the environment. This is one of the most basic
features that distinguish ANNs from traditional statistical methods (G. Zhang et al., 1998, p.
36).
In order to successfully solve the problem using artificial neural networks, the
problem must be well modeled. ANNs do not need any prior knowledge other than examples
for modeling. ANNs applications are both practical and cheaper in terms of cost. Just
specifying examples and a simple program may be sufficient to solve the problem. The fact
that ANNs can work in parallel facilitates their real-time use.
Apart from the advantages provided by the use of artificial neural networks, there are
also some disadvantages that should be considered 2012, p. 34). Hardware
dependent operation of ANNs is an important problem. Most of today s machines have serial
processors. The reason networks exist is to be able to work with parallel processors. ANNs
need very fast parallel processors, especially because they do parallel processing. Performing
parallel operations on serial machines causes a waste of time. A different ANN structure
should be developed for each problem. This is done by trial and error. Neural networks also
do not guarantee that the solution found is the best solution.
78
There is no certain set of rules in the creation of ANNs, in model selection, in
determining the topology of the network, in determining the number of layers, or in
determining parameters such as the learning rate and the number of neurons that should be
included in each layer. It is determined entirely based on the experience of the researcher.
This is also an important disadvantage. There is also no method to determine when the
training of the network will end. Reducing the error of the network on the samples below a
certain value is considered sufficient to complete the training. However, it cannot be said
that an optimal trainer is provided. In this case, it may take a long time for the training to
take place.
The ANN does not give any information on how it transforms any input vector into
an output vector. From an engineering point of view, neural networks can be seen as a black
box . The black box receives the information from the outside and gives the outputs it
produces to the outside. What happens inside is unknown (Cheng et al., 2022a, p. 108-218).
In other words, the general rule or condition of the connection between input and output is
unknown to the network, and the network has no ability to explain it. While this situation
decreases trust in the obtained network, successful applications increase interest in ANNs.
In cases where it is difficult to find examples or examples that accurately represent the
problem cannot be found, it is not possible to produce healthy solutions to problems.
Despite all these disadvantages, it is possible to produce solutions for many problems
and create successful applications with ANN. In order for ANNs to be able to produce
solutions to problems by getting rid of these disadvantages, it is necessary to meticulously
create networks. Having sufficient knowledge about both the problems to be solved and
ANNs can provide successful results. It is necessary to know that it is possible to create a
network without ignoring this fact in searching for solutions to problems with the ANN
method, but it is not an easy process 2012, p. 35).
79
7. DEEP LEARNING
DL is a branch of ML that uses ANNs and algorithms to process data in a way that
is inspired by the structure of human nerve cells. DL generates results by learning from the
processed data. Just as people learn from their life experiences, DL algorithms improve their
learning abilities by learning a little more each time and making changes thanks to their
many layers. Thus, they are able to produce results that were not possible before. The concept
of deep here also refers to the number of layers in the artificial neural network. Unlike
classical neural networks, deep neural networks consist of more layers. Since it is a sub-
branch of ML, the terms machine learning and deep learning are often used as if they
were the same term. But the two systems are different and have different capabilities.
The foundations of ANNs and DL, which are created by influencing the human nerve
cell structure, date back to the 1940s. Bengio et al. have examined the general historical
development of DL in three waves. The term cybernetics was used to describe the period
between 1940 and 1960, connectionism to describe the period between 1980 and 1990,
and deep learning to describe the period after 2006 (Bengio et al., 2015, pp. 11 12).
The concept of ANNs first started with the mathematical model developed by
McCulloch-Pitts in 1943 based on the study of the human brain. Later, with the concept of
ML by Turing in 1949, the concept of the perceptron by Rosenblatt in 1958, and the
discovery of the Adaptive Linear Element-ADALINE by Widrow and Hoff in 1960, the
process called cybernetics was experienced. However, since studies in this field declined
until the 1980s, the period known as the first winter of artificial intelligence was
experienced. In the second wave, called connectionism, gained momentum in studies after
the 1980s gained momentum, and the concepts of DL networks, backpropagation algorithms
and convolutional neural networks were introduced by LeCun (1998) in 1998. In the 1990s,
AI studies continued increasingly, and with the Hebbian learning rule concept in 1949 and
Schmidhuber s very DL concept in 1993, great steps were taken in the depth and complexity
of neural networks. This process paved the way for the discovery of the Long Short Term
Memory-LSTM network (Schmidhuber, 2015, pp. 93 94). During this period, deep
networks were thought to be very difficult to train because the existing hardware was not
structured to allow too many experiments and also the cost was too high. In 2006, the third
80
wave, which started with the concept of Goffrey deep belief networks, gradually increased
its speed with the concept of Goodfellow Contested Producer Networks in 2014 and
continued its development (Bengio et al., 2015, pp. 11 17; Schmidhuber, 2015, pp. 90 98).
deep learning
connectionism
cybernetics
81
image, speech, text, and video data. While doing so, the layers in the classical DL
architecture created are listed below 18).
Source: https://towardsdatascience.com/how-to-easily-draw-neural-network-architecture
-diagrams-a6b6138ed875
An example of deep neural network structure is shown in Figure 7.2 above. A deep
neural network structure consists of four main layers: the convolutional layer, max pooling
layer, fully connected layer, and softmax layer.
The first layer in convolutional neural networks is the input layer. It is the layer where
data is standardized and presented to the network. Data presented to the network is first
served from this layer. If the data has been subjected to any sizing process before, this layer
can be given to the network. If there is no standard for the size of the data or if there is a size
that must be followed at the entrance of the network, the sizing process is also performed in
this layer.
A series of filters (kernels) are used to circulate the input data. The content, size, and
number of the filters change depending on the application to be made. The size of the filters
to be used in the application should be smaller than the size of the data. It is the layer where
82
new attribute matrices are generated by circulating filters with dimensions and properties
determined by developers on data from the previous layer.
It is the layer that brings the matrix values obtained from the convolution layer into
the range determined depending on the algorithm used. It is often used to convert negative
values to positives. The most commonly used activation functions in the literature are; the
logistic sigmoid function , the hyperbolic tangent function
In many studies in the literature, the pooling layer is used after the activation layer.
Basically, the purpose of this layer is to reduce the preceding data to matrices of smaller
sizes. So, the network will work faster. However, downgrading may also cause data loss.
This is a predictable process that will normally be welcomed. Many matrices, such as
maximum value, minimum value, and mean value matrices, can be used in this layer for data
reduction.
In general, while DL works with large data sets, when it needs to work with a smaller
data set, the artificial neural network algorithm can memorize the data used during training.
As a result, the data in the network architecture may need to be forgotten. In other words, it
is the layer used to prevent the network from memorizing the training set in order to reduce
this risk, since there is a risk of over-learning in cases where the network has few data.
This layer is the one that takes all the data from the previous layers and transforms it
into a one-dimensional array matrix, or vector. Data from this layer goes directly to the
83
classification layer. It can be used after the pooling or memorization layer as well as after
the classification layer.
Since data science and the ease of accessing data increase the volume of data
obtained, it is very difficult and requires a long process to analyze big data with classical
programming methods. For this reason, many new AI algorithms have been developed that
can easily work with big data. Basically, the common feature of these algorithms is that they
can generate rules by creating a link between the data and the result. This rule concept
expresses the weight coefficients of the network and the structure of the network. DL has
found application in many areas where computer technologies are used. It is seen that it gives
very successful results. This result has led to an increase in the diversity of studies in this
field due to the interest in the deep artificial neural network method
p. 20).
84
7.2.1. Multi Layer Perceptron (MLP)
The first single-layer perceptron model was developed by Rosenblatt in 1958. This
perceptron model, called a perceptron is based on the principle of a neuron taking multiple
inputs and producing an output using a threshold function. A perceptron is the simplest form
of neural network and consists of only one artificial neuron. The output of the network is a
logical value of 1 or 0 2012, p. 61). Later, in 1959, Widrow and Heff developed a
single-layer network called Adaptive Linear Element (ADALINE), which learns using the
least squares method. In 1990, Widrow and Lehr (1990) took this model one step further and
developed Multiple Adaptive Linear Elements (MADALINE) networks, which are formed
by combining more than one ADALINE unit 2012, pp. 68 74). These early
models of ANNs are used to solve problems with a linear structure but it was seen that they
could not learn nonlinear relationships. For this reason, multilayer perceptrons have been
developed.
Multi layer perceptrons (MLP), also called feedforward neural networks or deep
feedforward neural networks, are models consisting of three or more layers or multiple
perceptrons (McNelis, 2005, p. 251). MPL is also known as the core architecture of deep
neural networks or DL. The learning phase of multi-layered DL networks is done with the
backpropagation algorithm, which is a supervised learning technique, as in classical ANNs.
Various optimization approaches, such as Stochastic Gradient Descent (SGD), Limited
Memory BFGS (L-BFGS), and Adaptive Moment Estimation (Adam), are applied during
the training process. The output of an MLP network is determined using transfer functions
such as ReLU, Tanh, Sigmoid, and Softmax. The mean square error or logloss functions are
used to calculate the loss rate (Sarker, 2021, pp. 6-7).
Due to the large number of layers, it is called the deep network and is used for solving
more complex problems. As seen in the examples given in the literature, deep neural
networks have been used successfully in time series analysis (Jing et al., 2021).
85
neural network design like multilayer networks. It creates the optimal parameters at each
layer to achieve a meaningful output, thus reducing the model s complexity. It uses the
dropout layer to reduce the memorization problems that can occur in a classical network. In
addition, it reduces the matrix of the outputs by using the pooling layer and transforms the
matrix into a one-dimensional vector by using the full link layer. Because of these features,
they are generally used in the fields of visual recognition, medical imaging, and natural
language processing. Convolutional network architecture is used to build applications such
as AlexNet, Xception, ResNet, and GoogleLeNet (Sarker, 2021, p. 7).
Source: https://teknoloji.org/derin-ogrenme-nedir-yapay-sinir-aglari-ne-ise-
yarar/#Multilayer_Perceptron_Cok_Katmanli_Algilayici
Recurrent neural networks are another popular network structure that uses sequential
or time-series data and feeds the output from the previous step as input to the next step.
(Sarker, 2021, p. 7). Like feedforward and convolutional networks, iterative networks also
learn from training inputs, but the difference in this network structure is that their internal
memory is used to process information from previous inputs. The purpose of the model is to
create a training model using sequential information. These networks are widely used for
86
complex tasks such as time series, handwriting learning, and language recognition due to
their learning capabilities.
Unlike a typical deep neural network, which assumes that the inputs and outputs are
independent of each other, the outputs in recurrent neural networks depend on previous
elements in the sequence. However, the backward dependence of a standard recurrent
network structure makes it difficult to train this network and causes the problem of
disappearing gradients. Due to the problem of lost information in this network, whose main
purpose is to measure long-term dependencies, it is difficult to store and learn information
with this network structure. For this purpose, methods such as recurrent neural network
architecture based on Long Short-Term Memory (LSTM), Bidirectional RNN/LSTM, Gated
Recurrent Units (GRUs) have been developed (Sarker, 2021, p. 7).
87
node makes stochastic decisions about whether to transmit the input. Certain threshold
values are added to the inputs, multiplied by certain weights, and passed through the
activation function to form the output. In the restructuring phase, the output is compared
model consisting of several sequentially connected stacks of independent unsupervisewith
the previous output by reprocessing the process in this way, and it is tried to reach the output
value that gives the least difference by making more than one repetition (Chong et al., 2017a,
p. 190).
A deep automatic encoder network, consisting of two parts (encoding and decoding),
is a feed-forward but unsupervised learning model that sends data from the input layer to the
output layer using its own tags during training. Autoencoder is widely used in many
unsupervised learning tasks such as size reduction, feature extraction, efficient coding,
generative modelling, noise removal, anomaly or outlier detection, etc. There are methods
developed based on deep autoencoder architecture, such as Sparse Auto-Encoder (SAE),
Noise Reduction Auto-Encoder (DAE), Contracted Auto-Encoder (CAE), and Variational
Auto-Encoder (VAE) (Sarker, 2021, p. 9).
88
8. LITERATURE REVIEW
With the great economic crises experienced in 1929 and 2008, the world stock market
activities all over the world have developed rapidly in an attempt to heal their wounds. Stock
exchanges have a significant impact on the global economy because they contribute
significantly to the economic development of countries by allowing a significant amount of
savings to be transferred to companies as capital. In addition, companies can reach long-
term, less risky, and less costly loans through securities exchanges instead of short-term,
high-interest, high-risk loans to be taken from the banking system. Undoubtedly, the
transformation of these funds arising from these savings into investments is a serious growth
driver for countries. The stock exchanges that investors invest most in the world are the New
York Stock Exchange (NYSE), an unorganized and over-the-counter stock market;
NASDAQ; the Toronto Stock Exchange; the Amsterdam Stock Exchange; the Tokyo Stock
Exchange; the Hong Kong Stock Exchange; and the Bombay Stock Exchange, which are
thought to be the starting place of the stock market in some sources.
An index is a criterion that is used to measure the change that consists of the
movements of one or more variables and thus provides general information about complex
events that are difficult to understand based on many more variables by linking them to a
single variable (Karan, 2013, p. 60). In other words, an index is a statistical composite
89
measure of movement in the market or industry. The price indices by which the stock prices
of companies are determined in a market or market segment are called stock market indexes
(Vui et al., 2013, p. 477). Since stock market indices include price movements in the sector,
they are not linear in nature, and their volatility is high. Stock markets are highly volatile
due to the effects of both micro and macroeconomic variables of countries, as well as
psychological variables such as consumer behavior, expectations, and political risk. Due to
these features, they are also important market indicators. Owing to their dynamic, complex,
and non-parametric structure, stock market index prediction is difficult. Considering that the
estimated size of the world stock market was 36.6 trillion dollars during the global crisis
period in 2008, it can be understood why stock markets are so important for national
economies and investors. For this reason, stock market indexes and stock price predictions
have attracted a lot of attention from investors and researchers recently.
The modern portfolio theory, or Capital Asset Pricing Model (CPM), introduced by
Markowitz in 1952, which assumes that investors choose the securities they will buy into
their portfolios from a universe full of risky assets, is the oldest and most popular method
used in determining the returns on securities (Markowitz, 1952). This theory was later
developed by Sharpe (1964), Lintner (1965), and Mossin (1966) and used in many studies
90
(Lintner, 1965; Mossin, 1966; Sharpe, 1964). The fact that it considers the market factor as
the only factor affecting asset returns ensures that the model can be easily predicted. In
addition, Treynor (Treynor, 1965, pp. 63 75), Sharpe (1966), and Jensen (1968) have
developed three different performance valuation models, named after themselves and
consisting of a single parameter, which evaluate the performance of mutual funds using
SVFM and Modern Portfolio Theory. However, the necessity of using multi-factor models
in the analysis due to the ignoring of many factors has led to the emergence of the Arbitrage
Pricing Model (AFM) (Roll & Ross, 1980).
Chen, Roll, and Ross (1986) concluded in their study that the difference between
long-term interest rates, expected and unexpected inflation rates, industrial production,
differences between high and low quality bonds, and the risk premium systematically affect
stock returns. They concluded that oil prices do not have any effect on stock returns. In their
linear regression analysis using monthly time series, they concluded that stock returns are
affected by systematic economic news and changes in macroeconomic variables and are
priced in line with these effects. In addition, in their study, they found that the effects of
stock market indices such as the NYSE on asset prices are insignificant when compared to
macroeconomic variables. Innovations in macroeconomic variables are risks and are priced
meaningfully in the stock market (N.-F. Chen et al., 1986, pp. 383 403).
Fama (1990) examined the interaction between stock prices and real activities,
inflation, and money and found a strong positive correlation between stock returns and real
variables such as industrial production, GNP, money supply, interest rate, and lagging values
of inflation (Fama, 1990, pp. 1089 1108). However, most of the recent studies have drawn
attention to the short-term relationships between macroeconomic variables and stocks.
91
Schwert (1990) analyzed the relationship between real stock returns and real activity
between 1889 and 1988. Fama (1990), in his study with 65 years of data between 1953 and
1987, concluded that there is a high correlation between monthly, quarterly and annual stock
returns and future production growth rates. He also compared the two measures of industrial
production, the Miron-Romer index and the Babson index, and concluded that the new
Miron-Romer index of industrial production is less related to stock price movements than
the old Babson measure (Schwert, 1990, pp. 1237 1257).
Mukherjee and Naka (1995) used the Vector Error Correction Model in their study
with stock prices and six macroeconomic indicators. They found that there is a long-term
equilibrium relationship between exchange rate, money supply, inflation, industrial
production, long-term government bond interest rates, demand loan interest rates, and the
Tokyo Stock Exchange Index, and that the variables are co-integrated (Mukherjee & Naka,
1995, pp. 223 237).
Wongbangpo and Sharma (2002) investigated stock prices traded in five ASEAN
countries (Indonesia, Malaysia, Singapore, the Philippines, and Thailand) and concluded that
variables such as gross national product, consumer price index, money supply, interest rate,
and exchange rate are affected and that there is a causal relationship between them. In the
long run, a positive relationship was observed between the growth in production and stock
prices and an inverse relationship between inflation and stock prices. It was also observed
that the effects of exchange rates and interest rates had different negative and positive effects
according to countries (Wongbangpo & Sharma, 2002, pp. 27 51).
The three-factor model of Fama and French (1998) was applied to ISE firms
operating in the 1993 1998 period, and it was concluded that firms with low book value or
market value outperformed firms with high book value or market value. It has also been
observed that the performance of large firms is higher than that of small firms. These results
also showed that the findings of similar studies in developed and emerging markets differ
due to differences in national capital dynamics (Gonenc & Karan, 2003, pp. 1 25).
Chen, Leung, and Daouk (2003) conducted a study based on the idea that trading
strategies guided by their predictions about the direction of the Taiwan stock market, one of
the fastest-growing stock markets in Asian countries, can lead to greater effectiveness and
92
higher profits. In this study, they concluded that the performance of the Probabilistic Neural
Network model they deveped is better than that of the Generalized Moments Method,
Kalman Filter and Random Walk model (A.-S. Chen et al., 2003).
Chakravarty (2005) used monthly time series from 1991 to 2005 to investigate the
relationship between stock price and some basic macroeconomic variables in India. It has
been concluded that there is no causal relationship between stock price, gold price, and
exchange rate, but there is a one-way relationship with money supply, and industrial
production affects stock prices (Chakravarty, 2005, pp. 1 15).
Lui and Shrestha (2008) investigated the long-run relationship between the Chinese
stock market index and macroeconomic factors such as the exchange rate, inflation, money
supply, industrial production, and interest rate. They showed that there is a positive
relationship between money supply and industrial production and the stock market index and
a negative relationship between the exchange rate, inflation, and interest rates and the stock
market index (Liu & Shrestha, 2008, pp. 744 755).
Boyer and Zheng (2009) selected seven investor groups in the US capital market
during the 53-year period between 1952 and 2004 and found that there was a positive and
significant relationship between cash flows and stock returns of these groups, especially
mutual funds and foreign mutual fund groups (Boyer & Zheng, 2009, pp. 87 100).
on selected
variables between 1999 and 2006, 11 developing countries such as Turkey, Hungary, Poland,
Russia, Argentina, Brazil, Chile, Mexico, Indonesia, Malaysia, and Jordan. In their analysis,
the effects of selected macroeconomic variables on stock returns were analyzed using
balanced panel data analysis. As a result, stock returns were affected by the exchange rate,
inflation rate, and Standard and Poor s 500 index. It was determined that there is no
significant relationship between the interest rate, gross domestic product, money supply, and
oil prices 96).
In their study, Ilahi, Ali, and Jamil (2015) proved by linear regression analysis that
the Karachi Stock Market Index is weakly affected by changes in macroeconomic variables
such as the inflation rate, exchange rate, and interest rate (Ilahi et al., 2015, pp. 1 11).
93
In their studies using monthly data from the 2005
Muhammed (2016) examined the relationship between the Borsa Istanbul Index and the
variables of interest rate, exchange rate, export amount, import amount, industrial production
index, and gold price by means of a causality test and an impulse-response function. While
a one-way causality relationship was observed from BIST to the industrial production index,
exports, and imports, there was also a one-way causal relationship from the exchange rate to
BIST 74).
it was
seen that the stock returns of the banks traded in Borsa Istanbul 100 were positively affected
by the changes in the Standard & Poors 500 Index, the exchange rate, and the US interest
rate 19, pp. 1 25).
affecting the Borsa Istanbul 100 index and the New York Stock Exchange S&P 500 index
for the period 2010 2019 with the Toda-Yamamoto model. They found a bidirectional
causal relationship between the money and exchange rate and the BIST100 index, and a
causal relationship was seen only in the money supply towards the S&P500 index. In
addition, no causal relationship was found between the consumer price index, industrial
production index, export-import coverage ratio, interest rate, oil price, and gold price
(Kocabiyik & Fattah, 2020, pp. 116 151).
After the great economic depression in 1929, the science of econometrics, which is
based on testing economic theories and modeling them mathematically and statistically,
gained importance. Because the experience gained from the crises in the capital markets
around the world has shown that the modeling and forecasting of economic and financial
94
variables is extremely important for the economies of countries. Therefore, the science of
econometrics has begun to be used more by policymakers to guide economic policies. These
experiences have shown that modeling and predicting economic and financial variables are
extremely important for the economy and finance. Since economic and financial variables
are time series that are formed by ordering the observation values according to time, the use
of econometric methods in economic and financial modeling has gradually increased due to
the developments in the field of time series analysis in recent years. In addition, AI
technologies, which have found a wide application area as a powerful statistical modeling
technique, have become an alternative to econometric methods and a predictive method with
comparable performance. This method, which can produce very successful results in the
classification and prediction of time series, is also used extensively in the fields of statistics,
economics, and finance (Kaastra & Boyd, 1996, p. 216).
There are many studies in the literature on the modeling and estimation of financial
variables using AI methods. These studies deal with the financial performance and financial
forecasting of the markets, the forecasting of financial crises, the forecasting of exchange
rates, and the forecasting of stock prices. The following studies and their findings are
examples of studies conducted with AI methods in the literature.
Wu and Lu (1993), in their study using ANNs and Box Jenkins ARIMA model to
estimate the Standard & Poor s 500 index for the period 1971 1990, they concluded that the
ANNs outperformed the time series (Wu & Lu, 1993, pp. 257 264).
Hsieh (1993) showed that it gives very successful results in financial management by
using artificial neural networks, which try to simulate the physical process on which intuition
is based, or the adaptive biological learning process, and have the ability to produce solutions
even for uncertain and incomplete data for the given problem (C. Hsieh, 1993, p. 12).
Yao, Poh and Jasic (1996) estimated the exchange rates between the US dollar and
the five main currencies of the period, using the exchange rates for 2,910 days for the period
from 18 May 1984 to 7 February 1995. They used the Japanese Yen, German Mark, British
Pound, Swiss Franc and Australian Dollar as the five basic currencies of the period. They
estimated the exchange rates using ARIMA and ANN models and found that the ANN model
gave more effective results in estimating exchange rates (Yao et al., 1996, pp. 754 759).
95
In recent studies, it has been seen that AI technologies are used in analysis together
with traditional statistical methods, so that better results can be obtained by reflecting the
strengths and advantageous aspects of both methods. One of the studies that gave these
successful results is the study of Hu and Tsoukalas in 1999, which estimated the volatility
of the European Monetary System exchange rates by combining the GARCH, EGARCH,
and IGARCH models with the moving average variance (MAV) model and an artificial
neural network model. In this study, it was concluded that ANNs outperformed the least
squares and simple averaging methods (Hu & Tsoukalas, 1999).
Donaldson and Kamstra (1999) estimated the price volatility of the S&P 500 stock
index using ANNs with the GARCH and moving average variance (MAV) models. They
showed that, compared to the traditional weighted least squares method, they could take into
account the interaction effects in time series estimations and that ANNs were more effective
in non-linear time series (Donaldson & Kamstra, 1999, pp. 227 236).
between 1993 and 1997 with the alpha coefficient obtained by linear regression of fund
returns and various benchmarks and concluded that the funds generally did not perform well
(Dahlquist, Engstr m ve Soderlind, 2000, 409-423).
Zhang (2001) tested the applicability of ANNs to linear time series problems. Zhang
compared eight different ARIMA models with nonlinear multilayer ANN models for
predicting IBM stock closing prices. As a result, the study revealed that ANNs are successful
in solving linear problems as well as nonlinear problems (G. P. Zhang, 2001, pp. 1183
1202).
Bollen and Buse (2001), in their study on 230 funds during the 1985 1995 period,
investigated whether there is a difference in timing ability when the daily returns and
monthly returns of the funds are used. Regression analyses developed by Treynor and Mazuy
(1966) and Henriksson and Merton (1981) were used as a method. In the method developed
by Treynor and Mazuy, it was determined that 11.9% of the funds were used in the case of
monthly returns and 34.2% of the funds showed timing ability in the case of daily returns.
Similar results were obtained with the method developed by Henriksson and Merton (Bollen
& Busse, 2001, pp. 1075 1094).
96
Chen, Leung, and Daouk (2003) have tested The direction of the Taiwan Stock
Exchange Index, one of the fastest growing financial stock markets from developing Asian
countries, with probabilistic neural networks and the Kalman Filter, which is one of the
parametric statistical methods, using the generalized moment method and the random walk
model, and the probabilistic neural. It has been observed that the network model makes
predictions with superior performance compared to other traditional methods, and higher
returns are obtained from investment strategies made according to this method (A.-S. Chen
et al., 2003, pp. 901 923).
Kim, Oh, Sohn, and Hwang (2004) examined the effects of the financial crisis in
South Korea in 1997 on the economic structure of Korea and developed an early warning
system with multi-layer ANN models. They divided the year 1997 into three main periods:
the stable period between January 3 and September 18, the unstable period between
September 19 and October 21, and the crisis period between October 22 and December 27.
They used the KOSPI stock index as an input variable in the model, and, considering that
index volatility gives information about the direction of the market, they calculated the
index's end-of-day closing value, daily return, 10-day moving average, variance, and
variance ratio. In the study covering the period 1994-2001, they found that the volatility of
the KOSPI index was a harbinger of the crisis and realized that ANN models were
impressively successful in early detection of the 1997 economic crisis, in classifying the
market movements, and in following the fundamental trend of the economy (T. Y. Kim et
al., 2004, pp. 583 590).
Yu, Tresp, and Schwaighofer (2005) on the other hand, combined parametric linear
models with non-parametric Gaussian processes and concluded that nonlinear estimations
yield better results than traditional estimations (Yu et al., 2005, pp. 1012 1019).
Dutta, Jha, Kumar, and Mohan (2006) used the multi-layer ANN method in modeling
the Mumbai Stock Exchange index. They concluded that the ANN model successfully
predicted the index values (Dutta et al., 2006, pp. 283 295).
Panda and Narasimhan (2007) estimated the future value of exchange rates with a
linear autoregressive model, a random walk model, and an ANN model using weekly data
of Indian Rupee/US Dollar (INR/USD) exchange rates for the period January 1994 June
97
2003. They found that the ANN model gave more effective results for the firm and investors
in estimating the exchange rate (Panda & Narasimhan, 2007, pp. 227 236).
Tseng, Cheng, Wang, and Peng (2008) estimated the volatility of Taiwan Stock Index
(TXO) prices with a new hybrid asymmetric volatility approach, the multi-layer ANN option
pricing model. They used EGARCH and Grey-GARCH models as comparison criteria. They
concluded that the ANN model predicts market volatility more effectively (Tseng et al.,
2008, pp. 3192 3200).
Liang, Zhang, Xiao, and Chen (2009) estimated option prices using Hong Kong
options stock market data for the period 2006 2007 using ANN, finite difference, and Monte
Carlo methods. Forecasting was done first using traditional option pricing methods, and then
ANN and support vector regression (SVR) were used to reduce forecast errors. Thus, future
option prices were estimated using parametric and non-parametric methods, and they
concluded that the ANN method showed superior forecasting performance (Liang et al.,
2009, pp. 3055 3065).
According to Tsai and Wang (2009), there are two methods used in estimating stock
prices in the literature. These are fundamental analysis, which uses information from the
company's financial statements, and technical analysis, which uses figures and graphs based
on historical data. However, since fundamental and technical analysis alone are not sufficient
to make the right decision, they used two analysis methods based on computer technologies
that have better forecasting performance. With a hybrid model combining ANNs and
decision tree management (DT), they achieved forecasts with an accuracy of 77%, which is
higher than the individual models (Tsai & Wang, 2009, p. 60).
At the beginning of October 2008, the size of the world stock market was estimated
at approximately $36.6 trillion in the United States. Considering that the total world
derivatives market is 11 times the size of the entire world economy and has a nominal value
of approximately 791 trillion dollars, it also shows why stock market forecasting is so
important and why it is the subject of so many academic studies. Dase and Pawar (2010)
compiled the studies in the literature that make stock market index prediction with the
artificial neural network method, which gives successful results in large data sets. In these
studies, it has been shown that the artificial neural network method has proven to have quite
98
high predictive performance in predicting the stock index and predicting whether it is best
to buy, hold, or sell stocks (Dase & Pawar, 2010, pp. 14 10).
Chen, Zhang, Yeo, Lau, and Lee (2017) used the parametric statistical methods of
linear regression (LR) and Support Vector Regression (SVR) to predict the volatility of
stocks on the Chinese Stock Exchange, which is very popular in both the business world and
the academic community. They estimated and analyzed with Repetitive Artificial
Intelligence (RNN) and Gated Repetitive Units (GRU), which are AI methods, and it was
seen that the estimation results of AI methods were more successful and showed higher
estimation performance (W. Chen et al., 2017, pp. 1 6).
Chong, Han, and Park (2017) used three data representation methods with a data set
consisting of high-frequency daily stock returns. Principal component analysis, an
autoencoder and constrained Boltzmann machine, and a three-layer deep neural network
(DNN) model to predict future stock returns were applied to the Korean stock market. They
observed that DNNs performed better than the linear autoregressive model in the training
set, but the advantage was mostly lost in the test set (Chong et al., 2017b, pp. 187 205).
99
Nunes, Gerding, McGroarty, and Niranjan (2019) showed that the multilayer AI
model gave better results, being the first comprehensive study to use a multivariate linear
regression model and a multilayer artificial neural network model to predict the European
yield curve. This result also paved the way for the development of better forecasting systems
for fixed income markets (Nunes et al., 2019, pp. 362 375).
Cao and Wang (2020) used principal component analysis and ANNs for an accurate
and effective stock prediction model and proved that the artificial neural network model they
created offers an effective stock selection strategy (Cao & Wang, 2020, pp. 7851 7860).
Jing, Wu, and Hefei (2021) created a hybrid model by combining the stock prices
traded on the Shanghai Stock Exchange with investor sentiment analysis and DL analysis,
which is an AI method. They concluded that the hybrid model they created by using the
Long-Short-Term Memory Neural Network approach to classify the hidden emotions of the
investors and to analyze the technical indicators in the stock market using the Convolutional
Neural Network model outperformed the basic classifiers in classifying investor sentiment
and outperformed the prediction of stock prices (Jing et al., 2021, p. 115019).
Cheng, Yang, Xiamg, and Lui (2022), noting that the financial market capitalization
of listed companies in the US reached $30 trillion in 2019, which is more than 1.5 times the
US Gross Domestic Product, emphasized how important it is for both investors and financial
institutions to predict the price movements of stocks in this huge but volatile market. In their
analysis using Multimodal Graph Neural Networks (MAGNN), they explained the
construction process of the leading lag effect and heterogeneous graphs, which is a new
approach for financial time series analysis that plays a major role in hedging market risks
and optimizing investment decisions, and concluded that this method shows superior
performance in market forecasting. They have shown that, this method provides investors
with a profitable as well as interpretable option and enables them to make informed
investment decisions (Cheng et al., 2022b, pp. 108 2018).
Many studies have been conducted using various econometric analysis methods to
predict financial crises, detect chaos and uncertainties, and determine the mobility of capital
markets, and the effectiveness of the methods has been compared. In this sense, besides the
100
econometric methods mentioned above, studies using Logit and Probit models, although few
in number, have also taken their place in the literature. These studies are listed below.
Frankel and Rose (1996) used a probit model to identify leading indicators of
currency crises for more than 100 developing countries using annual data for the period from
1971 to 1992. They concluded that high domestic credit growth, high foreign interest rates,
and persistently low foreign direct investment to debt ratios indicate a high probability of a
collapse (Frankel & Rose, 1996, pp. 351 366).
Kaminsky and Reinhart (1999) examined the sources and extent of 76 currency crises
and 26 banking crises using monthly data on 16 macroeconomic variables, whose findings
they considered likely to be indicative. They concluded that financial liberalization leads to
a banking crisis, banking sector problems trigger the currency crisis, and the currency crisis
caused by the banking crisis weaklened the already weak banking sector, and that this
process, which triggered each other by deepening the banking crisis, became a vicious circle.
They aslo concluded that when currency and banking crises occur together, the impact is
deeper than when they occur separately (Kaminsky & Reinhart, 1999, pp. 473 500).
Kim (2003) forecasted the stock price index, a financial time series, with a logistic
model and ANNs and concluded that ANNs are a promising method in stock market
forecasting (K. Kim, 2003, pp. 307 319).
the indicators of corporate financial crises in emerging markets during periods of economic
depression and emphasized that the separation model, which was the primary method for
predicting financial failure until the 1980s, has now been replaced by logistic regression and
that ANNs have yielded better results in predicting financial failure in recent years
& Aksoy, 2006, pp. 277 295).
the period 1990 2002, using the logit and signal methods, which are widely used in
estimating currency crises, for the monthly data of the 2003 2005 period in Turkey. Except
for two of these variables, the findings of the signal analysis made with 29 variables were
combined to form a common cluster, and they used five macroeconomic variables, which
101
they defined as "traditional indicators" for this period, containing the findings of this
common cluster, to form a financial pressure index. They tested the performances of the
models they established and concluded that both methods gave successful results, but the
leading indicators differed in terms of periods and economies (Kaya & Yilmaz, 2007).
Davis and Karim (2008) used the polynomial logit model and an early warning
system to predict leading indicators to enable the early detection of crises in the banking
sector. They found that the polynomial logit model performed better in predicting a global
banking crisis, while the early warning system for a country-specific banking crisis did better
(Davis & Karim, 2008, pp. 89 120).
Lin, Khan, Chang, and Wang (2008); Kaminsky and Reinhart, in their 1999 study,
created four models to predict the exchange rate crisis using data from 20 countries between
1970 and 1998. They measured the model performance of the early warning systems they
created by using four modeling techniques, namely the logistic regression model which is
one of the traditional methods, the KLR model, ANNs, and fuzzy logic. They concluded that
the most successful models are fuzzy logic, ANNs, and the logistics models, respectively
(C.-S. Lin et al., 2008, pp. 1098 1121).
Kantar and Akkaya (2018) estimated the effects of financial liberalization in Turkey
with the financial pressure index and logit and probit models they created using 19
macroeconomic variables for the period of January 2005-January 2017 and tried to reveal
the leading indicators of the financial crisis. They concluded that the increase in deposit
rates, the increase in the domestic debt stock, and the decrease in gross reserves cause an
increase the probability of a crisis (Kantar & Akkaya, 2018, pp. 575 590).
102
According to Akkaya and Kantar (2019), the importance of predicting banking crises,
which became a more important issue after the 2008 global crisis, has increased due to the
effects they have on the economy. For this reason, in their study, they examined the fragility
structure of the Turkish banking sector using annual data and using logit and probit models
(limited dependent variable models) for the period 1996-2017. While the exchange rate and
deposit interest variables, which have high explanatory power in all three models, are
statistically significant in the logit model, they have reached the conclusion that the loan
amount, deposit amount, and deposit interest variables are statistically significant in the
probit model (Akkaya & Kantar, 2019, pp. 131 145).
Some studies, taking the Turkish capital markets into account, also reveal that
artificial neural network modeling produces more successful results than other models.
the years 1983 and 1997, which are subject to the Capital Markets Board and/or traded on
the Istanbul Stock Exchange (ISE). He concluded that the ANN model is more successful
than the separation analysis in predicting financial failure (Yildiz, 2001, pp. 47 62).
Diler (2003) estimated the direction of the ISE National-100 Index the next day by
using the error back-propagation method with ANN modeling and concluded that the method
can predict the next-day value of the ISE National-100 Index by 60.81% (Diler, 2003, pp.
65 82).
Altay and Satman (2005) tried to estimate the returns of the ISE 30 and ISE all
indexes using multilayer ANN and linear regression methods. It has been concluded that
while ANN models do not give better results than linear regression for monthly and daily
103
returns, they are quite successful in estimating the direction of index returns (Altay &
Satman, 2005, pp. 18 33).
Altan (2008) estimated the exchange rate with an ANN and a vector autoregressive
(VAR) model using monthly data on the exchange rate (TL/USD) for the period January
1987 September 2007. It has been concluded that the predictions made with an ANN
architecture that learns by multi-layer feed-forward and back-propagation yield very
effective results (Altan, 2008, pp. 141 160).
104
ANNs
models and observed that the buy-and-hold strategy had a great advantage in most of the
examined periods (Avci, 2015, pp. 443 461).
105
9. METHODOLOGY AND DATA
Especially since the global economic crisis in 2008, researchers, policymakers, and
investors have understood how important financial forecasts are in preventing financial
crises and uncertainties. Although there is no clear consensus on when the global economic
crisis began, some researchers believe it began with the failure to repay subprime mortgage
(low value hold and sell) loans in the United States in 2006, while others believe it began
with the collapse of the large banking company Lehman Brothers in 2008. Everything began
with the failure of the large banking firm, Brothers. However, the impact area of the crisis
expanded and globalized in 2008, and the crisis spread to Europe and the rest of the world.
The aim of this study is to predict the long-term relationship between the ISE100
index value and the selected macroeconomic variables in the January 2001-January 2022
period by using Logit and Probit analysis and the deep neural networks method and to
compare the estimation performances of the results obtained for both methods. Thus, the
effectiveness of AI technologies, which emerged from developments in traditional
econometric methods and computer technologies and continued to develop rapidly in recent
years, in time series analysis will also be measured. For the models used in the research, the
most commonly used variables in studies on the market index in the literature were selected
(W. Chen et al., 2017; Cheng et al., 2022a; Donaldson & Kamstra, 1999; Kara et al., 2011;
Staub et al., 2015) and the meaningful variables obtained from the created meaningful
models were given as input to the models created by ANNs and DL methods, and the
effectiveness of the modeling methods was tested.
106
While the price movements of financial assets are observed in financial markets, the
price of assets is generally used in empirical studies (Campbell et al., 1997, p. 9). They
presented two main reasons for using returns in the analysis. First, the return on a financial
asset provides a complete and independent summary of investment opportunities for an
investor. In other words, the investor who invests 1000 TL in a financial asset with an annual
return rate of 5% will have 1050 TL at the end of the year. The second reason is that asset
returns are more useful than prices for both empirical and theoretical reasons. In other words,
price series display a non-stationary appearance due to the presence of a stochastic and/or
deterministic trend and constant deviations from the unconditional mean. Asset returns, on
the other hand, are close to the average and only deviate from it in the short run. Therefore,
return series are generally stationary .
The study consists of 253 monthly observation values covering the time interval
between 2001:01 and 2022:01. In order to estimate the value of the Borsa Istanbul 100 index
selected as the input (dependent) variable, or, in other words, to investigate the effects on
the stock market in Turkey, seven macroeconomic variables were determined. These are the
Dollar Rate (TL/$), Money Supply, Producer Price Index, Industrial Production Index, Gold
Price (TL/Gr), Active Bond Interest Rate and Brent Oil Price. In addition, since the effects
of the global economic crisis experienced in the world in 2008 were more limited in Turkey
than in developed countries, these periods were not excluded from the data set in order not
to spoil the integrity of the data set and to see the effects of the crisis in Turkey in the 2001
period.
The data sets used in the analysis were compiled as monthly series from the Central
Bank of the Republic of Turkey Electronic Data Distribution System (EVDS), the Turkish
107
Statistical Institute (TUIK) statistics, the monthly statistical bulletins of the Capital Markets
Board, the statistics of the Ministry of Treasury and Finance, and the Eurostat databases.
While the monthly values of the ISE100 index were used as the dependent variable
in the research, the consumer price index, industrial production index, gold bullion prices,
dollar rate, crude oil barrel prices, and monthly values of active bond interest rates were used
as independent variables. The arguments used are as described below:
Although stock market indices are a basic indicator for national economies, it is
necessary to know what the factors affecting the index are before making an evaluation.
Internal factors such as a country s economic data, unemployment figures, interest rates,
geopolitical position, and risks, as well as external factors such as global data and economic
relations between countries, are factors that influence a country s economic course and, of
course, stock market indices.
108
Two Years Bond Yield/ Active Bond Interest Rate: This rate is the 2-year bond
rate. Bonds are debt instruments used by governments, companies, or individual investors in
need of funds for various reasons, to provide financing for a certain period of time with a
variable or fixed interest rate. Investors prefer this investment tool to protect themselves
from economic risks and obtain a fixed return above inflation. While increasing interest rates
and increasing economic uncertainties due to the implementation of tight monetary policies
direct the investor to bonds in order to be protected from the said risks, as a result of stable
economic growth, interest rates tend to fall, leading them to stocks due to the reduction of
risks.
Interest is called the rent or price of money. If the interest rates that direct the entire
country s economy decrease, there is an economic recovery, while if the interest rates
increase, an economic slowdown occurs. Because when interest rates increase, fund owners
prefer alternative investment instruments that they anticipate will bring higher returns and
therefore do not turn into production. This will undoubtedly affect the banking sector as well.
For these reasons, stock returns significantly affect investors investment decisions in the
markets. When the market interest rate increases, the bond price decreases. A stock is a
security, like a bond. Therefore, when the Interest Rate of Active Bond increases, the stock
market index is expected to decrease. In other words, there is a negative relationship between
the two variables.
Since the investments to be made in stocks, bonds, and deposits are the types of
placements fed from the same fund source, the investor has two investment preferences. The
first is to buy bonds. The second is to earn interest income in return for deposits in the bank
or to earn stock market income by purchasing stocks. Therefore, in a situation where the
interest rates rise, the investor will turn to bonds and bills because he believes that he will
protect himself from risks, and in a situation where the interest rates decrease, he will turn
to stock investments . The graph in Figure 9.1. below shows the direction of
the relationship between the two variables.
109
Figure 9.1. ISE100-TYBY
The graphic above shows the reflections of the financial crisis in Turkey in February
2001. The financial crisis was reflected in the money market, and there were serious
increases in interest rates and exchange rates due to the increase in demand for TL and
foreign currencies and the inability to meet this increasing liquidity demand. On February
21, the overnight interest rate in the interbank markets increased to 6200%. When the foreign
exchange reserves of the central bank decreased significantly, it was unable to meet these
sharp speculative movements and left the exchange rate to fluctuate.
Exchange Rate: The exchange rate is a price and relates to commodity markets. The
stable course of the changes in the exchange rate affects economic stability positively. After
the liquidity crisis on November 22, 2000, and the currency crisis on February 19, 2001,
Turkey abandoned the disinflation program based on adjustable fixed rates and switched to
floating rates. The equilibrium exchange rate has a linear relationship with the ratio of price
levels. The exchange rate is also a macroeconomic variable that has been identified as a
benchmark for some securities. Since the crises in the foreign exchange and stock markets
cause uncertainties in both national economies and international markets, they have been the
focus of studies. It is very important both for governments to determine policy and for
investors to direct their investments. When the exchange rate increases, the returns on stocks
decrease as investors funds shift to other financial instruments. Therefore, empirical studies
have shown that while there is a negative relationship between the two variables in the short
term, there is also a positive relationship in the long term. The constant rise in foreign
exchange prices will cause investors who bind their savings to stocks to sell their stocks and
turn to foreign exchange. In this case, as the demand for stocks decreases, their returns will
110
also decrease. The graph in Figure 9.2 below shows the direction of the relationship between
the two variables.
The financial crisis experienced in Turkey in February 2001 was reflected in the
money market led to an increase in the demand for TL and foreign currency in interest rates
due to the due to the inability to meet this increasing liquidity demand. This situation is
shown in Figure 9.2. Likewise, there were serious increases in the exchange rates, and while
the dollar rate was 686.500 liras on February 19, 2001, it rose to 960.000 liras with an
increase of around 40% per day. The Turkish lira depreciated by 11% in real terms. 19 banks
were closed, economic contraction was experienced, and 1.5 million people were
unemployed.
The exchange rate, which started to increase after 2015, reached 14.00 TL in the 11th
month of 2021. Since 2022, when similar problems were experienced, exchange rates and
inflation have increased rapidly, and the Central Bank s reserves have decreased. In order to
prevent investors from turning to foreign exchange deposit accounts due to the increasing
exchange rates, a new application called a Currency Protected Account has been started
by the TCCB in order to support the Turkish Lira Deposit or Participation Account opened
with banks as of May 23, 2022. This method is a type of deposit that aims to give a higher
return by adding the difference of the changes that will occur in addition to the interest in
TL deposit accounts. However, it can be seen from the definition that this method does not
differ from paying interest on interest. It is observed that the rise in exchange rates also led
to an upward movement in the stock markets.
111
Money Supply/M2: The M2/GNP ratio is an indicator of the financial depth of the
country s economy. It is also an indicator of the use of the banking system by the public.
When the money supply increases, interest rates will fall and the prices of bonds and stocks
will rise. Therefore, investments will be directed toward stocks. Then there is a positive
relationship between the two variables. The money supply is a size that includes money in
circulation and time and demand deposits.
The graph in Figure 9.3 below shows that the relationship between the two variables
is in the same direction and they move together. Following the failure of 19 banks during the
2001 financial crisis, the public s use of the banking sector has increased dramatically in
Turkey. It can be said that, as of January 2022, banking transactions have replaced those on
the stock market.
IPI (Industrial Production Index): The Industrial Production Index is one of the
indices among the statistics announced by TURKSTAT. It is an index that gives the
industrial production/GNP ratio. It consists of weighting all branches of the industry
according to production classes (85% of manufacturing, 20% of electricity, 20% of mining).
Increasing the industrial production index means more production and the growth of the
Turkish economy. When this index, which measures economic activity, rises, it means that
the Gross National Product (GNP) rises as well, implying that prices have risen. In this case,
112
the value of the stocks also increases. Therefore, there is a positive relationship between
them. The relationship between the two variables is also seen in the graph below.
PPI (Wholesale Price Index- Domestic Producer Index): In Turkey, the index
calculated as the Producer Price Index (PPI) since November 10, 2005 and the Domestic
Producer Price Index (D-PPI) since February 26, 2014, replacing the index known as the
Wholesale Price Index (WPI) by the Turkish Statistical Institute, started to be used. This
index, which measures the price level of the country, covers the goods and services that are
subject to sale in the country. Therefore, it is a measure of inflation that shows the general
level of prices. When the Domestic Producer Price Index increases, stock returns are
expected to increase as prices increase. In other words, there is a positive relationship
between the two variables. The graph in Figure 9.5 below shows the direction of the
relationship between the two variables.
113
Gold Prices: The monthly selling price in Turkish Lira of gram gold bullion, which
is an alternative investment instrument, has been taken. Although gold has been around for
hundreds of years, it is one of the most preferred alternative investment instruments,
especially in recent years, since it has low risk. Since the variable is included in the analysis
as Turkish Lira, it will be possible to observe how the changes in Turkish Lira affect the
stock market index. The increase in gold prices will lead the investor to turn to gold as an
alternative investment tool. In an inflationary environment, however, investors will prefer a
low-risk, high-return investment instrument. Gold, as a precious metal, has always remained
a safe haven for investors. The graph in Figure 9.6 below shows the direction of the
relationship between the two variables.
Brent Oil Price: Oil, which is one of the primary energy sources, is the basic and
indispensable input of many sectors. This feature is therefore an important cost factor. The
increase in oil prices will cause cost inflation and create inflationary pressure. Inflationary
pressure will cause prices to rise and the central banks to raise interest rates to suppress price
increases, leading to investors turning to Treasury bills and bonds and thus lowering stock
prices. As can be seen, the components in question are like a loop that has an intricate
interaction structure with each other.
Oil is not only an input but also an investment tool. Because fluctuations in oil prices
cause changes in economic activity and stock prices. After the oil shock experienced in 1973
114
and because of the effects of this shock on the economies of the countries, oil started to be
used as an important factor in economic analysis. It has been observed that changes in oil
prices have a positive effect on the stock market in the long run.
Movements in Brent oil prices affect the country s economies through channels such
as the foreign trade balance, monetary policy, and inflation. Rising oil prices negatively
affect the foreign trade balance of oil-importing countries. It causes a wealth transfer to oil-
exporting countries. Since Turkey is an oil-importing country, when oil prices increase,
money demand increases, and if money supply does not increase in parallel with this
demand, interest rates increase and investment costs increase. The lack of this resource,
which is a very important input for many sectors, especially energy-intensive sectors, causes
a contraction in these sectors. Therefore, increased costs lead to shrinkage and inflation. This
continuity in price increases leads to a decrease in employment, which leads to a decrease in
demand, and a decrease in demand leads to a decrease in production and the continuation of
the spiral cycle. Again, this will lead to changes in the profitability ratios of the companies
and thus the values of the stocks. Therefore, oil-importing countries such as Turkey are very
sensitive to changes in oil prices. The graph below shows the direction of the relationship
between the two variables. In Figure 9.7, the high increases in Brent oil prices experienced
due to the global crisis in 2008 can be seen. It is seen that the price increase in 2011 almost
approached the increase in 2008. Brent oil prices, which had a decreasing trend in 2020,
increased rapidly in 2021 and continued to increase.
115
Economic models are established by taking into account the equilibrium relations
envisaged in economic theory. The existence of significant econometric and economic
relations between the variables in the established model also depends on the stationarity of
the series. Financial time series, on the other hand, are not stationary because they generally
show a volatile structure. Therefore, first of all, the variables to be used in the analysis should
be made stationary.
It is an important step to determine the stationarity of the series in the analysis made
with time series, since it may seem like there are relationships that do not exist in the analysis
made with non-stationary series and therefore may cause spurious regression problems.
Whether the statistical tests are meaningful or not in the modeling depends on these analyses.
The Efficient Market Hypothesis for the stock market is tested with unit root tests. If the
ISE100 index is unit-rooted, the efficient market hypothesis is valid; if it is stationary, the
efficient market hypothesis is not valid. Therefore, with unit root tests in financial time
series, the trend of asset prices returning to the mean is investigated
289).
Since the time series of the macroeconomic variables utilized in the analysis have
unit root and seasonality effects by nature, the Generalized Dickey-Fuller Test (ADF) and
Phillips-Perron (PP) unit root tests were used to ascertain if the data sets were stationary or
not.
In classical regression models, the dependent variable is random and the independent
variable or variables are considered constant. When the dependent variable is defined
according to whether a certain feature is present or not, it is converted into a binary structure
as a qualitative variable. In this case, the dependent variable takes the value 0 or 1. Since
these model parameters cannot be estimated by the classical least squares method, the
estimations are made with the linear probability, Logit, and Probit models.
In the study, both Logit and Probit model estimations were made using the ISE100
index value as a dependent (output) variable and seven macroeconomic variables as
independent (input) variables. Since the dependent variable must have a binary structure in
these modeling techniques, a threshold value was calculated by taking the geometric mean
of the series for the dependent variable BIST100 index. The variable was converted into a
116
qualitative or binary structure by assigning a value of 0 to values less than the threshold
value and a value of 1 to values greater than the threshold value. Here, the geometric mean
is preferred because it is a measure of central tendency that takes into account the geometric
differences, rather than the arithmetic differences between the data. Thus, a structure was
created that covers all the observations in the data set, conforms to algebraic operations, and
allows the processing of relative numbers.
Based on this information and the logit and porbit regression model, which was
created with the probability of the index value , the i period being greater or less than
the geometric mean, will be as in Figure 9.8:
The time series of the variables used in the analysis were compiled as monthly series
from the Central Bank of the Republic of Turkey Electronic Data Distribution System
database (EVDS), the statistics of the Turkish Statistical Institute (TUIK), the monthly
statistical bulletins of the Capital Markets Board, the bulletins of the Ministry of Treasury
and Finance, and the Eurostat databases. Eviews 12 for the statistical and econometric tests
of the time series data of the dependent and independent variables for the period January
117
2001 to January 2022, Stata 15 for Logit and Probit analysis, and MATLAB R2021b for DL
analysis were used.
In the DL model, ISE100 was used as the dependent (output) variable, and seven
macroeconomic variables were used as independent (input) variables. In the DL model,
which works based on the artificial neural network architecture, the data set is divided into
two parts: 70% training and 30% testing. In financial analyses in the literature, AI
applications usually make a distinction between data sets in different ratios, but this
distinction is determined by trial and error as with other parameters of the model (Chong et
al., 2017a; Jing et al., 2021; C.-C. Lin et al., 2018). In this case, the 2001:01 2015:08 period
consists of the first 177 observations as the training set and the last 76 observations as the
2015:09 2021:02 test set.
In order to solve the problems in their studies, researchers use modeling and analysis
by trying to create the best and most explanatory parameter sets to represent the problem. In
the studies carried out as a result of the development of the DL approach, the issues of how
to best design a multi-layer artificial neural network structure, how many layers it will
consist of, how many neurons it will contain, what the dropout value will be, and which
optimization algorithm or activation function to choose have been seen as how important
issues are for the solution. However, since there is no definite proposition in the selection of
these parameters, the parameters were obtained by trial and error according to the problem
and the data set (Jing et al., 2021; Kaastra & Boyd, 1996, pp. 220 224).
In the DL method, which is a ML method that learns from data, the most used
parameters while creating the artificial neural network architecture are explained in detail in
Chapter 7 of the study. Here, we will quickly touch on the key factors that must be taken
into account while determining these characteristics (Deep Learning Toolbox, n.d.):
The size and diversity of the data set: The larger the dataset, the better the model
learns, although this is not sufficient for a good model.
118
the calculation takes. For this reason, the data is divided into small groups, and learning is
done through these small groups. It indicates how many data points the model will process
at the same time by segmenting data or determining the learning mini-batch parameter.
Learning rate and momentum: In each iteration of learning in DL, the difference is
found by backward derivatives with the backpropagation method, and the result obtained by
multiplying the value with the learning parameter is subtracted from the weight values and
new weight values are calculated. Here, the learning rate can be determined as a fixed value,
or it can be learned as a step-by-step value, a momentum-dependent value, or during learning
by adaptive algorithms. However, the learning rate usually defaults to 0.01 and is reduced to
0.001 after a certain epoch or cycle. It is seen that the momentum beta coefficient, which is
generally used, is also 0.9 and varies in the range of 0.8 0.99.
Epoch number: While the model is being trained, the data is trained and tested in
parts, not completely, and the weights are updated by back propagation according to the
result. This is the name given to each of the training steps.
Weight value and activation function: There are various weight determination
methods that affect the learning speed of the model. It is explained in detail in Section 6.3.2.
In multilayer artificial neural network models, activation functions such as sigmoid, tanh,
and ReLu are mostly used in hidden layers for nonlinear transformation operations. It is used
for backing derivatives in hidden layers.
Dropout value: It has been observed that forgetting weak data below a certain
threshold in fully connected networks increases the success of learning. In general, the value
taken as 0.5 is defined as a value in the range [0,1] when taken as the threshold value.
119
Layer, number of neurons in each layer, and pooling: The feature that distinguishes
DL from ANNs is the high number of layers. That s why it s called deep learning As the
number of layers increases, other features of the data are learned during the learning stages,
so learning is more successful. The number of neurons indicates the amount of information
stored in memory. The better the computer technology and processor, the larger the size of
the kernels that operate on the matrix in each layer, so the pooling process made on the
output of the kernels, in other words, the filtering technique for the data, is also successful.
120
10. EMPIRICAL ANALYSIS
In this section, the models used during the study are introduced. In order to measure
the change in the value of Istanbul Stock Exchange Index by the change in the selected
macroeconomic variables, Logit and Probit models from qualitative response regression
models and a DL model from AI technologies will be estimated. Thus, the effectiveness of
these modeling techniques will be tested.
Descriptive statistics are statistics that describe the general characteristics of a data
set statistically, summarize the series, show typical values at which units are stacked, and
provide information about distributions.
Standard deviation is the most widely used measure of dispersion. Because the
standard deviation expresses how close the observations in the data set are to the mean.
Usually, the small standard deviation value indicates that the deviations from the mean are
small. In the opposite case, when there is a large standard deviation, it is understood that the
data diverge from the mean. The standard deviation is obtained by dividing the sum of the
squares of the deviations from the arithmetic mean, which is the variance, to the total number
of observations, and taking the square root of this value. In summary, the standard deviation
is equal to the square root of the variance.
Summary statistics for the original series dependent and independent variables are
given in Table 10.1 below. Analyzing the summary statistics, it is seen that macroeconomic
variables are exposed to fluctuations in the 2001:01 2022:01 period, so the variability is
high. It is seen that the series are skewed to the left or inclined to values higher than the
average, also due to the effects of the economic crisis. This shows that the financial series
121
are quadratic series. While the averages of macroeconomic variables are different, their
standard deviations also show significant differences.
Table 10.1. Some descriptive statistics for the dependent and independent variables
Time series are numerical quantities in which the values of the variables are observed
consecutively from one period to the next .A
time series is considered stationary if it does not continuously increase or decrease over a
given time period and the data scatter around a horizontal axis over time. In other words, a
time series is stationary if its mean, variance, and covariance do not change systematically
.
The stationarity of the series is a necessary assumption for efficient and consistent
estimations. However, economic time series tend to increase over time, so they are not
stationary. Regression model assumptions require both dependent and independent series to
be stationary and errors to have a zero mean and finite variance. Otherwise, spurious
regressions may occur with regression analysis on a non-stationary series. While there are
-statistics in the spurious regression, the parameter estimation
results are meaningless in terms of economic interpretation. If there is cointegration between
the variables, even if the variables are not stationary when examined individually, taking the
linear functions of these variables causes the series to become stationary by removing the
122
trend of each other and making the spurious regression disappear. Co-integration or
cointegretation means that the error term obtained from linear combinations of non-
stationary series is stationary. For cointegration to occur, all series must be equally
stationary, and the error term obtained from the linear functions of these variables must also
be stationary. This correction mechanism also includes time series AR (Autoregressive)
corrections and MA (Moving averages) corrections for DF and ADF models. Therefore, we
can say that the PP test is an ARMA (Autoregressive Moving Average) process.
While there are various methods for examining the stationarity of time series, the unit
root test is one of them. A unit root in a series means that the series is not stationary. The
most commonly used unit root tests for determining stationarity are the Dickey-Fuller Test
and the Generalized Dickey-Fuller Test (ADF, Augmented Dickey-Fuller, 1979), developed
as an extension of it. In addition, the Phillips Perron (1989) Test, which is created by adding
a non-parametric addition, is used to correct the error terms, considering that these tests are
insufficient when the economic time series are subjected to a break (Wooldridge, 2010, pp.
640 642). This correction mechanism also includes time series AR (Autoregressive)
corrections and MA (Moving Averages) corrections for DF and ADF models. Therefore, it
can be said that the PP test is an ARMA (Autoregressive Moving Average) process (Phillips
& Perron, 1988, pp. 335 346).
The unit root tests and hypotheses used in the study are as follows:
The following process is used to explain the use of the test. The starting point is the
probabilistic unit root process, which is an autoregressive process.
Here, is the error term of the white noise. If =1, which means the series contains
a unit root, the above model will be a non-stationary probabilistic process. Therefore, since
we cannot predict the coefficient in question with Ordinary Least Squares (OLS) estimators,
123
when we subtract from both sides of the model, Figure 10.2 shows -
processor;
For a =0.05 significance level, if, P , is rejected and is said to be the series is
stationary. is calculated and the absolute value of the
calculated value exceeds the McKinnon DF critical values, the null hypothesis cannot be
rejected .
It is stated that the Dickey-Fuller Test error terms are statistically independent and
have constant variance. In the extended Dickey-Fuller unit root test, the lagged values of the
dependent variable are added to the independent variable and the autocorrelation problem in
the error term is eliminated. The Phillips-Perron Unit Root Test uses non-coefficient or non-
parametric statistical methods without adding lagged difference values to account for the
sequential relationship, i.e. autocorrelation, in the error term (Gujarati & Porter, 2009, p.
758).
The critical values of the Phillips-Perron (PP) test and the critical values of the ADF
test are the same. The unit root test developed by Phillips-Perron (PP) adds new assumptions
to Dickey-Fuller s error term. The regression equations of Phillips-Perron are expressed
below (Phillips & Perron, 1988, pp. 335 346).
+ + ve
124
Here shows the white noise sequence process with an expected mean of zero and
the number of observations. No serial correlation or assumption of
homogeneity between error terms is required here. Phillips-Perron (PP) produced test
statistics to test the hypotheses about the and coefficients. In Dickey-Fuller unit root
tests, is used to test a model and in Phillips-Perron test it is expressed as
366).
Economic models are established by taking into account the equilibrium relations
envisaged in economic theory. The existence of significant econometric relations between
the variables in the established model also depends on the stationarity of the series. In a
stationary process, the series fluctuates around a fixed long-term average and the effect of
any shock is not permanent. On the other hand, a non-stationary series will not enter the
long-term deterministic path and will permanently reflect the effects of shocks in the short-
term to the long-term values. Time series are not stationary because they usually show a
volatile structure. Therefore, the variables to be used in the analysis must first be made
stationary.
In order to determine the stationarity of a time series, inferences are made with the
help of graphics. As can be seen in Figure 10.4, where the original graphics of the series of
125
the variables are given, it is seen that all the dependent and independent variables used in the
regression models are not stationary, in other words, the series contain unit roots. Trend or
seasonal mobility was observed in each variable.
Although there are many different unit root test techniques, the most popular ones
are the ADF (1981) and PP (1988) unit root tests for determining whether financial time
series are stationary. The stationarity test of the series was carried out by using three unit
root test models: the fixed model, the fixed and trended model, and finally the none model
without the constant term and trend. The results obtained from the stationarity analysis are
shown in Table 10.2.
When the test statistics in Table 10.2 are examined, it is seen that all of the ADF and
PP coefficients of the original series of the dependent and independent variables are not
It has been seen that the series where all variables
(except the interest rate of the active bond) are not stationary at level contain a unit root at
level. It is seen in Figure 10.4 above that the graphs of the series also support this result. In
the analyses made, it was seen that when the first-order differences of the original series
were taken, the series did not become stationary and the integration degrees were different.
For this reason, stationarity analyses were performed again by taking the natural logarithms
of the series.
126
If a time series becomes stationary when it is differentiated d times, the series is said
to be d-integrated and is denoted as I(d). When the first differences of the series, except the
interest of the active bond, are taken in the analysis, it is concluded that not all of them are
integrated at the same level, the PPI series is integrated at I(2), and the other series are also
integrated at the I(1) level. If the series is not stationary, the future prediction is not made by
making generalizations. While the series are not singularly stationary, their linear
combinations can be stationary. For this reason, after the ADF test, the Engle-Granger test
was performed, and the ADF test was performed on the series consisting of the residues
obtained from the regression and the null hypothesis was rejected. Therefore, according to
Engle-Granger, the series are cointegrated. Therefore, the regression will be a regression
showing a real relationship.
dlISE100 -17.4022[0] *, **, *** -17.5399*, **, *** -3.9951 -3.4279 -3.1373
dlERUSD -12.1062[1] *, **, *** -9.8556*, **, *** -3.9951 -3.4279 -3.1373
dlPPI -7.2550[1] *, **, *** -5.6983 *, **, *** -3.9951 -3.4279 -3.1373
dlIPI -6.1426[11] *, **, *** -87.3719 *, **, *** -3.9951 -3.4279 -3.1373
dlM2 -16.4110[0] *, **, *** -16.4004 *, **, *** -3.9951 -3.4279 -3.1373
dlGP -11.4887[1] *, **, *** -11.3133 *, **, *** -3.9951 -3.4279 -3.1373
dlBRT -13.7588[0] *, **, *** -13.8688 *, **, *** -3.9951 -3.4279 -3.1373
Note: All three models were tried in the level values and first order differences of the series and constant term and
trend models were used in model selection as long as they were meaningful. The values in square brackets represent
the appropriate lag length of the variables determined according to the SIC. d stands for first-order difference. The
minimum lag length at which autocorrelation is removed was chosen. *, ** and *** indicate stationarity at 1%, 5%
and 10% significance levels, respectively.
When the test statistics in Table 10.3 are examined, it is seen that all of the ADF and
PP coefficients of the first-order differences of the dependent and independent variables are
It has been determined that all variables are
127
stationary at the I(1) level and their series do not contain unit roots. All series are I(1) co-
integrated, and the Phillips-Perron unit root test results also confirm the ADF test results.
For this reason, the first-order differences of the variables are taken into account for the
estimation of the models.
As seen in Table 10.4, the correlation coefficients between the independent variables
remain below 0.73 and mostly demonstrate weak linear relationships.
In the linear probability model, the conditional expected value of the dependent
variable is the conditional probability of the event occurring given the independent variable
values. The linear probability model can be estimated by the classical least squares method.
However, the error term is not normally distributed, the error term has heteroscedasticity,
the estimation values can fall outside the range of 0 1, and the error term generally has a
low value. For this reason, Logit and Probit models have been developed as alternative
models to get rid of these problems (Bengio et al., 2015, p. 584; Gujarati & Porter, 2009, pp.
552 553). The difference between the two models is that the Logit curve has a thicker tail
or approaches the axis (0 or 1) more slowly than the Probit curve. The difference in the
coefficients obtained from these models is due to the different functions used for
probabilities (Gujarati & Porter, 2009, p. 571). The issue of which model will be selected in
the application is presented according to the preference of the person who will use the model.
The parameters of the Logit and Probit models are estimated using the maximum
likelihood method. In order to use this method, it is assumed that the probability distribution
of the error term is in accordance with the normal distribution. In this case, when the series
128
are stationary, the normal distribution assumption regarding the error term is fulfilled. Before
estimating the parameters of Logit and Probit regression models, which are qualitative
dependent response models, a stationarity analysis for the variables used in the model is
required.
Because the parameters of the Logit and Probit models cannot be estimated using the
classical least squares method, the maximum likelihood method is used. In this method, the
standard normal distribution which is the Z test statistic, is used for the statistical significance
of the coefficients. In order to test the co-significance of the coefficients in the Logit and
Probit models, Wald, Score (Lagrange Multiplier, LM), and Likelihood Ratio (LR) tests are
used. In the analysis, the standard normal distribution Z statistic was used to test for the
statistical significance of the coefficients, and the LR statistic was used to test the general
significance of the models.
In binary logit-probit models, the STATA program calculates the Pseudo value
as a measure of goodness of fit, and the Eviews12 program calculates the McFadden .
Pseudo- was used in this study.
The coefficients obtained in the models made with Logit and Probit techniques
cannot be interpreted directly as in the linear regression model. However, the sign of the
examined coefficients indicates the direction of the relationship between the independent
variables and the probability of the event occurring. If the sign of the coefficient is negative,
there is an inverse relationship; if it is positive, there is a direct relationship.
129
In this study, both Logit and Probit model estimations were made using the
macroeconomic variables in question, and for this, a threshold value was calculated for our
independent variable, the BIST100 index. Thus, the transformation of the dependent variable
into a binary variable with a qualitative response was achieved. The geometric mean of the
BIST100 variable was taken and converted to a binomial to give a value of 0 for values
below it and 1 for values above it. The geometric mean is a measure of central tendency that
takes into account geometric differences between data, rather than arithmetic differences.
The geometric mean, which covers all the observations in the data set and conforms to
algebraic operations, allows the processing of relative numbers.
While investigating whether the variables used in this study are suitable for the Logit
model and the Probit model, first of all, the dependent variable for each independent variable
was added to the Logit and Probit models separately, and the relationship between the
dependent variable and the independent variables was observed. The test results obtained in
this context are given in Tables 10.5 and 10.6 below.
BinaryISE100
Log- LR Prob
Variables Coefficient Prob Likelihood Statistic (LR St.)
TYBY 0.763828 0.0000 -94.087715 147.70 0.0000
LnERUSD 24294.1 0.0000 -82.945841 169.99 0.0000
LnPPI 553424.8 0.0000 -40.502838 254.87 0.0000
LnIPI 4.48e+07 0.0000 -44.480443 246.92 0.0000
LnM2 1.33e+50 0.0000 -38.382249 259.11 0.0000
LnGP 136.1367 0.0000 -41.043973 253.79 0.0000
LnBRT 12.36735 0.0000 -134.60854 66.66 0.0000
BinaryISE100
Log- LR Prob
Variable Coefficient Prob Likelihood Statistic (LR St.)
TYBY -0.160051 0.0000 -93.265235 149.35 0.0000
LnERUSD 5.747567 0.0000 -83.387448 169.10 0.0000
LnPPI 7.528884 0.0000 -39.850861 256.18 0.0000
LnIPI 9.398173 0.0000 -44.761852 246.36 0.0000
LnM2 66.07681 0.0000 -37.872111 260.13 0.0000
LnGP 2.776939 0.0000 -40.080625 255.72 0.0000
LnBRT 1.452791 0.0000 135.30463 65.27 0.0000
130
As can be seen in the Table 10.5 and Table 10.6 above showing the compatibility
results, each variable is compatible with the dependent binary variable ISE100 for the Logit
and Probit models. A 1% change in each variable causes the ISE100 to take a higher or lower
value than its geometric mean. In other words, each variable in the model is an important
variable that explains the dependent variable and is statistically significant. It was also in
line with our expectations economically. Therefore, our variables are suitable for both the
Logit and Probit models. However, since the changes in the interest rate and gold prices in
the Probit model are in line with expectations, this model is considered more appropriate.
In this study, the Logit and Probit models were estimated with the highest likelihood
method, taking into account the obtained data and evaluations, and the results are
summarized in Table 10.7. It was seen that the variables used in the models were statistically
significant according to the 1%, 5%, and 10% significance levels and were important
variables affecting the dependent variable.
To test the overall significance of the Logit and Probit models, Hosmer-Lemeshow
and Pearson goodness-of-fit tests, which are generally right-tailed tests, were performed. The
Akaike Information Criterion (1973), which is an indicator of the goodness of fit of an
estimated statistical model, and the Schwartz or Bayesian Information Criteria (1978), which
enables model selection among a group of parametric models with different numbers of
parameters, were also calculated. According to the these test, the model with the smallest
AIC or BIC value is the best model. In addition, the correct classification rate was found to
be 95.6 percent and 94.8 percent, respectively.
In the Logit and Probit models, since each of the coefficients obtained (log-odds) is
a log-bet value, they cannot be interpreted directly as in linear regression models. However,
the signs in the coefficients obtained indicate the direction of the relationship between the
variables and the probability of the event occurring. If the coefficient obtained is negative,
the relationship between the variables will be in the opposite direction; if it is positive, the
relationship will be in the same direction.
131
Table 10.7. Results of the Logit model & the Probit model
Note: *, **, ***; represent 1%, 5%, and 10% significance levels respectively.
As seen in Table 10.7, the coefficients obtained in the Logit and Probit models are
statistically significant at 5% and 10% significance levels, respectively. When the
coefficients of the variables were evaluated one by one, the results were obtained in a way
that met our expectations. While an inverse relationship was expected with the interest rate
of the active bond and gold prices, it was observed that this expectation was met and that
there was a positive relationship between other variables, again meeting the expectations. As
the interest rate of the active bond increases, the return will also increase, so the demand of
the investor shifts from the stock market to this instrument. The investor will show the same
reaction when gold prices increase and will choose gold as a safe haven. In other words,
there is a negative relationship between the stock market, the two-year bond rate, and gold
prices. It is seen that a 1% increase or decrease in other macroeconomic variables included
in the model causes the dependent variable to take higher and lower values than the average,
depending on the coefficients of the variables, by increasing or decreasing the dependent
variable at different rates.
132
It is seen that these results are confirmed by four separate tests calculated for models,
showing whether the model is completely significant or not. Since the probability values of
Hosmer-Lemeshow and Pearson tests are higher than 0.05, our models are statistically
significant. Again, AIC and BIC criteria are also undervalued. Accurate classification rates
were found to be 95.65% for the logit model and 94.86% for the probit model. In addition,
considering the LR test statistic and the probability values for this test statistic, it is seen that
all coefficients in both the Logit and Probit models are significant together. Also, the Pseudo
R2 values, which are the output of the STATA program and show the significance of the
whole model, were found to be 0.865 for the Logit model and 0.866 for the Probit model.
The correct classification rate results for the two models are shown in Table 10.8 and Table
10.9.
Table 10.8. Correct classification rate table for the Logit model
Clasisified D F Total
Positive 151 5 156
Negative 6 91 97
Total 157 96 253
Correctly
Classified(%) 96.79 93.81 95.65
Clasisified D F Total
Positive 150 6 156
Negative 7 90 97
Total 157 96 253
Correctly
Classified(%) 96.15 92.78 94.86
In the Probit model, unlike the Logit model, to meet our economic expectations, gold
prices appear as a variable that adversely affects the stock market index as well as the two-
year bond interest. The variable is also statistically significant. Gold is an important
investment tool all over the world and has been seen as a safe haven for investors. It is
evident that many investors are attracted to this area due to the continuous increase in gold
prices in recent years. There is no doubt that speculative attacks in the stock market,
interventions in the capital market, the lack of transparency and reliability of the capital
market, as well as speculative news on this subject, have a negative impact on all investors.
Since all of these factors make small investors nervous, they cause investors to turn to safer
133
investment instruments, and the failure to transform savings into investments leads to a
contraction in the economy. When this mechanism, which has a spiral structure, operates in
the opposite direction, it affects social welfare negatively.
Comparing the performance of the Logit and Probit models, the compatibility test
values of both models are very close to each other, but the main difference between the two
models is that the logistic distribution has a slightly thicker tail. However, there is practically
no reason to prefer one over the other. One is preferred over the other because of its
comparative mathematical simplicity. The correct classification rate varies from 95.65% for
the Logit model to 94.86% for the Probit model, which is quite low at 8 per thousand.
However, the variables in the Probit model were estimated to meet economic expectations.
The statistical tests of the variables included in the model are also an indication of this. In
addition, the Pseudo R2 values of both models were over 86%.
In this study, it was investigated how the DL model, which is a method based on the
working mechanism of AI technologies and is one of the ANNs, should be used to measure
the power of explaining the Istanbul Stock Exchange index values using the selected
macroeconomic variables. It has been shown in many empicial studies that AI applications,
especially in estimating non-linear time series, give as good or better results than well-known
traditional econometric methods, and because of this success, they are used as good
estimation tools (Akel & Karacameydan, 2018; Jing et al., 2021; G. Zhang et al., 1998, p.
40).
The preprocessing stage of the data means the normalization of the data groups
consisting of training and test data in accordance with the established DL networks.
Normalization is one of the most important factors that determine the success of the DL
model. In addition, the entry of the data into the system in raw or processed form should be
determined according to the researcher s purpose for the problem (Kaastra & Boyd, 1996,
p. 222). Data normalization is the process of compressing the output of the processing
134
elements into the range of [0, 1] or [-1, 1] with the activation functions used in the hidden
layer and the output layer (Altan, 2008; Gomes et al., 2016; G. Zhang et al., 1998, p. 49).
The data used in this study consists of non-linear, seasonally unadjusted, and trend-
containing time series in which seven macroeconomic variables and Borsa Istanbul Index
values are taken as independent variables. For this reason, the series are normalized in the
range of [-1, 1] using the equation ) . The graphs of the normalized
independent variable series obtained are given in Figue 10.5 below.
With the DL method based on working of ANNs, many different models can be
produced using different criteria. The performances of the models created are depends on
many parameters, such as the number of inputs, the number of input layers, the number of
135
hidden layers, activation functions, the learning method used, the learning rate, the number
of mini-batches, the number of epochs, weight values, memorization value, and even the
speed of the processor used in the analysis. It is closely related to critical components like
capacity. The most widely used ANN model in the literature is the multilayer perceptron and
the most widely used DL model is LSTM (W. Chen et al., 2017; Cheng et al., 2022a; Jing et
al., 2021; Sarker, 2021).
DL, one of the applications of AI that provides the opportunity to design deep neural
networks with pre-trained models, is a method used on image, time series, and text data.
Convolutional neural networks (ConvNets, CNN) and Long Short-Term Memory (LSTM)
networks can be used to classify and regress data. An LSTM network is a recurrent neural
network (RNN) that processes input data by looping the time steps and updating the network
state. The network contains information remembered at all previous time steps, and using
data from these previous time steps, it predicts the value a time series might take in the next
period. This network learns to predict the value of the next period. In this method, there are
two estimation methods: open and closed loop estimation.
There are activation functions such as ReLu, Sigmoid, and Step (Step), which are the
most commonly used in the literature, in the activation layer. The ReLu activation function
136
is used in this study. In the pooling layer, the maximum pooling function, which is the most
used in the literature, is the maximum pooling, minimum pooling, and average pooling value
functions. In the dropout layer, AI algorithms function to make the network forget the data
in order to prevent the data from being memorized during training. For this reason, the
dropout layer is used in network architecture.
Since the model we created is an LSTM model, the full connection (FullConnected,
FC) layer and classification layers in the structure of this model are also used. Thus, the data
is transformed into a one-dimensional vector and passed to the classification layer. As in
ANNs, in the classification layer, which is the last layer in DL algorithms, it evaluates the
data from the previous full link layer and creates the outputs of the network. SoftMax
classifier, which is a probabilistic calculation method, is generally used in this layer. Thus,
values in the range 0 1 are generated for each possible class
4 18).
For an LSTM model, there must be a number of input neurons that match the numbers
of the input data. In the created LSTM network, seven input neurons, 200 hidden units, 50
fully connected layers units, 50% dropout, and an output layer with the same layer as input
data were created. In the training phase, several tries later, using adam and sgdm optimisation
functions, 100 cycles per iteration and minibatch size 20 with an initial learning rate of 0.01
and gradient threshold 1 were used. Each cycle, the data is shuffled. The goal here is to
prevent the network from memorizing. The general structure of the DL prediction models
created according to the parameters used is given in Figure 10.6.
137
LTSM Layer
Input Layer (200hidden
(7 varb.) units) FullyCon1(50) Droupout (50) FullyCon2(50) Output Layer
1 1 1
1
2 2 2
2
TYBY
3 3
3
ERUSD 3
4 4 4
PPI 4 ISE100
5
5 5
IPI
5
6
6
M2
6 7
GP 7
7
7 8
BRT 8 8
Windows 10 Pro 64-bit is installed on hardware that includes an Intel(R) Core (TM)
i7-10510U CPU @ 1.80 GHz (8 CPUs), 2.30 GHz processor, 16 GB of RAM, and an Intel(R)
UHD Graphics graphics card. It was developed in the MATLAB R2021b environment with
the operating system.
138
The parameters that make up the AI architecture of DL models are given in Table
10.10.
While the data used for the solution of the established DL creates the training set, the
test set consists of data that the network did not see during the training process. The test set
tests whether the network learns well. It has been seen in the literature that the training test
set data is determined at various rates, such as 90%-10%, 80%-20%, or 70%-30%. However,
the most preferred rate is 70% 30%, although it varies according to the problem (Chong et
al., 2017a, pp. 187 205; Jing et al., 2021, pp. 5 8; G. Zhang et al., 1998, p. 50). In some
studies, the data set is divided into three groups as training, validation and test sets (Kaastra
& Boyd, 1996, p. 220). In this study, the data set was divided into training and test sets at
different rates, and models were created for each ratio by using the adam and sgdm
optimisation functions. Analyses were made with models created according to these
parameters. The findings are presented in the tables below for comparison.
139
The data set is divided into 70% training and 30% test sets, which is the most widely
used in the literature. The first 177 observation values (2001:01 2015:08 period) were
determined as the training set, and the last 76 observation values (2015:09 2021:02 period)
were determined as the test set. Another reason for determining the test set in this range is
to show whether the global crisis and its seasonal effects, which started in 2008 and showed
their effects until the beginning of 2009, could be comprehended by the model.
section 6.12 and which are frequently used in performance measurements of MLPs, are used.
These metrics are performance metrics such as R2, HKO-Mean Squared Errors, HKT-Sum
Squared Errors, and HKO-Root Mean Squared Errors. The most appropriate network
structure was obtained by comparing the findings in the training and testing phases, taking
into account these performance criteria.
140
Table 10.11. Simulation results for adam
The effectiveness of each of the seven DL models developed was evaluated using the
MSE and RMSE performance metrics. According to the results in Table 10.11, DL networks
were created with an unnormalized data set using the man optimization algorithm, and as
expected, the performance values differed according to the data set partitioning. The training
set reached the smallest MSE value when it was 40%. However, the test set did not reach the
global minimum, but the performance indicator values were high.
As can be seen in Table 10.12, seven DL network models were created using the
sgdm optimization algorithm with the unnormalized data set. The training set reached the
smallest MSE value when it was 40%. However, the test set did not reach the global
minimum, but the performance indicator values were high. According to these two tables,
the performance values of DL network models are high even if the optimization algorithm
is changed, if the analysis is performed without normalizing the series.
141
Table 10.13 below provides a summary of the outcomes of the examination of DL
models, which were developed by normalizing the research's data set in accordance with the
normal distribution and employing the adam and sgdm optimization algorithms.
Train Test
Train Test
MSE RMSE MSE RMSE
(70%) (30%)
adam 0.0440 0.2099 0.1396 0.3736
sgdm 0.0448 0.2117 0.1444 0.3800
As can be seen from the tables above, the system that gives the best results is a linear
system, and the best results are achieved by normalization.
Figure 10.6 shows how effective learning is when the optimization algorithm
determines the adam algorithm and the number of epoch as 100 during the training phase of
the obtained six layers LSTM DL model (while taking into account the other determined
parameters). As the number of epoch increases, the error between the original series
gradually decreases and progresses towards the global minimum.
142
The graphs in Figure 10.8 below show the training and test curve graphs obtained
from the analysis using the adam optimization algorithm. It is seen that the complexity of
the system increases as the degree of mathematical expressions of the regression models
chosen to model the training and test curves in the graphs increases, and thus this turns into
an advantage. Thus, an obscure system becomes identifiable with a simpler expression.
It can be seen that the two fully connected layered LSTM models aid in the
simplification of complex problems. It is also seen that the mathematical equation of the
LSTM models has a fourth-order function, and the R2 value is quite high at 0.95. This shows
that the model is completely significant and the explanatory power of the variables is quite
high. In other words, the network showed a very successful performance during the training
and testing phase. The model is a good forecasting model, and the margin of error in the
estimates was 0.0440.
143
Figure 10.9. Training progress with sgdm
Figure 10.9 shows how effective learning is when the optimization algorithm
determines the sgdm algorithm and the number of epoch as 100 during the training phase of
the obtained six layers LSTM DL model (while taking into account the other determined
parameters). As the number of epoch increases, the error between the original series
gradually decreases and progresses towards the global minimum.
The graphs in Figure 10.10 below show the training and test curve graphs obtained
from the analysis using the sgdm optimization algorithm. It is seen that the complexity of
the system increases as the degree of mathematical expressions of the regression models
chosen to model the training and test curves in the graphs increases, and thus this turns into
an advantage. Thus, an obscure system becomes identifiable with a simpler expression.
The correctness and sufficiency of the learning of an artificial neural network is tested
with the test set. According to Figure 10.8 and Figure 10.10, which give the training and test
phase results, the trainig and test phase has been successfully completed. As can be seen
from these graphs, which show the estimation values alongside the real values, the estimation
errors are quite low and gradually approach zero. In order to measure the prediction success
of the deep neural network models created after the training and testing phases, the findings
in Table 10.14 or, and root mean square
error.
Train Test
MSE RMSE MSE RMSE
adam 0.9501 0.0440 0,2099 0,9156 0,1396 0,3736
sgmd 0,9759 0,0448 0,2117 0,9919 0,1444 0,3800
When the findings in Table 10.14 and the graphs showing the results of the training
and testing phases are evaluated together, it is seen that the error margins of the networks in
the training and testing phases are quite low and approach zero. In other words, it is seen
that multilayer LSTM deep neural network models perform quite successfully in the in-
sample and out-of-sample periods. During the in-
multilayer deep neural network model developed to estimate the ISE100 index value was
97.59%, while the MSE value was at the lowest level at 0.0448.
145
Out-of-sample predictions are of greater importance in the comparison of prediction
accuracies, as they consist of predictions from new data that are not used in model prediction.
Considering this situation, it is seen that the created multilayer deep neural network model
at a very high level of 99.19%. This model s MSE value was at its lowest with 0.1444. In
general, it is seen that there is a strong and positive relationship between the independent
variables and the ISE100 value. It is seen that this relationship between the variables is over
50% in all variables, and especially the MSE is at a very low level.
Although the variables were not seasonally adjusted and dummy variables were not
used during the creation of the models, the deep neural network models performed quite
successfully. The deep neural network models developed in general have a good
understanding of the effects of the economic crisis in Turkey in 2001 and the global
economic crisis that emerged in the USA in mid-2007 and adversely affected all world
economies in 2008 on macroeconomic variables and the ISE100 index.
The findings in Table 10.14 and the training and testing phase graphics given above
show that the deep neural network models created gave very good results and that the
networks successfully completed the training and testing phases. The positive performance
indicators show that deep neural network models can be used for prediction. However, the
prediction results will be presented comparatively in the last section.
Time series are numerical quantities in which the values of the variables are observed
consecutively from one period to the next .
Statistically, the economic time series has a structure consisting of trend, seasonality,
cyclical, and random movements (Kennedy, 1998, p. 288). Therefore, they are not stationary.
It is very important for decision makers to be able to predict the future performance and
behavior of economic time series due to uncertainties in the economy. In this case, the model
established for a time series based on the observed values is expected to have a performance
that can predict the possible values that the series may take in the future (one day, one month,
146
or one year later). In other words, the forecasting performance of the model is expected to
be high.
In many empirical studies in the literature, econometric models have been used as
predictive models, and AI and deep neural network models have been used in comparison
with econometric models in recent years due to their high prediction performance. This is
because AI models can outperform and even outperform classical models in predicting
nonlinear, seasonal, and trending time series.
In this part of the study, the comparison of Logit, Probit, and DL methods has been
made. The comparison was made based on the estimation performances of the methods used.
The performances of the Logit and Probit models with the best performance were examined,
and the prediction results obtained from the regression analysis are presented comparatively
in Tables 10.15 and Table 10.16.
Clasisified D F Total
Positive 151 5 156
Negative 6 91 97
Total 157 96 253
Correctly Classified (%) 96.79 93.81 95.65
Sensitivity (%) 96.18
Specifivity (%) 94.79
Error Rate (%) 4.35
147
Table 10.16. Probit model performance
Clasisified D F Total
Positive 150 6 156
Negative 7 90 97
Total 157 96 253
Correctly Classified (%) 96.15 92.78 94.86
Sensitivity (%) 95.54
Specifivity (%) 93.75
Error Rate (%) 5.14
In this part of the study, the comparison of Logit, Probit and DL methods has been
made. The comparison was made based on the estimation performances of the methods used.
The performances of the Logit and Probit models with the best performance were examined,
and the prediction results obtained from the regression analysis are presented comparatively
in Tables 10.15 and Table 10.16.
The DL model learned the in-sample period series correctly at a rate of 98.81% with
a loss of 1.19% during the training phase. In other words, the rate at which the model learned
about the crises experienced in Turkey in 2000 and 2001 and the reflections of the 2008
global crisis is 98.81%.
As can be seen from Table 10.17, the DL model correctly learned the in-sample
period series and predicted 100% of the periods in the test data set for the out-of-sample
prediction. After 2015, the effects of the developments in the markets due to the political
events in July 2016 in Turkey and the price increases due to the COVID-19 pandemic in
2019, and their reflections on the market indicators, have been estimated at 100%. In other
words, the validation rate of the model is 100%.
Although the validation rates of the created models are high, the validation
percentage of the DL model is 100%. The margin of error is zero. In other models, the margin
of error is between 4.35% and 5.14%.
148
11. ARGUMENT
The effectiveness of AI has been proven once again in this study, in which the DL
method, one of the new generation of AI technologies, was tested alongside traditional
statistical and econometric methods such as the Logit and Probit regression modeling
techniques. In this study, it was discovered that AI technologies outperform classical
econometric methods in the analysis of financial time series with a dynamic and volatile
structure. Changes in the seven selected macroeconomic variables will affect the future
values of the ISE100 index in which direction and strength. In fact, studies have shown that
even new modulation techniques developed by AI technologies outperform each other in
estimating each other. The new formations that have occurred all over the world in recent
years, the ongoing economic crises since 2008, the COVID-19 pandemic that started in 2020,
the military crises between Russia and Ukraine in 2022, have actually revealed the necessity
of evaluating the world as a whole, not as countries and borders.
149
12. CONCLUSION AND PROPOSAL
ANNs are computer software developed to imitate the working mechanism of the
human brain and to realize the brain's abilities, such as learning, remembering, and
generating new information through generalization, without any assistance. Due to this
feature, AI applicaitons are successfully applied in many fields, such as industry, business,
finance, education, the military, defense, and health. One of the application areas of AI is
solving problems of future prediction.
ANNs have been able to obtain very successful results in non-linear time series
analysis due to their non-linear structure. In this regard, AI has become a frequently used
method in many areas of finance. In the field of finance, although AI models are applied in
many studies examining issues such as predicting financial crises, determining the direction
of exchange rates, general price levels, and especially measuring and selecting the
performance of stocks, there are limited studies on the estimation of values of stocks market
index.
Linear stochastic regression models can have an advantage over other models if they
can understand and explain important details between variables. However, linear models are
insufficient when the relationship between the variables in the studied problem is not linear.
At this point, AI models can make successful predictions when the appropriate network
structure for nonlinear relationships is determined.
In this study, it is aimed to estimate the value of the ISE100 index for the period
January 2001 January 2022 by using the DL modeling technique, which is one of the non-
linear estimation models, after the variables to be used as input variables are determined by
using the Logit and Probit methods. General information about the size and development of
stock markets in Turkey and around the world, or in other words, stock markets, is given.
Then, AI technologies, such as ANNs, and finally DL modeling, which is ML, were
examined in detail. The DL model was designed, and the ISE100 value was modeled using
the appropriate network architectures and seven macroeconomic variables. Estimated
models were evaluated within themselves and performance comparisons were made by
performing Logit and Probit analyses. In Logit and Probit analysis, the ISE100 index was
150
used as the dependent variable, with seven independent variables: the Dollar Rate (TL/$),
the Money Supply, the Producer Price Index, the Industrial Production Index, Gold Prices
(TL/Gr), the Active Bond Interest Rate, and the Brent Oil Price.
When the Logit and Probit model results are examined, it is seen that the models
created are the best models. Since the probability values of Hosmer and Lemeshow and
Pearson goodness of fit tests were higher than 0.05, the models were statistically completely
significant, and the lowest AIC and BIC values and the highest Pseudo R2 values were found
to be 0.865 for the Logit model and 0.866 for the Probit model. The accurate classification
rates were found to be 95.65% and 94.86% for the Logit and Probit models, respectively. It
has been concluded that the estimation results of the Logit and Probit models are very close
to each other. According to the Logit and Probit analysis results, the coefficients obtained
for each independent variable are significant, and the coefficient sizes and test statistics
values are close to each other. It has been observed that the Probit model gives better results
than the Logit model.
The fact that the predictions made with Logit and Probit analyses show great
deviations from the real values and the better prediction performance of the network even
though the seasonal effects are not reflected in the deep neural networks cause a
generalization to be made that the nonlinear modeling for financial variables, that is, the
deep neural network method, is more effective.
There are very few studies that use deep neural networks or DL models to predict
financial markets and stock markets around the world. The method of ANNs has been used
in several studies to predict the direction of the Turkish stock market, but the DL model has
not been used. In this study, it is thought that the use of Logit and Probit analysis and the DL
model to predict the direction of the ISE100 index and the large number and variety of
macroeconomic variables included in the analysis add originality to the study.
151
Today, developments in the fields of communication technology and finance, the
global economic crises that started in the US loan market and affected the national markets
of all the countries of the world, and the socio-economic processes experienced after
epidemics such as the COVID-19 pandemic have shown that international capital markets
have gained a national capital market identity. It is seen that the models created with AI
techniques, which have the ability to make successful predictions and model non-linear
relationships and thus explain the changes in the capital market, are an effective estimation
tool for investors who want to hedge risk and gain returns in such a wide market. Because
of these features, artificial neural network techniques are becoming an increasingly
important problem-solving tool in the field of finance.
Based on this study, the following can be suggested as future research topics for the
ISE100 index: While expanding the basket of macroeconomic variables that affect the
ISE100 value, the stock being studied can be limited. In other words, a similar analysis can
be made for certain stocks in Turkey. The analysis of the value of the Turkish stock market
over a longer period by comparing other econometric models with AI methods and/or using
hybrid models can be another subject of study.
152
13. REFERENCES
Abraham, B., & Malik, H. J. (1973). Multivariate Logistic Distributions. CRC Press., 1(3),
Aggarwal, M. (2020). Probit and Nested Logit Models Based on Fuzzy Measure. I
131 145.
Aladag, C. H., Egrioglu, E., & Kadilar, C. (2009). Forecasting Nonlinear Time Series with
Alaloul, W. S., & Qureshi, A. H. (2020). Data Processing Using Artificial Neural
pay
, 10(2),
Article 2.
Altay, E., & Satman, M. H. (2005). Stock market forecasting: Artificial neural network and
Anderson, D., & McNeill, G. (1992). Artificial Neural Networks Technology. Kaman
Science Corporation,.
Avci, E. (2007). Forecasting Daily and Sessional Returns of the ISE-100 Index with Neural
153
Avci, E. (2015). Stock Return Forecasts with Artificial Neural Network Models. Marmara
, 26(1), Article 1.
Barro, R. J. (1990). The Stock Market and Investment. Review of Financial Studies, 3(1),
115 131.
Bengio, Y., Goodfellow, I., & Courville, A. (2015). Deep Learning: Methods and
, 16, 31 46.
Bollen, N. P. B., & Busse, J. A. (2001). On the Timing Ability of Mutual Fund Managers.
/tr/sayfa/471/borsa-istanbul-hakkinda.
Boyer, B., & Zheng, L. (2009). Investor flows and stock market returns. Journal of
Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The Econometrics of Financial
Financial-Markets-John-Campbell/dp/0691043019
Cao, J., & Wang, J. (2020). Exploration of Stock index Change Prediction Model Based on
Chakravarty, S. (2005). Stock Market and Macro Economic Behavior in India. Institute of
154
Chen, A.-S., Leung, M. T., & Daouk, H. (2003). Application of Neural Networks to an
Emerging Financial Market: Forecasting and Trading The Taiwan Stock Index.
Chen, N.-F., Roll, R., & Ross, S. A. (1986). Economic Forces and the Stock Market. The
Chen, W., Zhang, Y., Yeo, C. K., Lau, C. T., & Lee, B. S. (2017). Stock market prediction
using neural network through news on online social networks. 2017 International
Cheng, D., Yang, F., Xiang, S., & Liu, J. (2022a). Financial Time Series Forecasting with
Cheng, D., Yang, F., Xiang, S., & Liu, J. (2022b). Financial time series forecasting with
Chong, E., Han, C., & Park, F. C. (2017a). Deep Learning Networks for Stock Market
Chong, E., Han, C., & Park, F. C. (2017b). Deep learning networks for stock market
Finansal Ekonometri
Cowles, A. (1933). Can Stock Market Forecasters Forecast? Econometrica, 1(3), 309 324.
155
Dase, R. K., & Pawar, D. D. (2010). Application of Artificial Neural Network for Stock
Davis, E. P., & Karim, D. (2008). Comparing Early Warning Systems for Banking Crises.
https://www.mathworks.com/products/deep-learning.html
Diler, A. I. (2003). Forecasting the Direction of the ISE National-100 Index by Neural
65 82.
Donaldson, R. G., & Kamstra, M. (1999). Neural Network Forecast Combining with
Dutta, G., Jha, P., Laha, A. K., & Mohan, N. (2006). Artificial Neural Network Models for
. Remzi Kitabevi.
https://www.nadirkitap.com/kuresel-finans-krizi-piyasa-sisteminin-elestirisi-mahfi-
egilmez-kitap14581620.html
Enke, D., Grauer, M., & Mehdiyev, N. (2011). Stock Market Prediction with Multiple
Enke, D., & Mehdiyev, N. (2013). Stock Market Prediction Using a Combination of
156
a Fuzzy Inference Neural Network. Intelligent Automation & Soft Computing,
Fama, E. F. (1990). Stock Returns, Expected Returns, and Real Activity. The Journal of
Fama, E. F., & French, K. R. (1996). Multifactor Explanations of Asset Pricing Anomalies.
Fathi, E., & Maleki Shoja, B. (2018a). Deep Neural Networks for Natural Language
Fathi, E., & Maleki Shoja, B. (2018b). Chapter 9 Deep Neural Networks for Natural
Gomes, L. F. A. M., Machado, M. A. S., Caldeira, A. M., Santos, D. J., & Nascimento, W.
J. D. do. (2016). Time Series Forecasting with Neural Networks and Choquet
Gonenc, H., & Karan, M. B. (2003). Do Value Stocks Earn Higher Returns than Growth
Goodell, J. W., Kumar, S., Lim, W. M., & Pattnaik, D. (2021). Artificial intelligence and
100577.
Gujarati, D. N., & Porter, D. C. (2009). Basic eEconometrics (5th ed). McGraw-Hill Irwin.
157
, 27, Article 27.
irilerine Etkisi:
, 1(1 2),
Article 1 2.
Henrique, B. M., Sobreiro, V. A., & Kimura, H. (2019). Literature Review: Machine
Hsieh, J., Chen, T.-C., & Lin, S.-C. (2019). Credit constraints and growth gains from
Hu, M. Y., & Tsoukalas, C. (1999). Combining Conditional Volatility Forecasts Using
Hull, J. (2007). Risk Management and Financial Institutions. Pearson Prentice Hall.
https://books.google.com.tr/books?id=G4nCQgAACAAJ
Ilahi, I., Ali, M., & Jamil, R. (2015). Impact of Macroeconomic Variables on Stock Market
158
Jing, N., Wu, Z., & Wang, H. (2021). A Hybrid Model Integrating Deep Learning with
Investor Sentiment Analysis for Stock Price Prediction. Expert Systems with
Kaastra, I., & Boyd, M. (1996). Designing a Neural Network for Forecasting Financial and
Kaminsky, G. L., & Reinhart, C. M. (1999). The Twin Crises: The Causes of Banking and
Kamruzzaman, J., Begg, R. K., & Sarker, R. A. (2006a). Artificial Neural Networks in
Kamruzzaman, J., Begg, R. K., & Sarker, R. A. (2006b). Artificial Neural Networks in
Price Index Movement Using Artificial Neural Networks and Support Vector
Machines: The Sample of The Istanbul Stock Exchange. Expert Systems with
, 13(2), Article 2.
Paper.
159
change
Kim, K. (2003). Financial Time Series Forecasting Using Support Vector Machines.
Kim, T. Y., Oh, K. J., Sohn, I., & Hwang, C. (2004). Usefulness of Artificial Neural
Networks for Early Warning System of Economic Crisisrisis. Expert Systems with
Kovacova, M., & Kliestik, T. (2017). Logit and Probit Application for The Prediction of
Kubat, C. (2019).
Liang, X., Zhang, H., Xiao, J., & Chen, Y. (2009). Improving Option Price Forecasts with
3055 3065.
Lin, C.-C., Chen, C.-S., & Chen, A.-P. (2018). Using intelligent computing and data
stream mining for behavioral finance associated with market profile and financial
160
Lin, C.-S., Khan, H. A., Chang, R.-Y., & Wang, Y.-C. (2008). A New Approach to
Lintner, J. (1965). Security Prices, Risk, and Maximal Gains From Diversification. The
McCulloch, W. S., & Pitts, W. (1943). A Logical Calculus of the Ideas Immanent in
(1st
Mossin, J. (1966). Equilibrium in a Capital Asset Market. Econometrica, 34(4), 768 783.
Nunes, M., Gerding, E., McGroarty, F., & Niranjan, M. (2019). A Comparison of
Multitask and Single Task Learning with Artificial Neural Networks for Yield
161
-price manipulation in an
emerging market: The case of Turkey. Expert Systems with Applications, 36(9),
11944 11949.
Panda, C., & Narasimhan, V. (2007). Forecasting Exchange Rate Better with Artificial
Patil, P. R., Parasar, D., & Charhate, S. (2021). A Literature Review on Machine Learning
Paul D. McNelis. (2005). Neural Networks in Finance: Gaining Predictive Edge in the
(EBSCOhost).
Peng, Y., Albuquerque, P. H. M., Kimura, H., & Saavedra, C. A. P. B. (2021). Feature
Selection and Deep Neural Networks for Stock Price Direction Forecasting Using
Phillips, P. C. B., & Perron, P. (1988). Testing for a unit root in time series regression.
Quazi, S. (2022). Artificial Intelligence and Machine Learning in Precision and Genomic
Roll, R., & Ross, S. A. (1980). An Empirical Investigation of the Arbitrage Pricing Theory.
Rousseau, P. L., & Sylla, R. (2003). Financial Systems, Economic Growth, and
of Chicago Press.
162
Sarker, I. H. (2021). Deep Learning: A Comprehensive Overview on Techniques,
BDDK
, 5(1), Article 1.
Schwert, G. W. (1990). Stock Returns and Real Activity: A Century of Evidence. The
Smith, K. A., & Gupta, J. N. D. (2000). Neural Networks in Business: Techniques and
Network and Agility. Procedia - Social and Behavioral Sciences, 195, 1477 1485.
Sutskever, I., Martens, J., & Dahl, G. (n.d.). On The Importance of Initialization and
Re , 43(1), 63 75.
163
Tsai, C. F., & Wang, S. P. (2009). Stock Price Forecasting by Hybrid Machine Learning
Tseng, C.-H., Cheng, S.-T., Wang, Y.-H., & Peng, J.-T. (2008). Artificial neural network
model of the hybrid EGARCH volatility of the Taiwan stock index option prices.
Turing, A. M. (1950). Computing Machinery and Intelligence. Mind, 49, 433 460.
Literature Review. American Journal of Trade and Policy, 4(3), 123 128.
Vui, C. S., Soon, G. K., On, C. K., Alfred, R., & Anthony, P. (2013). A review of stock
market prediction with Artificial neural network (ANN). 2013 IEEE International
Wang, L., Wang, Z., Qu, H., & Liu, S. (2018). Optimal Forecast Combination Based on
Neural Networks for Time Series Forecasting. Applied Soft Computing, 66, 1 17.
Wongbangpo, P., & Sharma, S. C. (2002). Stock Market and Macroeconomic Fundamental
27 51.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (The
Wu, S.-I., & Lu, R.-P. (1993). Combining Artificial Neural Networks and Statistics for
164
Yao, J., Poh, H., & Jasic, T. (1996). Foreign Exchange Rates Forecasting with Neural
Yolcu, U., Egrioglu, E., & Aladag, C. H. (2013). A New Linear & Nonlinear Artificial
Neural Network Model for Time Series Forecasting. Decision Support Systems,
Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning Gaussian Processes from Multiple
, 1012 1019.
Zhang, G., Eddy Patuwo, B., & Y. Hu, M. (1998). Forecasting with Artificial Neural
Networks: The state of the Art. International Journal of Forecasting, 14(1), 35 62.
Zhang, G. P. (2003). Time Series Forecasting Using a Hybrid ARIMA and Neural
Zhang, G. P. (2004b). Neural Networks in Business Forecasting. Idea Group Inc (IGI).
165