Machine Learning Methods in Finance
Machine Learning Methods in Finance
Abstract
We study how researchers can apply machine learning (ML) methods in finance. We first
establish that the three distinct categories of ML (supervised learning, unsupervised learn-
ing, reinforcement learning and others) address fundamentally different problems than tra-
ditional econometric approaches. Then, we review the current state of research of ML in
finance and identify three archetypes of applications: i) the construction of superior and
novel measures, ii) the reduction of prediction error, and iii) the extension of the standard
econometric toolset. With this taxonomy, we provide an outlook on potential future direc-
tions for both researchers and practitioners. We finally apply ML to typical problems in
finance. Our results suggest large benefits of ML methods compared to traditional ap-
proaches and indicate that ML holds great potential for future research in finance.
†
We appreciate helpful comments and suggestions made by Andreas Benz, Martin Ruckes, and Fabian Silbereis.
∗
Hoang and Wiegratz are with the Karlsruhe Institute of Technology (KIT). Address correspondence to Daniel Hoang,
Institute of Finance, Banking, and Insurance, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Ger-
many, Phone: +49 721 608-44768 or e-mail: [email protected].
1. Motivation
Artificial intelligence is increasingly entering our day-to-day life with impressive applications: face
detection enables safe and efficient airport travel, voice recognition allows us to talk to personal
assistants on our smartphones and smart home devices, and ever more firms are using chatbots
for quick customer support. Almost everyone interacts with modern artificial intelligence many
The main technology behind artificial intelligence is machine learning (ML). ML methods enable
machines to conduct such complex tasks as detecting faces, understanding speech, or answering
messages. Given the power of ML technology, it is natural to ask whether we can also apply ML
methods elsewhere. This paper addresses the use of ML to solve problems in financial economics.
Varian (2014) describes ML as an appropriate tool in the economic analysis of big data and
presents some ML methods with examples in economics. He further hints at potential ML appli-
cations in econometrics. Mullainathan and Spiess (2017) identify prediction problems as the main
use case of ML in economics and present different categories of existing and potential future
applications. Athey and Imbens (2019) illustrate the most relevant ML methods from an econo-
metric perspective. They also provide an overview of ML’s potential beyond pure prediction,
In financial economics, while the number of published ML papers is still limited, an increasing
number of recent applications exploit the potential of ML. For instance, Bandiera et al. (2020)
analyze CEO behavior with ML methods. Gu, Kelly, and Xiu (2020) use ML to predict risk
premiums as a typical problem in empirical asset pricing. Bertsch et al. (2020) study bank mis-
economics has greatly expanded recently. However, it is still mostly unclear where and how to
The contribution of this paper is threefold. First, we give a high-level introduction to ML aimed
at financial economists. We illuminate the different types of ML, their purposes and functionali-
ties, and the available methods for each type. Given our focus on financial economics, we place
1
special emphasis on the difference between traditional econometrics and ML. Our concise intro-
duction allows researchers in the field to quickly grasp the ML essentials that are relevant for
Given the increasing number of recent studies, earlier classifications do not capture existing ap-
plications well. We review the up-to-date literature in the field and divide it into three distinct
archetypes. Our taxonomy allows researchers to better understand the current state of the litera-
ture and how different contributions relate to each other. Furthermore, it serves as guidance for
Third, we apply ML to solve typical problems at the intersection of financial economics and real
estate finance: the pricing of real estate assets and credit risk prediction. To study the accuracy
and usefulness of ML predictions for heterogeneous real estate assets, we exploit a unique dataset
of more than four million German real estate properties listed for sale on five German property
portals and in major newspapers between 2000 and 2020. Our results show that the ML-based
price estimates for individual properties exhibit dramatically lower pricing errors than traditional
approaches such as hedonic pricing with ordinary least squares (OLS). Hence, ML can directly
help participants in real estate markets make more informed decisions and improve aggregate
market efficiency. In general, our application illustrates how ML adds value in solving a typical
Traditional econometrics aims to provide causal explanations for economic phenomena by analyz-
ing relationships between economic variables. ML, in contrast, serves different purposes. There
are two major types of ML: supervised and unsupervised learning. Supervised learning provides
us with predictions that exhibit low out-of-sample prediction errors by automatically considering
nonlinearities and interaction effects. Unsupervised learning infers structural information from the
given data. Hence, ML is suited for different kinds of applications than traditional econometrics.
Based on our review of the financial economics literature, we classify ML applications into three
error in economic prediction problems, and 3) extension of the existing econometric toolset.
2
First, researchers can use ML to construct superior and novel measures. ML methods are able to
extract information from unconventional data such as text or images. The extracted information
can then serve as a superior or novel measure of an economic variable. Superior ML measures
exhibit lower measurement error and therefore allow more precise estimates of economic relation-
ships than traditional measures can. Novel measures allow us to conduct analyses with previously
Second, researchers can use ML to reduce prediction error in economic prediction problems. There
are certain problems in financial economics that are prediction problems at their core. For in-
stance, the fundamental problem in credit risk is the prediction of credit default. Given that the
main functionality of supervised ML is prediction, ML methods are able to provide better results
Third, researchers can use ML to extend the existing econometric toolset. Econometric tools often
contain a prediction component. For instance, the first stage of an instrumental variable design is
effectively a prediction problem. ML methods can enhance such existing econometric tools by
improving the performance of their prediction component. Furthermore, some ML methods them-
selves directly serve as new econometric tools. For instance, clustering methods from unsupervised
estate pricing, which is particularly relevant in the areas of household finance and real estate
economics. More specifically, we predict the prices of real estate assets in Germany with various
ML methods and compare their accuracy to estimates from traditional hedonic pricing with OLS.
Figure 1 illustrates our key results. The two charts compare the actual property prices with the
OLS estimates and with the price predictions of our best-performing ML method (boosted regres-
sion trees). On average, the price predictions from the ML approach are much closer to the actual
prices than the OLS estimates. The difference in pricing accuracy is especially pronounced at the
upper end of the price range: while the OLS estimates show large deviations from the actual
prices, the ML-based price predictions are much closer. As we show in more detail below, ML is
especially able to improve pricing accuracy for more expensive real estate assets, which traditional
hedonic regression cannot price well. Furthermore, our results indicate that nonlinearities and
interaction effects are most relevant for real estate assets at the upper end of the price range.
3
Figure 1. Comparison of pricing accuracy between hedonic pricing (OLS) and ML
This figure depicts the pricing accuracy of traditional hedonic pricing (OLS) and ML. On average, the
ML-based price estimates are much closer to the actual prices than the OLS estimates are. The benefit
of ML is most pronounced at the upper end of the price range, where OLS performs especially poorly.
The superior pricing power of ML compared to traditional hedonic pricing with OLS becomes
even more apparent if we look at different performance metrics. While hedonic pricing with OLS
can only explain approximately 40% of the price variation as measured by R², ML almost doubles
R² to approximately 77%. On average, the OLS estimates misprice real estate assets by almost
44%, which ML can reduce to less than 27%. In monetary terms, ML reduces the average mis-
pricing by more than 82,000 EUR. Given that the average property price in our sample is 393,000
EUR, the improvement in pricing accuracy is not only statistically significant but also economi-
In the top price quintile where hedonic pricing with OLS performs especially poorly, ML shows
even stronger performance. The ML-based price predictions exhibit an average pricing error of
less than 24% compared to over 50% for the OLS estimates. In monetary terms, ML reduces the
average mispricing by more than 240,000 EUR, which is economically very large, given the average
The remainder of this paper is organized as follows. Section 2 contains a high-level introduction
to ML. In Section 3, we present the three types of ML applications and review the corresponding
literature. In Section 4, we apply ML in real estate pricing to illustrate the benefits of ML. In
4
2. Introduction to ML
In this section, we provide a high-level introduction to ML to facilitate a better understanding of
its applications in financial economics later in this paper. We focus on the fundamental problems
that ML can solve, how the different types of ML work, and which methods exist. Since our main
target audience is economists, we place special emphasis on the differences between ML and
Most studies in empirical economics aim for causal explanations of economic phenomena by ana-
lyzing relationships between economic variables. For instance, we might want to explain the cross-
sectional differences in real estate prices. We then mainly care about how different influencing
factors, such as location or number of bedrooms, affect price. Traditional econometric methods
provide us with estimates 𝛽 ̂ for the direction and strength of these influencing factors.
ML, in contrast, serves different purposes. Instead of providing direct insights into the relation-
ships between economic variables, ML serves as a method for prediction or for data structure
inference. In prediction, we use the given observations to infer estimates for the dependent variable
𝑦̂ of new observations based on their covariates 𝑋. For instance, we might want to use the observed
prices and property characteristics in the real estate market to predict prices of previously unob-
served properties based on their characteristics. The first major type of ML, supervised learning,
In data structure inference, we derive different kinds of structural information from given data 𝑋.
For instance, we might want to identify clusters in the data to learn how different observations
relate to each other. The second major type of ML, unsupervised learning, provides us with
Figure 2 gives an overview of the differences between traditional econometrics and the two major
types of ML, supervised and unsupervised learning. Most importantly, the three approaches serve
different purposes. As explained above, traditional econometrics aims for explanations; that is, it
predictions; that is, it solves so-called 𝑦̂-problems (Mullainathan and Spiess, 2017). Unsupervised
learning infers the data structure from given data, so it solves 𝑋-problems.
5
Figure 2. Differences between traditional econometrics and the two major types of ML:
supervised and unsupervised learning
This figure gives an overview of how traditional econometrics and the two major types of ML, supervised
and unsupervised learning, differ with regard to their methodological process and purpose. Traditional
econometrics enables explanations of economic phenomena, while supervised learning provides predictions
and unsupervised learning infers data structure.
The three approaches also differ with regard to their methodological process. Every approach
starts from data. In traditional econometrics, we have a dependent variable 𝑦 and multiple inde-
pendent variables 𝑋. In ML jargon, we call such data “labeled data”, since there is a special label
𝑦 for each observation. The dominant method in traditional econometrics is linear regression,
mainly due to its flexibility and interpretability. We usually require unbiased estimates of the
strength and direction of economic relations; thus, the OLS estimator, as the best linear unbiased
estimator, is most common. As a result, we obtain an explanatory model in the form of a regres-
sion line and different metrics of statistical significance, such as t-values and p-values. Finally,
In supervised learning, we also start with labeled data. Here, the special label 𝑦 represents the
target variable that we want to predict based on the predictor variables 𝑋. Applying a supervised
ML method on the given data yields a prediction model as well as estimates for its expected
prediction performance. We can use the prediction model to make out-of-sample predictions, that
is, predictions for the value of the target variable of previously unseen examples based on their
characteristics.
In unsupervised learning, we start with unlabeled data, which is the defining distinction between
unsupervised and supervised learning in the literature (Hastie, Tibshirani, and Friedman, 2009,
6
pp. 485-486). Unlabeled data means that there is no special variable; all variables are considered
equal. Applying an unsupervised ML method to the given data provides us with a data structure
model and data structure characteristics. Finally, we can use both results to infer structural
In the following subsections, we describe the two major types of ML – supervised and unsupervised
learning – in more detail and give an overview of the relevant methods for each type. In the last
The purpose of supervised learning is prediction. More specifically, we aim for out-of-sample pre-
dictions with high prediction performance. To accurately assess the expected prediction perfor-
mance on previously unseen observations, we typically split the given data into training data and
test data. We apply a supervised ML method on the training data to build a prediction model.
Then, we apply the prediction model on the test data to derive an estimate for the expected out-
To build a prediction model, there exist various supervised ML methods of differing complexity.
In general, more complex methods enable higher prediction performance but reduce interpreta-
prediction performance and interpretability. We can further distinguish between different classes
of methods based on similarities in their general approach and based on the data types for which
The simplest method is linear regression with the OLS estimator. As stated before, OLS provides
A simple way to improve the prediction performance of the linear OLS model would be to add
nonlinear transformations and interactions of the original predictor variables to the model speci-
fication. In many cases, however, it is ex ante unclear which nonlinearities and interactions are
actually relevant. Including all possible combinations is unfeasible since it results in an exorbitant
1
The more abstract descriptions of the individual steps of unsupervised learning result from the fact that various
methods with many different goals fall under the umbrella term unsupervised learning. See Section 2.2 for an overview
of the different categories and methods in unsupervised ML.
7
number of variables, which quickly exceeds the number of observations. In many cases, the sheer
Since OLS is only the best linear unbiased estimator (BLUE), a more feasible way to improve its
lems do not require unbiased variable coefficients. Instead, we only aim for maximal prediction
performance. Regularized linear methods offer a way to systematically introduce bias to improve
the prediction performance of OLS (Hastie, Tibshirani, and Friedman, 2009, pp. 61-79). More
specifically, regularization means that such methods shrink the coefficients of the predictor varia-
bles to increase prediction performance.2 The most common method for regularized linear regres-
sion is the least absolute shrinkage and selection operator (LASSO). LASSO works similarly to
OLS but introduces bias by adding a penalty term in its optimization function to penalize large
variable coefficients with little informational content. The specific functional form of the penalty
term tends to drive irrelevant coefficients to exactly zero.3 Hence, LASSO is often used for variable
selection in addition to pure prediction and thereby provides relatively good interpretability.
In addition to LASSO, there are other regularized linear methods that differ with regard to the
functional form of their penalty terms. Ridge regression uses a penalty term that does not drive
coefficients to exactly zero and is therefore less interpretable.4 However, ridge regression often
provides superior prediction performance compared to LASSO. Elastic net regression combines
the two methods (Zou and Hastie, 2005). Its penalty term is a linear combination of the penalty
In contrast to the linear methods just discussed, more complex ML methods automatically con-
sider relevant nonlinearities and interaction effects. For numerical data, tree-based ML methods
are widespread (Hastie, Tibshirani, and Friedman, 2009, pp. 305-334). The simplest tree-based
2
The introduction of bias can increase prediction performance because of the bias-variance tradeoff. See, for instance,
Hastie, Tibshirani, and Friedman (2009, pp. 37-38, 219-228) for technical details.
3
LASSO uses the penalty term 𝛼 ∑ |𝛽 |, where 𝛼 is a parameter that controls the amount of regularization and 𝛽
are the variable coefficients.
4
Ridge regression uses the penalty term 𝛼 ∑ 𝛽 , where 𝛼 is a parameter that controls the amount of regularization
and 𝛽 are the variable coefficients.
5
Elastic net regression uses the penalty term 𝛼 ∑ |𝛽 | + 𝛼 ∑ 𝛽 , where 𝛼 and 𝛼 are parameters that control the
amount of regularization and the proportion of the two subparts and 𝛽 are the variable coefficients.
8
method is the decision tree, which at the same time acts as the building block of all other tree-
based methods. Figure 4 depicts a simplified decision tree trained for house price prediction. It
consists of nodes at which the tree splits depending on the value of a certain predictor variable.
Decision trees typically contain multiple layers of nodes, so they implicitly consider interactions
between multiple variables. When the tree reaches a leaf node, that is, a node after which there
is no further split, the tree returns a prediction value. For more details on decision trees and how
to build them algorithmically from training data, see, for example, Loh (2011). Given that we can
observe the relevant predictor variables and thresholds in the splits, decision trees are character-
Random forests combine multiple decision trees (Breiman, 2001). More specifically, the random
forest method repeatedly draws bootstrap samples from the given data and builds a separate
decision tree from each sample. The prediction of a random forest is then the average prediction
value of the different trees. Random forests typically achieve much higher prediction performance
9
Figure 4. Illustrative depiction of a decision tree trained for house price prediction
This figure depicts a simplified version of a decision tree trained for house price prediction. Nodes repre-
sent splits according to the value of a certain predictor variable. Trained decision trees typically consist
of multiple layers, so they implicitly consider variable interactions. At leaf nodes, the decision tree returns
a prediction value. Decision trees in practice usually consist of many more layers than are shown in this
illustrative example.
Boosted regression trees extend the concept of random forests to further improve their prediction
performance (Hastie, Tibshirani, and Friedman, 2009, pp. 353-358). Instead of combining many
independent decision trees, the boosted regression tree method builds the trees iteratively and
considers which observations the previous trees could not predict well. Boosted regression trees
typically not only outperform random forests but often are among the winning algorithms in data
While tree-based ML methods and, in particular, boosted regression trees achieve state-of-the-art
prediction performance with numerical data, neural networks excel with unconventional data such
as text, images, or videos. Figure 5 depicts a small feed-forward neural network. A neural network
consists of neurons with links among each other and arranged in layers (Hastie, Tibshirani, and,
Friedman, 2009, pp. 389-415). In their most basic version, each neuron can be thought of as a
single linear regression model combined with a nonlinear “activation” function. The links describe
the flow of data between the neurons. First, a neural network’s input layer receives the predictor
variables, for instance, pixel-level image data. Then, the hidden layers process the data and deliver
them to the output layer, which returns the final prediction value. The neural network in our
example is a highly simplified version of the neural networks used in practice. Neural networks
for real applications are much larger, with many hidden layers and millions of neurons and links.
10
Furthermore, they do not have to be fully connected, so not every neuron of a layer necessarily
Our exemplary neural network uses a simple feed-forward architecture, which means that the
neurons come in their most basic variant and that no backlinks exist; thus, data simply flows from
left to right. More advanced neural networks employ more complex neurons and architectures.
Recurrent neural networks (RNNs) are designed for sequential data such as text (Medsker and
Jain, 1999). The hidden layer neurons in RNNs have an additional memory feature that allows
them to accumulate information over multiple related observations (for instance, words in a sen-
tence). There are different possibilities for how to exactly design the memory feature in the neu-
rons. Widespread design examples are gated recurrent units (GRU) and long short-term memory
(LSTM).
Convolutional neural networks (CNNs) are even more complex neural networks whose general
architecture fits well with visual data such as images and videos (Albawi, Mohammed, and Al-
Zawi, 2017). Simply put, their hidden layers represent trainable filters that iteratively detect
11
increasingly complex structures. The architecture of CNNs is typically highly customized towards
a specific application. Adequately designed CNNs show outstanding performance for tasks such
Due to their high complexity, neural networks are inherently hard to interpret. In general, we can
infer very little information from the hidden layers, which represent the learned knowledge of a
neural network. Improving the interpretability of neural networks is subject to ongoing research
in computer science.
In addition to the methods just discussed, there are older methods that typically achieve worse
prediction performance and/or provide lower interpretability than newer methods. Since many
early studies that applied ML in financial economics used these methods extensively, we also
A widespread example is the naïve Bayes method (Rish, 2001), which uses Bayes’ theorem to
classify observations into categories. For instance, we might want to classify loan applications as
accept or reject. Naïve Bayes then calculates the probability for each possible classification cate-
gory of a new loan application (accept or reject) conditional on the given training data. Its final
classification decision is the category with the highest probability. Modern methods such as
boosted regression trees typically outperform naïve Bayes by a wide margin, so it is much less
Methods based on the support vector machine (SVM) are also common among older studies
(Hastie, Tibshirani, and Friedman, 2009, pp. 417-455). In support vector classification (SVC), we
separate observations in different classification categories (for instance, positive and negative ex-
amples) with a hyperplane, which is effectively just a multidimensional line. We position the
hyperplane between the training examples in such a way that the margin between examples of
different categories is maximized. The hyperplane then allows us to classify new examples de-
pending on which side of the hyperplane they lie. Support vector regression (SVR) extends the
idea of SVM to regression problems, that is, predictions of continuous values instead of categories.
In general, SVM-based methods provide lower prediction performance than newer methods such
as boosted regression trees. Hence, we also see them less and less often in current studies.
12
2.2 Unsupervised Learning
The purpose of unsupervised learning is data structure inference. Since data structure subsumes
many different kinds of information, we divide the methods of unsupervised learning into different
categories. Figure 6 gives an overview of the most important categories in unsupervised learning.
There are two major categories: clustering and dimensionality reduction. Further categories in-
clude association rule mining, outlier detection, and synthetic data generation.
The first major category of unsupervised learning is clustering. Methods for clustering group the
given observations in a way that results in high within-group similarity and low cross-group sim-
ilarity. There exist various kinds of methods for clustering. Centroid-based methods form clusters
around centroids. After the initial positioning of the centroids, they iteratively update their posi-
tion to arrive at suitable clusters. A common example of a very early but still heavily used cen-
troid-based method is K-means (MacQueen, 1967). Density-based methods build clusters depend-
ing on the differing density in the space of observations. Simply put, they group observations with
many similar observations nearby into clusters. An example of a density-based clustering method
is DBSCAN from Ester et al. (1996), which is also one of the most famous clustering methods.
Distribution-based methods assign observations to clusters based on whether they likely belong
13
to the same statistical distribution. Hence, they require us to know the distribution of the under-
lying data process in advance. For normally distributed data, Gaussian mixture models are wide-
spread (Rasmussen, 1999). Finally, hierarchical methods construct clusters that consider the hi-
erarchical relationship in the data. They start with initial clusters, where each cluster consists of
a single observation. Then, they iteratively combine smaller clusters into larger clusters to build
a hierarchy. A common method for hierarchical clustering is BIRCH (Zhang, Ramakrishnan, and
Livny, 1996). While the method classes just discussed are most common for clustering, it should
be noted that there are additional but much less often used methods.
The second major category is dimensionality reduction. Methods in this category try to increase
the information density of the given data by decreasing their dimensionality while retaining most
of the inherent information. There are various methods for dimensionality reduction; we cover
only the two most common ones. First, methods based on principal component analysis (PCA)
derive linear combinations of the original variables (“principal components”) that cover as much
of the data’s variance as possible. While the basic variant of PCA is inherently linear, nonlinear
generalizations also exist. For more details on the different PCA-based methods, see, for instance,
Hastie, Tibshirani, and Friedman (2009, pp. 534-552). Second, methods based on neural networks
reduce dimensionality with special architectures. A widely used method is the autoencoder neural
network (Goodfellow, Bengio, and Courville, 2016, pp. 499-523). An autoencoder consists of an
encoder network that creates a condensed representation of the input data and a subsequent
decoder network that reconstructs the original data from the condensed representation. A special
bottleneck layer connects the encoder and decoder networks to train them on given data. If the
autoencoder is able to reconstruct the original data well, then the condensed data representation
in the bottleneck layer has successfully retained most of the information in the data while reducing
its dimensionality.
In addition to the two major categories, there are minor categories of unsupervised ML methods.
Association rule mining tries to identify relations between variables (Agrawal, Imieliński, and
Swami, 1993). For instance, it can learn from customer purchase data which products are often
bought together. The identified relations can often directly affect decision making. Outlier detec-
tion methods try to find observations that substantially differ from the remaining data. While
many traditional methods for outlier detection exist, ML-based methods often provide superior
14
performance, especially in high-dimensional settings (Domingues et al., 2018). In synthetic data
generation, we try to generate new data that satisfies certain requirements. Generative adversarial
networks, for instance, use neural networks to create new, synthetic data that closely mimics the
given training data (Goodfellow et al., 2020). Their neural network architecture makes them
especially useful for unconventional data, for example, to create artificial images that are similar
to existing images. While the categories just discussed are the most common ones in unsupervised
learning, it should be noted that there are even more but less commonly used categories and
methods.
In addition to the two major types of ML, supervised and unsupervised learning, there are other
types of ML, which we briefly cover here. Figure 7 gives an overview of these additional types of
ML. One of the more common types is reinforcement learning, which is suitable for sequential
decision problems with a long-term goal. Such problems are common, for instance, in robotics
(Sutton and Barto, 2018). We usually model such problems with a Markov decision process that
consists of an environment and an agent whose actions change the environment and bring rewards.
Reinforcement learning methods then try to find a policy for the agent to maximize the expected
Semi-supervised learning combines supervised and unsupervised learning. It aims for prediction
as in supervised learning, but it uses data in a way more similar to unsupervised learning (Zhu,
2005). More specifically, the training data in semi-supervised learning consists of few labeled
examples and many unlabeled examples. While we cannot directly train a prediction model with
unlabeled data, we can still obtain information about the probability distribution of the data.
Active learning is a closely related variant of semi-supervised learning (Settles, 2009). It is useful
in cases where obtaining additional labels is costly, for example, because an expert has to manually
assign labels to the examples. In active learning, we first train the model with the already labeled
examples. Then, we can calculate an importance score for each unlabeled example based on re-
latedness to other examples and prediction uncertainty. Finally, the expert must label only those
observations that help improve our prediction model the most, which can vastly reduce costs.
15
Figure 7. Overview of reinforcement learning and other types of ML
This figure gives an overview of types of ML other than supervised and unsupervised learning. Among
these other ML types, reinforcement learning and semi-supervised learning and are two of the more
common. Less common types include deductive learning, federated learning, and genetic algorithms.
In addition to these more common ML types, reinforcement learning and semi-supervised learning,
there are other types of ML. In deductive learning, we try to algorithmically infer valid logical
statements from other logical statements. Applications for deductive learning can be found, for
instance, in natural language processing (Cambria and White, 2014). In federated learning, we
train ML models across multiple machines that do not share the same data. Federated learning
offers benefits in special domains, for instance, if we need to preserve data privacy (Yang et al.,
2019). Finally, genetic algorithms use evolutionary principles such as mutation and selection mech-
anisms to derive optimal solutions to various kinds of problems (Mitchell, 1998). Practical appli-
study get published. However, many researchers are still unaware of how and where to apply ML
in the field of financial economics. In this section, we present a taxonomy of existing ML applica-
tions, which serves multiple purposes. First, it outlines where ML can add value in financial
field of financial economics. Third, it allows us to better understand new contributions and how
16
they relate to the existing literature. Finally, it guides researchers in discovering possible applica-
The workhorse model of financial economics research, linear regression with OLS, has one direct
phenomena. In contrast, ML provides us with predictions that minimize prediction error or infers
structural information from given data. Our taxonomy shows the categories in which these func-
We identify three archetypical applications of ML from the existing financial economics literature:
1) construction of superior and novel measures, 2) reduction of prediction error in economic pre-
diction problems, and 3) extension of the existing econometric toolset. Figure 8 illustrates our
taxonomy and depicts characteristic equations for each of the three application categories to
Studies in the first category use ML to construct a superior or novel measure for one of the
independent variables 𝑋. The main analysis still relies on a traditional linear model, which we
estimate with OLS. In the second category, studies use ML to reduce the prediction error of
17
predictions 𝑦̂ in economic prediction problems. Supervised ML methods achieve superior predic-
tion performance by using flexible functional forms 𝑓(∗) in the prediction model. Studies in the
third category use ML to extend the existing econometric toolset. ML methods either serve as
new econometric methods themselves or optimize some part of a traditional econometric method.
In the following subsections, we review the relevant literature for each of the three categories of
The first category of ML applications in financial economics is the construction of superior and
novel measures. Figure 9 shows the general methodological process of studies in this category.
The process starts with unconventional and nonnumerical data such as text, images, or videos.
To use the information from such data in econometric analyses, we need to construct a numerical
measure. For textual data, traditional approaches use word counts based on domain-specific dic-
tionaries.6 For image and video data, only human assessments have been available for a long time.
6
See Loughran and McDonald (2016) for an overview of mostly traditional text analytics methods in accounting and
finance.
18
ML-based approaches provide easier and, at the same time, more powerful access to the infor-
mation contained in unconventional data. All types of ML are applicable here. We can use pre-
dictions from supervised learning, data structure information from unsupervised learning, or re-
By applying ML to the given unconventional data, we can construct a superior or novel measure
for an economic variable. From the current literature, we identify three subcategories of typical
Finally, the superior or novel measure serves as an independent variable in the main analysis of
an economic relation. Using superior measures with lower measurement error than existing
measures reduces attenuation bias, so we obtain more precise estimates for the parameters de-
scribing an economic relation. Novel measures allow us to conduct new analyses with previously
unmeasurable economic aspects. In the main analysis, most studies that construct ML-based
measures apply traditional econometric methods such as linear regression with OLS.
Figure 10 presents an overview of the relevant studies that use ML to construct superior or novel
measures. In the following, we present them in three subcategories: 1) measures of market senti-
ment, 2) measures of corporate executives’ personality and decisions, and 3) measures of firm
Measures of market sentiment describe opinions and moods of market-participating agents. Most
studies in this subcategory construct measures of market sentiment from textual data. There are
sentiment from textual data. Loughran and McDonald (2011) present a dictionary approach to
derive market sentiment from financial texts. More specifically, they count negative words based
on a finance-specific word list. Dictionary approaches, however, miss the context of words within
a sentence (Loughran and McDonald, 2016). In contrast, flexible ML-based approaches can con-
sider not only the context of words within a sentence but also how different sentences interrelate
with each other. For an extensive review of sentiment with traditional econometric and ML-based
19
Figure 10. Overview of studies that use ML to construct superior and novel measures
This figure presents an overview of the relevant studies in financial economics that apply ML to construct
superior and novel measures. There are three main categories: measures of market sentiment, measures
of corporate executives’ personality and decisions, and measures of firm characteristics and corporate
policies.
Sentiment exists for many topics and is derived from many sources. In financial economics, our
interest lies in the sentiment of markets such as the stock market. There is, however, a large array
of potential sources. For instance, analyst reports are a common source of stock market sentiment.
Recently, novel sources of market sentiment, such as social media, have also become more perva-
sive. Kearney and Liu (2014) present different sources of textual sentiment in finance as well as
The most common ML-based measure of market sentiment in the literature relates to stock market
sentiment. Several studies construct a measure of stock market sentiment from social media.
Antweiler and Frank (2004) use the ML methods naïve Bayes and SVM to classify user posts on
the Yahoo Finance message board as positive or negative. Then, they aggregate their classifica-
tions to construct a measure of stock market sentiment. Renault (2017) similarly classifies user
posts on the finance-focused social network StockTwits to construct a measure of stock market
sentiment. Bartov, Faurel, and Mohanram (2017) derive stock market sentiment from user posts
on Twitter.
20
In addition to social media, news articles are another source of stock market sentiment. Barbon
et al. (2019) enhance the naïve Bayes method to build a stock market sentiment variable based
on firm-specific news. Ke, Kelly, and Xiu (2019) implement a customized ML-based approach that
specializes in extracting information that is relevant for stock returns. Their method then allows
them to extract a measure of stock market sentiment from Dow Jones Newswire articles.
Other studies use traditional analyst reports for stock market sentiment. For example, Huang,
Zang, and Zheng (2014) apply naïve Bayes to analyst reports to construct measures of stock
market sentiment.
The main analysis in all discussed studies concerns the effect of stock market sentiment on future
stock returns. Antweiler and Frank (2004) additionally analyze the effect on stock volatility. Re-
nault (2017) studies intraday stock index returns. Bartov, Faurel, and Mohanram (2017) focus on
returns around earnings announcements and study the effect of sentiment on quarterly earnings.
Huang, Zang, and Zheng (2014) additionally analyze sentiment’s effect on earnings growth. In the
study by Barbon et al. (2019), stock market sentiment only serves as a control variable in their
Manela and Moreira (2017) deviate from the traditional positive vs. negative market sentiment
measures. Instead, they construct a measure of stock market uncertainty from Wall Street Journal
front-page articles. Their novel measure allows them to analyze the effect of news on equity risk
premia. Vamossy (2020) measures investor emotions by extracting different emotional states from
StockTwits posts with textual analysis based on deep learning. He then studies the effect of
Slightly different from the stock market sentiment discussed above, Liew and Wang (2016) con-
struct a measure of pre-IPO sentiment. They use a commercial ML-based service to extract sen-
timent information from Twitter posts. Finally, they study the effect of pre-IPO sentiment on
While most studies that construct ML-based measures of market sentiment deal with stock market
sentiment, Tang (2018) examines product market sentiment. The study uses a commercial service
to create a measure of product and brand sentiment based on Twitter posts. The subsequent main
analysis then studies the effect of product market sentiment on firm sales.
21
3.1.2 Measures of Corporate Executives’ Personality and Decisions
The prominent role of a firm’s leadership has led to a vast amount of financial economics literature
that studies various aspects of corporate executives. Related to this stream of literature, ML can
help us construct superior and novel measures of executives’ personality and decisions. While
most measures in this subcategory are based on textual data, some studies construct measures
Multiple studies construct ML-based measures of executives’ personality. Gow et al. (2016) use
ML to extract a measure of CEO personality from the Q&A part of conference call transcripts.
The extracted measure then allows them to analyze the effect of personality on financing and
investment choices and on operating performance. Similarly, Hrazdil et al. (2020) measure CEO
and CFO personality traits from conference calls by using the commercial service IBM Watson
Personality Insights. Based on the extracted personality traits, they construct a measure of risk
tolerance to analyze the effect of executives’ risk tolerance on audit fees. Hsieh et al. (2020)
leverage recent advances in ML-based face detection technology to extract a measure of trustwor-
thiness from executives’ business headshot images. More specifically, they detect and use certain
facial features (for instance, eyebrow angle or face roundness) to predict perceived trustworthiness.
Their main analysis then studies the effect of executives’ trustworthiness on audit fees. Du et al.
(2019) analyze the personality of mutual fund managers. They use mutual fund managers’ letters
to shareholders to construct a measure of managers’ level of confidence. Their main analysis then
Rather than measuring executives’ personalities, several studies construct ML-based measures of
executives’ general decisions. Bandiera et al. (2020) use CEO survey data to construct a measure
of CEO behavior that captures whether CEOs perform more low-level or more high-level tasks.
In their main analysis, they use this measure to analyze the effect of CEO behavior on firm
performance. Barth, Mansouri, and Woebbeking (2020) study how executives withhold infor-
mation from shareholders. They create an ML-based measure of executives’ obstruction of infor-
mation from transcripts of earnings conference calls. More specifically, their measure captures how
executives answer questions in such calls. Finally, they analyze the effect of information obstruc-
tion by management on abnormal stock returns and implied volatility. Hu and Ma (2020) leverage
22
the fact that many venture capital firms have started to require startups to publish their appli-
cation pitch video on YouTube. The authors use a combination of different commercial services
on these publicly available pitch videos to construct measures of three distinct aspects: how the
founders speak, what they say, and how they present themselves visually. Finally, they analyze
the effect of the three dimensions of founders’ behavior on the probability of obtaining a venture
capital investment.
Studies in the third subcategory construct measures of firm characteristics and corporate policies
with ML methods. We can further distinguish these measures by the company type involved:
corporates or financials. In the literature on corporates , Li et al. (2020) extract aspects of corpo-
rate culture from conference call transcripts with ML and build measures of five different corpo-
rate culture values. Using these measures allows them to analyze the effect of corporate culture
on firm policies such as executive compensation and risk-taking. Furthermore, they study the
effect on firm performance metrics such as operational efficiency and firm value. Buehlmaier and
Whited (2018) apply ML to annual reports to construct a measure of financial constraints. Their
ML-based measure achieves superior performance compared to existing measures of financial con-
straints. Lowry, Michaely, and Volkova (2020) analyze firms’ communications with the SEC prior
to IPOs. They apply ML to extract the discussed topics from SEC letters and construct a measure
of regulatory IPO concern. Finally, they use their measure to study the effect of regulatory concern
on IPO outcomes.
In the literature on financials, Hanley and Hoberg (2019) construct a measure of the aggregate
risk exposure of the financial sector from individual banks’ annual reports by using a commercial
ML-based service. They use their measure to study the effect of financial sector risk on banks’
stock returns and volatility as well as bank failure. Bubna, Das, and Prabhala (2020) study ven-
ture capital syndications and create a measure of venture capital relatedness. More specifically,
they cluster venture capital firms using ML to identify syndication groups. Finally, they analyze
the effect of venture capital relatedness on startup maturation and innovation. Bertsch et al.
(2020) construct an ML-based measure of bank misconduct from customer complaint texts sent
to the regulator. They use their measure to study how bank misconduct affects online lending
demand.
23
3.2 Reduction of Prediction Error in Economic Prediction Problems
prediction error in economic prediction problems. While many problems in economics require
causal relationships between economic variables, some problems directly require prediction. ML
can reduce the prediction error in such problems, that is, generate more accurate predictions than
simpler approaches such as fitted values from linear regression with OLS.
Figure 11 shows the general methodological process of studies in this category. We can create
predictions based on numerical data as well as unconventional data such as text, images, or videos.
Since the purpose of ML in this category is to minimize prediction error in economic prediction
problems, only supervised ML is directly applicable here. Given the large number of available ML
methods, most studies use a multitude of different methods to assess which method works best
on the given data. For numerical data, regularized linear methods (LASSO, ridge, and elastic
net), tree-based methods (random forest and boosted regression trees) and SVM-based methods
(SVC and SVR) are most common. Unconventional data such as text, images, and videos require
more complex methods, so neural network-based methods such as deep learning models are most
common here.
Applying supervised ML methods then results in predictions for an economic variable, which
directly helps in solving an economic prediction problem. From the literature, we identify three
Finally, some studies also try to derive explanations from ML models and their predictions. ML
models are often known as black boxes; that is, they produce predictions, but we cannot directly
observe how the algorithm has generated them. The field of interpretable ML offers methods that
deliver explanations on how an ML model has derived its prediction results. There are three
permutation importance yield importance scores for the different predictor variables. For instance,
in predicting real estate prices, such methods can tell us whether the number of bedrooms or the
lot size is more important in the prediction of house prices. Second, feature dependence methods
(such as partial dependence plots) uncover the relations between predictor variables and the pre-
diction target. For instance, they can show the possibly nonlinear dependency between lot size
24
and house price. Third, single prediction analysis methods such as Shapley Additive Explanation
(SHAP) values disentangle the contribution of every predictor variable to a specific prediction
value. For instance, the SHAP value for the predictor variable swimming pool can tell us how
much the presence or absence of a swimming pool contributes to the price of a specific house.
Figure 11. General methodological process of studies that use ML to reduce prediction
error in economic prediction problems
This figure depicts the general methodological process of studies in the second category of ML applica-
tions in financial economics. Based on conventional or unconventional data, ML methods create predic-
tions for an economic prediction problem. Some studies also try to derive explanations from ML models
and their predictions.
Figure 12 gives an overview of the relevant studies that use ML in economic prediction problems
to reduce prediction error. In the following, we present these studies in the three subcategories of
1) prediction of capital market behavior, 2) prediction of credit risk, and 3) prediction of firm
In capital markets, we are most often interested in the future behavior of security prices. Hence,
capital markets provide many different kinds of prediction problems, in which ML can reduce the
prediction error. We can distinguish between predictions in five different areas of capital market
behavior: individual stock returns, the equity risk premium, the stochastic discount factor, option
25
Figure 12. Overview of studies that use ML in economic prediction problems
This figure presents an overview of relevant studies in financial economics that apply ML in economic
prediction problems to reduce prediction error. There are three main categories of economic prediction
problems in which ML is relevant: prediction of capital market behavior, prediction of credit risk, and
prediction of firm characteristics and corporate policies.
Most common in ML-based prediction of capital market behavior is the prediction of individual
stock returns, which is closely related to the field of cross-sectional asset pricing. Rasekhschaffe
and Jones (2019) provide an overview of the use of ML for predicting the cross-section of stock
returns and for the selection of individual stocks. Martin and Nagel (2019) additionally present
the challenges of cross-sectional asset pricing with high-dimensional data. Gu, Kelly, and Xiu
(2020) directly predict future stock returns based on firm characteristics, historical returns, and
macroeconomic indicators. They use ML methods with varying complexity, from regularized linear
26
models to neural networks. Furthermore, they analyze which predictor variables are most informa-
tive to predict the cross-section of stock returns. Rossi (2018) predicts future stock returns and
future volatility based on established predictor variables from Welch and Goyal (2008). The stud-
ies from Moritz and Zimmermann (2016), Kelly, Pruitt, and Su (2019), Gu, Kelly, and Xiu (2019),
and Freyberger, Neuhierl, and Weber (2020) all predict future stock returns based on firm char-
acteristics and historical returns. However, they differ with respect to the specific ML methods
they apply. Grammig et al. (2020) construct a hybrid approach that combines traditional methods
based on financial theory with ML to predict future excess stock returns. Chinco, Clark‐Joseph,
and Ye (2019) predict ultra-short-term future stock returns based on the cross-section of ultra-
short-term historical returns with LASSO. Amel-Zadeh et al. (2020) predict abnormal stock re-
turns around earnings announcements based on financial statement variables. They use LASSO,
random forests, and neural networks, and they analyze which financial statement variables are
most informative.
In addition to predictions of individual stock returns, ML can reduce the prediction error in
predicting the aggregate stock market behavior, particularly the equity risk premium. Jacobsen,
Jiang, and Zhang (2019) predict the equity risk premium based on established stock market
predictor variables from Welch and Goyal (2008) with an ensemble of multiple ML models.
Routledge (2019) predicts the equity risk premium from macroeconomic indicators and FOMC
texts. Adämmer and Schüssler (2020) extract topics discussed in general news articles with ML
Additionally, two studies directly predict the stochastic discount factor. Chen, Pelger, and Zhu
(2019) use generative adversarial networks based on deep neural networks on different predictors,
such as firm characteristics, historical returns, and macroeconomic indicators. Kozak, Nagel, and
Santosh (2018) develop a custom ML method based on Bayesian priors to predict the stochastic
The studies discussed above all predict future capital market behavior. However, predicting cur-
rent market prices can also be useful. In particular, the pricing of derivatives was a very early
application of ML in finance. Hutchinson, Lo, and Poggio (1994) predict option prices on the S&P
500 future based on the Black-Scholes variables with an early variant of neural networks. Similarly,
Yao, Li, and Tan (2000) predict option prices on the Nikkei 225 future. In more recent work,
27
Spiegeleer et al. (2018) find that ML methods can price derivatives much faster than advanced
Some studies use ML to predict aspects of capital markets behavior other than the aspects dis-
cussed above. Bianchi, Büchner, and Tamoni (2020) predict future excess returns of US treasury
bonds from general yield data and macroeconomic indicators. Reichenbacher, Schuster, and Uhrig‐
Homburg (2020) predict future bond liquidity using elastic net and random forests on bond trans-
action and characteristics data. Colombo, Forte, and Rossignoli (2019) predict the direction of
changes in exchange rates based on indicators of market uncertainty. Two studies focus on finan-
cial market volatility: Kogan et al. (2009) predict future stock volatility based on annual reports,
and Osterrieder et al. (2020) predict the intraday volatility index VIX from option prices. McInish
et al. (2019) focus on the market microstructure. More specifically, they predict the lifespan of
orders based on order characteristics and market data. Finally, Rossi and Utkus (2020) study
investors’ behavior and the effects of robo-advising. They apply boosted regression trees to predict
investors’ portfolio allocation and performance based on different investor characteristics. They
also study which characteristics are most important in the prediction and how exactly they affect
Credit risk is a typical economic prediction problem: we ultimately want to know which prospec-
tive borrowers will eventually default. As such, ML can help us lower prediction error and improve
decision making, for instance, in loan origination. We can divide the current literature on ML-
based predictions of credit risk into three areas: consumer credit risk, real estate credit risk, and
Studies on consumer credit risk apply ML to make default predictions for any type of consumer
credit. Albanesi and Vamossy (2019) study general consumer credit default. They use advanced
ML methods such as boosted regression trees and deep neural networks to derive more accurate
predictions from credit bureau data compared to standard credit scoring models. Furthermore,
they analyze which predictors are most relevant with feature importance methods and how the
different predictors affect the predictions with feature dependence methods. Khandani, Kim, and
Lo (2010) predict consumer credit card default based on transaction data and traditional credit
bureau data. Similarly, Butaru et al. (2016) predict credit card default but consider more general
28
account data and macroeconomic indicators. They both use tree-based ML methods that auto-
matically consider nonlinearities and interactions between predictor variables. Butaru et al. (2016)
also use feature importance methods to identify which predictor variables drive default predic-
tions. Björkegren and Grissen (2018, 2019) focus on bill payment and use random forests on
mobile phone metadata to predict the payment of consumer bills in developing countries. Being
able to make credit risk predictions based on easily obtainable data from mobile phones can help
unbanked people in developing countries without a credit score obtain access to loans. Slightly
different from the studies above, Gathergood et al. (2019) use credit card transaction data to
predict credit card repayment patterns. They predict not whether customers pay their credit card
bills but how customers split repayment on multiple cards with different interest rates. They also
apply various ML methods and analyze which predictors are most informative.
Whenever algorithm-based decisions affect people, algorithmic bias is a potential issue. Since ML-
based predictions in consumer credit risk directly affect credit approval decisions, we need to
ensure that the algorithm does not discriminate against people based on attributes such as gender
or race. The literature does not paint a uniform picture of whether ML reduces or increases bias
in consumer credit decisions. Rambachan et al. (2020a) and Rambachan et al. (2020b) argue that
discrimination by algorithms crucially depends on the given data. Since algorithms base their
decisions on the data on which they have been trained, they might propagate biases present in
the data. Fuster et al. (2020) apply ML to a concrete dataset to create an ML model for credit
decisions. They find that ML increases the disparity between and within different groups relative
rithms as soon as their predictions influence decisions that directly affect people, such as lending.
On the other hand, there are also studies showing that ML use can decrease bias in consumer
credit decisions. Based on a theoretical model, Philippon (2019) shows how algorithms can reduce
discrimination in credit markets. Dobbie et al. (2018) train an ML model to maximize expected
profit from credit applications and find that the resulting lending decisions eliminate bias. Klein-
berg et al. (2018) show that including problematic variables such as gender and race in ML models
can actually reduce discrimination. To conclude the discussion on algorithmic bias in consumer
29
credit risk, there is no uniform picture in the literature yet. Some studies find that using ML to
determine consumer credit risk increases bias, while other studies find that it decreases bias.
The second area of ML-based credit risk predictions, real estate credit risk, involves the risk of
mortgages and commercial real estate loans. Sirignano, Sadhwani, and Giesecke (2018) use deep
neural networks for the prediction of mortgage loan risk from mortgage origination and perfor-
mance data as well as macroeconomic indicators. They also analyze which predictor variables are
most important and how they affect predictions. Cowden, Fabozzi, and Nazemi (2019) use various
Corporate credit risk is another area in which ML can provide superior credit risk predictions.
Jones, Johnstone, and Wilson (2015) predict firms’ credit rating changes based on firm funda-
mentals, analyst forecasts, and macroeconomic indicators. Gündüz and Uhrig-Homburg (2011)
predict CDS prices as market-based indicators of credit risk based on observed CDS prices for
other time horizons and from other firms. Tian, Yu, and Guo (2015) directly predict corporate
bankruptcy from firms’ financial statements and market data by using the LASSO method.
Lahmiri and Bekiros (2019) similarly predict bankruptcy from firm fundamentals but additionally
include general risk indicators. They use more sophisticated neural networks. Croux et al. (2020)
apply LASSO to predict fintech loan default from loan and borrower characteristics as well as
macroeconomic indicators. In contrast to the above studies, Nazemi and Fabozzi (2018) focus on
the time after credit default and predict the recovery rates of corporate bonds based on bond and
Firm characteristics and corporate policies, as the fundamental subject of study in the field of
corporate finance, can also be the target of ML-based predictions. Depending on the specific
target of the prediction, we can divide the current literature in this category into three areas:
prediction of firm fundamentals, prediction of accounting fraud, and prediction of startups’ suc-
cess.
Two studies use ML to predict different firm fundamentals. Amini, Elmore, and Strauss (2019)
study firms’ capital structure as a typical problem in corporate finance. They predict corporate
leverage based on different hypothesized predictors from the literature (Frank and Goyal, 2009)
30
with various ML methods. Furthermore, they analyze which predictors are actually informative
for capital structure and how they influence predictions in detail. The study by Van Binsbergen,
Han, and Lopez-Lira (2020) applies ML to predict firms’ future earnings based on their accounting
Another typical prediction problem in the subcategory of firm characteristics and corporate poli-
cies is accounting fraud. While there are traditional approaches to predict accounting fraud (such
as the Beneish (1999) model for earnings manipulation), ML can provide superior prediction
accuracy. Bao et al. (2020) use boosted regression trees on raw financial statement variables to
predict accounting fraud. They find that the ML-based predictions outperform simpler existing
fraud models. Brown, Crowley, and Elliott (2020) apply ML to extract the topics discussed in
annual reports and use them to predict accounting fraud. They also employ feature importance
and feature dependence methods to further analyze which topics are most informative and how
Finally, studies in the field of entrepreneurial finance use ML to predict startups’ success. Xiang
et al. (2012) predict startup acquisitions based on firms’ fundamental data and firm-specific news.
Similarly, Ang, Chia, and Saghafian (2020) apply ML to predict startups’ valuations and their
probabilities of success.
Studies in the third category of ML applications extend the existing econometric toolset. Many
commonly used econometric methods contain a prediction component. For instance, the first stage
of instrumental variable regression with 2SLS is effectively a prediction problem, as only the fitted
(predicted) value of the instrumented variable enters the second stage. ML methods can provide
superior predictions and hence improve the capabilities of such econometric methods. On the
other hand, some ML methods already serve similar purposes as existing econometric methods.
For instance, clustering is a known problem in econometrics and in ML. ML-based methods often
provide superior performance, so they can directly extend the econometric toolset. Figure 13 gives
causal ML that uses ML for the estimation of treatment effects and other isolated applications of
ML in econometrics. Within the subcategory of causal ML, we can further divide the literature
31
into ML-enhanced methods for instrumental variable regression, the novel methods of causal trees
and causal forests, and other approaches related to causal ML. In the following, we briefly review
Figure 13. Overview of ML-based methods that extend the existing econometric toolset
This figure shows the different categories of ML-based methods that extend the existing econometric
toolset. The largest subcategory is causal ML for the estimation of treatment effects. ML enhances exist-
ing methods, such as instrumental variable regression, or introduces new methods, such as causal trees
and causal forests. ML also provides other methods relevant for the estimation of treatment effects, such
as verifying the balance between treatment and control groups. The second subcategory includes special
applications of ML in econometric approaches in addition to treatment effects, such as the generation of
simulated data.
3.3.1 Causal ML
While traditional econometric methods aim for causality, ML methods are designed for prediction
or for data structure inference. The field of causal ML tries to combine the advantages of both to
create superior econometric methods suitable for causality and especially for the estimation of
treatment effects. The most developed methods within causal ML are ML-enhanced instrumental
variable regression and the novel methods of causal trees and forests.
As noted before, ML can directly improve the first stage of instrumental variable regression. By
providing better predictions for the instrumented variable, the coefficient of determination R² of
the first stage improves, resulting in more precise estimates in the second stage. Concrete imple-
mentations of this idea already exist for different ML methods, including LASSO (Belloni et al.,
32
2012), ridge regression (Carrasco, 2012; Hansen and Kozbur, 2014), and neural networks (Hart-
ford et al., 2017). However, Angrist and Frandsen (2019) argue that ML-enhanced instrumental
variable methods might not be superior to existing specialized approaches in selecting instrumen-
tal variables.
For the estimation of treatment effects with ML, causal trees and causal forests are other well-
developed methods. The seminal work from Athey and Imbens (2016) introduced the causal tree
approach, which uses tree-based ML methods to partition data into subpopulations with different
magnitudes of treatment effects. Causal forests from Athey and Wager (2019) extend this concept
by using an entire ensemble of causal trees. There already exist studies that apply causal forests
to concrete problems in financial economics. Gulen, Jens, and Page (2020) apply causal forests to
estimate heterogeneous treatment effects of debt covenant violations on firms’ investment levels.
In addition to causal trees and causal forests, other approaches use ML to improve the estimation
of treatment effects. Lee, Lessler, and Stuart (2010) estimate the propensity score with ML. Mul-
lainathan and Spiess (2017) suggest the use of ML for verifying the balance between treatment
and control groups. They argue that if we can predict the treatment assignment with ML, then
the split into treatment and control group cannot be balanced. However, this idea works in only
one direction: we can infer imbalance but not balance from applying ML to predict the treatment
assignment. It is always possible that our chosen ML methods are just not powerful enough to
predict the treatment assignment of imbalanced data. Chernozhukov et al. (2017, 2018) directly
calculate treatment effects from ML-based predictions for treatment assignment and outcome.
Finally, Athey et al. (2019a) predict the counterfactual with ensemble methods to estimate treat-
While causal ML for the estimation of treatment effects is currently the most developed applica-
33
Above, we have presented how ML can create measures for economic variables. By generalizing
this concept, ML can also construct a predictability measure for entire economic theories.
Peysakhovich and Naecker (2017) introduce the notion that we can use ML to derive an upper
bound for the predictive power of theories: the explainable variation of the dependent variable
from a given dataset with ML methods. Fudenberg et al. (2019) extend this idea to construct a
completeness measure for economic theories. They calculate completeness by comparing two pre-
diction errors: the error achieved from using the model and variables hypothesized by an economic
theory and the error achievable with ML. In general, different datasets contain different levels of
those achievable with ML methods, we can create a fairer and more informative measure of com-
in loan performance data, actual defaults are much rarer than uneventful repayments. Sigrist and
Hirnschall (2019) combine ML with traditional econometric methods to address such problem
types. More specifically, they use boosted regression trees to enhance the traditional Tobit model.
They also illustrate the advantages of their method in a concrete problem by applying it to loan
defaults in Switzerland.
In the field of simulation, Athey et al. (2019b) use generative adversarial networks instead of
traditional Monte Carlo methods to simulate data that more closely mimics real data. They
illustrate their method by using simulated data for performance comparisons between different
econometric estimators.
Finally, Ludwig, Mullainathan, and Spiess (2019) introduce ML-augmented pre-analysis plans for
the avoidance of p-hacking. They augment standard linear regression with new regressors from
ML. The new regressors aggregate many potentially relevant variables into a single index. Hence,
their method avoids the otherwise necessary pre-specification of concrete analysis choices in stand-
34
4. Real Estate Price Prediction
Real estate is one of the most important asset classes in the economy. In the US, the total value
of real estate assets is comparable to the size of the equities and fixed income markets. For most
households, real estate is the greatest source of wealth. The Global Financial Crisis in 2007/08
exemplified how spillover effects from the real estate sector can destabilize economies around the
world.
In contrast to other asset classes, real estate assets show a high level of heterogeneity, which makes
real estate pricing challenging. The traditional approach to derive price estimates for individual
properties is hedonic pricing. In hedonic pricing, we first regress the property characteristics on
the observed property prices with OLS to obtain a linear pricing model. Then, we can use this
model to obtain price estimates for new, previously unobserved properties. We can also interpret
the regression coefficients as the characteristics’ shadow prices. However, hedonic pricing relies on
an inherently linear model and therefore does not directly consider nonlinearities and interaction
effects. For instance, we can assume relevant interactions between lot size and location: an addi-
tional m² in lot size for a property in a city center is likely worth more than in a suburb. While
we could manually add such specific effects to the linear model, there may exist a plethora of
unknown nonlinear and interaction effects. By ignoring these effects, the linear model of hedonic
pervised ML can potentially provide us with price predictions that exhibit lower pricing error
than the linear model from hedonic pricing. In this section, we study whether and how ML pro-
We use a unique dataset of real estate listing data in Germany to train different ML models for
the prediction of individual property prices and compare them with the linear OLS model from
hedonic pricing. Figure 14 shows the key result of our analysis. ML methods strongly improve the
accuracy of price predictions over the OLS baseline. Our best-performing ML method, boosted
regression trees, dramatically increases out-of-sample R2 to 77%, compared to 40% for OLS; thus,
it almost doubles the amount of explained price variation. On average, the predictions from
boosted regression trees deviate from the actual prices by approximately 27%, compared to 44%
for OLS. In monetary terms, the superior prediction performance of boosted regression trees
35
corresponds to an average pricing error of approximately 94,000 EUR, compared to 176,000 EUR
for OLS. Since the mean property price in our sample is 393,000 EUR, the improvements in
pricing accuracy from ML are not only statistically significant but also economically large.
Figure 14. Prediction performance of hedonic pricing (OLS) and different ML methods
This figure compares the prediction performance (R²) of traditional hedonic pricing (OLS) with different
ML methods. While most ML methods outperform OLS, the boosted regression trees method performs
best by far and almost doubles the OLS performance.
While the improvements in pricing accuracy induced by ML are already impressive on average,
their benefits become even more pronounced at the upper end of the price range. Figure 15 depicts
the prediction performance of the best-performing ML method, boosted regression trees, com-
pared to that of OLS in the five property price quintiles. The boosted regression trees method
outperforms OLS in all quintiles. While OLS performs worst at the extremes of the price range,
ML is especially useful in reducing the pricing error for the most expensive properties. In the
highest price quintile, the boosted regression trees method lowers the average pricing error to
24%, compared to 50% for OLS. In monetary units, the superior prediction performance of
boosted regression trees relative to that of OLS corresponds to a reduction of the average pricing
error by more than 240,000 EUR in the highest price quintile. Given that the average property
price in the top quintile is approximately 884,000 EUR, the improvements in pricing power from
ML are dramatic. Our results indicate that nonlinearities and interaction effects are relevant in
real estate pricing and especially important for the most expensive properties.
36
Existing research on real estate pricing almost exclusively focuses on the aggregate real estate
market instead of individual properties. Ghysels et al. (2013) forecast the aggregate price level in
the real estate market with mathematical forecasting methods such as GARCH or ARMA models.
Other studies investigate the relationship between aggregate real estate price levels and stock
prices (Quan and Titman, 1999), banks’ mortgage lending behavior (Hott, 2011), or subprime
lending (Pavlov and Wachter, 2011). The few studies that analyze individual property prices tend
to focus on the effect of specific influencing factors such as environmental variables (Din, Hoesli,
and Bender, 2001), spatial factors (Bourassa, Cantoni, and Hoesli, 2007), or climate change (Bal-
dauf, Garlappi, and Yannelis, 2020). Closest to our work is the study by Park and Bae (2015),
Figure 15. Average pricing error of the best-performing ML method, boosted regression
trees, compared to that of OLS in the five property price quintiles
This figure shows the average pricing error (measured by mean absolute error (MAE)) for the best-
performing ML method, boosted regression trees, and for the OLS baseline in the five price quintiles. In
all quintiles, the boosted regression trees method significantly outperforms OLS. The reduction in pricing
error from ML is most pronounced in the highest price quintile, where OLS performs relatively poorly.
Our contribution in this section is threefold. First, we demonstrate the benefit of using ML to
reduce the prediction error in economic prediction problems. We present evidence that ML can
yield a statistically and economically significant reduction in prediction error compared to tradi-
tional linear regression with OLS in addressing the problem of real estate price prediction. By
using the most common ML methods and relevant metrics, we further illustrate how exactly
37
researchers can apply ML to solve such problems. Hence, our analysis can serve as a blueprint for
Second, we identify that nonlinearities and interaction effects are highly relevant in real estate
pricing. ML methods that automatically consider such effects significantly outperform the linear
OLS method traditionally used in hedonic pricing. We further find that the pricing effect of
nonlinearities and interaction effects is most pronounced at the upper end of the property price
range. While linear OLS struggles in the pricing of expensive properties, ML achieves much lower
Third, we provide researchers as well as practitioners with a real estate pricing model that yields
much more accurate price predictions than traditional methods. Researchers can use the predic-
tions from our model to impute prices at the individual property level for dates at which no
observed prices exist. We therefore facilitate subsequent studies that require estimates for indi-
vidual property prices, for instance, to study real estate price behavior or to use as control varia-
bles. For practitioners, multiple usage scenarios exist. Realtors and prospective sellers can use our
pricing model to obtain an immediate estimate of a reasonable price for the property they are
selling. Hence, they may be able to sell faster and avoid selling at a too low price. Our model can
also help prospective buyers assess whether a certain offer is reasonably priced. Finally, home-
owners can use our model to obtain more precise estimates for the value of their properties and
subsequently their net worth, for instance, to make more informed consumption decisions.
This section is organized as follows. In Subsection 4.1, we describe our dataset and our variable
construction process as well as the specific ML methods that we use. Subsection 4.2 then presents
our prediction results and their interpretation in detail. Subsection 4.3 concludes the section and
In contrast to other asset classes such as equities or fixed income, no regular market prices exist
in the real estate market. In many cases, transaction prices are also not publicly available, as they
are the result of private negotiations. Instead, we often have to rely on list prices to study real
estate price behavior. List prices are set by sellers or realtors to attract potential buyers and then
merely serve as a starting point for the subsequent negotiation process. As such, list prices deviate
38
from realized transaction prices. Empirical evidence from various real estate markets, however,
shows that the deviations between list and transaction prices are relatively small: on average, list
prices overestimate transaction prices by less than 10% (Yavas and Yang, 1995; Palmon, Smith,
and Sopranzetti, 2004; Haurin et al., 2010). Hence, we work with the assumption that, especially
over a longer time period, the bias from using list prices instead of transaction prices is negligible.
We construct our sample based on a unique, proprietary dataset from a specialized German real
estate data provider. The dataset consists of a comprehensive collection of detailed real estate
listings for the entire German residential market from five German property portals and major
newspapers between January 2000 and September 2020. We restrict our analyses to single-family
houses, as they are the most common property type in the dataset. Table A1 in the Appendix
gives an overview of the available variables. For our sample, we first eliminate observations with
missing values for any continuous variable. To reduce the influence of outliers and data errors, we
then truncate all continuous variables at the 0.01st and 99.99th percentiles.7 Our sample construc-
For the prediction of real estate prices, we follow the literature (for example, Mullainathan and
Spiess, 2017) and choose the natural logarithm of the list price as our actual prediction target.8
In our main specification, we use the most relevant factors influencing real estate prices from the
given variables. Physical attribute variables describe the characteristics of the property: general
house type, size, number of rooms, lot size, and construction year. As a macro location variable,
we use a property’s county to describe its approximate location within Germany.9 The granular
location variables horizontal geocoordinates and vertical geocoordinates capture a property’s loca-
tion more precisely. Note that they describe not a property’s exact location but the approximate
center of the city district implied by the property’s zip code. They still capture, for instance,
whether a property is located in a city center or in a suburb. Finally, offer variables describe offer-
7
Given the huge size of our dataset and the already high data quality, we choose relatively small outlier percentiles.
Using different percentiles has only a minor effect on the results.
8
In unreported analyses, we use price or price per square meter as the prediction target. These alternative specifications
produce qualitatively similar results but slightly lower prediction performance.
9
In unreported analyses, we use city and state, in addition to and instead of county, to capture macro location effects.
Prediction performance and conclusion are virtually unchanged.
39
specific features: offer year captures time trends and price-level effects, online listing indicates
whether the sale offer is listed on an online platform, and seller type describes who is selling the
property (realtor, developer, or private owner). We include all categorical variables as dummy
variables in our specification and finally arrive at 388 predictor variables. We also tested an alter-
native specification with a set of additional property characteristics, such as the number of bal-
conies or bathrooms (see Table A1 in the Appendix), which are available for only a limited subset
of our sample. The results were qualitatively similar to those of the main specification.
To accurately assess and compare the out-of-sample prediction performance of different prediction
methods, we divide the sample into two subsamples: training data and test data (also called hold-
out data). We train our prediction models on the training data and subsequently determine their
prediction performance on the test data. Since the algorithm has not seen the test data before,
prediction performance. Many studies that use cross-sectional data assign the dataset’s observa-
tions into training and test data at random. However, our data exhibits a time component. A
random assignment would imply that our ML models can learn from future information (look-
ahead bias): for instance, we would train on some observations from 2020 to predict prices from
2000. Hence, our measured prediction performance would be biased upwards. To avoid this issue,
we split our sample into disjoint time periods. We hereby follow common practice from panel
studies that also have to consider the temporal order in the data (for example, Gu, Kelly, and
Xiu, 2020). We assign observations from 2000 to 2019 as training data and observations from 2020
as test data. Thereby, we take the standpoint of a practitioner who uses all historical data to
learn the pricing mechanism and predicts prices for the most recent observations. We use sample
weights to take into account that the observations from 2019 are more informative for price pre-
dictions in 2020 than observations from 2000. More specifically, we weight the training data line-
arly depending on the offer year: observations from 2000 have a weight of 1, while observations
10
In unreported analyses, we also use alternative weighting schemes such as hyperbolic weighting. The results remain
qualitatively unchanged.
40
Table 1 shows summary statistics for the continuous variables in the total sample, the training
sample, and the test sample. The differences in all variables other than list price are negligible.
List prices in the test sample, which covers the most recent observations from 2020, are much
higher than those in the training sample and total sample, which cover previous years, as a result
of price-level effects. We account for such price-level effects by having the offer year variable in
our specification and, as discussed above, by using year-dependent weights in the training data.
Table 1. Summary statistics for the continuous variables of the total sample and the training
and test samples
This table reports summary statistics for the continuous variables in our three samples: the total sample
of all observations, the training sample on which we train our prediction models, and the test sample on
which we evaluate prediction performance. The training sample consists of observations from 2000 to
2019, while the test sample covers 2020.
To predict real estate prices, we apply linear regression with OLS (traditional hedonic regression)
and various supervised ML methods. The pricing performance of the OLS estimates serves as our
baseline against which we compare the performance of the different ML methods. We choose
different classes of supervised ML methods that are widespread in the current literature and
traditional OLS but introduces bias to potentially improve prediction performance. We apply the
most common methods of regularized linear regression: LASSO, ridge, and elastic net. Tree-based
41
methods are especially well suited for capturing nonlinearities and interaction effects. We also
apply the most common of these methods: decision tree, random forest, and boosted regression
trees. Finally, we leverage the common ensemble learning concept and build an ensemble model
that returns the unweighted average of all other models’ predictions.11 We derive suitable hy-
perparameters for each ML model by using fivefold cross-validation.12 For a detailed description
In addition to the abovementioned methods, there are many more ML methods to make predic-
tions. Currently, a very popular ML method is deep learning with neural networks. Neural net-
works, however, do not perform particularly well for pure prediction based on original numerical
data. Instead, they are the method of choice for unconventional data such as images, videos, or
text (see Section 2). In an unreported analysis, we nevertheless trained a basic feed-forward neural
network on our data. As expected, it not only achieved worse prediction performance compared
to the other methods but also required much higher computational effort.
Various metrics exist to assess prediction performance: R2, mean squared error, mean absolute
error, etc. Since R² is also a common metric in many empirical studies in economics and hence
mance. The different methods’ prediction performance on the test data is most meaningful in
assessing the expected out-of-sample prediction performance. To derive 95% confidence intervals
for the test data performance, we follow Mullainathan and Spiess (2017) and use bootstrap sam-
pling with fixed prediction functions (see their Online Appendix for a more detailed description
of the method). We further calculate the relative improvement of each method over the OLS
baseline by quintile of property price (based on mean squared error). In addition to reporting the
11
In an unreported analysis, we also build a more complex ensemble model that uses a weighted average of the other
models’ predictions. We follow the linear regression approach from Mullainathan and Spiess (2017) to derive the optimal
weights. The complex ensemble model puts a large weight on the boosted regression tree method and hence performs
very similarly.
12
Common practice in literature is five- to tenfold cross-validation. Given our large dataset and the resulting long
computation times, we choose the computationally less demanding fivefold cross-validation.
42
test data metrics, we report the performance of each method on the training data (in-sample) to
allow comparisons with traditional studies and to illustrate the amount of overfitting.13
Table 2 shows the prediction performance of the OLS baseline and the different ML methods. The
table reveals five main results. First, the prediction performance on the test data is lower than
that on the training data for every method. This observation illustrates the effect of overfitting:
prediction models can closely fit the given training data by picking up noise, so their performance
is lower for test data that the algorithm has not seen before. Thus, our results highlight the well-
established fact that the (in-sample) performance of prediction models on training data is upwards
biased, so we need to evaluate them on held-out test data to derive unbiased estimates for out-
13
Overfitting refers to the phenomenon that complex prediction methods can flexibly adapt to the given data and
possibly pick up noise that does not generalize beyond the training data.
43
Second, every ML method outperforms the OLS baseline on average, and more complex ML
models achieve higher prediction performance on the test data. Regularized linear regression
methods (LASSO, ridge, elastic net) and the simple decision tree method achieve only modest
improvements over the OLS baseline. More complex methods (random forest, boosted regression
trees, ensemble), however, can strongly improve the prediction performance. The performance
ranking of the different methods is also in line with typical expectations (see Figure 3 in Section
2). Our results strongly indicate that the nonlinearities and interaction effects captured by more
Third, most ML methods do not outperform OLS in every price quintile. Especially in the lowest
quintile, only the boosted regression trees method and the ensemble model achieve superior pre-
diction performance. Hence, we need complex ML methods to achieve not only maximal perfor-
mance on average but also consistent outperformance relative to OLS over the entire price range.
Fourth, the boosted regression trees method outperforms the OLS baseline and every other ML
method on average as well as in every price quintile. Given that boosted regression trees is a
highly optimized ML method that captures complex nonlinearities and interaction effects, this
result further strengthens our previous indication that nonlinear effects are relevant for real estate
pricing.
Fifth, the outperformance of our best-performing ML method, boosted regression trees, monoton-
ically increases by price quintile. For low-priced properties, the improvement induced by ML is
relatively modest even with the best ML method. For high-priced properties, however, the pre-
diction performance of ML dramatically improves over that of OLS. Hence, our results indicate
that nonlinearities and interaction effects are most relevant for properties at the upper end of the
price range.
Having established that advanced ML models outperform OLS in real estate price prediction, we
now analyze the economic magnitude of our findings. To make statements about economic rele-
vance, R² values are less suited. Instead, we use metrics that are more interpretable. First, the
mean absolute percentage error (MAPE) quantifies by what percent a model’s predictions deviate
from the actual prices on average. Based on the MAPE, we calculate the improvement of each
ML model over the OLS baseline on average and per price quintile, and we report their statistical
44
Table 3. Improvements in prediction accuracy for different ML methods
This table shows the MAPE values (in %) for OLS and different ML methods as well as the improvements
over the OLS baseline on average and by quintile of property price. The numbers in brackets show the
respective t-values.
The results from using the MAPE metric are consistent with those from using R². Overly simple
methods (LASSO, ridge, elastic net, and decision tree) achieve a statistically significant but not
economically meaningful improvement over the OLS baseline, with a maximum improvement of
3.6 percentage points. The more complex methods perform much better. The improvements in
pricing accuracy from the best-performing ML method, boosted regression trees, are not only
statistically significant but also economically large. On average, we achieve a pricing error of
26.8% with boosted regression trees compared to 43.6% with OLS. While the average reduction
in pricing error by 16.8 percentage points is already highly meaningful, the improvements from
ML become even larger at the upper end of the price range. In the highest price quintile, boosted
regression trees reduce the average pricing error by 26.2 percentage points. Hence, complex ML
methods, especially the boosted regression trees method, yield price predictions with much higher
accuracy than the OLS approach from hedonic pricing by considering nonlinearities and interac-
tion effects.
45
To quantify the value of superior prediction performance in monetary units (EUR), we use the
mean absolute error (MAE) metric calculated from the different methods’ price predictions.
Again, we also determine the improvement of each ML method over the OLS baseline on average
and per price quintile and report their statistical significance. Table 4 shows our results. While
the OLS estimates exhibit an average pricing error of over 176,000 EUR, the boosted regression
trees predictions lower the error to approximately 94,000 EUR. Given that the average property
price in the sample is 393,000 EUR, the reduction in pricing error by more than 82,000 EUR is
economically very large. In the highest price quintile, boosted regression trees reduce the average
pricing error by more than 240,000 EUR for an average property price of approximately 884,000
EUR. Hence, we conclude that ML, especially the boosted regression trees method, is able to
reduce the prediction error in real estate pricing in a statistically significant and economically
meaningful way.
46
4.3 Concluding Remarks and Outlook
In this section, we applied state-of-the art ML methods to predict real estate prices based on
property characteristics, location, and offer details. As our data source, we used a proprietary set
of real estate listings in Germany from various online and offline sources. While simple methods
such as LASSO or decision tree already perform superior to traditional hedonic pricing with OLS,
we found that the more complex boosted regression trees method yields much lower pricing errors.
While the average pricing error is almost 44% for OLS, boosted regression trees lowers that value
to less than 27%. In monetary units, the improved pricing accuracy corresponds to a reduction in
pricing error by approximately 82,000 EUR for an average property price of 393,000 EUR. We
infer that nonlinearities and interaction effects captured by complex ML methods are relevant for
real estate pricing. They become even more important at the upper end of the price range: in the
highest price quintile, ML reduces the average pricing error by more than 240,000 EUR for an
The biggest limitation of our approach is the reliance on listing data instead of transaction data.
List prices often serve as a mere starting point in the subsequent price negotiation. Depending on
the state of the market at the time of selling, the final transaction price might be higher or lower.
Furthermore, it is possible that certain listed properties do not get sold at all. Empirical evidence,
however, indicates that the differences between list prices and transaction prices are rather small
on average. Nevertheless, future studies might look into repeating our prediction exercise with
Future research might also consider integrating further data sources to enhance prediction perfor-
mance. For instance, macroeconomic data such as GDP or inflation data could provide additional
information that is relevant for real estate prices but is not yet included in our dataset.
Another future research avenue is model interpretation. We only predicted prices without analyz-
ing how our ML models arrive at their predictions and which influencing factors are most im-
portant. To identify relevant predictor variables, feature importance methods such as permutation
importance have become common. However, most feature importance methods produce highly
misleading results, especially if there are strong dependencies between the predictor variables
(Hooker and Mentch, 2019). As we cannot rule out relevant dependencies between our real estate
47
variables (for instance, size and number of rooms are highly correlated), specialized methods such
Finally, it might be interesting to see whether the large benefits of using ML over using OLS for
real estate price prediction also hold in countries other than Germany.
48
6. Conclusion
In this paper, we studied the question of how researchers can leverage ML technology in financial
economics. First, we identified that different types of ML solve different problems than traditional
linear regression with OLS does. While the properties of OLS are beneficial for explanation prob-
lems, supervised ML is the superior method for prediction problems. The other major type of
ML, unsupervised learning, deals with data structure inference. We also briefly covered other, less
In the second part of this paper, we identified the three main application categories of ML in
error in economic prediction problems, and 3) extension of the existing econometric toolset. For
each application category in our taxonomy, we further identified subcategories and reviewed the
existing literature.
In the third part, we applied ML in typical prediction problems in economics: real estate price
prediction and credit risk prediction. In real estate price prediction, we compared different ML
methods with OLS and found that the ML-based price predictions achieve dramatically lower
pricing errors than OLS. We also identified that nonlinearities and interaction effects are highly
relevant for real estate pricing, especially at the upper end of the price range. Our analysis can
further serve as a blueprint for studies that want to apply ML to other economic prediction
problems.
Given its already successful applications in practice, we expect ML to become much more wide-
spread in financial economics research in the coming years. ML is likely to contribute to all three
application categories identified above. In constructing superior and novel measures, ML can im-
prove traditional measures that do not work well, create novel measures from traditional datasets
already used in financial economics, or create novel measures from entirely new data sources
outside of traditional financial economics research. To use ML for the reduction of prediction error
in economic prediction problems, the identification of suitable prediction problems is crucial. Fi-
nally, multiple ML-enhanced econometric methods are already available for application in tradi-
tional and novel problems in financial economics and beyond. We hope that this paper can serve
as a guide for researchers who want to apply ML in financial economics and conduct some of the
49
References
Adämmer, P. and Schüssler, R.A., 2020. Forecasting the Equity Premium: Mind the News! Review
Agrawal, R., Imieliński, T., and Swami, A., 1993. Mining Association Rules between Sets of Items
in Large Databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Man-
Albanesi, S. and Vamossy, D.F., 2019. Predicting Consumer Default: A Deep Learning Approach.
Albawi, S., Mohammed, T.A., and Al-Zawi, S., 2017. Understanding of a Convolutional Neural
Algaba, A., Ardia, D., Bluteau, K., Borms, S., and Boudt, K., 2020. Econometrics Meets Senti-
ment: An Overview of Methodology and Applications. Journal of Economic Surveys 34, 512–47.
Amel-Zadeh, A., Calliess, J.-P., Kaiser, D., and Roberts, S., 2020. Machine Learning-Based Fi-
nancial Statement Analysis. SSRN Working Paper No. 3520684. Social Science Research Network.
Amini, S., Elmore, R., and Strauss, J., 2019. Can Machines Learn Capital Structure? SSRN
Ang, Y.Q., Chia, A., and Saghafian, S., 2020. Using Machine Learning to Demystify Startups
Funding, Post-Money Valuation, and Success. HKS Working Paper No. RWP20-028. Harvard
Kennedy School.
Angrist, J. and Frandsen, B., 2019. Machine Labor. NBER Working Paper No. 26584. National
Antweiler, W. and Frank, M.Z., 2004. Is All That Talk Just Noise? The Information Content of
Athey, S., Bayati, M., Imbens, G.W., and Qu, Z., 2019. Ensemble Methods for Causal Effects in
Athey, S. and Imbens, G.W., 2016. Recursive Partitioning for Heterogeneous Causal Effects. In
50
Athey, S. and Imbens G.W., 2019. Machine Learning Methods That Economists Should Know
Athey, S., Imbens, G.W., Metzger J., and Munro E.M., 2019. Using Wasserstein Generative Ad-
versarial Networks for the Design of Monte Carlo Simulations. NBER Working Paper No. 26566.
Athey, S. and Wager, S., 2019. Estimating Treatment Effects with Causal Forests: An Application.
Baldauf, M., Garlappi, L., and Yannelis, C., 2020. Does Climate Change Affect Real Estate
Prices? Only If You Believe In It. The Review of Financial Studies 33, 1256–95.
Bandiera, O., Prat, A., Hansen, S., and Sadun, R., 2020. CEO Behavior and Firm Performance.
Bao, Y., Ke, B., Li, B., Yu, Y.J., and Zhang, J., 2020. Detecting Accounting Fraud in Publicly
Traded U.S. Firms Using a Machine Learning Approach. Journal of Accounting Research 58, 199–
235.
Barbon, A., Di Maggio, M., Franzoni, F., and Landier, A., 2019. Brokers and Order Flow Leakage:
Barth, A., Mansouri, S., and Woebbeking, F., 2020. Econlinguistics. SSRN Working Paper No.
Bartov, E., Faurel, L., and Mohanram, P.S., 2017. Can Twitter Help Predict Firm-Level Earnings
Belloni, A., Chen, D., Chernozhukov, V., and Hansen, C., 2012. Sparse Models and Methods for
Beneish, M.D, 1999. The Detection of Earnings Manipulation. Financial Analysts Journal 55, 24–
36.
Bertsch, C., Hull, I., Qi, Y., and Zhang, X., 2020. Bank Misconduct and Online Lending. Journal
51
Bianchi, D., Büchner, M., and Tamoni, A., 2020. Bond Risk Premiums with Machine Learning.
Binsbergen, J.H.v., Han, X., and Lopez-Lira, A., 2020. Man versus Machine Learning: Earnings
Expectations and Conditional Biases. NBER Working Paper No. 27843. National Bureau of Eco-
nomic Research.
Björkegren, D. and Grissen, D., 2018. The Potential of Digital Credit to Bank the Poor. AEA
Björkegren, D. and Grissen, D., 2019. Behavior Revealed in Mobile Phone Usage Predicts Credit
Bourassa, S.C., Cantoni, E., and Hoesli, M., 2007. Spatial Dependence, Housing Submarkets, and
House Price Prediction. The Journal of Real Estate Finance and Economics 35, 143–60.
Brown, N.C., Crowley, R.M., and Elliott, W.B., 2020. What Are You Saying? Using Topic to
Bubna, A., Das, S.R., and Prabhala, N., 2020. Venture Capital Communities. Journal of Financial
Buehlmaier, M.M. and Whited, T.M., 2018. Are Financial Constraints Priced? Evidence from
Butaru, F., Chen, Q., Clark, B., Das, S., Lo, A.W., and Siddique, A., 2016. Risk and Risk Man-
agement in the Credit Card Industry. Journal of Banking & Finance 72, 218–39.
Cambria, E. and White, B., 2014. Jumping NLP Curves: A Review of Natural Language Pro-
Carrasco, M., 2012. A Regularization Approach to the Many Instruments Problem. Journal of
Chen, L., Pelger, M., and Zhu, J., 2019. Deep Learning in Asset Pricing. SSRN Working Paper
52
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., and Newey, W., 2017.
107, 261–65.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins,
J., 2018. Double/Debiased Machine Learning for Treatment and Structural Parameters. The
Chinco, A., Clark‐Joseph, A.D., and Ye, M., 2019. Sparse Signals in the Cross-Section of Returns.
Colombo, E., Forte, G., and Rossignoli, R., 2019. Carry Trade Returns with Support Vector
Cowden, C., Fabozzi, F.J., and Nazemi, A., 2019. Default Prediction of Commercial Real Estate
Properties Using Machine Learning Techniques. The Journal of Portfolio Management, Forth-
coming.
Croux, C., Jagtiani, J., Korivi, T., and Vulanovic, M., 2020. Important Factors Determining
Fintech Loan Default: Evidence from a Lendingclub Consumer Platform. Journal of Economic
Din, A., Hoesli, M., and Bender, A., 2001. Environmental Variables and Real Estate Prices. Urban
Dobbie, W., Liberman, A., Paravisini, D., and Pathania, V., 2018. Measuring Bias in Consumer
Lending. NBER Working Paper No. 24953. National Bureau of Economic Research.
Domingues, R., Filippone, M., Michiardi, P., and Zouaoui, J., 2018. A Comparative Evaluation
of Outlier Detection Algorithms: Experiments and Analyses. Pattern Recognition 74, 406–421.
Du, Q., Jiao, Y., Ye, P., and Fan, W., 2019. When Mutual Fund Managers Write Confidently.
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X., 1996. A Density-Based Algorithm for Discovering
Frank, M.Z. and Goyal, V.K., 2009. Capital Structure Decisions: Which Factors Are Reliably
53
Freyberger, J., Neuhierl, A., and Weber, M., 2020. Dissecting Characteristics Nonparametrically.
Fudenberg, D., Kleinberg, J., Liang, A., and Mullainathan, S., 2019. Measuring the Completeness
of Theories. SSRN Working Paper No. 3018785. Social Science Research Network.
Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T., and Walther, A., 2020. Predictably Unequal?
The Effects of Machine Learning on Credit Markets. SSRN Working Paper No. 3072038. Social
Gathergood, J., Mahoney, N., Stewart, N., and Weber, J., 2019. How Do Individuals Repay Their
Ghysels, E., Plazzi, A., Valkanov, R., and Torous, W., 2013. Chapter 9 - Forecasting Real Estate
Prices. In Handbook of Economic Forecasting, edited by Graham Elliott and Allan Timmermann,
Goodfellow, I., Bengio, Y., and Courville, A., 2016. Deep Learning. Vol. 1. MIT Press, Cambridge,
MA, USA.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.,
and Bengio, Y., 2020. Generative Adversarial Networks. Communications of the ACM 63, 139–
144.
Gow, I.D., Kaplan, S.N., Larcker, D.F., and Zakolyukina, A.A., 2016. CEO Personality and Firm
Policies. SSRN Working Paper No. 2805635. Social Science Research Network.
Grammig, J., Hanenberg, C., Schlag, C., and Sönksen, J., 2020. Diverging Roads: Theory-Based
vs. Machine Learning-Implied Stock Risk Premia. SSRN Working Paper No. 3536835. Social Sci-
Gu, S., Kelly, B.T., and Xiu, D., 2019. Autoencoder Asset Pricing Models. SSRN Working Paper
Gu, S., Kelly, B.T., and Xiu, D., 2020. Empirical Asset Pricing via Machine Learning. The Review
54
Gulen, H., Jens, C., and Page, T.B., 2020. An Application of Causal Forest in Corporate Finance:
How Does Financing Affect Investment? SSRN Working Paper No. 3583685. Social Science Re-
search Network.
Gündüz, Y. and Uhrig-Homburg, M., 2011. Predicting Credit Default Swap Prices with Financial
Hanley, K.W. and Hoberg, G., 2019. Dynamic Interpretation of Emerging Risks in the Financial
Hansen, C. and Kozbur, D., 2014. Instrumental Variables Estimation with Many Weak Instru-
Hartford, J., Lewis, G., Leyton-Brown, K., and Taddy, M., 2017. Deep IV: A Flexible Approach
Hastie, T., Tibshirani, R., and Friedman, J., 2009. The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer Science & Business Media, Luxem-
bourg.
Haurin, D.R., Haurin, J.L., Nadauld, T., and Sanders, A., 2010. List Prices, Sale Prices and
Marketing Time: An Application to U.S. Housing Markets. Real Estate Economics 38, 659–85.
Hooker, G. and Mentch, L., 2019. Please Stop Permuting Features: An Explanation and Alterna-
Hott, C., 2011. Lending Behavior and Real Estate Prices. Journal of Banking & Finance 35, 2429–
42.
Hrazdil, K., Novak, J., Rogo, R., Wiedman, C., and Zhang, R., 2020. Measuring Executive Per-
sonality Using Machine-Learning Algorithms: A New Approach and Audit Fee-Based Validation
Hsieh, T.-S., Kim, J.-B., Wang, R.R., and Wang, Z., 2020. Seeing Is Believing? Executives’ Facial
Trustworthiness, Auditor Tenure, and Audit Fees. Journal of Accounting and Economics 69,
101260.
55
Hu, A., and Ma, S., 2020. Human Interactions and Financial Investment: A Video-Based Ap-
proach. SSRN Working Paper No. 3583898. Social Science Research Network.
Huang, A.H., Zang, A.Y., and Zheng, R., 2014. Evidence on the Information Content of Text in
Hutchinson, J.M., Lo, A.W., and Poggio, T., 1994. A Nonparametric Approach to Pricing and
Hedging Derivative Securities Via Learning Networks. The Journal of Finance 49, 851–89.
Jacobsen, B., Jiang, F., and Zhang, H., 2019. Ensemble Machine Learning and Stock Return
Predictability. SSRN Working Paper No. 3310289. Social Science Research Network.
Jones, S., Johnstone, D., and Wilson, R., 2015. An Empirical Evaluation of the Performance of
Binary Classifiers in the Prediction of Credit Ratings Changes. Journal of Banking & Finance 56,
72–85.
Ke, Z.T., Kelly, B.T., and Xiu, D., 2019. Predicting Returns With Text Data. NBER Working
Kearney, C. and Liu, S., 2014. Textual Sentiment in Finance: A Survey of Methods and Models.
Kelly, B.T., Pruitt, S., and Su, Y., 2019. Characteristics Are Covariances: A Unified Model of
Khandani, A.E., Kim, A.J., and Lo, A.W., 2010. Consumer Credit-Risk Models via Machine-
Kleinberg, J., Ludwig, J., Mullainathan, S., and Sunstein, C.R., 2018. Discrimination in the Age
Kogan, S., Levin, D., Routledge, B.R., Sagi, J.S., and Smith, N.A., 2009. Predicting Risk from
Financial Reports with Regression. In Proceedings of Human Language Technologies: The 2009
Annual Conference of the North American Chapter of the Association for Computational Linguis-
tics, 272–280.
Kozak, S., Nagel, S., and Santosh, S., 2018. Shrinking the Cross Section. SSRN Working Paper
56
Lahmiri, S. and Bekiros, S., 2019. Can Machine Learning Approaches Predict Corporate Bank-
ruptcy? Evidence from a Qualitative Experimental Design. Quantitative Finance 19, 1569–77.
Lee, B.K., Lessler, J., and Stuart, E.A., 2010. Improving Propensity Score Weighting Using Ma-
Li, K., Mai, F., Shen, R., and Yan, X., 2020. Measuring Corporate Culture Using Machine Learn-
ing. SSRN Working Paper No. 3256608. Social Science Research Network.
Liew, J.K.-S. and Wang, G.Z., 2016. Twitter Sentiment and IPO Performance: A Cross-Sectional
Liu, L. and Özsu, M.T., 2009. Encyclopedia of Database Systems. Vol. 6. Springer, New York,
USA.
Loh, W.-Y., 2011. Classification and Regression Trees. WIREs Data Mining and Knowledge Dis-
covery 1, 14–23.
Loughran, T. and McDonald, B., 2011. When Is a Liability Not a Liability? Textual Analysis,
Loughran, T. and McDonald, B., 2016. Textual Analysis in Accounting and Finance: A Survey.
Lowry, M., Michaely, R., and Volkova, E., 2020. Information Revealed through the Regulatory
Process: Interactions between the SEC and Companies Ahead of Their IPO. The Review of Fi-
Ludwig, J., Mullainathan, S., and Spiess, J., 2019. Augmenting Pre-Analysis Plans with Machine
MacQueen, J., 1967. Some Methods for Classification and Analysis of Multivariate Observations.
In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1, 281–
297.
Manela, A. and Moreira, A., 2017. News Implied Volatility and Disaster Concerns. Journal of
57
Martin, I. and Nagel, S., 2019. Market Efficiency in the Age of Big Data. NBER Working Paper
McInish, T.H., Nikolsko‐Rzhevska, O., Nikolsko‐Rzhevskyy, A., and Panovska, I., 2019. Fast and
Medsker, L.R. and Jain, L.C., 1999. Recurrent Neural Networks: Design and Applications. CRC
Mitchell, M., 1998. An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA, USA.
Moritz, B. and Zimmermann, T., 2016. Tree-Based Conditional Portfolio Sorts: The Relation
between Past and Future Stock Returns. SSRN Working Paper No. 2740751. Social Science Re-
search Network.
Mullainathan, S. and Spiess, J., 2017. Machine Learning: An Applied Econometric Approach.
Nazemi, A. and Fabozzi, F.J., 2018. Macroeconomic Variable Selection for Creditor Recovery
O’Malley, T., 2020. The Impact of Repossession Risk on Mortgage Default. The Journal of Fi-
nance, Forthcoming.
Osterrieder, J., Kucharczyk, D., Rudolf, S., and Wittwer, D., 2020. Neural Networks and Arbi-
trage in the VIX. SSRN Working Paper No. 3591755. Social Science Research Network.
Palmon, O., Smith, B., and Sopranzetti, B., 2004. Clustering in Real Estate Prices: Determinants
Park, B. and Bae, J.K., 2015. Using Machine Learning Algorithms for Housing Price Prediction:
The Case of Fairfax County, Virginia Housing Data. Expert Systems with Applications 42, 2928–
34.
Pavlov, A. and Wachter S., 2011. Subprime Lending and Real Estate Prices. Real Estate Econom-
58
Peysakhovich, A. and Naecker, J., 2017. Using Methods from Machine Learning to Evaluate Be-
havioral Models of Choice under Risk and Ambiguity. Journal of Economic Behavior & Organi-
Philippon, T., 2019. On Fintech and Financial Inclusion. NBER Working Paper No. 26330. Na-
Quan, D.C. and Titman, S., 1999. Do Real Estate Prices and Stock Prices Move Together? An
Rambachan, A., Kleinberg, J., Ludwig, J., and Mullainathan, S., 2020. An Economic Perspective
Rambachan, A., Kleinberg, J., Mullainathan, S., and Ludwig, J., 2020. An Economic Approach
to Regulating Algorithms. NBER Working Paper No. 27111. National Bureau of Economic Re-
search.
Rasekhschaffe, K.C. and Jones, R.C., 2019. Machine Learning for Stock Selection. Financial An-
Rasmussen, C., 1999. The Infinite Gaussian Mixture Model. Advances in Neural Information
Reichenbacher, M., Schuster, P., and Uhrig‐Homburg, M., 2020. Expected Bond Liquidity. SSRN
Renault, T., 2017. Intraday Online Investor Sentiment and Return Patterns in the U.S. Stock
Rish, I., 2001. An Empirical Study of the Naive Bayes Classifier. In IJCAI 2001 Workshop on
Rossi, A.G., 2018. Predicting Stock Market Returns with Machine Learning. Working Paper.
Rossi, A.G. and Utkus, S.P., 2020. Who Benefits from Robo-Advising? Evidence from Machine
Learning. SSRN Working Paper No. 3552671. Social Science Research Network.
Routledge, B.R., 2019. Machine Learning and Asset Allocation. Financial Management 48, 1069–
94.
59
Settles, B., 2009. Active Learning Literature Survey. Technical Report. University of Wisconsin-
Madison.
Sigrist, F. and Hirnschall, C., 2019. Grabit: Gradient Tree-Boosted Tobit Models for Default
Sirignano, J., Sadhwani, A., and Giesecke, K., 2018. Deep Learning for Mortgage Risk. arXiv
Spiegeleer, J.D., Madan, D.B., Reyners, S., and Schoutens, W., 2018. Machine Learning for Quan-
titative Finance: Fast Derivative Pricing, Hedging and Fitting. Quantitative Finance 18, 1635–43.
Sutton, R.S. and Barto, A.G., 2018. Reinforcement Learning: An Introduction. MIT Press, Cam-
Tang, V.W., 2018. Wisdom of Crowds: Cross-Sectional Variation in the Informativeness of Third-
Tian, S., Yu, Y., and Guo, H., 2015. Variable Selection and Corporate Bankruptcy Forecasts.
Varian, H.R., 2014. Big Data: New Tricks for Econometrics. Journal of Economic Perspectives 28,
3–28.
Vamossy, D.F., 2020. Investor Emotions and Earnings Announcements. SSRN Working Paper No.
Welch, I. and Goyal, A., 2008. A Comprehensive Look at The Empirical Performance of Equity
Xiang, G., Zheng, Z., Wen, M., Hong, J., Rose, C., and Liu, C., 2012. A Supervised Approach to
Predict Company Acquisition with Factual and Topic Features Using Profiles and News Articles
on TechCrunch. In Proceedings of the Sixth International AAAI Conference on Weblogs and Social
Media 4.
Yang, Q., Liu, Y., Chen, T., and Tong, Y., 2019. Federated Machine Learning: Concept and
60
Yao, J., Li, Y., and Tan, C.L., 2000. Option Price Forecasting Using Neural Networks. Omega 28,
455–66.
Yavas, A. and Yang, S., 1995. The Strategic Role of Listing Price in Marketing Real Estate:
Zhang, T., Ramakrishnan, R., and Livny, M., 1996. BIRCH: An Efficient Data Clustering Method
Zhu, X.J., 2005. Semi-Supervised Learning Literature Survey. Technical Report. University of
Wisconsin-Madison.
Zou, H. and Hastie, T., 2005. Regularization and Variable Selection via the Elastic Net. Journal
61
Appendix
Table A1. Overview of variables with definitions
This table gives an overview of the variables used in our analysis and provides definitions. Our target
variable that we want to predict is list price. There are multiple types of predictor variables: physical
attribute variables, macro location variables, granular location variables, and offer variables. Variables
that are available for only a limited subset of the sample are included in an additional specification.
Original German
Variable Definition
Variable Name
Target variable
List price Price of the property in EUR as given in the listing Preis
Physical attribute variables
Type of house: detached, semi-detached, or ter- Haustyp: EFH, DHH/
House type
raced/townhouse REH, RH
Size Size of the property in m² Wohnfläche
Rooms Number of rooms of the property Zimmer
Lot size Lot size of the property in m² Grundstücksfläche
Construction year Construction year of the building Baujahr
Macro location variables
County The county where the property is located Landkreis
Granular location variables
Horizontal Precise latitude of the center of the city district implied Horizontale
geocoordinate by the property’s zip code Geo-Koordinate
Vertical Precise longitude of the center of the city district implied Vertikale
geocoordinate by the property’s zip code Geo-Koordinate
Offer variables
Offer year Year of the listing Angebotsjahr
Indicates whether the sale offer is listed on an online plat-
Online listing Online-Angebot
form
Seller type as stated in the listing: realtor, developer, or
Seller type Verkäufer
private owner
Variables for additional specification (available for only a limited subset of the sample)
Patios Number of the property’s patios Terrassen
Balconies Number of the property’s balconies Balkone
Garages Number of garages belonging to the property Garagen
Parking lots Number of parking lots belonging to the property Kfz-Stellplätze
Bathrooms Number of the property’s bathrooms Bäder
The property’s status of renovation: necessary, partly ren-
Renovation status Zustand
ovated, or renovated
The property’s type of basement: basement, livable, fully
Basement type Art des Kellers
finished, or partially finished
Balcony type The property’s type of balcony: balcony or loggia Art des Balkons
Leased lot Indicates whether the property’s lot is leased Erbbaupacht
Wintergarden Indicates whether the property has a wintergarden Wintergarten
Rooftop Indicates whether the property has a rooftop terrace Dachterrasse
Indicates whether the property has solar panels on the
Solar Solaranlage
roof
Indicates whether there is other usage (e.g., commercial)
Other usage Alternativnutzung
possible for the property
Hillside Indicates whether the property is located on a hillside Hanglage
62
Studio Indicates whether the property’s top floor is a studio Dachstudio
Indicates whether the property contains an extra-large
Large kitchen Wohnküche
kitchen
Indicates whether there is a recreational room in the
Recreation room Hobbyraum
property
Sauna Indicates whether there is a sauna in the property Sauna
Gallery Indicates whether the property contains a gallery Galerie
Fireplace Indicates whether there is fireplace in the property Kamin
Indicates whether underfloor heating is available in the
Underfloor heat Fußbodenheizung
property
Pool Indicates whether there is a pool on the property Schwimmbad
Indicates whether the property’s rooms have hardwood
Hardwood floors Parkett
floors
Prefab Indicates whether the building has been prefabricated Fertighaus
Separate flat Indicates whether a separate flat belongs to the property Einliegerwohnung
Ausgebautes
Attic finished Indicates whether the property’s attic is finished
Dachgeschoss
Garden Indicates whether there is a garden on the property Garten
Pond Indicates whether there is a pond on the property Teich
63