Use Machine Learning To Forecast Future Earnings
Use Machine Learning To Forecast Future Earnings
Edward, XU Zhaoyu
Clair, CUI Xinyue
Ashlley, ZHOU Yue
V. Conclusion ……………………….......................................... p. 32
2
I. Project Overview
3
Project Overview
Overview
Objective
• We are trying to select, adjust and integrate a series of machine learning or deep learning models to
comprehensively assess their feasibility and suitability on predicting the company fundamentals (i.e.
earnings).
Highlights
• Large Samples: select from Top 3000 Market
Capitalization Companies in US
• Multi-Class Prediction: 2 to 9 groups compared
to peers
• Rolling Window validation: 40 overlapping sets
• Relative Earnings Change: Year-over-Year 𝑁𝑒𝑡 𝐼𝑛𝑐𝑜𝑚𝑒!"# − 𝑁𝑒𝑡 𝐼𝑛𝑐𝑜𝑚𝑒!
(YoY) or Quarter-over-Quarter (QoQ) 𝐸! = 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛|
𝑁𝑒𝑡 𝐼𝑛𝑐𝑜𝑚𝑒!
4
Project Overview
Techniques
• LightGBM
• Gradient Boost Decision Tree (GBDT)
Main Classifier
• Hyperopt
• Tree of Parzen Estimators (TPE)
Hyperparameter Optimization
5
Project Overview
Variables
Market Variables • Abnormal returns of each company against the S&P500 benchmark
6
Project Overview
Literature Review
• Highest cited papers regarding machine learning models in Accounting & Finance
• Support vector machines (SVMs) (Kim, 2003; Pai and Lin, 2005)
• Neural networks (Campbell, 1987; Chang, Liu, Lin, Fan, & Ng, 2009; Chen, Leung, & Daouk, 2003)
• Autoregressive conditional heteroskedasticity (ARCH/GARCH) (Engle, 1982; Bollerslev, 1986)
• Gradient Boosting Decision Tree (GBDT) (Jones, Johnstone, & Wilson, 2015)
7
II. Model Construction
8
Model Construction
Dimension Reduction – Principal Components Analysis
9
Model Construction
Dimension Reduction– Principal Components Analysis
Well, however, though such thought is quite simple and plain, but one has to realize that a “subspace” would
certainly mean that, it only contains “part” of the original information, while some of the other information
are correspondingly lost in the “reduced dimensions”.
10
Model Construction
Dimension Reduction – Principal Components Analysis
Trade-off between
Number of
Dimensions
and
Number of
Information
11
Model Construction
Gradient Boost Decision Tree - LightGBM
12
Model Construction
Gradient Boost Decision Tree - LightGBM
Intuitively, we shall expect that with the development of such < 50% …
division (i.e. with the tree growing deeper), the ‘purity’ of samples
contained in each node will also increase gradually. Stock …….
Return
Ultimately, in the perfect case, only the real outperform stocks …
can be classified into the final category – “Outperform(2)”, the > 5%
purity of this class will then be 100% (i.e. 100% correct).
…….
Purity ↑ Outper
form
13
Model Construction
Gradient Boost Decision Tree - LightGBM
What is LightGBM?
Accuracy
15
Model Construction
Gradient Boost Decision Tree - LightGBM
What is LightGBM?
LightGBM grows tree vertically (Leaf-wise) while other algorithms grow trees horizontally (Level-wise).
LightGBM will only choose the leaf with max delta loss (another measurement of ‘purity’ ) to grow.
When growing the same leaf, Leaf-wise algorithm can reduce more loss (increase more purity) than a
level-wise algorithm.
LightGBM is prefixed as ‘Light’ because of its high speed in virtue of this ground-breaking algorithm.
Therefore, LightGBM can handle the large size of data and takes lower memory to run.
16
III. Data and Experiment
17
Data and Experiment
Sample Selection
19
Data and Experiment
Preprocessing
YoY Prediction:
Growth of Annual Net
Income
QoQ Prediction:
Growth of Quarterly Net
Income
22
IV. Results and Analysis
23
Results and Analysis
Benchmark Comparison – Logistic Regression
• Our model outperform the accuracy of Both Logistic Regression Model proposed by Ou and
Penman (1989) and Hunt Myers & Myers (2019).
• All models here deal with Sign of Earnings Change problem (increase / decrease).
24
Results and Analysis
Benchmark Comparison – Consensus
• Our model improves the accuracy of I/B/E/S Analysts’ Consensus Prediction by evaluating the
converging case of LightGBM models and Consensus Prediction (i.e. same class).
• 3-class Quarter-over-Quarter problem achieved highest accuracy to 81.3%. However,
comparatively 9-class YoY problem improves the most by 29%. 27
Results and Analysis
Benchmark Comparison – Consensus
• News 0.7
0.6
• Language & sentiment data 0.5
0.4
• Information released after the prior fiscal 0.3
year/quarter 0.2
0.1
Correlation 0
2008-03-31
2008-06-30
2008-09-30
2008-12-31
2009-03-31
2009-06-30
2009-09-30
2009-12-31
2010-03-31
2010-06-30
2010-09-30
2010-12-31
2011-03-31
2011-06-30
2011-09-30
2011-12-31
2012-03-31
2012-06-30
2012-09-30
2012-12-31
2013-03-31
2013-06-30
2013-09-30
2013-12-31
2014-03-31
2014-06-30
2014-09-30
2014-12-31
2015-03-31
2015-06-30
2015-09-30
2015-12-31
2016-03-31
2016-06-30
2016-09-30
2016-12-31
2017-03-31
2017-06-30
2017-09-30
2017-12-31
• Our results has 0.44 correlation with the Conesnsus (Mean) Conesnsus (Medium) LightGBM Random Walk
consensus prediction.
1 Fried 28
& Givoly 1982, Das, Levine, & Sivaramakrishnan 1998
V. Conclusion
29
Conclusion
Summary and Suggestions
Summary
• LightGBM is an innovative machine learning techniques that has great potential in
accounting and finance research.
• Our paper proves its ability to predict future earnings and generate relative accurate
results compared to many other statistical models.
Suggestions
• Unfortunately, due to the constrains of time and data access, our paper failed to
outperform consensus.
• Future research may implement Natural Language Processing (NPL) to include non-
financial data into the analysis and improves the results.
30
Thanks for Listening!
Edward, XU Zhaoyu
[email protected]