0% found this document useful (0 votes)

16 views10 pages

Equity Research Report-Driven Investment Strategy

Uploaded by

andrew Robenstein

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views10 pages

Equity Research Report-Driven Investment Strategy

Uploaded by

andrew Robenstein

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3067691, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Equity research report-driven investment

strategy in Korea using binary
classification on stock price direction
POONGJIN CHO1 JI HWAN PARK2 AND JAE WOOK SONG2
1
KRiBLE, FnGuide, Seoul 07805, Republic of Korea
2
Department of Industrial Engineering, Hanyang University, Seoul 04763, Republic of Korea
Corresponding author: Jae Wook Song (e-mail: [email protected])

ABSTRACT This research examines and proposes an investment strategy by combining the natural
language processing on the equity research reports published in the Korean financial market and machine
learning algorithms for binary classification. At first, we deduce the part-of-speech from the report using the
KoNLPy and Mecab. Then, we define 33 features as the input variables and perform the binary classification
on the price direction of the stocks recommended in the report using various machine learning algorithms.
Note that we investigate the model performance in detail by dividing the entire period into three sub-periods,
including pre-COVID-19 for the sideways market, COVID-19 for the crashing market, and post-COVID-19
for the extreme bullish market. We confirm that the random forest is the best classifier for all periods, so
we utilize its results on positively predicted stocks in the test set as the investment universe for the monthly
re-balancing and buy-and-hold investment. The proposed strategy shows a significantly higher return on
investment than benchmarks during the pre-COVID-19 and COVID-19 periods, whereas the comparable
return during the post-COVID-19.

INDEX TERMS Finance, Natural language processing, stock markets, Equity research reports, Binary
classification, Investment strategy

1 I. INTRODUCTION 21 mining in social media, has become essential. As the mar-

2 Financial companies periodically issue research reports for 22 ket prediction using NLP algorithms has been studied in
3 investors. The report contents include analyzing companies, 23 the financial field, a research field called natural language-
4 financial institutions, diplomatic issues between countries, 24 based financial forecasting has been gradually established
5 and politics. Among them, this study focuses on the equity 25 [1]–[4]. In particular, the stock market has received great
6 research report that recommends a specific stock at a time. 26 attention in academia due to its sensitivity to market partic-
7 Usually, analysts write their perspective on a stock expected 27 ipants’ sentiment. That is, investors’ sentiment can change
8 to show high returns in the future through various quantitative 28 the overall trend of individual stocks and even the market.
9 and qualitative analyses. However, the profit in the future 29 Many previous studies have analyzed investors’ opinions and
10 varies in different reports. One reason for such a result is that 30 market sentiment from social media posts regarding financial
11 the person who writes the report may not be equipped with 31 markets. Some studies extract and analyze the mood of text
12 enough analytical skills, extending to low-quality reports. 32 from social media such as Twitter [5]–[9], news [10], bulletin
13 In this study, we assume that the composition of the equity 33 board [11], [12] and utilize them for market prediction. This
14 research reports quantified through natural language pro- 34 research’s main objective focuses on a binary classification
15 cessing (NLP) can distinguish the stock recommendations’ 35 of positive or negative moods using the text to discover its
16 reliability. 36 correlation with the market or company’s stock price move-
17 In the 2010s, the digital online content volume has ex- 37 ment. The methods that have been used for analysis include
18 ploded, including market analysis reports, news articles, 38 the Naive Bayes approach [12]–[16], support vector machine
19 journal texts, online blogs, and social media. Accordingly, 39 [17]–[20], and decision tree [21]. In addition to the same
20 research on analyzing public sentiment, especially opinion 40 machine learning algorithm, many studies also have utilized

VOLUME X, XXXX 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3067691, IEEE Access

P. Cho et al.: Equity research report-driven investment strategy in Korea using binary classification on stock price direction

41 various deep learning algorithms such as Artificial Neural 96 recommends one stock at a time, providing information on
42 Network [22]–[24] and Recurrent Neural Network [25]–[27]. 97 the target price, current status and direction on the underlying
43 Depending on how the features of the text are extracted, it 98 company’s business, and degree of recommendation on the
44 may not reflect the movement of stock price well. Hence, 99 stock. Note that the number of reports varies in different
45 the extraction of features representing the characteristics of 100 months as summarized in Table 1.
46 natural language is an important topic in NLP. The methods 101 A total of 1,118 stocks were recommended within 34,780
47 of the feature engineering incorporate linguistic feature [28], 102 research reports where the reports are concentrated on a lim-
48 [29], keyword extraction [30], data reduction with generative 103 ited number of stocks as presented in Figure 1. Specifically,
49 probabilistic model [31], and word embedding with n-grams 104 the top 400 stocks account for 90% of all reports. One reason
50 [28], [32], TF-IDF [33], ensemble model [34] and deep 105 for such concentration could be the investor’s preference on
51 learning [35]. 106 stocks with high market capitalization or thematic investing,
52 Furthermore, there have been many studies regarding the 107 which can promote readability and click rates on reports in
53 derivation of investment strategies using NLP. Several stud- 108 favor of analysts. Due to less diversity among the reports,
54 ies have analyzed market sentiment information using a 109 many investors doubt the report’s utility in predicting the
55 machine-learning algorithm to construct a portfolio. Such 110 future returns on the underlying stock. However, it is also
56 studies either design a neural network with an ensemble 111 true that most of the reports are carefully written through
57 of evolving clustering and LSTM [36], or propose a new 112 sufficient research, consisting of the analyst’s sentimental
58 follow-the-loser portfolio strategy from the post of stock 113 but reasonable opinions on the company. In this context, we
59 micro-blogs using semi-supervised learning method [37], or 114 assume that the valuable reports with rich information are
60 establish a trading strategy from new sentiment data using 115 written based on clear facts, whose composition and even
61 learning-to-rank algorithms [38]. Also, recently, a portfolio 116 sentence structure could differ from the unhelpful report with
62 investment strategy that considers shareholders’ confidence 117 limited value. At first, we define that a report is valuable if it
63 index by combining the existing random forest and senti- 118 successfully recommends a stock with a positive return in
64 mental analysis [39] and an investment strategy that encodes 119 the near future. A recommended stock from an unhelpful
65 external information from financial news using reinforcement 120 report yields a negative return in the near future. Then, we
66 learning have been proposed [40]. 121 conduct a binary classification based on NLP and various
67 However, there have been limited efforts on establishing 122 machine learning algorithms to distinguish the composition
68 an investment strategy based on the NLP of equity research 123 of valuable reports.
69 reports published in the Korean financial market. Therefore,
70 in this paper, we focus on analyzing the report through NLP 124 B. FEATURE ENGINEERING WITH NLP
71 and investigate if the induced information can be utilized 125 We utilize the NLP to define the features for binary classifi-
72 for investment strategy. At first, the NLP element is derived 126 cation. At first, the contents of each report written in Korean
73 by quantifying the structure of the report in the form of 127 are divided into morpheme units through NLP. English has its
74 part-of-speech (POS). Then, using NLP elements as input 128 meaning decomposed based on spacing, but the Korean can
75 features, a binary classification model that predicts whether 129 be divided into morphemes containing two or more meanings
76 the stocks recommended from the report produce the positive 130 without spacing. To analyze the Korean language, we employ
77 or negative return is constructed. The model with the best 131 the KoNLPy [41], a Python package for NLP of the Korean
78 classification performance is selected for the experiment by 132 language, and Mecab [42], methods of tagging POS that tags
79 applying several machine learning algorithms. Finally, we 133 each morpheme with 43 detailed POS. However, 43 POS di-
80 propose an investment strategy to buy stocks predicted to 134 vides the sentence in too much detail and has many features,
81 yield a positive return in future returns through the suggested 135 which can cause overfitting. Therefore, in this study, the top
82 classification algorithm. To show the superiority of the pro- 136 10 most used NLP elements are integrated and selected as
83 posed investment strategy, we compare its investment returns 137 summarized in Table 2.
84 with the strategy of investing all the stocks recommended by 138 Based on the ten selected POS, we utilize each POS
85 the report and the market index as benchmarks. Besides, to 139 frequency as a feature for the binary classification. Then, we
86 investigate whether the proposed investment strategy shows 140 create eight additional features that can represent the char-
87 consistent performance in various market conditions, differ- 141 acteristic of equity research report: Number of a morpheme
88 ent periods’ investment return is analyzed separately. 142 (subwords), Average number of morpheme per sentence
143 (mean_subwords_per_sentence), Standard deviation of mor-
89 II. FRAMEWORK OF INVESTMENT STRATEGY 144 pheme per sentence(std_subwords_per_sentence), Number
90 A. EQUITY RESEARCH REPORT 145 of sentences ending with da (da), Number of sentences (sen-
91 This study utilizes 34,780 equity research reports on stocks 146 tence), Number of paragraphs (paragraph), Number of pages
92 traded in Korean financial markets published from 2019-01- 147 (page), Number of pages with words (page_with_word).
93 01 to 2020-06-12. Note that the securities firm analysts are 148 Note that we use optical character recognition (OCR) to
94 in charge of writing the reports and provide them in Portable 149 count the number of pages with words since there exist pages
95 Document Format (PDF) on the firm’s website. Each report 150 with only tables or pictures. In Korean, a perfect sentence
2 VOLUME X, XXXX

P. Cho et al.: Equity research report-driven investment strategy in Korea using binary classification on stock price direction

Months Jan-2019 Feb-2019 Mar-2019 Apr-2019 May-2019 Jun-2019 Jul-2019 Aug-2019 Sep-2019
Number of reports 600 470 350 607 685 303 602 556 280
Months Oct-2019 Nov-2019 Dec-2019 Jan-2020 Feb-2020 Mar-2020 Apr-2020 May-2020 Jun-2020
Number of reports 644 702 179 468 492 364 628 593 173

TABLE 1: Number of equity research report per month

FIGURE 1: Distribution of recommendation frequencies on each stock

Selected POS Number of Integration List of Integrated POS

Noun 6 Common, Proper, Bound, Unit, Numerals, Pronoun
Adjective 1 Adjective
Verb 1 Verb
Adverb 2 Common, Connective
Postposition 9 Subjective, Auxiliary, Connective, Complement case, Adnominal case, Objective case,
Adverbial case, Vocative case, Citing case
Determiner 1 Determiner
Ending 4 Sentence-closing, Connective, Nominal, Adnominal
Number 1 Number
English 1 English
Others 1 Auxiliary Predicate element
Not-in-use 16 Exclamation, Uninflected prefix, Noun-driven suffix, Verb-driven suffix, Adjective-driven suffix, Root,
Period & Marks, Ellipsis, Opening parenthesis, Closing parenthesis, Delimiter, Dash, Symbol, Chinese character
Positive copula word, Negative copula word

TABLE 2: Selected 10 POS from 45 POS

151 ends with da; otherwise, a sentence has omitted elements. 168 tio (determiner/subwords), Ending ratio (ending/subwords),
152 Note that a sentence that does not end with da only conveys 169 Number ratio (number/subwords), English ratio (en-
153 some financial terms providing limited implication to the 170 glish/subwords), Others ratio (others/subwords), Ratio of da
154 investment. Therefore, we assume that the da can be a feature 171 in sentences (da/sentence), Changes in number of morpheme
155 representing an equity research report’s characteristic. In 172 (std_subwords_per_sentence/mean_subwords_per_sentence),
156 Figure 2a, the distributions of the ten selected POS and 173 Morpheme per page (subwords/page), Morpheme per sen-
157 additional features are investigated, showing that all features 174 tence (subwords/sentence), Ratio of pages with words
158 are skewed. Therefore, we apply log-transformation to all 175 (page_with_word/page). Note that we also apply the log-
159 variables as illustrated in Figure 2b. In this context, we 176 transformation to the ratios. Finally, we apply the Min-max
160 successfully obtain variables whose distributions are close to 177 scaling to total 33 features to finalize the data pre-processing
161 the normal distribution used in machine learning for binary 178 for the binary classification.
162 classification.
163 In addition, we include 15 different ratios based on the 179 C. BINARY CLASSIFICATION & INVESTMENT
164 selected POS and additional features as follows: Noun ra- 180 STRATEGY
165 tio (noun/subwords), Adjective ratio (adjective/subwords), 181 We propose a binary classification based on the pre-processed
166 Verb ratio (verb/subwords), Adverb ratio (adverb/subwords), 182 NLP-driven features, which predicts whether or not the stock
167 Postposition ratio (postposition/subwords), Determiner ra- 183 suggested in the equity report will show a positive or negative
VOLUME X, XXXX 3

P. Cho et al.: Equity research report-driven investment strategy in Korea using binary classification on stock price direction

(a) Original features (b) Log-transformed features

FIGURE 2: Distributions of selected POS and additional features based on NLP

184 return in the future. Specifically, we utilize five well-known 212 where pk refers to the percentage of the data points belonging
185 models. At first, we employ the k-Nearest Neighbors (k-NN) 213 to the category k. It is trained to increase the homogeneity of
186 classifier. The k-NN algorithm hinges on the assumption that 214 each area and reduce the impurity or uncertainty as much as
187 similar data points will be located at close distance [43]. 215 possible, which is called information gain.
188 Therefore, it calculates the distance between the test data and 216 Fourthly, we utilize the random forest. Since the decision
189 the input, which can be obtained as follows: 217 tree has a limitation of overfitting, we employ an ensemble
v
u n 218 model that generates multiple decision trees and votes on
uX 219 each tree’s classification results. It can be obtained through
190 d(p, q) = t (qi − pi )2 (1) 220 bagging that makes a decision tree with data sampled with
i=1
191 221 replacement from the entire training data [46].
192 where p and q refer to the data points that have coordinates of 222 Lastly, we utilize gradient boosting, an ensemble model
193 (p1 , p2 , ..., pn ) and (q1 , q2 , ..., qn ) in n dimensions, respec- 223 that produces a robust classifier by combining weak classi-
194 tively. 224 fiers, typically decision trees [47]. It uses gradient descent
195 Secondly, we utilize the logistic regression using the sig- 225 to differentiate the loss function as a parameter to obtain the
196 moid function as follows [44]: 226 slope and calibrates the parameter so that the loss decreases.
227 The loss function and the negative gradient are expressed as
1 X
197 cost(W ) = c(H(x), y) (2) 228 follows.
m
( 1
− log(H(x)), :y=1 229 L(yi , f (xi )) =(yi − f (xi ))2 (6)
198 c(H(x), y) = (3) 2
− log(1 − H(x)), : y = 0 ∂[ 1 (yi − f (xi ))2 ]
∂L(yi , f (xi ))
1 230 = 2 = f (xi ) − yi (7)
199 H(x) = (4) 231 ∂f (xi ) ∂f (xi )
1+e −(W x+b)
200
232 where L refers to the loss function.
201 where H(x), W and b correspond to the sigmoid function, 233 For the experiment, we divide the data into the train(70%)
202 weight, and bias, respectively. As a result of approaches 1 234 and test(30%) sets. Note that we ensure the partitioned data
203 or 0, the value of the cost function decreases or increases, 235 can carry the equivalent distributional characteristics of the
204 respectively. 236 number of equity research reports per month as well as
205 Thirdly, we utilize the decision tree, which analyzes and 237 the number of those per stock. Although many prediction
206 represents patterns between data as a combination of possible 238 problems in financial time-series use the in-sample and out-
207 rules and is built top-down from the root node [45]. To build 239 of-sample on time, our model can utilize random sampling
208 a decision tree, we use the entropy for an area to which m 240 since its explanatory variables are not dependent on time.
209 data points belong can be calculated as follows: 241 For 50 different random seeds, we compare the classification
m
X 242 performances of five models for different times after the re-
210 Entropy = − pk log2 (pk ) (5) 243 port’s release. Based on the model with the best performance,
211 k=1 244 we simulate the backtesting with monthly re-balancing and
4 VOLUME X, XXXX

P. Cho et al.: Equity research report-driven investment strategy in Korea using binary classification on stock price direction

245 simple buy-and-hold for different investment horizon invest- 300 indicates relatively lower false positives than false negatives.
246 ment strategies using the positively predicted stocks in the 301 In this context, the stocks predicted to be positive are likely to
247 test set. Then, we compare the investment performance with 302 be in the actual positive direction, although the model cannot
248 other benchmarks. A step-by-step scenario of the proposed 303 accurately detect all stocks with positive direction. Therefore,
249 investment strategy is illustrated in Figure 3 304 we can imply that an investment strategy based on the stocks
305 predicted to be positive returns can produce a high profit.
250 III. EMPIRICAL RESULTS AND DISCUSSIONS 306 We further investigate how random forest classification
251 A. BINARY CLASSIFICATION PERFORMANCE 307 varies in different market conditions as summarized in Table
252 As previously stated, we utilize five machine learning models 308 6. For the reports published during the sideways period, the
253 to predict the price direction of the stock recommended 309 accuracy increases as the investment period increases, but the
254 from the equity research reports whose NLP elements are 310 AUC remains around 0.5. Hence, the corresponding invest-
255 considered as the features. Table 3 summarizes the hyper- 311 ment strategy is expected to produce a little advantage over
256 parameters for each model. We compare the binary classi- 312 investing in all reports. During the collapsing period, both
257 fication performances of each model for a different time in 313 AUC and F1-score increase as the prediction time increases.
258 the future in terms of prediction accuracy and area under the 314 Hence, the corresponding investment strategy is expected to
259 receiver operating characteristic curve (AUC). Note that this 315 produce a high profit over investment in all reports. During
260 research’s main objective is to examine if the equity research 316 the soaring period, we observe a high accuracy and F1-score
261 report’s NLP elements can be used to construct an investment 317 but relatively low AUC values. The high accuracy and F1-
262 strategy. Therefore, based on two simple measures, we select 318 score are realized due to the biased target variable on the
263 a model with the highest classification performance, analyze 319 positive direction during the recovery from the COVID-19
264 the classification results in detail using the precision, recall, 320 pandemic shock. Therefore, the corresponding investment
265 and F1-score, and utilize it to establish an investment strat- 321 strategy is expected to show no significantly different profit
266 egy. 322 compared to the investment in all reports.
267 The models predict the direction of stock at 30, 60, 90, 120,
268 150, and 180 trading days after the report’s release. We will 323 B. FEATURE IMPORTANCE
269 call this as prediction time. We consider the equity research 324 Prior to utilizing the binary classification into the investment
270 reports published from 2019-01-01 to 2020-06-12, and the 325 strategy, we investigate the feature importance based on
271 Korean financial market has experienced the sideways period 326 random forest results. The average importance of each NLP
272 with low volatility (2019-01-01 - 2020-01-20), collapsing 327 element in the random forest is summarized in Figure 4. For
273 period due to the outbreak of COVID-19 (2020-01-21 2020- 328 the total period in Figure 4a, the most significant feature is the
274 03-29), and soaring period with the extreme bullish market 329 English ratio. Note that the low feature importance indicates
275 (2020-03-30 2020-06-30). Specifically, we divide the pe- 330 no significant influence on predicting the direction of the
276 riods based on the highest and lowest points of KOSPI200, 331 stock price. Specifically, based on the median of the English
277 the representative financial market index of Korea, within the 332 ratio, the average investment return for 180 prediction time
278 entire period. In this regard, the classification performance 333 for all reports with an English ratio lower than the median
279 can be evaluated for different market conditions. 334 is -2.3%, whereas that for all reports with an English ratio
280 At first, the average classification performances of each 335 higher than the median is 7.8%, which yields the difference
281 model for 50 different random seeds are summarized in 336 of 10.1% of the return. It implies that a relatively high
282 Table 4. According to the results, the accuracy and AUC tend 337 English ratio report can be expected to show a positive
283 to increase as the prediction time increases for all models. 338 expected return compared to a report that does not have one.
284 It implies that a higher return can be expected when an in- 339 Likewise, the noun ratio, the second most crucial variable,
285 vestment strategy is established based on the long investment 340 shows a 7.2% difference in investment return based on the
286 horizon’s prediction results. Finally, we choose the random 341 median. In this context, we discover the NLP-elements that
287 forest as the primary classification model since it shows the 342 positively affect the investment return, which are English
288 highest accuracy and AUC for all prediction times. 343 ratio, subwords per page, page word, and subwords, among
289 Detailed classification performance of the random forest is 344 the top 15 features showing high importance. Otherwise,
290 summarized in Table 5. Comparing to the accuracy and AUC, 345 for most NLP-elements, the lower the value, the higher the
291 the F1-score is low and invariant for different prediction 346 investment return. Interestingly, most of the ratios of NLP-
292 times, which reduces the utility of the prediction model. 347 elements show high feature importance than selected POS
293 Specifically, the low F1-score is caused by the relatively low 348 and additional features in Figure 3.
294 recall. Note that the precision shares the same pattern as the 349 Furthermore, we examine the feature importance of NLP
295 accuracy and AUC. However, such a result does not affect the 350 elements for different market conditions in Figures 4b,4c and
296 random forest’s utility since the proposed investment strategy 351 4d. Analogous to the total period, the ratios of NLP-elements
297 only utilizes the positively predicted stocks, whose return in 352 show high feature importance in all periods. Therefore, we
298 the future is expected to be positive. A classification model 353 can conclude that the ratios of NLP-elements play a more
299 with high precision but low recall in a binary classification 354 important role than basic NLP elements regardless of market
VOLUME X, XXXX 5

P. Cho et al.: Equity research report-driven investment strategy in Korea using binary classification on stock price direction

FIGURE 3: Proposed investment framework using NLP-driven features from the equity research report

Machine learning Hyper-parameters

k-Nearest Neighbors Metric(Manhattan), Number-of-neighbors(19), Weights(distance)
Logistics Regression C(100), Penalty(L2), Solver(lbfgs)
Decision Tree Min-impurity-decrease(0.2), Max-depth(320), Max-features(Sqrt), Min-samples-leaf(3)
Random Forest Number-of-estimators(1200), Criterion(Gini), Max-depth(460), Max-features(Sqrt), Min-samples-leaf(2)
Gradient Boosting Number-of-estimators(1000), Learning-rate(0.05), Sub-sample(0.75)

TABLE 3: Hyper-parameters of each machine learning algorithms for binary classification

Machine learning Prediction time (Trading days)

30 60 90 120 150 180
(a) Accuracy
k-Nearest Neighbors 0.5504 0.5581 0.5706 0.5959 0.6140 0.6252
Logistics Regression 0.5665 0.5800 0.6080 0.6385 0.6535 0.6506
Decision Tree 0.5279 0.5457 0.5416 0.5507 0.5593 0.5684
Random forest 0.5750 0.5943 0.6196 0.6460 0.6555 0.6609
Gradient Boosting 0.5706 0.5880 0.6147 0.6410 0.6490 0.6532
(b) AUC
k-Nearest Neighbors 0.5492 0.5553 0.5633 0.5735 0.5860 0.6030
Logistics Regression 0.5619 0.5692 0.5840 0.5941 0.6021 0.6115
Decision Tree 0.5338 0.5337 0.5364 0.5421 0.5462 0.5622
Random Forest 0.5760 0.5913 0.5972 0.6022 0.6097 0.6256
Gradient Boosting 0.5656 0.5752 0.5900 0.5968 0.5974 0.6136

TABLE 4: Average classification performances of each machine learning algorithms for different prediction times

Performance measures Prediction time (Trading days)

30 60 90 120 150 180
F1-Score 0.5065 0.5111 0.4818 0.4507 0.4667 0.5038
Precision 0.5922 0.6055 0.6242 0.6403 0.6233 0.6501
Recall 0.4424 0.4421 0.3923 0.3477 0.3730 0.4112

TABLE 5: Average precision and recall of random forest for different prediction times

6 VOLUME X, XXXX

P. Cho et al.: Equity research report-driven investment strategy in Korea using binary classification on stock price direction

Performance measures Prediction time (Trading days)

30 60 90 120 150 180
(a) Sideways period
Accuracy 0.5330 0.5603 0.6026 0.6493 0.6696 0.6680
AUC 0.5178 0.5271 0.5225 0.4146 0.5181 0.5212
F1-score 0.3771 0.3500 0.2597 0.1693 0.1805 0.1912
Precision 0.4955 0.4898 0.4510 0.4095 0.3958 0.4107
Recall 0.3044 0.2723 0.1824 0.1067 0.1169 0.1246
(b) Collapsing period
Accuracy 0.7099 0.6984 0.6044 0.5665 0.5608 0.5611
AUC 0.5393 0.5467 0.5594 0.5439 0.5537 0.5708
F1-score 0.2624 0.2873 0.3599 0.3636 0.4489 0.5301
Precision 0.2795 0.3533 0.5740 0.5567 0.5652 0.6334
Recall 0.2473 0.2420 0.2622 0.2700 0.3723 0.4557
(c) Soaring period
Accuracy 0.6452 0.6748 0.6909 0.6645 0.6414 0.6992
AUC 0.4904 0.5104 0.5045 0.5014 0.4885 0.5064
F1-score 0.7723 0.7945 0.8089 0.7869 0.7690 0.8155
Precision 0.7358 0.7611 0.7718 0.7603 0.7356 0.7751
Recall 0.8125 0.8309 0.8497 0.8156 0.8057 0.8602

TABLE 6: Average classification performance of random forest for different market periods

(a) Total period (20190101 20200630) (b) Sideways period (20190101 20200120)

(c) Collapsing period (20200121 20200329) (d) Soaring period (20200330 20200630)
FIGURE 4: Feature importance for different market periods

VOLUME X, XXXX 7

P. Cho et al.: Equity research report-driven investment strategy in Korea using binary classification on stock price direction

355 conditions except for the determiner ratio and ending ratio. 409 based on NLP-elements of the equity research report. To the
356 Also, subwords per page and number ratio show high feature 410 best of our knowledge, this is the first attempt to utilize the
357 importance regardless of market conditions. Note that the 411 NLP-elements of the equity research report in Korea to estab-
358 higher the subwords per page, the higher the investment 412 lish investment strategies. Therefore, this research’s novelty
359 return, while the lower the number ratio, the higher the 413 lies in providing the possible integration of NLP-elements of
360 investment return. 414 the equity research report in stock investment. Through the
415 experiments, the random forest shows the best classification
361 C. INVESTMENT PERFORMANCE 416 performance whose AUC of the random forest during the
362 Finally, we perform the backtesting of two investment strate- 417 sideways period and the collapsing period is higher than
363 gies based on the positively predicted stocks in the test set 418 0.5. Therefore, we select the random forest as the binary
364 from the 50 random seeds as the investment universe. The 419 classification algorithm. Then, we perform the backtesting
365 first strategy is the monthly-rebalancing. At first, we take 420 based on classification results for monthly re-balancing and
366 a long position on the positively predicted stocks on the 421 buy-and-hold for different investment horizons. As a result,
367 test set with equal weight. Then, after a month, we sell all 422 we confirm that the proposed investment strategy generates
368 the stocks purchased and repeat the process of taking long 423 higher returns than the benchmark during the sideways period
369 position. Figure 5 shows the monthly average cumulative 424 and collapsing period. In an extreme bull market, selecting
370 rate of return. The proposed strategy is a blue line, and 425 stocks with high expected return does not make much of
371 the monthly cumulative returns of KOSPI200 and all stocks 426 a difference since any stock an investor chooses will yield
372 recommended from the equity research reports in the test set 427 a high return. However, an investment strategy that helps
373 are provided as benchmarks with sky blue and gray lines, 428 select stocks with a high return in the future during sideways
374 respectively. Note that the vertical lines indicate the three 429 or bearish markets has a significant implication in real-
375 standard deviations of cumulative returns for each month. 430 world investment practice. Therefore, for further research,
376 The result shows that the proposed strategy outperforms the 431 we plan to utilize various portfolio theories in constructing
377 returns of other benchmarks. Besides, the strategy of buying 432 efficient investment strategies rather than simple buy-and-
378 all the stocks recommended by the report slightly exceeds the 433 hold by using the positively predicted stocks from the binary
379 KOSPI index, which ensures some degree of the reliability of 434 classification.
380 the equity research report on recommending stocks.
381 In order to compensate for the limitation of the cumulative REFERENCES
382 return, the average return on investment in different market [1] P. C. Tetlock, M. Saar-Tsechansky, and S. Macskassy, “More than words:
383 conditions based on a buy-and-hold strategy is summarized Quantifying language to measure firms’ fundamentals,” The Journal of
Finance, vol. 63, no. 3, pp. 1437–1467, 2008.
384 in Table 7 for different investment horizons from 30 days to [2] R. P. Schumaker and H. Chen, “A discrete stock price prediction engine
385 180 days. The proposed investment strategy yields signifi- based on financial news,” Computer, vol. 43, no. 1, pp. 51–56, 2010.
386 cantly higher returns for the total period than the benchmarks [3] R. P. Schumaker, “Analyzing parts of speech and their impact on stock
price,” Communications of the IIMA, vol. 10, no. 3, p. 1, 2010.
387 invested in all stocks recommended by the report for all [4] F. Z. Xing, E. Cambria, and R. E. Welsch, “Natural language based
388 investment horizons. Also, the difference in returns between financial forecasting: a survey,” Artificial Intelligence Review, vol. 50,
389 the two investment strategies increases as the investment no. 1, pp. 49–73, 2018.
[5] J. Si, A. Mukherjee, B. Liu, Q. Li, H. Li, and X. Deng, “Exploiting topic
390 period increases. During the sideways period, the proposed based twitter sentiment for stock prediction,” in Proceedings of the 51st
391 investment strategy shows slightly better returns than the Annual Meeting of the Association for Computational Linguistics (Volume
392 benchmark. However, the equity research report published 2: Short Papers), pp. 24–29, 2013.
[6] J. Smailović, M. Grčar, N. Lavrač, and M. Žnidaršič, “Stream-based
393 in the sideways period includes a collapsing period on the active learning for sentiment analysis in the financial domain,” Information
394 long-term investment horizon. Despite the sharp decline in sciences, vol. 285, pp. 181–203, 2014.
395 the market, the proposed strategy does not record negative [7] G. Ranco, D. Aleksovski, G. Caldarelli, M. Grčar, and I. Mozetič, “The
effects of twitter sentiment on stock price returns,” PloS one, vol. 10, no. 9,
396 returns except for the investment horizon of 120 trading days, p. e0138441, 2015.
397 which is very encouraging. During the collapsing period, it [8] A. Papana, C. Kyrtsou, D. Kugiumtzis, and C. Diks, “Detecting causality
398 yields significantly higher returns than the benchmark for in non-stationary time series using partial symbolic transfer entropy:
evidence in financial data,” Computational economics, vol. 47, no. 3,
399 all investment horizons. In particular, since the long-term pp. 341–365, 2016.
400 investment horizon includes a soaring period, the proposed [9] A. Tafti, R. Zotti, and W. Jank, “Real-time diffusion of information on
401 investment strategy can be considered to possess an ability twitter and the financial markets,” PloS one, vol. 11, no. 8, p. e0159226,
2016.
402 to detect stocks whose prices will rise rapidly during the [10] R. Akita, A. Yoshihara, T. Matsubara, and K. Uehara, “Deep learning
403 recovery of a financial market after the market crash. Finally, for stock prediction using numerical and textual information,” in 2016
404 during the soaring period, the presented model shows a IEEE/ACIS 15th International Conference on Computer and Information
Science (ICIS), pp. 1–6, IEEE, 2016.
405 similar investment return as the benchmark. [11] S. R. Das and M. Y. Chen, “Yahoo! for amazon: Sentiment extraction from
small talk on the web,” Management science, vol. 53, no. 9, pp. 1375–
406 IV. CONCLUSIONS 1388, 2007.
[12] T. H. Nguyen and K. Shirai, “Topic modeling based sentiment analysis
407 Throughout this research, we explore the possibility of devel- on social media for stock market prediction,” in Proceedings of the 53rd
408 oping an investment framework using a binary classification Annual Meeting of the Association for Computational Linguistics and

8 VOLUME X, XXXX

P. Cho et al.: Equity research report-driven investment strategy in Korea using binary classification on stock price direction

FIGURE 5: Cumulative investment return with monthly re-balancing strategy

Investment returns Investment horizon (Trading days)

30 60 90 120 150 180
(a) Total period
Return of recommended from report (%) 0.60 0.98 2.00 1.58 1.26 2.57
Return of predicted positive (%) 3.82 6.93 11.87 17.53 18.74 21.21
(b) Sideways period
Return of recommended from report (%) -0.13 -1.05 -2.61 -5.34 -6.56 -6.35
Return of predicted positive (%) 0.80 0.69 0.11 -2.60 0.05 0.51
(c) Collapsing period
Return of recommended from report (%) -9.95 -9.93 0.62 7.70 12.42 19.55
Return of predicted positive (%) -4.78 -2.53 12.05 21.53 23.76 30.74
(d) Soaring period
Return of recommended from report (%) 9.44 15.29 21.33 26.05 26.51 28.99
Return of predicted positive (%) 8.71 14.91 20.99 26.51 26.72 29.29

TABLE 7: Investment returns of predicted & all stocks from the equity research reports for different investment horizons

the 7th International Joint Conference on Natural Language Processing similarity,” Expert Systems with Applications, vol. 118, pp. 411–424,
(Volume 1: Long Papers), pp. 1354–1364, 2015. 2019.
[13] W. Antweiler and M. Z. Frank, “Is all that talk just noise? the information [21] B. Weng, M. A. Ahmed, and F. M. Megahed, “Stock market one-day ahead
content of internet stock message boards,” The Journal of finance, vol. 59, movement prediction using disparate data sources,” Expert Systems with
no. 3, pp. 1259–1294, 2004. Applications, vol. 79, pp. 153–163, 2017.
[14] F. Li, “The information content of forward-looking statements in cor- [22] S. K. Khatri and A. Srivastava, “Using sentimental analysis in prediction
porate filings—a naïve bayesian machine learning approach,” Journal of of stock market investment,” in 2016 5th International Conference on
Accounting Research, vol. 48, no. 5, pp. 1049–1102, 2010. Reliability, Infocom Technologies and Optimization (Trends and Future
[15] N. Jegadeesh and D. Wu, “Word power: A new approach for content Directions)(ICRITO), pp. 566–569, IEEE, 2016.
analysis,” Journal of financial economics, vol. 110, no. 3, pp. 712–729, [23] X. Zhang, S. Qu, J. Huang, B. Fang, and P. Yu, “Stock market predic-
2013. tion via multi-source multiple instance learning,” IEEE Access, vol. 6,
[16] A. H. Huang, A. Y. Zang, and R. Zheng, “Evidence on the information pp. 50720–50728, 2018.
content of text in analyst reports,” The Accounting Review, vol. 89, no. 6, [24] M. Shastri, S. Roy, and M. Mittal, “Stock price prediction using artificial
pp. 2151–2180, 2014. neural model: an application of big data,” EAI Endorsed Transactions on
[17] X. Li, H. Xie, L. Chen, J. Wang, and X. Deng, “News impact on stock Scalable Information Systems, vol. 6, no. 20, 2019.
price return via sentiment analysis,” Knowledge-Based Systems, vol. 69, [25] J. Li, H. Bu, and J. Wu, “Sentiment-aware stock market prediction: A deep
pp. 14–23, 2014. learning method,” in 2017 international conference on service systems and
[18] F. Xu and V. Keelj, “Collective sentiment mining of microblogs in 24- service management, pp. 1–6, IEEE, 2017.
hour stock price movement prediction,” in 2014 IEEE 16th Conference on [26] M. Kraus and S. Feuerriegel, “Decision support from financial disclosures
Business Informatics, vol. 2, pp. 60–67, IEEE, 2014. with deep neural networks and transfer learning,” Decision Support Sys-
[19] Y. Xie and H. Jiang, “Stock market forecasting based on text min- tems, vol. 104, pp. 38–48, 2017.
ing technology: A support vector machine method,” arXiv preprint [27] M.-Y. Chen, C.-H. Liao, and R.-P. Hsieh, “Modeling public mood and
arXiv:1909.12789, 2019. emotion: Stock market trend prediction with anticipatory computing ap-
[20] W. Long, L. Song, and Y. Tian, “A new graphic kernel method of stock proach,” Computers in Human Behavior, vol. 101, pp. 402–408, 2019.
price trend prediction based on financial news semantic and structural [28] A. Onan, “An ensemble scheme based on language function analysis and

VOLUME X, XXXX 9

P. Cho et al.: Equity research report-driven investment strategy in Korea using binary classification on stock price direction

feature engineering for text genre classification,” Journal of Information

Science, vol. 44, no. 1, pp. 28–47, 2018.
[29] O. Aytuğ, “Sentiment analysis on twitter based on ensemble of psycholog-
ical and linguistic feature sets,” Balkan Journal of Electrical and Computer
Engineering, vol. 6, no. 2, pp. 69–77, 2018.
[30] A. Onan, S. Korukoğlu, and H. Bulut, “Ensemble of keyword extraction
methods and classifiers in text classification,” Expert Systems with Appli-
cations, vol. 57, pp. 232–247, 2016.
[31] A. Onan, S. Korukoglu, and H. Bulut, “Lda-based topic modelling in
text sentiment classification: An empirical analysis.,” Int. J. Comput.
Linguistics Appl., vol. 7, no. 1, pp. 101–119, 2016.
[32] A. Onan and M. A. Toçoğlu, “A term weighted neural language model and
stacked bidirectional lstm based framework for sarcasm identification,”
IEEE Access, vol. 9, pp. 7701–7722, 2021.
[33] A. Onan, “Sentiment analysis on product reviews based on weighted word
embeddings and deep neural networks,” Concurrency and Computation:
Practice and Experience, p. e5909, 2020.
[34] A. Onan, “Two-stage topic extraction model for bibliometric data anal-
ysis based on word embeddings and clustering,” IEEE Access, vol. 7,
pp. 145614–145633, 2019.
[35] A. Onan, “Topic-enriched word embeddings for sarcasm identification,” in
Computer Science On-line Conference, pp. 293–304, Springer, 2019.
[36] F. Z. Xing, E. Cambria, L. Malandri, and C. Vercellis, “Discovering
bayesian market views for intelligent asset allocation,” in Joint European
Conference on Machine Learning and Knowledge Discovery in Databases,
pp. 120–135, Springer, 2018.
[37] S. Koyano and K. Ikeda, “Online portfolio selection based on the posts of
winners and losers in stock microblogs,” in 2017 IEEE Symposium Series
on Computational Intelligence (SSCI), pp. 1–4, IEEE, 2017.
[38] Q. Song, A. Liu, and S. Y. Yang, “Stock portfolio selection using learning-
to-rank algorithms with news sentiment,” Neurocomputing, vol. 264,
pp. 20–28, 2017.
[39] Y. Ye, H. Pei, B. Wang, P.-Y. Chen, Y. Zhu, J. Xiao, and B. Li,
“Reinforcement-learning based portfolio management with augmented as-
set movement prediction states,” in Proceedings of the AAAI Conference
on Artificial Intelligence, vol. 34, pp. 1112–1119, 2020.
[40] M. Chen, Z. Zhang, J. Shen, Z. Deng, J. He, and S. Huang, “A quantitative
investment model based on random forest and sentiment analysis,” in Jour-
nal of Physics: Conference Series, vol. 1575, p. 012083, IOP Publishing,
2020.
[41] E. L. Park and S. Cho, “Konlpy: Korean natural language processing in
python,” in Proceedings of the 26th Annual Conference on Human &
Cognitive Language Technology, (Chuncheon, Korea), October 2014.
[42] T. Kudo, K. Yamamoto, and Y. Matsumoto, “Applying conditional random
fields to japanese morphological analysis,” in Proceedings of the 2004
conference on empirical methods in natural language processing, pp. 230–
237, 2004.
[43] N. S. Altman, “An introduction to kernel and nearest-neighbor nonpara-
metric regression,” The American Statistician, vol. 46, no. 3, pp. 175–185,
1992.
[44] J. S. Cramer, “The origins of logistic regression,” 2002.
[45] J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, no. 1,
pp. 81–106, 1986.
[46] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32,
2001.
[47] J. H. Friedman, “Greedy function approximation: a gradient boosting
machine,” Annals of statistics, pp. 1189–1232, 2001.

10 VOLUME X, XXXX

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

Prof +P Sheela
No ratings yet
Prof +P Sheela
13 pages
677cc494589faa9aee6cfb8a FORMULA SHEET MAS202
No ratings yet
677cc494589faa9aee6cfb8a FORMULA SHEET MAS202
2 pages
It Skills
No ratings yet
It Skills
20 pages
Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms Via Continuous ...
No ratings yet
Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms Via Continuous ...
62 pages
Stock Market Prediction Using Machine Language
No ratings yet
Stock Market Prediction Using Machine Language
11 pages
Concept Paper Template 1
No ratings yet
Concept Paper Template 1
2 pages
2024 - Data Analytics Book
No ratings yet
2024 - Data Analytics Book
193 pages
Template 1st ICGC
No ratings yet
Template 1st ICGC
3 pages
PSYC 2400 - REB Student Research Forms - 4
No ratings yet
PSYC 2400 - REB Student Research Forms - 4
6 pages
Digital Marketing
No ratings yet
Digital Marketing
25 pages
Lesson 3 EAPP Oct 23
No ratings yet
Lesson 3 EAPP Oct 23
36 pages
Construction Procurement and Contract Management
No ratings yet
Construction Procurement and Contract Management
40 pages
Algorithms: Unleashing The Power of Tweets and News in Stock-Price Prediction Using Machine-Learning Techniques
No ratings yet
Algorithms: Unleashing The Power of Tweets and News in Stock-Price Prediction Using Machine-Learning Techniques
29 pages
Potential of Chatgpt in Predicting Stock Market Trends Based On Twitter Sentiment Analysis
No ratings yet
Potential of Chatgpt in Predicting Stock Market Trends Based On Twitter Sentiment Analysis
11 pages
Multimodal Stock Price Prediction
No ratings yet
Multimodal Stock Price Prediction
9 pages
90-99, Tesma0804, IJEAST
No ratings yet
90-99, Tesma0804, IJEAST
11 pages
使用情感分析預測股價
No ratings yet
使用情感分析預測股價
30 pages
Peerj Cs 1293
No ratings yet
Peerj Cs 1293
18 pages
Research Paper - Group2
No ratings yet
Research Paper - Group2
24 pages
Stock Prediction Using Sentimnent
No ratings yet
Stock Prediction Using Sentimnent
15 pages
Media and Investor Sentiment Prediction For Stock Returns - A BP Neural Network Model Based On Whale Optimization Algorithm
No ratings yet
Media and Investor Sentiment Prediction For Stock Returns - A BP Neural Network Model Based On Whale Optimization Algorithm
5 pages
Text Mining
No ratings yet
Text Mining
18 pages
Aurora
No ratings yet
Aurora
6 pages
GPT-Invest: Enhancing Stock Investment Strategies Through Annual Report Analysis With Large Language Models
No ratings yet
GPT-Invest: Enhancing Stock Investment Strategies Through Annual Report Analysis With Large Language Models
15 pages
2
No ratings yet
2
1 page
The Infuence of Running On Lower Limb Cartilage - A Systematic Review and Meta Analysis
No ratings yet
The Infuence of Running On Lower Limb Cartilage - A Systematic Review and Meta Analysis
20 pages
Entropy 21 00589 With Cover
No ratings yet
Entropy 21 00589 With Cover
13 pages
Labour Migration Tajikistan
No ratings yet
Labour Migration Tajikistan
132 pages
Journal of Financial Data Science (Marco Lopez de Prado) (Z-Library)
100% (1)
Journal of Financial Data Science (Marco Lopez de Prado) (Z-Library)
175 pages
A Hybrid Deep Learning Model For Predicting Stock Market Trend Prediction
No ratings yet
A Hybrid Deep Learning Model For Predicting Stock Market Trend Prediction
20 pages
2405 10584v1
No ratings yet
2405 10584v1
34 pages
SSRN 4885011
No ratings yet
SSRN 4885011
54 pages
Bajunaid 2017 Ijca 914112
100% (1)
Bajunaid 2017 Ijca 914112
4 pages
1 s2.0 S1877050920322511 Main
No ratings yet
1 s2.0 S1877050920322511 Main
10 pages
Human Computer Interactionwith Multivariate Sentiment Distributionsof Stocks Intraday Camera Ready
No ratings yet
Human Computer Interactionwith Multivariate Sentiment Distributionsof Stocks Intraday Camera Ready
8 pages
Text-Based Stock Market Analysis: A Review: Kamaladdin Fataliyev, Aneesh Chivukula, Mukesh Prasad, and Wei Liu
No ratings yet
Text-Based Stock Market Analysis: A Review: Kamaladdin Fataliyev, Aneesh Chivukula, Mukesh Prasad, and Wei Liu
30 pages
NLPin Stock Marketpredictionby Rodrigue Andrawos
No ratings yet
NLPin Stock Marketpredictionby Rodrigue Andrawos
13 pages
Stock Prediction
No ratings yet
Stock Prediction
23 pages
RPH Sains DLP Y3 2018
0% (1)
RPH Sains DLP Y3 2018
29 pages
Econometrics Assignment 1
No ratings yet
Econometrics Assignment 1
6 pages
When and Why Your Code Starts To Smell Bad (And Whether The Smells Go Away)
No ratings yet
When and Why Your Code Starts To Smell Bad (And Whether The Smells Go Away)
26 pages
5f1372f39e7f69 58485640
No ratings yet
5f1372f39e7f69 58485640
10 pages
The Statement of Purpose/Intent/Reasons For Graduate Study: What Information Should I Include?
No ratings yet
The Statement of Purpose/Intent/Reasons For Graduate Study: What Information Should I Include?
3 pages
Stock Market Prediction Using Machine Learning Classifiers and Social Media, News
No ratings yet
Stock Market Prediction Using Machine Learning Classifiers and Social Media, News
24 pages
Maulia 2020 J. Phys. Conf. Ser. 1477 042016
No ratings yet
Maulia 2020 J. Phys. Conf. Ser. 1477 042016
7 pages
Introduction To IOAA
No ratings yet
Introduction To IOAA
14 pages
Stonkbert PDF
No ratings yet
Stonkbert PDF
16 pages
174-Article Text-586-1-10-20220718
No ratings yet
174-Article Text-586-1-10-20220718
10 pages
Shubham Tripathi CV PDF
No ratings yet
Shubham Tripathi CV PDF
3 pages
Mother Tongue Action Research
100% (1)
Mother Tongue Action Research
86 pages
Sentiment Analysis of Stock Prices and News Headlines Using The MCDM Framework
No ratings yet
Sentiment Analysis of Stock Prices and News Headlines Using The MCDM Framework
4 pages
Sentiment Score Prediction Nifty
No ratings yet
Sentiment Score Prediction Nifty
12 pages
S M P N L P - A S: Tock Arket Rediction Using Atural Anguage Rocessing Urvey
No ratings yet
S M P N L P - A S: Tock Arket Rediction Using Atural Anguage Rocessing Urvey
16 pages
Robust and Efficient Multi-Object Detection and Tracking For Vehicle Perception Systems Using Radar and Camera
No ratings yet
Robust and Efficient Multi-Object Detection and Tracking For Vehicle Perception Systems Using Radar and Camera
6 pages
Stock Price Prediction Using Sentiment Analysis and Deep Learning For Indian Markets
No ratings yet
Stock Price Prediction Using Sentiment Analysis and Deep Learning For Indian Markets
15 pages
Stock Prediction Using Machine
No ratings yet
Stock Prediction Using Machine
13 pages
Applied
No ratings yet
Applied
4 pages
Computers and Geotechnics: Ning Luo, Richard J. Bathurst, Sina Javankhoshdel
No ratings yet
Computers and Geotechnics: Ning Luo, Richard J. Bathurst, Sina Javankhoshdel
11 pages
Report On Residents Executive Summary
No ratings yet
Report On Residents Executive Summary
4 pages
Profitable Strategy Design For Trades On Cryptocurrency Markets With Machine Learning Techniques
No ratings yet
Profitable Strategy Design For Trades On Cryptocurrency Markets With Machine Learning Techniques
28 pages
Stock Market Prediction Using Machine Learning and News
100% (1)
Stock Market Prediction Using Machine Learning and News
24 pages
Item Response Theory and Classical Test Theory: An Empirical Comparison of Their Item/person Statistics
No ratings yet
Item Response Theory and Classical Test Theory: An Empirical Comparison of Their Item/person Statistics
17 pages
Stock Market Forecasting
100% (1)
Stock Market Forecasting
7 pages
Mathematics 10 02156 v2
No ratings yet
Mathematics 10 02156 v2
20 pages
Paper 109
No ratings yet
Paper 109
11 pages
Using News To Predict Investor Sentiment: Based On SVM Model
No ratings yet
Using News To Predict Investor Sentiment: Based On SVM Model
9 pages
04 Assumptions
No ratings yet
04 Assumptions
53 pages
Forecasting Stock Prices Using Sentiment Information in Annual Reports-NNvsSVR - (2013)
No ratings yet
Forecasting Stock Prices Using Sentiment Information in Annual Reports-NNvsSVR - (2013)
13 pages
Irjet V5i3634
No ratings yet
Irjet V5i3634
4 pages
Mehta 2021
No ratings yet
Mehta 2021
4 pages
Sun2016 PDF
No ratings yet
Sun2016 PDF
29 pages
Stock Market Prediction Using Reinforcement Learning With Sentiment Analysis
No ratings yet
Stock Market Prediction Using Reinforcement Learning With Sentiment Analysis
20 pages
Financial Analysis of AMUL
No ratings yet
Financial Analysis of AMUL
10 pages
(IJCST-V10I5P49) :mrs R Jhansi Rani, C Nithin
No ratings yet
(IJCST-V10I5P49) :mrs R Jhansi Rani, C Nithin
8 pages
Journal of Relationship Marketing
No ratings yet
Journal of Relationship Marketing
17 pages
Humanoid Robot Reinforcement Learning Algorithm For Biped Walking
No ratings yet
Humanoid Robot Reinforcement Learning Algorithm For Biped Walking
7 pages
EPIC Consulting Group
100% (1)
EPIC Consulting Group
13 pages
Objectives: Technical Report Writing
No ratings yet
Objectives: Technical Report Writing
11 pages
Literature Survey: 2.1 Review On Machine Learning Techniques For Stock Price Prediction
No ratings yet
Literature Survey: 2.1 Review On Machine Learning Techniques For Stock Price Prediction
15 pages
Testing Questionaire Email
No ratings yet
Testing Questionaire Email
8 pages
10.1007@s12652 020 01892 5
No ratings yet
10.1007@s12652 020 01892 5
8 pages
Entropy: A Labeling Method For Financial Time Series Prediction Based On Trends
No ratings yet
Entropy: A Labeling Method For Financial Time Series Prediction Based On Trends
27 pages
(IJCST-V10I3P29) :riswana E A, Roushath Beevi K S, Salmath K A, Sandra Santhosh, Jisha Jamal
No ratings yet
(IJCST-V10I3P29) :riswana E A, Roushath Beevi K S, Salmath K A, Sandra Santhosh, Jisha Jamal
4 pages
Fung Et Al (2005) - The Predicting Power of Textual Information On Financial Markets
No ratings yet
Fung Et Al (2005) - The Predicting Power of Textual Information On Financial Markets
10 pages
SVM Finance
No ratings yet
SVM Finance
25 pages
Analysis of Text Data For Stock Prediction
100% (1)
Analysis of Text Data For Stock Prediction
7 pages
Stock Trend Prediction Using News Sentim PDF
No ratings yet
Stock Trend Prediction Using News Sentim PDF
8 pages
Kenerl Based SVM Classification For Financial News
No ratings yet
Kenerl Based SVM Classification For Financial News
6 pages
Text Search
No ratings yet
Text Search
11 pages

Equity Research Report-Driven Investment Strategy

Uploaded by

Equity Research Report-Driven Investment Strategy

Uploaded by

This article has been accepted for publication in a future issue of this journal, but has not been

Equity research report-driven investment

1 I. INTRODUCTION 21 mining in social media, has become essential. As the mar-

TABLE 1: Number of equity research report per month

FIGURE 1: Distribution of recommendation frequencies on each stock

Selected POS Number of Integration List of Integrated POS

TABLE 2: Selected 10 POS from 45 POS

(a) Original features (b) Log-transformed features

FIGURE 2: Distributions of selected POS and additional features based on NLP

Machine learning Hyper-parameters

TABLE 3: Hyper-parameters of each machine learning algorithms for binary classification

Machine learning Prediction time (Trading days)

Performance measures Prediction time (Trading days)

Performance measures Prediction time (Trading days)

FIGURE 5: Cumulative investment return with monthly re-balancing strategy

Investment returns Investment horizon (Trading days)

feature engineering for text genre classification,” Journal of Information

You might also like