Combining bag-of-words and sentiment features of annual reports to predict abnormal stock returns
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216275%3A25410%2F18%3A39913373" target="_blank" >RIV/00216275:25410/18:39913373 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/article/10.1007/s00521-017-3194-2" target="_blank" >https://link.springer.com/article/10.1007/s00521-017-3194-2</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s00521-017-3194-2" target="_blank" >10.1007/s00521-017-3194-2</a>
Alternative languages
Result language
angličtina
Original language name
Combining bag-of-words and sentiment features of annual reports to predict abnormal stock returns
Original language description
Automated textual analysis of firm-related documents has become an important decision support tool for stock market investors. Previous studies tended to adopt either dictionary-based or machine learning approach. Nevertheless, little is known about their concurrent use. Here we use the combination of financial indicators, readability, sentiment categories, and bag-of-words (BoW) to increase prediction accuracy. This paper aims to extract both sentiment and BoW information from the annual reports of US firms. The sentiment analysis is based on two commonly used dictionaries, namely a general dictionary Diction 7.0 and a finance-specific dictionary proposed by Loughran and McDonald (J Finance 66:35-65, 2011. doi:10.1111/j.1540-6261.2010.01625.x). The BoW are selected according to their tf-idf. We combine these features with financial indicators to predict abnormal stock returns using a multilayer perceptron neural network with dropout regularization and rectified linear units. We show that this method performs similarly as na Naive Bayes and outperforms other machine learning algorithms (support vector machine, C4.5 decision tree, and k-nearest neighbour classifier) in predicting positive/negative abnormal stock returns in terms of ROC. We also show that the quality of the prediction significantly increased when using the correlation-based feature selection of BoW. This prediction performance is robust to industry categorization and event window.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/GA16-19590S" target="_blank" >GA16-19590S: Topic and sentiment analysis of multiple textual sources for corporate financial decision-making</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Neural Computing and Applications
ISSN
0941-0643
e-ISSN
—
Volume of the periodical
29
Issue of the periodical within the volume
7
Country of publishing house
US - UNITED STATES
Number of pages
16
Pages from-to
343-358
UT code for WoS article
000427799400005
EID of the result in the Scopus database
2-s2.0-85028574890