Finer: Financial Named Entity Recognition Dataset and Weak-Supervision Model
Finer: Financial Named Entity Recognition Dataset and Weak-Supervision Model
Weak-Supervision Model
Agam Shah* Ruchit Vithani*
Georgia Institute of Technology Georgia Institute of Technology
USA USA
[email protected] [email protected]
USA USA
[email protected] [email protected]
ABSTRACT extract relevant information. However, processes for information
The development of annotated datasets over the 21st century has retrieval must be efficient to make data-driven decisions in a timely
helped us truly realize the power of deep learning. Most of the manner. With the given abundance of text data, manually extracting
datasets created for the named-entity-recognition (NER) task are relevant information is impossible and unsustainable for large-scale
not domain specific. Finance domain presents specific challenges text datasets used for rapid downstream tasks. However, natural
to the NER task and a domain specific dataset would help push the language processing (NLP) techniques can help with automating
boundaries of finance research. In our work, we develop the first high- the information retrieval process. Named entity recognition (NER)
quality NER dataset for the finance domain. To set the benchmark is one such NLP technique that serves as an important first step to
for the dataset, we develop and test a weak-supervision-based frame- identify named entities, such as persons, organizations, and locations,
work for the NER task. We extend the current weak-supervision and efficiently use available text data to ultimately drive downstream
framework to make it employable for span-level classification. Our tasks and decisions.
weak-ner framework 1 , and the dataset 2 are publicly available on While numerous studies have attempted to construct annotated
GitHub and Hugging Face. datasets [9, 27, 35] and develop NER models[13, 33, 34, 36] for
generic texts, the financial domain presents unique challenges that
KEYWORDS require domain-specific approaches and expertise. Models trained
specifically for the financial domain [3, 31] show superior perfor-
Dataset, Named Entity Recognition, Weak-Supervision
mance when applied on finance-domain specific tasks compared
ACM Reference Format: to general domain counterparts. NER datasets for financial texts
Agam Shah, Ruchit Vithani, Abhinav Gullapalli, and Sudheer Chava. 2023. differ from general NER datasets in some characteristics which we
FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision
discuss in section 3.2. The current annotated dataset based on Credit
Model. In ACM SIGIR ’23: The 46th International ACM SIGIR Confer-
ence on Research and Development in Information Retrieval, July 23–27,
Risk Agreements [2] for financial NER has significant shortcomings,
2023, Taipei, Taiwan. ACM, New York, NY, USA, 10 pages. https://doi.org/ limiting the use of deep learning models for financial NER. We
XXXXXXX.XXXXXXX elaborate on these shortcomings in section 3.2.
Weak-supervision provides an automated, efficient, and cheaper
1 INTRODUCTION alternative to the tedious manual work performed by humans con-
structing annotated datasets [25]. With weak-supervision, pattern-
The growth of technology and the web over the last several decades
based rules, known as heuristics, which are encoded in labeling
has led to a rapid increase in the generation of text data, especially
functions (LFs) can be leveraged to quickly annotate large-scale
in the financial domain. The abundance of financial texts, primarily
unlabeled text datasets. The LFs can also encode the necessary do-
in the form of news and regulatory writings, presents a valuable
main expertise when applying weak-supervision for domain-specific
resource for analysts, researchers, and even individual investors to
texts and NER tasks. Notably, there has been limited use of weak-
* These authors contributed equally to this work. supervision-based frameworks [14–17] for NER due to a lack of
1 https://github.com/gtfintechlab/FiNER
2 https://huggingface.co/datasets/gtfintechlab/finer-ord available software packages for weak-supervision on the span-level.
To address these challenges, this paper presents a new weak-
Permission to make digital or hard copies of all or part of this work for personal or supervision framework for span-level labeling, extending the existing
classroom use is granted without fee provided that copies are not made or distributed Snorkel framework [25]. The Snorkel framework allows users to
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM write LFs for NER beyond the entities identified in this work and
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, can be generalized for other span-level labeling tasks. We apply
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected].
Snorkel to the NER task and build FiNER, a weak-supervision NER
Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan pipeline for financial texts. To evaluate the performance of FiNER,
© 2023 Association for Computing Machinery. we develop the first high-quality financial NER open research dataset
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/XXXXXXX.XXXXXXX
(FiNER-ORD) and benchmark the performance of various models
Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Shah, et al.
on this dataset. Our FiNER pipeline codebase and FiNER-ORD distant supervision and self-training. The survey paper [39] provides
dataset are readily accessible under a CC-BY-NC 4.0 license on a comprehensive overview of weak-supervision methods in machine
GitHub and Hugging Face, respectively, enabling easy usage for the learning, including their strengths and limitations. Finally, PRBoost
users. [40] improved an interactive weakly-supervised learning framework
Through this work, we make a valuable contribution to the field by an iterative prompt-based rule discovery and model boosting.
of information retrieval by Together, these frameworks provide a range of tools and techniques
• Extending Snorkel’s existing weak-supervision framework for using weak-supervision in information retrieval, enabling the
for span-level labeling. creation of accurate models with limited labeled data. Despite the
• Building the largest NER dataset (FiNER-ORD) for the finan- high potential, the use of weak-supervision has been limited [30] in
cial domain. finance.
• Developing a pipeline (FiNER) for weak-supervision based Information Retrieval in Finance. Information retrieval has
NER which is extendable for entities beyond what is available many useful applications in finance. Sentiment analysis of news can
in FiNER-ORD. help rank the influence of financial events based on market-moving
• Benchmarking the models on the FiNER-ORD dataset. factors which in turn helps with downstream tasks such as predict-
ing the price movement of assets [11]. Analyzing tweets and news
2 RELATED WORK headlines can help automate stock trading, equity price movement,
Named Entity Recognition. There is a significant body of early and volatility forecasts for financial risk prediction [28]. Numerical
literature on named entity recognition (NER) that uses rule-based claim extraction from analysts’ reports can help improve the volatil-
models. These models [10, 12, 21, 41] are based on a set of prede- ity forecasting on report release date and earning announcement date
termined rules that are used to identify entities based on specific [30]. Tools such as an interactive web interface to extract operating
patterns and syntax. Another popular approach for NER is condi- segments from earnings reports in real-time can help understand com-
tional random field (CRF) models. These models [6, 20, 29] use pany performance and sector-level trends [19]. Transformer-based
probabilistic graphical models to predict the named entities. The models can be finetuned to extract price-change-related information
latest advancement in NER models is the use of transformer-based from earnings call transcripts to measure the firm-level inflation ex-
models. These models [1, 18, 33] use self-attention mechanisms to posure [5]. Creating financial knowledge graphs that embed entities,
capture long-range dependencies and have shown significant im- relationships, and events extracted from text such as financial news
provements in NER performance. articles can drive downstream quantitative algorithmic trading [7].
In summary, NER models can be broadly classified into rule-
based, CRF-based, and transformer-based models. While rule-based 3 DATA
models rely on pre-defined rules, CRF-based models use probabilis-
tic models to predict named entities, and transformer-based models
3.1 FiNER-ORD
use self-attention mechanisms to capture long-range dependencies. The FiNER-Open Research Dataset (FiNER-ORD) consists of a
Each of these categories of models has its own strengths and limita- manually annotated dataset of financial news articles (in English)
tions, and the choice of the model depends on the specific NER task collected from webz.io3 . In total, there are 47,851 news articles
at hand. available in this data at the point of writing this paper. Each news
The CoNLL 2003 [27] dataset established a standard for generic article is available in the form of a JSON document with various
NER. However, the absence of a high-quality financial NER dataset metadata information like the source of the article, publication date,
has limited progress in this area. The Financial NER dataset from author of the article, and the title of the article. For the manual
Alvardo et al [2] was the first attempt to create the NER dataset for annotation of named entities in financial news, we randomly sampled
the finance domain, but in our assessment, it is skewed, limiting its 220 documents from the entire set of news articles. We observed
usefulness. We discuss it in more detail in section 3.2. There have that some articles were empty in our sample, so after filtering the
been attempts to develop CRF-based [32] and transformer-based empty documents, we were left with a total of 201 articles. We use
[31] models for financial NER task. Doccano4 , an open-source annotation tool, to ingest the raw dataset
and manually label person (PER), location (LOC), and organization
Weak-Supervision. In order to satisfy the need for large labeled (ORG) entities. Figure 1 shows an example of manually annotated
dataset to train deep learning models, weak-supervision provides named entities in FiNER-ORD. For our experiments, we use the
an alternative approach to label the vast amount of data [26] with manually labeled FiNER-ORD to benchmark model performance.
some noise. Crowd-sourced labels [38] and distant supervision [22] The output from Doccano contains span-level label information. This
are often associated with weak-supervision-based models, but they information is in the form of a list of lists containing information on
provide limited coverage and accuracy [26]. the start character, end character, and label of each entity annotated
Several recent papers including Snorkel [25], Skweak [17], and by the manual annotator.
PRBoost [40] have proposed frameworks for weak-supervision in We would also like to highlight that there is a need to run post-
machine learning. Snorkel [25] is a system for generating training processing on manual annotations because of the chances of errors in
data using weak-supervision signals, such as heuristics and rules. the human annotation. For example, there might be errors like space
Yu et al [37] take advantage of weak-supervision to fine-tune the
language model for various tasks. Skweak [17] is a toolkit for cre- 3 https://webz.io/free-datasets/financial-news-articles/
Unlabeled
Raw Text
Dataset Labeling Weak-
Apply Labeling Function Aggregate Supervision Final
Functions for NER Prediction Labels Entity NER Data
Matrix Labels
Entity
Labeling
Functions
noisy and of low quality compared to manually labeled data, the goal is to generate a label matrix that contains the label of each word
weak-supervision approach provides three potential benefits over of each sentence by each labeling function. Theoretically, our label
traditional hand labeling. (1) This approach is cheap as we don’t matrix will have a 3-dimensional shape like (𝑛, 𝑚, 𝑡). However, since
need to hire people to spend time to hand-label entire data (2) This Snorkel works on the sentence-level (it expects a 2-dimensional label
method allows us to generate training data programmatically, which matrix), we cannot give this label matrix as input for the Snorkel
is much quicker than manual labeling. (3) We can encode expert model to combine the labels from all labeling functions. We there-
opinion in this approach in order to generate good quality training fore flatten this label matrix along the word dimension to convert
data. our 3D label matrix to 2D. This has two potential benefits: first, now
Figure 2 shows all the stages implemented in our pipeline. We rely we can use Snorkel to combine our labels. Second, we can now relax
on Snorkel5 , a weak-supervision framework, to generate training the fixed token length sentence assumption, since now we no longer
data programmatically. Snorkel is a Python library that provides have any dimension along the length of the sentences. This way, our
functionalities to combine many sources of signals which are noisy new matrix will have the shape (𝑛 × 𝑡, 𝑚). In this matrix, rows will
when considered individually. These sources are represented in the be tuples representing (sentence, token) and columns will represent
form of labeling functions. When considered independently, each labeling functions. Therefore, for each tuple (sentence, token) all 𝑚
labeling function is only able to label a few data points based on some labeling functions will generate the label for the token present in the
heuristic set of rules, but when we combine many such functions, tuple.
we get a significantly better version of training data with relatively We subclass BaseLFApplier class and implement the logic
lower noise. The labeling function can be written based on simple described above. Snorkel combines labels from each of the 𝑚 label-
heuristic rules. For example, entities like person can be detected ing functions and generates a unique label for each (sentence, token)
based on a few keywords like Mr, Mrs, or Miss. Likewise, many tuple. This way, we achieve the goal to label each token of each
organizations can be detected using keywords like Inc, LLC, Ltd, sentence by combining the labels from all 𝑚 labeling functions.
PLLC, etc. Below, we list some unique features of the FiNER framework:
Although Snorkel provides an easy-to-use API to generate labeled
training data, it operates on sentence-level labeling which is useful Scalability. Our framework is designed to be highly scalable in
for tasks like sentiment analysis or any other sentence classification. two dimensions. Firstly, it is straightforward to add new entities
NER on the other hand is a word/span-level labeling task in which to the pipeline. Secondly, the number of labeling functions can
the goal is to label each word in a sentence. Snorkel library has be increased without any limitations. The framework maintains an
implemented an abstraction called PandasLFApplier that can updated list of entities that are currently supported in the pipeline,
be used to combine labels on a sentence-level. This applier sub- and new entities can be appended to this list with ease.
classes a base applier called BaseLFApplier which implements The framework provides an API that enables the labeling func-
necessary methods for executing a set of labeling functions on a tions to be used to assign labels to the new entities. To add the new
data point collection object like a DataFrame in the pandas Python entity "DATE" it is necessary to append it to the list of entities cur-
library. In order to use Snorkel’s functionalities, we design a new rently supported in the pipeline. Once it has been added, all labeling
applier called PandasLFApplierForNER which subclasses Ba- functions will be able to access it through the API provided by our
seLFApplier and is specifically designed for NER tasks to operate framework. One can write the labeling function for the "DATE"
on the word/span-level. entity in a similar manner as shown in Listing 1, which demonstrates
For each entity, we can write multiple labeling functions. Con- an example of a labeling function to detect the "PER" (person) entity.
sider we have 𝑛 sentences {𝑠 1, 𝑠 2, ..., 𝑠𝑛 } and 𝑚 labeling functions
Open-source codebase. Our complete weak-supervision method-
{𝑓1, 𝑓2, ..., 𝑓𝑚 }. Also for now, assume that each sentence has 𝑡 words,
ology for NER will be publicly released as open-source software.
such that 𝑖 𝑡ℎ sentence will be tokenized as {𝑤𝑖1, 𝑤𝑖2, ..., 𝑤𝑖𝑡 }. This
Interested developers are welcome to contribute to the repository by
fixed token length sentence assumption will be relaxed later. Our
submitting change requests or improvements to the framework. The
5 https://www.snorkel.org/ framework is available for non-commercial use and can be extended
FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision Model Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan
by a global community of developers to incorporate a limitless num- specified by the labeling function. On line 8, the list "spans" is ini-
ber of exciting features. tialized, which collects all the spans of person entities identified by
the labeling function. Lines 9 to 24 implement the primary logic for
Easy to use APIs. The process of writing labeling functions is recognizing person entities using the specified prefixes. This logic in-
straightforward. In this section, we present an illustrative example volves verifying if the title appears in the sentence and subsequently
of a simple labeling function. Writing a labeling function is just searching for words beginning with uppercase letters in the words
like writing any other Python function, except for one restriction following the captured title. This search continues as long as words
on the format of the return value. Specifically, each labeling func- starting with uppercase letters are found. Each captured word’s span
tion must conform to the format prescribed by our pipeline, and location is added to the "spans" list, which is ultimately passed to the
return a tuple containing the original input text and the output of the "generate_labels" abstraction. This abstraction applies the common
"generate_labels" API. It is the responsibility of this API to enforce tokenization scheme to the labeling function output, as outlined in
consistency in tokenization schemes among all labeling functions. Section 4.3, and returns the unique ID associated with the input data
The mechanics underlying this process are further explained in Sec- point to track sentences in the label matrix.
tion 4.3.
function can perform tokenization however they need and then label the Snorkel aggregator.
each token based on its own tokenization.
However, if we provide such flexibility to our labeling functions, Example Sentence:
we can not perform the aggregation well. This is because we need " The F −150 , which h a s h i s t o r i c a l l y b e e n a key d r i v e r o f
all labeling functions to output labels for each token that follow the F o r d ' s p r o f i t s , d i d n ' t c o n t r i b u t e much t o i t s 2Q15
same tokenization scheme. profits . "
Our framework provides one more abstraction to deal with this Common Tokenization to enforce:
issue. This abstraction assumes a common tokenization scheme for [ The , F , − , 1 5 0 , , , which , has , h i s t o r i c a l l y , been , a ,
each labeling function and performs the necessary merging or split- key , d r i v e r , of , Ford , ' s , p r o f i t s , , , d i d , n ' t ,
ting of tokens (which are tokenized by individual labeling functions) c o n t r i b u t e , much , t o , i t s , 2Q15 , p r o f i t s , . ]
Labeling function tokenization after enforcing common tok- PER, LOC, and ORG entities, respectively. Each of these 3 LFs pre-
enization scheme: dicts mutually exclusive labels for the corresponding specific type of
entity, so no aggregation is needed on the predicted labels generated
[ The , F , − , 1 5 0 , , , which , has , h i s t o r i c a l l y , been , a , by these 3 standard LFs. We test 2 weak-supervision models with
key , d r i v e r , of , Ford , ' s , p r o f i t s , , , d i d , n ' t ,
the LF Suite and LF Aggregator components. Having multiple LFs
c o n t r i b u t e , much , t o , i t s , 2Q15 , p r o f i t s , . ]
for each entity type can be advantageous to recognize an entity using
different heuristics which cannot be encoded in a single LF. An
It is evident that the token ’F-150’ is a single token in a particular aggregation method is required when applying FiNER-LFs, which
labeling function’s tokenization scheme, however, the common tok- contains multiple LFs for each entity type as shown in Table 3 be-
enizer has split this token into three tokens. Consequently, to ensure cause different LFs for each entity type could predict conflicting
consistency in the tokenization scheme, we have partitioned the ’F- labels. We experiment with FiNER-LFs + Snorkel WMV which
150’ token into distinct tokens such as "F, -, 150". Furthermore, the aggregates labels using Snorkel’s Weighted Majority Vote (WMV)
assigned labels of the labeling functions have been duly adjusted to and FiNER-LFs + Majority Vote which aggregates labels using a
reflect this alteration. Specifically, if any token is split into multiple simple majority vote. The Snorkel WMV model is trained on the
tokens, the initial token will be designated with a B- label and the generated label matrix in an unsupervised way. We used 1000 epochs
succeeding tokens will receive an I- label. to train the Snorkel WMV model.
Likewise, a similar adjustment can be made in the reverse direc-
tion, in the case where the labeling function partitions a token that is 5.2 Results
not split by the common tokenizer. In this scenario, we combine the Vanilla Flair achieves an average weighted F1-score of 0.7924. As
multiple tokens into a single token within the labeling function and hypothesized, having multiple LFs for each entity type and aggre-
assign the label of the initial token to the combined term. gating the LFs helps recognize entity types with various heuristics
The process of tokenizing the input sentence using a common encoded in multiple LFs, giving the weak-supervision model with
tokenization scheme is demonstrated in the above example. Gen- aggregated LFs comparable or better performance than a weak-
erally, we use the state-of-the-art tokenizer here as we want our supervision model with only 3 mutually exclusive LFs. FiNER-LFs
common tokenization to be as good as possible. Upon establishing + Simple Majority Vote a comparable average weighted F1-score
the common tokenization scheme, the token spans labeled by each of 0.7934 and FiNER-LFs + Snorkel WMV achieves a slightly
labeling function are updated to conform to the same tokenization improved average weighted F1-score of 0.7948.
scheme.
In cases where a word within a sentence is separated by the
5.3 Ablation Study
common tokenizer but not by the labeling function, the token span
from the labeling function is split into two or more spans. Conversely, To understand why performance for the location and organization
if a word is separated by the labeling function tokenizer but not by category is lower and how one can work towards improving it in
the common tokenizer, the token spans from the labeling function future work, we present a confusion matrix for the best performing
are merged into a single token span. This merging and splitting of model (FiNER-LFs + Snorkel WMV) in Table 5. Upon reviewing
token spans as required ensures that all labeling functions adopt the the matrix, we discovered that many location tokens were incor-
same tokenization. rectly labeled as organizations. This may be due to the fact that
Upon completion of the specified adjustments, a label matrix is organization names often include location information in the train-
obtained, which comprises a label assigned to each token by each ing data, as well as the fact that the organization category has more
labeling function. We then applySnorkel which executes the label than double the number of tokens compared to the location category.
aggregation process and generates predicted labels for each token. In order to improve the model’s performance further, researchers
These labels are referred to as "Final NER Data" in our pipeline. can explore adding more labeling functions specifically designed to
resolve location-organization conflicts.
Model (LF Suite + LF Aggregator) PER_B PER_I LOC_B LOC_I ORG_B ORG_I Weighted F-1
Vanilla Flair + NA 0.8943 0.9012 0.8361 0.6387 0.7652 0.7146 0.7924
FiNER-LFs + Majority Vote 0.9180 0.9322 0.8297 0.6455 0.7636 0.6919 0.7934
FiNER-LFs + Snorkel WMV 0.9196 0.9348 0.8297 0.6455 0.7610 0.6998 0.7948
Table 4: Performance comparison of various weak-supervision models tested on the FiNER-ORD dataset. Values are weighted F-1
scores averaged over three different random seed trials.
Predicted
Other PER_B PER_I LOC_B LOC_I ORG_B ORG_I Recall
Other 23792 4 1 19 3 149 150 0.9865
PER_B 7 263 2 1 0 11 2 0.9196
PER_I 5 8 165 0 0 0 5 0.9016
Actual LOC_B 23 8 0 246 1 16 7 0.8173
LOC_I 25 0 1 19 61 2 12 0.5083
ORG_B 49 3 1 7 0 457 27 0.8401
ORG_I 37 0 0 0 4 22 310 0.8311
Precision 0.9939 0.9196 0.9706 0.8425 0.8841 0.6956 0.6043
Table 5: Confusion matrix for the best performing model (FiNER-LFs + Snorkel WMV) on FiNER-ORD test sample for seed=42.
LIMITATIONS AND FUTURE WORK Words like President, Ms, and CEO were not labeled as part of the
Our NER model encounters difficulties in cases where the location PER entity but help indicate a PER entity. In a context indicating
(LOC) is part of the organization (ORG) entity. For instance, in possession with ’s, the name until the ’s was tagged.
the phrase "Google India," "India" is a location, but it is labeled • President Obama
as an organization in our framework, as we do not permit overlap- • CEO Phyllis Wakiaga
ping entity labels. Moreover, we do not incorporate a miscellaneous • Ms Wakiaga
(MISC) label in our label set. Our pipeline has inherited limitations • Bill Clinton’s
from Snorkel, including that the labeling function cannot leverage
knowledge from sentences other than the one it is currently process-
ing. Additionally, our labeling functions do not utilize the output of A.2 Location Entities
other labeling functions when assigning labels. As a result, we urge LOC entities primarily consisted of names of continents, countries,
researchers to expand on our work and address these limitations. states, cities, and addresses. In the examples below, bolded spans
represent spans comprising LOC entities. Commas in addresses
ACKNOWLEDGMENTS are not included tagged LOC entities. In such cases for tagging
We appreciate the generous support of Azure credits from Microsoft addresses, each complete span delimited by a comma was tagged as
made available for this research via the Georgia Institute of Tech- a LOC entity. In a context indicating possession with ’s, the name
nology Cloud Hub. We would like to thank Manan Jagani, Darsh until the ’s was tagged. In the case where a location was identified by
Rank, Visaj Shah, Nitin Jain, Roy Gabriel, Olaolu Dada, and Gabriel multiple tokens delimited by a space, the entire name was labeled as
Shafiq for their contribution to the project in the initial stage. LOC and the post-processing script tagged the first name as LOC_B
and the last name as LOC_I. Words such as "Kenyan" were treated
A ANNOTATION GUIDE as adjectives and thus not labeled as a LOC entity. When discussing
The manual annotation process to create FiNER-ORD consisted a lawmaker’s political and location affiliation, examples such as R-
of ingesting the financial news articles in Doccano. Entities of the Texas denoting "Republican from Texas" are encountered, in which
type person (PER), organization (ORG), and location (LOC) were only the location name such as Texas is tagged.
identified according to the rules described below. Some well-known • Asia
names for these entities were obvious while others were confirmed • US
by researching the names to identify the correct entity type. • India
• United States
A.1 Person Entities • Beijing
PER entities were identified by their first name and/or last name. In • New York
the examples below, bolded spans represent a single person entity. • Redwood City, California
In the case where a person was identified by their first and last • Kenya’s
name, the entire name was labeled as PER and the post-processing • Mombasa Road
script tagged the first name as PER_B and the last name as PER_I. • R-Texas
FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision Model Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan
A.3 Organization Entities with distant supervision. In Proceedings of the 26th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. 1054–1064.
ORG entities consist of examples such as company names, news [17] Pierre Lison, Jeremy Barnes, and Aliaksandr Hubin. 2021. skweak: Weak Super-
agencies, government entities, and abbreviations such as stock ex- vision Made Easy for NLP. arXiv preprint arXiv:2104.09683 (2021).
[18] Tianyu Liu, Yuchen Jiang, Nicholas Monath, Ryan Cotterell, and Mrinmaya
change names and company stock tickers. Punctuation marks such Sachan. 2022. Autoregressive Structured Prediction with Language Models.
as hyphens are included when tagging an ORG entity. As designated, arXiv:2210.14698 [cs.CL] https://arxiv.org/abs/2210.14698
.com is included in the identified company’s name. In a context [19] Zhiqiang Ma, Steven Pomerville, Mingyang Di, and Armineh Nourbakhsh. 2020.
SPot: A Tool for Identifying Operating Segments in Financial Tables. In Pro-
indicating possession with ’s, the name until the ’s was tagged. ceedings of the 43rd International ACM SIGIR Conference on Research and
Development in Information Retrieval. 2157–2160.
• Wal-Mart [20] Andrew McCallum and Wei Li. 2003. Early results for named entity recognition
• China Resources SZITIC Trust Co Ltd with conditional random fields, feature induction and web-enhanced lexicons.
• Wall Street Journal (2003).
[21] Andrei Mikheev, Marc Moens, and Claire Grover. 1999. Named entity recognition
• Atlanta Federal Reserve without gazetteers. In Ninth Conference of the European Chapter of the Association
• Morgan Stanley’s for Computational Linguistics. 1–8.
[22] Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision
• Delta Air Lines for relation extraction without labeled data. In Proceedings of the Joint Conference
• DAL of the 47th Annual Meeting of the ACL and the 4th International Joint Conference
• NYSE on Natural Language Processing of the AFNLP. 1003–1011.
[23] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gre-
• Amazon.com gory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga,
Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison,
Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and
REFERENCES Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep
[1] Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Learning Library. In Advances in Neural Information Processing Systems 32,
Roland Vollgraf. 2019. FLAIR: An easy-to-use framework for state-of-the-art H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett
NLP. In NAACL 2019, 2019 Annual Conference of the North American Chapter (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-
of the Association for Computational Linguistics (Demonstrations). 54–59. pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[2] Julio Cesar Salinas Alvarado, Karin Verspoor, and Timothy Baldwin. 2015. Do- [24] Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D Manning.
main adaption of named entity recognition to support credit risk assessment. In 2020. Stanza: A Python natural language processing toolkit for many human
Proceedings of the Australasian Language Technology Association Workshop languages. arXiv preprint arXiv:2003.07082 (2020).
2015. 84–90. [25] Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and
[3] Dogu Araci. 2019. Finbert: Financial sentiment analysis with pre-trained language Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision.
models. arXiv preprint arXiv:1908.10063 (2019). In Proceedings of the VLDB Endowment. International Conference on Very Large
[4] Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing Data Bases, Vol. 11. NIH Public Access, 269.
with Python — Analyzing Text with the Natural Language Toolkit. O’Reilly [26] Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and
Media. Christopher Ré. 2020. Snorkel: Rapid training data creation with weak supervision.
[5] Sudheer Chava, Wendi Du, Agam Shah, and Linghang Zeng. 2022. Measuring The VLDB Journal 29, 2 (2020), 709–730.
firm-level inflation exposure: A deep learning approach. Available at SSRN [27] Erik F Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared
4228332 (2022). task: Language-independent named entity recognition. arXiv preprint cs/0306050
[6] Aitao Chen, Fuchun Peng, Roy Shan, and Gordon Sun. 2006. Chinese named (2003).
entity recognition with conditional probabilistic models. In Proceedings of the [28] Ramit Sawhney, Shivam Agarwal, Megh Thakkar, Arnav Wadhwa, and Rajiv Ratn
Fifth SIGHAN Workshop on Chinese Language Processing. 173–176. Shah. 2021. Hyperbolic Online Time Stream Modeling. In Proceedings of the
[7] Dawei Cheng, Fangzhou Yang, Xiaoyang Wang, Ying Zhang, and Liqing Zhang. 44th International ACM SIGIR Conference on Research and Development in
2020. Knowledge Graph-Based Event Embedding Framework for Financial Information Retrieval. 1682–1686.
Quantitative Investments. In Proceedings of the 43rd International ACM SIGIR [29] Burr Settles. 2004. Biomedical named entity recognition using conditional random
Conference on Research and Development in Information Retrieval. 2221–2230. fields and rich feature sets. In Proceedings of the international joint workshop on
[8] Walter Daelemans, Jakub Zavrel, Antal van den Bosch, and Ko van der Sloot. natural language processing in biomedicine and its applications (NLPBA/BioNLP).
2002. MBT: Memory-Based Tagger version 1.0 Reference Guide. ILK Technical 107–110.
Report ILK-0209, University of Tilburg, The Netherlands (2002). [30] Pratvi Shah, Arkaprabha Banerjee, Agam Shah, Bhaskar Chaudhury, and Sud-
[9] Leon Derczynski, Eric Nichols, Marieke Van Erp, and Nut Limsopatham. 2017. heer Chava. 2022. Numerical Claim Detection in Finance: A Weak-Supervision
Results of the WNUT2017 shared task on novel and emerging entity recognition. Approach. (2022).
In Proceedings of the 3rd Workshop on Noisy User-generated Text. 140–147. [31] Raj Sanjay Shah, Kunal Chawla, Dheeraj Eidnani, Agam Shah, Wendi Du, Sudheer
[10] Dimitra Farmakiotou, Vangelis Karkaletsis, John Koutsias, George Sigletos, Con- Chava, Natraj Raman, Charese Smiley, Jiaao Chen, and Diyi Yang. 2022. WHEN
stantine D Spyropoulos, and Panagiotis Stamatopoulos. 2000. Rule-based named FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for
entity recognition for Greek financial texts. In Proceedings of the Workshop on Financial Domain. arXiv preprint arXiv:2211.00083 (2022).
Computational lexicography and Multimedia Dictionaries (COMLEX 2000). 75– [32] Shuwei Wang, Ruifeng Xu, Bin Liu, Lin Gui, and Yu Zhou. 2014. Financial
78. named entity recognition based on conditional random fields and information
[11] Fuli Feng, Moxin Li, Cheng Luo, Ritchie Ng, and Tat-Seng Chua. 2021. Hybrid entropy. In 2014 International Conference on Machine Learning and Cybernetics,
Learning to Rank for Financial Event Ranking. In Proceedings of the 44th Inter- Vol. 2. IEEE, 838–843.
national ACM SIGIR Conference on Research and Development in Information [33] Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei
Retrieval. 233–243. Huang, and Kewei Tu. 2021. Automated Concatenation of Embeddings for
[12] Radu Florian, Abe Ittycheriah, Hongyan Jing, and Tong Zhang. 2003. Named Structured Prediction. In the Joint Conference of the 59th Annual Meeting of
entity recognition through classifier combination. In Proceedings of the seventh the Association for Computational Linguistics and the 11th International Joint
conference on Natural language learning at HLT-NAACL 2003. 168–171. Conference on Natural Language Processing (ACL-IJCNLP 2021). Association
[13] Xiaoya Li, Xiaofei Sun, Yuxian Meng, Junjun Liang, Fei Wu, and Jiwei Li. 2019. for Computational Linguistics.
Dice loss for data-imbalanced NLP tasks. arXiv preprint arXiv:1911.02855 [34] Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei
(2019). Huang, and Kewei Tu. 2021. Improving named entity recognition by external
[14] Yinghao Li, Pranav Shetty, Lucas Liu, Chao Zhang, and Le Song. 2021. BERTify- context retrieving and cooperative learning. arXiv preprint arXiv:2105.03654
ing the hidden Markov model for multi-source weakly supervised named entity (2021).
recognition. arXiv preprint arXiv:2105.12848 (2021). [35] Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer
[15] Yinghao Li, Le Song, and Chao Zhang. 2022. Sparse Conditional Hidden Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle
Markov Model for Weakly Supervised Named Entity Recognition. arXiv preprint Franchini, Mohammed El-Bachouti, Robert Belvin, and Ann Houston. 2013.
arXiv:2205.14228 (2022). OntoNotes Release 5.0. https://doi.org/11272.1/AB2/MKJJ2R
[16] Chen Liang, Yue Yu, Haoming Jiang, Siawpeng Er, Ruijia Wang, Tuo Zhao, and
Chao Zhang. 2020. Bond: Bert-assisted open-domain named entity recognition
Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Shah, et al.
[36] Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto. [39] Jieyu Zhang, Cheng-Yu Hsieh, Yue Yu, Chao Zhang, and Alexander Ratner. 2022.
2020. LUKE: deep contextualized entity representations with entity-aware self- A survey on programmatic weak supervision. arXiv preprint arXiv:2202.05433
attention. arXiv preprint arXiv:2010.01057 (2020). (2022).
[37] Yue Yu, Simiao Zuo, Haoming Jiang, Wendi Ren, Tuo Zhao, and Chao Zhang. [40] Rongzhi Zhang, Yue Yu, Pranav Shetty, Le Song, and Chao Zhang. 2022. PRBoost:
2020. Fine-tuning pre-trained language model with weak supervision: A Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised
contrastive-regularized self-training approach. arXiv preprint arXiv:2010.07835 Learning. arXiv preprint arXiv:2203.09735 (2022).
(2020). [41] GuoDong Zhou and Jian Su. 2002. Named entity recognition using an HMM-
[38] Man-Ching Yuen, Irwin King, and Kwong-Sak Leung. 2011. A survey of crowd- based chunk tagger. In Proceedings of the 40th annual meeting of the association
sourcing systems. In 2011 IEEE third international conference on privacy, security, for computational linguistics. 473–480.
risk and trust and 2011 IEEE third international conference on social computing.
IEEE, 766–773. Received Under Review; revised Under Review; accepted Under Review