0% found this document useful (0 votes)
57 views10 pages

Finer: Financial Named Entity Recognition Dataset and Weak-Supervision Model

Uploaded by

Pratay Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views10 pages

Finer: Financial Named Entity Recognition Dataset and Weak-Supervision Model

Uploaded by

Pratay Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

FiNER: Financial Named Entity Recognition Dataset and

Weak-Supervision Model
Agam Shah* Ruchit Vithani*
Georgia Institute of Technology Georgia Institute of Technology
USA USA
[email protected] [email protected]

Abhinav Gullapalli* Sudheer Chava


Georgia Institute of Technology Georgia Institute of Technology
arXiv:2302.11157v1 [cs.CL] 22 Feb 2023

USA USA
[email protected] [email protected]
ABSTRACT extract relevant information. However, processes for information
The development of annotated datasets over the 21st century has retrieval must be efficient to make data-driven decisions in a timely
helped us truly realize the power of deep learning. Most of the manner. With the given abundance of text data, manually extracting
datasets created for the named-entity-recognition (NER) task are relevant information is impossible and unsustainable for large-scale
not domain specific. Finance domain presents specific challenges text datasets used for rapid downstream tasks. However, natural
to the NER task and a domain specific dataset would help push the language processing (NLP) techniques can help with automating
boundaries of finance research. In our work, we develop the first high- the information retrieval process. Named entity recognition (NER)
quality NER dataset for the finance domain. To set the benchmark is one such NLP technique that serves as an important first step to
for the dataset, we develop and test a weak-supervision-based frame- identify named entities, such as persons, organizations, and locations,
work for the NER task. We extend the current weak-supervision and efficiently use available text data to ultimately drive downstream
framework to make it employable for span-level classification. Our tasks and decisions.
weak-ner framework 1 , and the dataset 2 are publicly available on While numerous studies have attempted to construct annotated
GitHub and Hugging Face. datasets [9, 27, 35] and develop NER models[13, 33, 34, 36] for
generic texts, the financial domain presents unique challenges that
KEYWORDS require domain-specific approaches and expertise. Models trained
specifically for the financial domain [3, 31] show superior perfor-
Dataset, Named Entity Recognition, Weak-Supervision
mance when applied on finance-domain specific tasks compared
ACM Reference Format: to general domain counterparts. NER datasets for financial texts
Agam Shah, Ruchit Vithani, Abhinav Gullapalli, and Sudheer Chava. 2023. differ from general NER datasets in some characteristics which we
FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision
discuss in section 3.2. The current annotated dataset based on Credit
Model. In ACM SIGIR ’23: The 46th International ACM SIGIR Confer-
ence on Research and Development in Information Retrieval, July 23–27,
Risk Agreements [2] for financial NER has significant shortcomings,
2023, Taipei, Taiwan. ACM, New York, NY, USA, 10 pages. https://doi.org/ limiting the use of deep learning models for financial NER. We
XXXXXXX.XXXXXXX elaborate on these shortcomings in section 3.2.
Weak-supervision provides an automated, efficient, and cheaper
1 INTRODUCTION alternative to the tedious manual work performed by humans con-
structing annotated datasets [25]. With weak-supervision, pattern-
The growth of technology and the web over the last several decades
based rules, known as heuristics, which are encoded in labeling
has led to a rapid increase in the generation of text data, especially
functions (LFs) can be leveraged to quickly annotate large-scale
in the financial domain. The abundance of financial texts, primarily
unlabeled text datasets. The LFs can also encode the necessary do-
in the form of news and regulatory writings, presents a valuable
main expertise when applying weak-supervision for domain-specific
resource for analysts, researchers, and even individual investors to
texts and NER tasks. Notably, there has been limited use of weak-
* These authors contributed equally to this work. supervision-based frameworks [14–17] for NER due to a lack of
1 https://github.com/gtfintechlab/FiNER
2 https://huggingface.co/datasets/gtfintechlab/finer-ord available software packages for weak-supervision on the span-level.
To address these challenges, this paper presents a new weak-
Permission to make digital or hard copies of all or part of this work for personal or supervision framework for span-level labeling, extending the existing
classroom use is granted without fee provided that copies are not made or distributed Snorkel framework [25]. The Snorkel framework allows users to
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM write LFs for NER beyond the entities identified in this work and
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, can be generalized for other span-level labeling tasks. We apply
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected].
Snorkel to the NER task and build FiNER, a weak-supervision NER
Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan pipeline for financial texts. To evaluate the performance of FiNER,
© 2023 Association for Computing Machinery. we develop the first high-quality financial NER open research dataset
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/XXXXXXX.XXXXXXX
(FiNER-ORD) and benchmark the performance of various models
Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Shah, et al.

on this dataset. Our FiNER pipeline codebase and FiNER-ORD distant supervision and self-training. The survey paper [39] provides
dataset are readily accessible under a CC-BY-NC 4.0 license on a comprehensive overview of weak-supervision methods in machine
GitHub and Hugging Face, respectively, enabling easy usage for the learning, including their strengths and limitations. Finally, PRBoost
users. [40] improved an interactive weakly-supervised learning framework
Through this work, we make a valuable contribution to the field by an iterative prompt-based rule discovery and model boosting.
of information retrieval by Together, these frameworks provide a range of tools and techniques
• Extending Snorkel’s existing weak-supervision framework for using weak-supervision in information retrieval, enabling the
for span-level labeling. creation of accurate models with limited labeled data. Despite the
• Building the largest NER dataset (FiNER-ORD) for the finan- high potential, the use of weak-supervision has been limited [30] in
cial domain. finance.
• Developing a pipeline (FiNER) for weak-supervision based Information Retrieval in Finance. Information retrieval has
NER which is extendable for entities beyond what is available many useful applications in finance. Sentiment analysis of news can
in FiNER-ORD. help rank the influence of financial events based on market-moving
• Benchmarking the models on the FiNER-ORD dataset. factors which in turn helps with downstream tasks such as predict-
ing the price movement of assets [11]. Analyzing tweets and news
2 RELATED WORK headlines can help automate stock trading, equity price movement,
Named Entity Recognition. There is a significant body of early and volatility forecasts for financial risk prediction [28]. Numerical
literature on named entity recognition (NER) that uses rule-based claim extraction from analysts’ reports can help improve the volatil-
models. These models [10, 12, 21, 41] are based on a set of prede- ity forecasting on report release date and earning announcement date
termined rules that are used to identify entities based on specific [30]. Tools such as an interactive web interface to extract operating
patterns and syntax. Another popular approach for NER is condi- segments from earnings reports in real-time can help understand com-
tional random field (CRF) models. These models [6, 20, 29] use pany performance and sector-level trends [19]. Transformer-based
probabilistic graphical models to predict the named entities. The models can be finetuned to extract price-change-related information
latest advancement in NER models is the use of transformer-based from earnings call transcripts to measure the firm-level inflation ex-
models. These models [1, 18, 33] use self-attention mechanisms to posure [5]. Creating financial knowledge graphs that embed entities,
capture long-range dependencies and have shown significant im- relationships, and events extracted from text such as financial news
provements in NER performance. articles can drive downstream quantitative algorithmic trading [7].
In summary, NER models can be broadly classified into rule-
based, CRF-based, and transformer-based models. While rule-based 3 DATA
models rely on pre-defined rules, CRF-based models use probabilis-
tic models to predict named entities, and transformer-based models
3.1 FiNER-ORD
use self-attention mechanisms to capture long-range dependencies. The FiNER-Open Research Dataset (FiNER-ORD) consists of a
Each of these categories of models has its own strengths and limita- manually annotated dataset of financial news articles (in English)
tions, and the choice of the model depends on the specific NER task collected from webz.io3 . In total, there are 47,851 news articles
at hand. available in this data at the point of writing this paper. Each news
The CoNLL 2003 [27] dataset established a standard for generic article is available in the form of a JSON document with various
NER. However, the absence of a high-quality financial NER dataset metadata information like the source of the article, publication date,
has limited progress in this area. The Financial NER dataset from author of the article, and the title of the article. For the manual
Alvardo et al [2] was the first attempt to create the NER dataset for annotation of named entities in financial news, we randomly sampled
the finance domain, but in our assessment, it is skewed, limiting its 220 documents from the entire set of news articles. We observed
usefulness. We discuss it in more detail in section 3.2. There have that some articles were empty in our sample, so after filtering the
been attempts to develop CRF-based [32] and transformer-based empty documents, we were left with a total of 201 articles. We use
[31] models for financial NER task. Doccano4 , an open-source annotation tool, to ingest the raw dataset
and manually label person (PER), location (LOC), and organization
Weak-Supervision. In order to satisfy the need for large labeled (ORG) entities. Figure 1 shows an example of manually annotated
dataset to train deep learning models, weak-supervision provides named entities in FiNER-ORD. For our experiments, we use the
an alternative approach to label the vast amount of data [26] with manually labeled FiNER-ORD to benchmark model performance.
some noise. Crowd-sourced labels [38] and distant supervision [22] The output from Doccano contains span-level label information. This
are often associated with weak-supervision-based models, but they information is in the form of a list of lists containing information on
provide limited coverage and accuracy [26]. the start character, end character, and label of each entity annotated
Several recent papers including Snorkel [25], Skweak [17], and by the manual annotator.
PRBoost [40] have proposed frameworks for weak-supervision in We would also like to highlight that there is a need to run post-
machine learning. Snorkel [25] is a system for generating training processing on manual annotations because of the chances of errors in
data using weak-supervision signals, such as heuristics and rules. the human annotation. For example, there might be errors like space
Yu et al [37] take advantage of weak-supervision to fine-tune the
language model for various tasks. Skweak [17] is a toolkit for cre- 3 https://webz.io/free-datasets/financial-news-articles/

ating weakly supervised training data, using techniques such as 4 https://github.com/doccano/doccano


FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision Model Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan

FiNER-ORD Train Validation Test Dataset FiNER-ORD CoNLL Credit Risk


# Articles 135 24 42 # Articles 201 1,393 8
# Tokens 80,531 10,233 25,957 # Tokens 116,721 301,418 54,262
# LOC entities 1,255 267 428 # PER_B tokens 1,243 10,059 962
# ORG entities 3,440 524 933 # PER_I tokens 819 6,991 57
# PER entities 1,374 222 466 # LOC_B tokens 1,459 10,645 208
Table 1: Statistics for train, validation, and test splits of FiNER- # LOC_I tokens 491 5,289 227
ORD # ORG_B tokens 2,844 9,323 295
# ORG_I tokens 2,053 1,671 203
Table 2: Comparison of our dataset with CoNLL-2003 (English)
and Credit Risk Agreement
characters being included in the entity span, or a few characters
from the boundary of words being missed from the entity span.
Therefore, we run a post-processing on the manual annotation output
that performs the following tasks:
general benchmark dataset for NLP tasks. In general, financial texts
• Removes the trailing spaces from the entities annotated differ from general texts and contain a higher ratio of organization
• Extends the token level borders to non-space characters. For tokens compared to person and location tokens. For example, in
example, fixing errors by changing the span "Sir Alex Fer- the CoNLL dataset, the ratio of person, location, and organization
guso" to "Sir Alex Ferguson". tokens is approximately 1:1:1 while in our FiNER-ORD dataset it
• Cleans suffixes. For example, remove apostrophes from entity is 1:1:2.5. If we look at the ratio of ORG_B and ORG_I tokens
suffixes. for FiNER-ORD (1.4:1) and CoNLL(5.6:1), we can infer that in
• Add positional information in the labeled entities. This part the financial text organization entities are more likely to span over
performs the splitting of multi-word spans into separate words multiple words compared to general text. This underscores the fact
and assigns B and I label suffixes to each token span. that there is a need for datasets specific to the financial domain. The
We make a train, validation, and test split of FiNER-ORD. To Credit Risk Agreement dataset attempts to provide such a domain-
avoid biased results, manual annotation is performed by annotators specific dataset in CoNLL format using manual expert annotation
who have no knowledge about the labeling functions for the weak- on eight English documents from the U.S. Security and Exchange
supervision framework. The train and validation sets are annotated Commission (SEC) filings which were pre-tokenized and part-of-
by two separate annotators and validated by a third annotator. The speech (POS) tagged by NLTK [4]. Unfortunately, the annotation
test dataset is annotated by a fourth annotator. We present a manual methodology for the Credit Risk Agreement dataset "automatically
annotation guide in Appendix A detailing the procedures used to cre- tagged all instances of the tokens lender and borrower as being of
ate the manually annotated FiNER-ORD. The descriptive statistics entity type PER" [2]. This approach is problematic because of the
on the resulting FiNER-ORD are available in Table 1. resulting skewed distribution of entity types in the dataset, leading
to confounded results. Our analysis of the Credit Risk Agreement
dataset showed that in the FIN3 testing data, instances of the to-
kens lender and borrower represented 83.05% of all PER tokens
and 44.95% of all tokens labeled as PER, ORG, MISC, or LOC.
Similarly, in the FIN5 training data, instances of the tokens lender
and borrower represented 90.04% of all PER tokens and 46.08% of
all tokens labeled as PER, ORG, MISC, or LOC.
Thus, we believe this dataset is not of high quality to serve as a
benchmark for specialized NLP tasks in the financial domain, moti-
vating us to create a new high-quality financial domain-specific NER
dataset. Our FiNER-ORD dataset has many higher-quality examples
of each entity type. FiNER-ORD consists of 25x more articles, over
4x more LOC entities, nearly 10x more ORG entities, and over 2x
Figure 1: Representative example of annotation in FiNER-ORD more PER entities than found in the Credit Risk Agreement dataset.
The statistics for dataset comparisons are available in Table 2.

3.2 Comparison with Related Datasets 4 FINER WEAK-SUPERVISION PIPELINE


In this section, we compare our proposed dataset (FiNER-ORD) 4.1 Weak-Supervision Framework
with the CoNLL-2003 generic English NER dataset [27] and Credit Manual labeling of large training data can be time-consuming, expen-
Risk Agreement NER dataset [2]. The CoNLL-2003 English dataset sive, and at times may require a domain expert as well. To overcome
was created with manual expert annotation on Reuter generic news these difficulties, we rely on a weak-supervision approach to gener-
stories which were pre-tokenized and part-of-speech (POS) tagged ate large labeled training data quickly and cheaply. While the data
by the memory-based MBT tagger [8], and is widely regarded as a that is generated by the weak-supervision approach is relatively
Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Shah, et al.

Unlabeled
Raw Text
Dataset Labeling Weak-
Apply Labeling Function Aggregate Supervision Final
Functions for NER Prediction Labels Entity NER Data
Matrix Labels
Entity
Labeling
Functions

Figure 2: FiNER Weak-Supervision Pipeline

noisy and of low quality compared to manually labeled data, the goal is to generate a label matrix that contains the label of each word
weak-supervision approach provides three potential benefits over of each sentence by each labeling function. Theoretically, our label
traditional hand labeling. (1) This approach is cheap as we don’t matrix will have a 3-dimensional shape like (𝑛, 𝑚, 𝑡). However, since
need to hire people to spend time to hand-label entire data (2) This Snorkel works on the sentence-level (it expects a 2-dimensional label
method allows us to generate training data programmatically, which matrix), we cannot give this label matrix as input for the Snorkel
is much quicker than manual labeling. (3) We can encode expert model to combine the labels from all labeling functions. We there-
opinion in this approach in order to generate good quality training fore flatten this label matrix along the word dimension to convert
data. our 3D label matrix to 2D. This has two potential benefits: first, now
Figure 2 shows all the stages implemented in our pipeline. We rely we can use Snorkel to combine our labels. Second, we can now relax
on Snorkel5 , a weak-supervision framework, to generate training the fixed token length sentence assumption, since now we no longer
data programmatically. Snorkel is a Python library that provides have any dimension along the length of the sentences. This way, our
functionalities to combine many sources of signals which are noisy new matrix will have the shape (𝑛 × 𝑡, 𝑚). In this matrix, rows will
when considered individually. These sources are represented in the be tuples representing (sentence, token) and columns will represent
form of labeling functions. When considered independently, each labeling functions. Therefore, for each tuple (sentence, token) all 𝑚
labeling function is only able to label a few data points based on some labeling functions will generate the label for the token present in the
heuristic set of rules, but when we combine many such functions, tuple.
we get a significantly better version of training data with relatively We subclass BaseLFApplier class and implement the logic
lower noise. The labeling function can be written based on simple described above. Snorkel combines labels from each of the 𝑚 label-
heuristic rules. For example, entities like person can be detected ing functions and generates a unique label for each (sentence, token)
based on a few keywords like Mr, Mrs, or Miss. Likewise, many tuple. This way, we achieve the goal to label each token of each
organizations can be detected using keywords like Inc, LLC, Ltd, sentence by combining the labels from all 𝑚 labeling functions.
PLLC, etc. Below, we list some unique features of the FiNER framework:
Although Snorkel provides an easy-to-use API to generate labeled
training data, it operates on sentence-level labeling which is useful Scalability. Our framework is designed to be highly scalable in
for tasks like sentiment analysis or any other sentence classification. two dimensions. Firstly, it is straightforward to add new entities
NER on the other hand is a word/span-level labeling task in which to the pipeline. Secondly, the number of labeling functions can
the goal is to label each word in a sentence. Snorkel library has be increased without any limitations. The framework maintains an
implemented an abstraction called PandasLFApplier that can updated list of entities that are currently supported in the pipeline,
be used to combine labels on a sentence-level. This applier sub- and new entities can be appended to this list with ease.
classes a base applier called BaseLFApplier which implements The framework provides an API that enables the labeling func-
necessary methods for executing a set of labeling functions on a tions to be used to assign labels to the new entities. To add the new
data point collection object like a DataFrame in the pandas Python entity "DATE" it is necessary to append it to the list of entities cur-
library. In order to use Snorkel’s functionalities, we design a new rently supported in the pipeline. Once it has been added, all labeling
applier called PandasLFApplierForNER which subclasses Ba- functions will be able to access it through the API provided by our
seLFApplier and is specifically designed for NER tasks to operate framework. One can write the labeling function for the "DATE"
on the word/span-level. entity in a similar manner as shown in Listing 1, which demonstrates
For each entity, we can write multiple labeling functions. Con- an example of a labeling function to detect the "PER" (person) entity.
sider we have 𝑛 sentences {𝑠 1, 𝑠 2, ..., 𝑠𝑛 } and 𝑚 labeling functions
Open-source codebase. Our complete weak-supervision method-
{𝑓1, 𝑓2, ..., 𝑓𝑚 }. Also for now, assume that each sentence has 𝑡 words,
ology for NER will be publicly released as open-source software.
such that 𝑖 𝑡ℎ sentence will be tokenized as {𝑤𝑖1, 𝑤𝑖2, ..., 𝑤𝑖𝑡 }. This
Interested developers are welcome to contribute to the repository by
fixed token length sentence assumption will be relaxed later. Our
submitting change requests or improvements to the framework. The
5 https://www.snorkel.org/ framework is available for non-commercial use and can be extended
FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision Model Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan

by a global community of developers to incorporate a limitless num- specified by the labeling function. On line 8, the list "spans" is ini-
ber of exciting features. tialized, which collects all the spans of person entities identified by
the labeling function. Lines 9 to 24 implement the primary logic for
Easy to use APIs. The process of writing labeling functions is recognizing person entities using the specified prefixes. This logic in-
straightforward. In this section, we present an illustrative example volves verifying if the title appears in the sentence and subsequently
of a simple labeling function. Writing a labeling function is just searching for words beginning with uppercase letters in the words
like writing any other Python function, except for one restriction following the captured title. This search continues as long as words
on the format of the return value. Specifically, each labeling func- starting with uppercase letters are found. Each captured word’s span
tion must conform to the format prescribed by our pipeline, and location is added to the "spans" list, which is ultimately passed to the
return a tuple containing the original input text and the output of the "generate_labels" abstraction. This abstraction applies the common
"generate_labels" API. It is the responsibility of this API to enforce tokenization scheme to the labeling function output, as outlined in
consistency in tokenization schemes among all labeling functions. Section 4.3, and returns the unique ID associated with the input data
The mechanics underlying this process are further explained in Sec- point to track sentences in the label matrix.
tion 4.3.

Listing 1: Example of a Labeling Function 4.2 Labels and Labeling Functions


1 @labeling_function () We are using the open source library Flair [1] for preprocessing
2 def l a b e l _ p e r _ h e u r i s t i c _ p r e f i x ( x ) : the input text. These libraries provide some good baseline labels
3 t i t l e s = {Dr , Mr , Mrs , Ms , P r o f }
4
such as a person, location, organization, and product. However,
5 # t o k e n i z e t a k e s s t r i n g a s an i n p u t and r e t u r n s L i s t [ the accuracy of extracting entities with these labels is not ideal,
Tuple [ str , Tuple [ int , i n t ] ] ] ( r e p r e s e n t i n g especially for a specialized project targeting financial documents as
t o k e n i z e d s t r i n g , and ( s t a r t _ c h a r , e n d _ c h a r ) o f
e a c h t o k e n ) a s an o u t p u t they are trained on generic NER dataset. Thus, we add more labeling
6 tokens = tokenize ( x . text ) functions to improve the accuracy of extracting a person, location,
7 and organization from financial documents. The details on labeling
8 spans = [ ]
9 i = 0 functions used for the NER task are available in Table 3.
10 For preprocessing the input sentences, we apply a common tok-
11 while i < len ( tokens ) :
12 i f tokens [ i ] [ 0 ] in t i t l e s :
enization function using the Stanza [24] Python library. This allows
13 idx = i + 1 all labeling functions to work on a uniformly tokenized sentence to
14 assign labels rather than inefficiently delegating the task of custom
15 w h i l e t o k e n s [ i d x ] [ 0 ] . i s u p p e r ( ) and i d x < l e n (
tokens ) : tokenization to each labeling function, making data aggregation for
16 i f i d x == i + 1 : final results convenient to use and compare for benchmark accu-
17 spans . append ( t o k e n s [ idx ] [ 1 ] ) racy. Through our set of labeling functions, we try to mitigate two
18 else :
19 spans [ −1] = ( spans [ − 1 ] [ 0 ] , tokens [ idx complementary problems in weak-supervision frameworks:
][1][1])
20 i d x += 1 Label Incompleteness. Most human-written labeling functions
21
22 i = idx provide high precision but leave a considerable amount of data
23 else : without any labels. To address this problem we add Flair, which
24 i += 1
25
increases the total data being annotated significantly and increases
26 r e t u r n x . u u i d , g e n e r a t e _ l a b e l s ( x , s p a n s , "PER" , " overall recall.
ENTITY" )
Noisy Label. To improve the precision of overall weak-supervision
In the example function above, we search for prefixes that poten- given the noisy nature of some of the labeling functions, we modify
tially suggest a person entity after the prefix token. For example, if Flair labeling functions and add our own labeling functions, collec-
we encounter the token "Mr." in any sentence, it is very likely that tively referred to as FiNER-LFs.
the following words represent a person entity. This labeling function
checks for all subsequent tokens which start with an upper case letter
and assigns them the PER label. 4.3 Aggregation of Weak Sources of Signals
The first line of the code snippet comprises a decorator supplied Once we apply all labeling functions on the input dataset, we use the
by the Snorkel library, which is applied to every labeling function Snorkel model to aggregate the labels. Before we can use Snorkel
we write. The decorator serves the purpose of executing any prepro- to aggregate all the labels, there is one more challenge that we
cessing on the input data point before forwarding it to the labeling need to address before we can do any aggregation. As discussed
function. Furthermore, this decorator incorporates metadata utilized before, each labeling function will take the sentence as input and
by the Snorkel aggregator in the downstream pipeline. On line 2, the generate the labels for each token of the sentence. However, how the
labeling function accepts one input parameter, a pandas Series object individual sentences perform tokenization before assigning labels to
that contains the sentence to be annotated. The labeling function each token is not fixed by default. The labeling functions are usually
can utilize all attributes of the input parameter to label the sentence. written by the users who generally do not have much idea on how the
Line 3 defines the potential title prefixes for person entities. Line 6 underlying aggregation works or how they should tokenize the input
tokenizes the input sentence according to the tokenization scheme sentence. Therefore, we provide the flexibility that each labeling
Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Shah, et al.

Entity Type LF Type LF Name Description


Flair label_loc_modified_flair Modified version of Flair’s location LF.
Finds location using keywords ("state of", "district
label_loc_heuristic_keywords
of", "near", "based in").
Finds descriptors ("D-location", "R-location")
LOC
Heuristic label_loc_senator which indicate political and location affiliation
for U.S. lawmakers.
Finds "location-based" phrase as an indicator of a
label_loc_based
location.
Flair label_org_modified_flair Modified version of Flair’s organization LF.
If a role ("CEO", "COO", "CTO", "CXO", "CFO",
"CIO", "CCO", "CHRM", "CSO", "CGO", "CAO",
label_org_heuristic_role
"CMO", "CDO") is detected in "role of", the next
two tokens are tagged as an organization.
Detect an abbreviation usually used at the end
of an organization’s legal name ("Assn", "Co",
label_org_heuristic_abbr "Corp", "Dept", "Inc", "Ltd", "LLC", "St", "US",
"Univ"). If any of these is detected, the previous
ORG
Heuristic two words are tagged as an organization.
Detects the phrase "partnered with" and tags the
label_org_heuristic_partner
next two words as an organization.
Detects the phrase "trademarks of" and tags the
label_org_heuristic_trademark
next two words as an organization.
Labels organizations by identifying common suf-
fixes ("LLC", "Ltd", "Inc", "Co", "Bank", "Corpo-
label_org_heuristic_suffix
ration", "LLC", "Company", "Incorporated", "Lim-
ited", "Association", "Board").
Flair label_per_modified_flair Modified version of Flair’s person LF.
Detects a common suffix for names ("CPA",
"DDS", "Esq", "JD", "Jr", "LLD", "MD", "PhD",
label_per_heuristic_suffix "Ret", "RN", "Sr", "DO"). If any of these is de-
PER tected, the previous two tokens are tagged as a
Heuristic
person.
Find person based on executive titles ("CEO",
label_per_heuristic_exec_title "CFO", "CTO", "CIO", "CFO", "President",
"Chairman").
Table 3: FiNER Labeling Functions grouped by entity types and labeling function types

function can perform tokenization however they need and then label the Snorkel aggregator.
each token based on its own tokenization.
However, if we provide such flexibility to our labeling functions, Example Sentence:
we can not perform the aggregation well. This is because we need " The F −150 , which h a s h i s t o r i c a l l y b e e n a key d r i v e r o f
all labeling functions to output labels for each token that follow the F o r d ' s p r o f i t s , d i d n ' t c o n t r i b u t e much t o i t s 2Q15
same tokenization scheme. profits . "
Our framework provides one more abstraction to deal with this Common Tokenization to enforce:
issue. This abstraction assumes a common tokenization scheme for [ The , F , − , 1 5 0 , , , which , has , h i s t o r i c a l l y , been , a ,
each labeling function and performs the necessary merging or split- key , d r i v e r , of , Ford , ' s , p r o f i t s , , , d i d , n ' t ,
ting of tokens (which are tokenized by individual labeling functions) c o n t r i b u t e , much , t o , i t s , 2Q15 , p r o f i t s , . ]

from the labeling function. Labeling function tokenization:


The following example clearly demonstrates how this abstraction [ The , F −150 , , , which , has , h i s t o r i c a l l y , been , a , key ,
performs the necessary alignments on tokenization schemes from dif- d r i v e r , of , Ford , ' s , p r o f i t s , , , d i d , n ' t ,
ferent labeling functions and then outputs the common tokenization c o n t r i b u t e , much , t o , i t s , 2Q15 , p r o f i t s , . ]
scheme for each function. Once all labeling functions are following
the same tokenization scheme, we can easily aggregate them using
FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision Model Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan

Labeling function tokenization after enforcing common tok- PER, LOC, and ORG entities, respectively. Each of these 3 LFs pre-
enization scheme: dicts mutually exclusive labels for the corresponding specific type of
entity, so no aggregation is needed on the predicted labels generated
[ The , F , − , 1 5 0 , , , which , has , h i s t o r i c a l l y , been , a , by these 3 standard LFs. We test 2 weak-supervision models with
key , d r i v e r , of , Ford , ' s , p r o f i t s , , , d i d , n ' t ,
the LF Suite and LF Aggregator components. Having multiple LFs
c o n t r i b u t e , much , t o , i t s , 2Q15 , p r o f i t s , . ]
for each entity type can be advantageous to recognize an entity using
different heuristics which cannot be encoded in a single LF. An
It is evident that the token ’F-150’ is a single token in a particular aggregation method is required when applying FiNER-LFs, which
labeling function’s tokenization scheme, however, the common tok- contains multiple LFs for each entity type as shown in Table 3 be-
enizer has split this token into three tokens. Consequently, to ensure cause different LFs for each entity type could predict conflicting
consistency in the tokenization scheme, we have partitioned the ’F- labels. We experiment with FiNER-LFs + Snorkel WMV which
150’ token into distinct tokens such as "F, -, 150". Furthermore, the aggregates labels using Snorkel’s Weighted Majority Vote (WMV)
assigned labels of the labeling functions have been duly adjusted to and FiNER-LFs + Majority Vote which aggregates labels using a
reflect this alteration. Specifically, if any token is split into multiple simple majority vote. The Snorkel WMV model is trained on the
tokens, the initial token will be designated with a B- label and the generated label matrix in an unsupervised way. We used 1000 epochs
succeeding tokens will receive an I- label. to train the Snorkel WMV model.
Likewise, a similar adjustment can be made in the reverse direc-
tion, in the case where the labeling function partitions a token that is 5.2 Results
not split by the common tokenizer. In this scenario, we combine the Vanilla Flair achieves an average weighted F1-score of 0.7924. As
multiple tokens into a single token within the labeling function and hypothesized, having multiple LFs for each entity type and aggre-
assign the label of the initial token to the combined term. gating the LFs helps recognize entity types with various heuristics
The process of tokenizing the input sentence using a common encoded in multiple LFs, giving the weak-supervision model with
tokenization scheme is demonstrated in the above example. Gen- aggregated LFs comparable or better performance than a weak-
erally, we use the state-of-the-art tokenizer here as we want our supervision model with only 3 mutually exclusive LFs. FiNER-LFs
common tokenization to be as good as possible. Upon establishing + Simple Majority Vote a comparable average weighted F1-score
the common tokenization scheme, the token spans labeled by each of 0.7934 and FiNER-LFs + Snorkel WMV achieves a slightly
labeling function are updated to conform to the same tokenization improved average weighted F1-score of 0.7948.
scheme.
In cases where a word within a sentence is separated by the
5.3 Ablation Study
common tokenizer but not by the labeling function, the token span
from the labeling function is split into two or more spans. Conversely, To understand why performance for the location and organization
if a word is separated by the labeling function tokenizer but not by category is lower and how one can work towards improving it in
the common tokenizer, the token spans from the labeling function future work, we present a confusion matrix for the best performing
are merged into a single token span. This merging and splitting of model (FiNER-LFs + Snorkel WMV) in Table 5. Upon reviewing
token spans as required ensures that all labeling functions adopt the the matrix, we discovered that many location tokens were incor-
same tokenization. rectly labeled as organizations. This may be due to the fact that
Upon completion of the specified adjustments, a label matrix is organization names often include location information in the train-
obtained, which comprises a label assigned to each token by each ing data, as well as the fact that the organization category has more
labeling function. We then applySnorkel which executes the label than double the number of tokens compared to the location category.
aggregation process and generates predicted labels for each token. In order to improve the model’s performance further, researchers
These labels are referred to as "Final NER Data" in our pipeline. can explore adding more labeling functions specifically designed to
resolve location-organization conflicts.

5 PERFORMANCE ANALYSIS 6 CONCLUSION


5.1 Experiments This paper presents a novel labeled Named Entity Recognition
We conduct 3 experiments with models tested on the FiNER-ORD (NER) dataset, FiNER-ORD, that was generated from financial
test data. We use the LF Suite + LF Aggregator naming convention news articles. We compare our dataset with the existing financial
to indicate each component of the model architecture used in weak- NER dataset and demonstrate its qualitative and quantitative im-
supervision experiments. Each weak-supervision model requires an portance. Additionally, we expand the existing open-source weak-
LF Suite component which indicates the set of labeling functions supervision pipeline for span-level labeling. Our proposed FiNER
applied to the unlabeled raw text input. When different labeling weak-supervision framework is scalable and open-source, includ-
functions may produce conflicting predictions for an entity type, LF ing an easy-to-use API. To evaluate the proposed framework, we
Aggregator indicates the voting technique applied to make a final benchmark various configurations of weakly-supervised models on
label prediction for each token. We conduct all experiments using the newly created dataset. The performance analysis shows that the
PyTorch [23] on NVIDIA RTX A6000 GPU. FiNER-LFs aggregated using Snorkel’s weighted majority vote ag-
We test a weak-supervision model with only an LF Suite compo- gregation performs best and achieves a weighted average F-1 score
nent. Vanilla Flair uses the 3 out-of-the-box Flair LFs which label of 0.7948.
Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Shah, et al.

Model (LF Suite + LF Aggregator) PER_B PER_I LOC_B LOC_I ORG_B ORG_I Weighted F-1
Vanilla Flair + NA 0.8943 0.9012 0.8361 0.6387 0.7652 0.7146 0.7924
FiNER-LFs + Majority Vote 0.9180 0.9322 0.8297 0.6455 0.7636 0.6919 0.7934
FiNER-LFs + Snorkel WMV 0.9196 0.9348 0.8297 0.6455 0.7610 0.6998 0.7948
Table 4: Performance comparison of various weak-supervision models tested on the FiNER-ORD dataset. Values are weighted F-1
scores averaged over three different random seed trials.

Predicted
Other PER_B PER_I LOC_B LOC_I ORG_B ORG_I Recall
Other 23792 4 1 19 3 149 150 0.9865
PER_B 7 263 2 1 0 11 2 0.9196
PER_I 5 8 165 0 0 0 5 0.9016
Actual LOC_B 23 8 0 246 1 16 7 0.8173
LOC_I 25 0 1 19 61 2 12 0.5083
ORG_B 49 3 1 7 0 457 27 0.8401
ORG_I 37 0 0 0 4 22 310 0.8311
Precision 0.9939 0.9196 0.9706 0.8425 0.8841 0.6956 0.6043
Table 5: Confusion matrix for the best performing model (FiNER-LFs + Snorkel WMV) on FiNER-ORD test sample for seed=42.

LIMITATIONS AND FUTURE WORK Words like President, Ms, and CEO were not labeled as part of the
Our NER model encounters difficulties in cases where the location PER entity but help indicate a PER entity. In a context indicating
(LOC) is part of the organization (ORG) entity. For instance, in possession with ’s, the name until the ’s was tagged.
the phrase "Google India," "India" is a location, but it is labeled • President Obama
as an organization in our framework, as we do not permit overlap- • CEO Phyllis Wakiaga
ping entity labels. Moreover, we do not incorporate a miscellaneous • Ms Wakiaga
(MISC) label in our label set. Our pipeline has inherited limitations • Bill Clinton’s
from Snorkel, including that the labeling function cannot leverage
knowledge from sentences other than the one it is currently process-
ing. Additionally, our labeling functions do not utilize the output of A.2 Location Entities
other labeling functions when assigning labels. As a result, we urge LOC entities primarily consisted of names of continents, countries,
researchers to expand on our work and address these limitations. states, cities, and addresses. In the examples below, bolded spans
represent spans comprising LOC entities. Commas in addresses
ACKNOWLEDGMENTS are not included tagged LOC entities. In such cases for tagging
We appreciate the generous support of Azure credits from Microsoft addresses, each complete span delimited by a comma was tagged as
made available for this research via the Georgia Institute of Tech- a LOC entity. In a context indicating possession with ’s, the name
nology Cloud Hub. We would like to thank Manan Jagani, Darsh until the ’s was tagged. In the case where a location was identified by
Rank, Visaj Shah, Nitin Jain, Roy Gabriel, Olaolu Dada, and Gabriel multiple tokens delimited by a space, the entire name was labeled as
Shafiq for their contribution to the project in the initial stage. LOC and the post-processing script tagged the first name as LOC_B
and the last name as LOC_I. Words such as "Kenyan" were treated
A ANNOTATION GUIDE as adjectives and thus not labeled as a LOC entity. When discussing
The manual annotation process to create FiNER-ORD consisted a lawmaker’s political and location affiliation, examples such as R-
of ingesting the financial news articles in Doccano. Entities of the Texas denoting "Republican from Texas" are encountered, in which
type person (PER), organization (ORG), and location (LOC) were only the location name such as Texas is tagged.
identified according to the rules described below. Some well-known • Asia
names for these entities were obvious while others were confirmed • US
by researching the names to identify the correct entity type. • India
• United States
A.1 Person Entities • Beijing
PER entities were identified by their first name and/or last name. In • New York
the examples below, bolded spans represent a single person entity. • Redwood City, California
In the case where a person was identified by their first and last • Kenya’s
name, the entire name was labeled as PER and the post-processing • Mombasa Road
script tagged the first name as PER_B and the last name as PER_I. • R-Texas
FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision Model Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan

A.3 Organization Entities with distant supervision. In Proceedings of the 26th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. 1054–1064.
ORG entities consist of examples such as company names, news [17] Pierre Lison, Jeremy Barnes, and Aliaksandr Hubin. 2021. skweak: Weak Super-
agencies, government entities, and abbreviations such as stock ex- vision Made Easy for NLP. arXiv preprint arXiv:2104.09683 (2021).
[18] Tianyu Liu, Yuchen Jiang, Nicholas Monath, Ryan Cotterell, and Mrinmaya
change names and company stock tickers. Punctuation marks such Sachan. 2022. Autoregressive Structured Prediction with Language Models.
as hyphens are included when tagging an ORG entity. As designated, arXiv:2210.14698 [cs.CL] https://arxiv.org/abs/2210.14698
.com is included in the identified company’s name. In a context [19] Zhiqiang Ma, Steven Pomerville, Mingyang Di, and Armineh Nourbakhsh. 2020.
SPot: A Tool for Identifying Operating Segments in Financial Tables. In Pro-
indicating possession with ’s, the name until the ’s was tagged. ceedings of the 43rd International ACM SIGIR Conference on Research and
Development in Information Retrieval. 2157–2160.
• Wal-Mart [20] Andrew McCallum and Wei Li. 2003. Early results for named entity recognition
• China Resources SZITIC Trust Co Ltd with conditional random fields, feature induction and web-enhanced lexicons.
• Wall Street Journal (2003).
[21] Andrei Mikheev, Marc Moens, and Claire Grover. 1999. Named entity recognition
• Atlanta Federal Reserve without gazetteers. In Ninth Conference of the European Chapter of the Association
• Morgan Stanley’s for Computational Linguistics. 1–8.
[22] Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision
• Delta Air Lines for relation extraction without labeled data. In Proceedings of the Joint Conference
• DAL of the 47th Annual Meeting of the ACL and the 4th International Joint Conference
• NYSE on Natural Language Processing of the AFNLP. 1003–1011.
[23] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gre-
• Amazon.com gory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga,
Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison,
Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and
REFERENCES Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep
[1] Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Learning Library. In Advances in Neural Information Processing Systems 32,
Roland Vollgraf. 2019. FLAIR: An easy-to-use framework for state-of-the-art H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett
NLP. In NAACL 2019, 2019 Annual Conference of the North American Chapter (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-
of the Association for Computational Linguistics (Demonstrations). 54–59. pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[2] Julio Cesar Salinas Alvarado, Karin Verspoor, and Timothy Baldwin. 2015. Do- [24] Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D Manning.
main adaption of named entity recognition to support credit risk assessment. In 2020. Stanza: A Python natural language processing toolkit for many human
Proceedings of the Australasian Language Technology Association Workshop languages. arXiv preprint arXiv:2003.07082 (2020).
2015. 84–90. [25] Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and
[3] Dogu Araci. 2019. Finbert: Financial sentiment analysis with pre-trained language Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision.
models. arXiv preprint arXiv:1908.10063 (2019). In Proceedings of the VLDB Endowment. International Conference on Very Large
[4] Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing Data Bases, Vol. 11. NIH Public Access, 269.
with Python — Analyzing Text with the Natural Language Toolkit. O’Reilly [26] Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and
Media. Christopher Ré. 2020. Snorkel: Rapid training data creation with weak supervision.
[5] Sudheer Chava, Wendi Du, Agam Shah, and Linghang Zeng. 2022. Measuring The VLDB Journal 29, 2 (2020), 709–730.
firm-level inflation exposure: A deep learning approach. Available at SSRN [27] Erik F Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared
4228332 (2022). task: Language-independent named entity recognition. arXiv preprint cs/0306050
[6] Aitao Chen, Fuchun Peng, Roy Shan, and Gordon Sun. 2006. Chinese named (2003).
entity recognition with conditional probabilistic models. In Proceedings of the [28] Ramit Sawhney, Shivam Agarwal, Megh Thakkar, Arnav Wadhwa, and Rajiv Ratn
Fifth SIGHAN Workshop on Chinese Language Processing. 173–176. Shah. 2021. Hyperbolic Online Time Stream Modeling. In Proceedings of the
[7] Dawei Cheng, Fangzhou Yang, Xiaoyang Wang, Ying Zhang, and Liqing Zhang. 44th International ACM SIGIR Conference on Research and Development in
2020. Knowledge Graph-Based Event Embedding Framework for Financial Information Retrieval. 1682–1686.
Quantitative Investments. In Proceedings of the 43rd International ACM SIGIR [29] Burr Settles. 2004. Biomedical named entity recognition using conditional random
Conference on Research and Development in Information Retrieval. 2221–2230. fields and rich feature sets. In Proceedings of the international joint workshop on
[8] Walter Daelemans, Jakub Zavrel, Antal van den Bosch, and Ko van der Sloot. natural language processing in biomedicine and its applications (NLPBA/BioNLP).
2002. MBT: Memory-Based Tagger version 1.0 Reference Guide. ILK Technical 107–110.
Report ILK-0209, University of Tilburg, The Netherlands (2002). [30] Pratvi Shah, Arkaprabha Banerjee, Agam Shah, Bhaskar Chaudhury, and Sud-
[9] Leon Derczynski, Eric Nichols, Marieke Van Erp, and Nut Limsopatham. 2017. heer Chava. 2022. Numerical Claim Detection in Finance: A Weak-Supervision
Results of the WNUT2017 shared task on novel and emerging entity recognition. Approach. (2022).
In Proceedings of the 3rd Workshop on Noisy User-generated Text. 140–147. [31] Raj Sanjay Shah, Kunal Chawla, Dheeraj Eidnani, Agam Shah, Wendi Du, Sudheer
[10] Dimitra Farmakiotou, Vangelis Karkaletsis, John Koutsias, George Sigletos, Con- Chava, Natraj Raman, Charese Smiley, Jiaao Chen, and Diyi Yang. 2022. WHEN
stantine D Spyropoulos, and Panagiotis Stamatopoulos. 2000. Rule-based named FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for
entity recognition for Greek financial texts. In Proceedings of the Workshop on Financial Domain. arXiv preprint arXiv:2211.00083 (2022).
Computational lexicography and Multimedia Dictionaries (COMLEX 2000). 75– [32] Shuwei Wang, Ruifeng Xu, Bin Liu, Lin Gui, and Yu Zhou. 2014. Financial
78. named entity recognition based on conditional random fields and information
[11] Fuli Feng, Moxin Li, Cheng Luo, Ritchie Ng, and Tat-Seng Chua. 2021. Hybrid entropy. In 2014 International Conference on Machine Learning and Cybernetics,
Learning to Rank for Financial Event Ranking. In Proceedings of the 44th Inter- Vol. 2. IEEE, 838–843.
national ACM SIGIR Conference on Research and Development in Information [33] Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei
Retrieval. 233–243. Huang, and Kewei Tu. 2021. Automated Concatenation of Embeddings for
[12] Radu Florian, Abe Ittycheriah, Hongyan Jing, and Tong Zhang. 2003. Named Structured Prediction. In the Joint Conference of the 59th Annual Meeting of
entity recognition through classifier combination. In Proceedings of the seventh the Association for Computational Linguistics and the 11th International Joint
conference on Natural language learning at HLT-NAACL 2003. 168–171. Conference on Natural Language Processing (ACL-IJCNLP 2021). Association
[13] Xiaoya Li, Xiaofei Sun, Yuxian Meng, Junjun Liang, Fei Wu, and Jiwei Li. 2019. for Computational Linguistics.
Dice loss for data-imbalanced NLP tasks. arXiv preprint arXiv:1911.02855 [34] Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei
(2019). Huang, and Kewei Tu. 2021. Improving named entity recognition by external
[14] Yinghao Li, Pranav Shetty, Lucas Liu, Chao Zhang, and Le Song. 2021. BERTify- context retrieving and cooperative learning. arXiv preprint arXiv:2105.03654
ing the hidden Markov model for multi-source weakly supervised named entity (2021).
recognition. arXiv preprint arXiv:2105.12848 (2021). [35] Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer
[15] Yinghao Li, Le Song, and Chao Zhang. 2022. Sparse Conditional Hidden Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle
Markov Model for Weakly Supervised Named Entity Recognition. arXiv preprint Franchini, Mohammed El-Bachouti, Robert Belvin, and Ann Houston. 2013.
arXiv:2205.14228 (2022). OntoNotes Release 5.0. https://doi.org/11272.1/AB2/MKJJ2R
[16] Chen Liang, Yue Yu, Haoming Jiang, Siawpeng Er, Ruijia Wang, Tuo Zhao, and
Chao Zhang. 2020. Bond: Bert-assisted open-domain named entity recognition
Under Review at ACM SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Shah, et al.

[36] Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto. [39] Jieyu Zhang, Cheng-Yu Hsieh, Yue Yu, Chao Zhang, and Alexander Ratner. 2022.
2020. LUKE: deep contextualized entity representations with entity-aware self- A survey on programmatic weak supervision. arXiv preprint arXiv:2202.05433
attention. arXiv preprint arXiv:2010.01057 (2020). (2022).
[37] Yue Yu, Simiao Zuo, Haoming Jiang, Wendi Ren, Tuo Zhao, and Chao Zhang. [40] Rongzhi Zhang, Yue Yu, Pranav Shetty, Le Song, and Chao Zhang. 2022. PRBoost:
2020. Fine-tuning pre-trained language model with weak supervision: A Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised
contrastive-regularized self-training approach. arXiv preprint arXiv:2010.07835 Learning. arXiv preprint arXiv:2203.09735 (2022).
(2020). [41] GuoDong Zhou and Jian Su. 2002. Named entity recognition using an HMM-
[38] Man-Ching Yuen, Irwin King, and Kwong-Sak Leung. 2011. A survey of crowd- based chunk tagger. In Proceedings of the 40th annual meeting of the association
sourcing systems. In 2011 IEEE third international conference on privacy, security, for computational linguistics. 473–480.
risk and trust and 2011 IEEE third international conference on social computing.
IEEE, 766–773. Received Under Review; revised Under Review; accepted Under Review

You might also like