A Survey On Deep Learning For Patent Analysis
A Survey On Deep Learning For Patent Analysis
A R T I C L E I N F O A B S T R A C T
Keywords: Patent document collections are an immense source of knowledge for research and innovation communities
Deep learning worldwide. The rapid growth of the number of patent documents poses an enormous challenge for retrieving and
Patent analysis analyzing information from this source in an effective manner. Based on deep learning methods for natural
Text mining
language processing, novel approaches have been developed in the field of patent analysis. The goal of these
Natural language processing
approaches is to reduce costs by automating tasks that previously only domain experts could solve. In this article,
we provide a comprehensive survey of the application of deep learning for patent analysis. We summarize the
state-of-the-art techniques and describe how they are applied to various tasks in the patent domain. In a detailed
discussion, we categorize 40 papers based on the dataset, the representation, and the deep learning architecture
that were used, as well as the patent analysis task that was targeted. With our survey, we aim to foster future
research at the intersection of patent analysis and deep learning and we conclude by listing promising paths for
future work.
1. The patent life-cycle and tasks for automation patent applications and granted patents, the patent offices classify newly
filed patent applications based on the field of invention. As a result,
The patent life-cycle requires experts with domain-specific knowl every published patent application and granted patent is labeled with
edge at all its different stages. First, when companies or individual in one or several class codes. Such hierarchical classification schemes, e.g.,
ventors wish to obtain patent protection for an invention, they need to the International Patent Classification (IPC), may later help to limit the
specify the invention in textual form and as illustrations in a patent search to the relevant field of invention.
application. Typically, such a patent application is drafted by a patent After a patent application has been filed with a patent office, a
attorney, who has both a technical and legal background. It contains technically skilled examiner assesses the patentability of the described
claims that define the desired scope of protection for the invention. A invention. In particular, the examiner carries out a prior art search and
patent will only be granted in case these claims specify a subject-matter evaluates whether the invention defined in the claims is new and in
that is new and inventive over prior art. Hence, already in this early ventive (i.e. not obvious) over the found prior art. In this regard, the
stage of drafting a patent application, it is beneficial to conduct a cursory examiner mostly cites older published patents or patent publications
prior art search to assess the granting chances and eventually to adapt rather than books or conference papers as prior art. The main reasons for
the claim wording. citing mostly other patents and patent applications is the large size of
However, it can be challenging to retrieve relevant prior art, i.e., patent data (according to the European Patent Office (EPO) currently
previous publications related to the invention in question. Other in 110 million documents1), the standardized structure of the patent data
ventors who had the same idea before might have used different words in patent classes, and the unquestionable publication date of each patent
to describe it. A simple keyword search is therefore not necessarily document.
successful. In particular, computer-implemented inventions as non- In case the examiner finds that the claimed invention is not new or
tangible products are often described by generic terms such as “sys not inventive, the applicant has the opportunity to further specify the
tem”, “means”, “modules” etc. In order to better structure the entirety of invention defined in the claims. In this way, the claimed invention may
* Corresponding author.
E-mail addresses: [email protected] (R. Krestel), [email protected] (R. Chikkamath), [email protected] (C. Hewel), [email protected] (J. Risch).
1
https://worldwide.espacenet.com.
https://doi.org/10.1016/j.wpi.2021.102035
Received 14 October 2020; Received in revised form 18 February 2021; Accepted 7 March 2021
Available online 30 March 2021
0172-2190/© 2021 Elsevier Ltd. All rights reserved.
R. Krestel et al. World Patent Information 65 (2021) 102035
be sufficiently delimited from the relevant prior art. However, the 1.3. Related surveys
applicant must not add any new information to the originally filed
application. In case the patent examiner can be convinced, a patent is To foster interdisciplinary research, a couple of surveys on machine
granted. learning in the patent domain have been published in the past. Joho
Independent of the outcome of the examination, a patent application et al. [43] provide a descriptive analysis of the patent literature and list
is published 18 months after its filing date. Accordingly, the applicant various requirements of patent search and functionalities of patent users
has to disclose its invention to the public even if finally no patent might adapted during analysis. Their survey also identifies the demographics
be granted. In case a patent is granted, said granted patent is published of patent specialists and their relation in performing patent search tasks.
as an additional document. Moreover, since the applicant may obtain Based on the number of patent practitioners and approaches utilized in
patent protection for an invention in several countries, multiple national literature, they outline an ideal patent search system. Abbas et al. [1]
patent applications and patents with substantially the same content may summarize text-mining and visualization-based approaches used for
be published. A granted patent may be enforced against a competitor’s patent analysis. Patent collections are rich in structured and unstruc
products in patent litigation proceedings. As a defence strategy, the tured text content which requires intelligent tools to accomplish effi
competitor may attack the validity of the patent. In particular, the cient patent analysis. The study focuses on a taxonomy of tools and
competitor usually tries to find relevant prior art, that has not been techniques to retrieve and analyze the data. The study also discloses the
revealed during the examination proceedings. In case a novelty drawbacks associated with approaches that utilize only semantics-based
destroying prior art document is found, a patent can still be nullified in techniques. Data mining techniques for patent analysis were surveyed
post-grant proceedings. by Zhang et al. [101], who critically point out technical issues associated
with patent mining tasks. For classification, visualization, search, and
evaluation tasks, not only technical issues but also solutions proposed in
1.1. Patent analysis tasks
related work are comprehensively discussed. The authors also identified
challenges for end-user applications related to patent mining. A recent
The described patent life-cycle describes a set of tasks that can be (at
survey reports on intellectual property analysis as a special type of data
least) partially automated. These tasks are often summarized under the
science [6]. It refers primarily to non-deep-learning machine learning
term “patent analysis”. From the literature, we identified the most
approaches and introduces four main categories of analysis: knowledge
popular tasks for automatic patent analysis. They can be classified into:
management, technology management, information extraction, and
economic value prediction. Another survey [88], which focuses on
1. Supporting tasks, such as pre-processing, extracting information for
patent retrieval, quantitatively compares different approaches using
further analysis, or translating patents to other languages;
benchmark datasets. The retrieval methods discussed are all query
2. Patent classification, where patent documents are categorized hier
reformulation methods and are grouped into: keyword-based, pseu
archically based on the field of invention;
do-relevance feedback, semantics-based, metadata-based, and interac
3. Patent retrieval, which branches into prior art search, automated
tive methods. The retrieval tasks and datasets are described in-depth but
patent landscaping, infringement search, Freedom-to-Operate
deep learning approaches to tackle this task are not presented.
search, and passage retrieval;
To the best of our knowledge, no survey focuses on deep learning
4. Patent valuation and market value prediction, being an innovative
approaches for different patent analysis tasks. Therefore, with this
research where content, bibliographic details of the patents are
article, we intent to fill this gap and present an overview of datasets, text
analyzed with certain human made protocols to analyze the quality
representation methods, and deep neural network architectures used for
of patent applications, this analysis is further used to add market
various patent analysis tasks.
value and solved as regression problems;
5. Technology forecasting, where patents are used to asses a technology
2. Datasets
landscape and which helps researchers to capture new or trendy
technologies;
Training data are an essential ingredient for machine learning in
6. Patent text generation, where the structure and styles incorporated
general and for deep learning in particular. Publicly available datasets
in published patent documents are used to automate the process of
facilitate access and thus research in the patent domain. Table 2 gives an
writing patent claims;
overview which papers make use of which data collection. The different
7. Litigation analysis, a legal process where potential patents lead to a
patent datasets are listed in Table 3. For a more detailed description of
dispute or litigation between any two companies by prohibiting the
available datasets and benchmark collections, see, e.g. Ref. [88], or
development of business strategies;
[66].
8. Computer vision tasks, which work with figures and drawings from
patent documents instead of text.
2.1. Granted patents and patent applications
1.2. Patent offices The large patent offices, namely USPTO and EPO, are providing
various datasets next to the actual granted patents and applications.
The ever growing number of patent applications leaves especially the There are also specific collections serving as benchmark datasets, such as
patent offices in dire need for automation. The potential of deep learning USPTO2M, which comprises the titles, abstracts, and class labels of 2
or machine learning in general to automate some of the necessary pro million granted patents and contains around 235 million tokens, avail
cessing was recognized by major patent offices and AI initiatives where able in JSON format. With 5 million full-text patent publications from
established.2 Table 1 lists some of the efforts undertaken by EPO, United 1976 to 2016, USPTO5M is an even larger dataset, which contains
States Patent and Trademark Office (USPTO), and World Intellectual around 38 billion tokens. WIPO released excerpts extracted from patent
Property Organization (WIPO). Research on deep learning for the patent documents in several datasets in XML format. Its most recent dataset,
domain is thus not only conducted by academia and industry but is WIPO-delta-en, is from 2019 and comprises an unprecedented set of 55
actively fueled by the major patent offices by providing benchmark million excerpts. EPO has patents in English, French, and German and
datasets, organizing challenges and workshops. publishes various data products for research. Other national patent of
fices also publish patent data, e.g., there has been research conducted
involving documents from the Japanese, Chinese, and Russian patent
2
https://www.wipo.int/about-ip/en/artificial_intelligence/search.jsp. offices.
2
R. Krestel et al. World Patent Information 65 (2021) 102035
Table 1
Major patent offices have recognized the potential of deep learning and started initiatives to foster its application.
Task EPO USPTO WIPO
Patent Automatic pre-classification, re-classification Concept questioning using chat bots to assist automatic Automatic categorization based on IPC main
classification using CPC scheme claim analysis and classification class, sub-class or main group
Prior art search Gold standard generation, automatic annotation, Search whole document against corpus of grants and Cognitive or semantic search using AI assisted
search, and query generation pre-grants tools
Data analysis Develop open source libraries to explore data and Browser-based advanced patent analytics Economic and strategic analysis
technology trend analysis
Patent Automatic annotation and exclusion detection Patent quality assurance using text analytics, advance Advance big data analysis to automate
examination data analytics for statistics description examination workflow
Image analysis Automatic image and figure search Image search for patents and trademarks Image search within global brand database,
patent search using images
USPTO [3,14,18,23,24,32,34,36,39,41,45,49,54–59,61,75,
Besides patent collections, there are also a couple of related docu
80–82,87,102,104]
WIPO [2,5,34,81,82] ments that are explored in the context of patent analysis. Among them
EPO [5,34,41] are trend and search reports, landscaping reports, or general reports
NAT (national) [34,47,48,60,63,67,70,103,105] about trending technologies.
COLL (curated [2,40,41,58,86] KIPO Reports: In addition to the patent offices already mentioned,
collections)
REP (reports) [23,45,79,104]
the Korean Intellectual Property Office (KIPO)6 is also active in patent
LEGAL (post-grant [61,94] analysis research. Along with patent documents, KIPO releases yearly
documents) trend reports (KISTA7) and landscaping reports (KIPRIS8).
Gartner’s Hype Cycle: Gartner’s Hype Cycle for Emerging Tech
nologies9 describes technology trends and classifies them into different
Table 3 stages based on maturity. This can be used as ground truth for trend
Sources of patent collections. prediction [104].
Dataset Collection Source
EPO Search Reports: Search reports published by patent offices are
a valuable resource especially for patent retrieval research. In particular,
USPTO Bulk https://bulkdata.uspto.gov/
to make sure an invention is novel and non-obvious, search reports help
USPTO 2 M http://mleg.cse.sc.edu/DeepPatent/index.html
WIPO Bulk https://www.wipo.int/classifications/ipc/en/ITsu by providing citations to the existing similar inventions [62]. These
pport/Categorization/dataset/ inventions may be described in patents or pending applications. The
EPO Bulk https://www.epo.org/searching-for-patents/data/bulk- search reports can be utilized as a ground truth resource for training
data-sets.html machine learning models for prior art search. They play an important
NAT Chinese https://oversea.cnki.net
Japanese https://www.j-platpat.inpit.go.jp/
role in the patenting process and document how the prior art search was
Russian https://new.fips.ru/publication-web/?lang=en conducted and what relevant documents were found by the examiner.10
COLL NTCIR http://research.nii.ac.jp/ntcir/data/data-en.html According to the rich format citation standard of EPO and WIPO, the
CLEF-IP http://www.ifs.tuwien.ac.at/~clef-ip/download-cent relevant passages are cited and the citation is categorized for each found
ral.shtml
prior art document. The categories indicate in which way the cited
REP KISTA http://biz.kista.re.kr/patentmap/
Search EPO https://www.epo.org/searching-for-patents/legal/regi documents are relevant to the patent application’s claims, such as
ster.html technical background (A), novelty-destroying (X), or inventive step
LEGAL Litigation http://lexmachina.com; http://www.patexia.com/; htt (Y).11
p://legal.thomsonreuters.com
Various https://developer.uspto.gov/data?search_api_fullte
USPTO xt=csv 2.4. Post-grant documents
3
R. Krestel et al. World Patent Information 65 (2021) 102035
(e.g., the German federal court of justice). Post-grant office proceedings For example, in Germany and Switzerland patent documents (as
give third parties the possibility to challenge the validity of a granted published by the patent office) are explicitly excluded from copyright
patent, e.g., to request revocation due to novelty destroying prior art protection (Section 5(2) German UrhG, and Art. 5 d. Swiss CopA). In the
which has not been found by the patent examiner. The documents UK patent specifications published after 1989 may be reproduced for the
originating from these proceedings may thus complement the search purpose of “disseminating information”, but other uses are prohibited
reports from the pre-grant examination proceedings. One example for without a license from the copyright holder.15 In the USA, the text and
these kinds of documents can be found in Rajshekhar et al. [77], who drawings of a patent are typically not subject to copyright restrictions
provide a data set of PTAB prior art references extracted from USPTO according to the USPTO,16 even if there are limited exceptions reflected
rulings.12 in 37 CFR 1.71(d) & (e) and 1.84(s). At least for US patent applications it
Post-grant court proceedings mostly concern litigation proceedings is thus possible to include a copyright notice or mask work notice (cf. 37
due to an alleged patent infringement. Accordingly, these documents CFR 1.71 (d) and (e)). However, inclusion of such a copyright notice or
relate to the question whether a contested product of a third party in mask is only allowed if the applicant acknowledges at the same time that
fringes a patent. Documents originating from court proceedings facsimile reproduction (i.e., photocopies) are allowed.
comprise court decisions (also referred to as case law) and party sub
missions (including evidence). Meanwhile, decisions are often publicly 3. Representations
available (e.g., on a court’s website13), while party submissions are
usually not published and only available under specific conditions (e.g., Patent documents consist of text data, metadata, such as citation
in Germany due to §299 ZPO). There also exist datasets related to patent information, and sometimes image data. These data need to be pre-
litigation proceedings. For example, USPTO provides a dataset with processed into a suitable (vector) representation to allow deep
detailed patent litigation data on more than 81,000 unique district court learning models to use the data as input. Table 4 lists the various rep
cases filed between 1963 and 2016.14 The data include a variety of in resentations, which can be divided into representations learned by deep
formation, including parties and attorneys, cause of action, location, and learning methods, such as word2vec [69], and “raw” input representa
litigation history. In addition, commercial services exist that offer liti tions, as used primarily in traditional machine learning models.
gation data, which is used by researchers (see Table 3). Deep learning models can use raw input directly to learn a seman
tically meaningful representation given enough training data. The
imprecise, labor-intensive step of extracting features is then unnec
2.5. Copyright protection
essary. These learned representations already contain valuable infor
mation so that they can be directly used for certain tasks and don’t
Machine learning approaches rely on the availability of training
data. Complex deep learning models require large amounts of data to require further processing by deep learning methods. Examples are
clustering of patents based on their learned representations, or visual
learn and thus copyright issues need to be considered. The patent
domain seems to be suited very well to deploy deep learning methods. izing patent similarity by using dimensionality reduction [68] of the
embedding space. Especially embeddings of words, i.e., the represen
On the one hand, there are huge amounts of publicly available patent
documents, that are often annotated by domain experts, e.g., with re tation of words as dense vectors, can also be used as input for traditional
machine learning methods [5,34,36,39,49,94,102,103]
gard to the international patent classification (IPC) scheme. This saves
additional, expensive, manual labeling of the data for training purposes. Before deep learning revolutionized machine learning in many areas,
extracting and selecting appropriate features was a major task and
On the other hand, published patents and patent applications are
crucial processing step for traditional machine learning approaches.
generally excluded from copyright protection. That is to say, since such
Especially text and image data needed to be transformed into features
patent documents are published by a patent office, they have a public
that could be used by methods such as support vector machines or de
domain status which shall not be subject to any copyright claims. Ac
cision trees. This type of features is still valuable when available, e.g., as
cording to this general rule, anyone can reproduce published patent
metadata or citation information. However, in the context of deep
documents, at least as long as the content is not altered and the source is
learning, the raw input text or image data is typically represented by
correctly cited, i.e., the patent (application) number.
learned representations, usually called embeddings, which automates
However, there may exist exceptions to this general rule. Potential
the cumbersome feature extraction and selection process. It is also
copyright claims may concern very different aspects in a patent docu
possible to combine traditional features with embedding representa
ment. Just to give some examples, ownership might be claimed for
tions, either by combining the different input data or by combining the
drawings of a third party, for specific boilerplate paragraphs used by a
representations. In the deep learning for patent analysis literature, there
law firm, or for third-party text or drawings incorporated in a patent
are three types of traditional features used: numerical, citation, and raw
application which itself was protected by copyright. Moreover, even if
image features. Alternatively and more common for deep learning ap
there exist several multilateral international copyright treaties, copy
proaches are different embeddings to represent the input data.
rights basically constitute national rights. Hence, copyright protection
can vary between the national jurisdictions (i.e., between the single
3.1. Numerical features
countries). For example, any copyright claims with regard to a European
patent application would be subject to the individual national laws of
These are quantities that can be extracted either manually or auto
the member states of the European Patent Convention. A thorough
matically from patent data. Examples are citation counts [61,104], dates
analysis of the different national laws regarding potential copyright
[61], number of claims [59,104] or other numeric quantities derived
protection of patent documents goes beyond the scope of this survey. For
from metadata [59,61,104], but also categorical features represented as
this reason, only a brief summary of the regulations in some selected
one-hot encoded vectors, e.g., Ref. [3] or class codes [3,61].
jurisdictions is given in the following which confirm the above-
mentioned general rule.
12
https://data.world/wzadrozn/ptab-prior-art.
13 15
https://juris.bundesgerichtshof.de/cgi-bin/rechtsprechung/list.py?Gericht https://webarchive.nationalarchives.gov.uk/20140603113132/h
=bgh&Art=en&Sort=3. ttp://www.ipo.gov.uk/types/copy/c-other/c-other-faq/c-other-faq-type/c-othe
14
https://www.uspto.gov/about-us/news-updates/patent-litigation-data-th r-faq-type-patspec.htm.
16
rough-2016-now-available. https://www.uspto.gov/terms-use-uspto-websites.
4
R. Krestel et al. World Patent Information 65 (2021) 102035
5
R. Krestel et al. World Patent Information 65 (2021) 102035
6
R. Krestel et al. World Patent Information 65 (2021) 102035
samples from real data [31]. Roughly speaking, the two sub-networks 5.1. Supporting tasks
compete during the training process and thereby improve each other’s
parameters. One major area of application for GANs is the generation of There are a couple of tasks that can be considered supporting tasks,
authentic looking, but artificially generated data. In the context of in the sense that they produce results that can be used to analyze patents
patent data, GANs have not been used for this task so far. One reason in a later step. Among these supporting tasks, there are three broader
might be the low quality of the resulting texts. Even when solving the areas that were investigated in the context of patent data employing
problem of the non-differentiable selection of the next token to generate deep learning methods: extraction of information from patents, segmen
a sentence with an RNN [100], the output of these GAN models is far tation of patent documents into semantically meaningful smaller parts,
away from genuine-looking texts. Besides generating texts, GANs can and translation of patents from a foreign language.
also be used to generate other types of data. There is an approach using Extraction: Named entity extraction is a very important task in
GANs to generate the features of artificial samples to create more various domains, e.g., from news articles or tweets. In the patent
training data for standard machine learning approaches in the patent domain, automatically extracting chemical named entities [102] or
domain [104]. biomedical named entities [86] is particularly interesting. Not only
Autoencoder network: The main idea behind AEs [96] are to learn entity mentions can be extracted, but also the relations between a pair of
a condensed, lower-dimensional representation of the input data in an entities [18]. Besides entities, extracting general keywords from patent
unsupervised fashion. To this end, the autoencoder tries to reconstruct texts can also be very useful, e.g., for classification [39].
the encoded data in a decoding step. It learns to represent the data in a Segmentation: Patents or patent applications are semi-structured
way that keeps the reconstruction error minimal. In the patent domain, documents consisting of different sections. If patents are not available
AEs have not been employed very frequently [45]. An extension to plain in electronic form, this structure might get lost. Deep learning based
autoencoders are variational AEs [46] which allow to generate data, but representations can be used to segment large OCRed text into predefined
haven’t been employed for patent data yet. sections [14]. But even within a large section, such as the “description”
Transformer-based network: Transformer models [95] have been section, text can often be further segmented, primarily into the part
developed in the context of machine translation and consist of an describing the invention and the part describing experiments [34].
encoder and a decoder. As such, they form the foundation of contextual Translation: If patents are only available in a certain language,
word embedding models, such as BERT [27], based on a transformer’s translating these patents is the first step to further analyze them. Given
encoder, or GPT [11], based on a transformer’s decoder. These models the huge success of deep learning methods in the area of machine
learn powerful representations (therefore categorized as “representa translation, it comes with no surprise that there has been research
tion” in this survey). By exchanging the last layer of BERT with a focusing on patent texts in the context of translation [40,47].
task-specific layer, these transformer-based network architectures can These supporting tasks are useful as a preparation step to then
be trained on different tasks. This step is called fine-tuning. Since the further analyze the results or use the results in subsequent steps, e.g., for
underlying representations are so powerful, one fully connected layer on classification.
top is sufficient to get very good results. In the patent domain, re
searchers have fine-tuned BERT for different tasks [23,54,56,79]. Some 5.2. Classification
experiments have been conducted to use GPT to generate patent texts
[55]. Patent classification is the most prominent patent analysis task,
Besides these main concepts, there are a couple of further improve where one needs to assign a classification code to a patent document
ments and specializations. An important one is the attention mecha based on IPC or CPC classification schemes. In practice, patent docu
nisms [7], which allows neural networks to learn which part of the input ments are analyzed manually and then the classification codes are
is most relevant for the desired output. A side effect of the attention assigned by the applicant and patent officers. These manual labeling
mechanism is that the words on which the attention is placed can serve tasks require domain expertise and are time-consuming. Besides tradi
as an explanation for the network’s output, e.g., to explain a classifica tional machine learning, deep learning techniques can be used for
tion decision. Attention can be used on top of convolutional or recurrent automatic patent classification. Since the classification schemes are hi
layers. It is especially popular in combination with erarchical, different variants for the setting exists, e.g., only predicting
sequence-to-sequence network architectures. The training of architec the top-level class. This is the simplest version and in practice not very
tures based only on attention without an underlying LSTM or GRU layer useful. More interesting settings require to prediction of the subclass up
can be better parallelized and thus enables the processing of more data to a certain level. This setting was also used for large shared-task com
in shorter time. Transformer-based networks make heavy use of atten petitions (e.g., CLEF-IP 2010 [8]). Regarding the evaluation, there are
tion allowing to learn huge models. also different variants possible. Given that a patent can have multiple
subclasses assigned to it, three measures can be deduced [29]: The
5. Patent analysis tasks straightforward measure compares the top prediction with the main
subclass assigned to the patent. Another measure compares the top three
There are different patent analysis tasks that have been automated at predictions with the main subclass. And the third measure compares the
least partially in the past. Table 6 lists different tasks together with the top prediction with the main class and the incidental subclasses assigned
publications that propose deep learning methods to automate them. to the patent.
In the context of machine learning for patent analysis, classification
is by far the most popular task. One reason for this is the availability of
Table 6
Patent analysis tasks and the papers that addressed them. large quantities of training data, i.e., patent documents with assigned
class labels. Another reason is the straightforward setting: given a
Analysis Task Used in
document, predict the subclass codes. The deep learning approaches
SUP (supporting tasks) [14,18,34,39,40,47,86,102,103] differ therefore only slightly, e.g., with respect to the data they use as
CLASS (classification) [2,32,56–58,63,67,80–82,87,105]
input (abstract, claims, metadata, etc.) or the network architecture
RETR (retrieval) [3,5,23,36,45,49,57,63,75,79]
QUAL (quality analysis and market valuation) [24,59] (GRU, LSTM, etc.). Some approaches also explicitly model the hierarchy
TECH (technology forecasting) [70,104] of the classification codes [80]. Besides predicting classes based on a
GEN (data generation) [54,55] classification scheme, other classification tasks are possible when
LIT (litigation analysis) [61,94] analyzing patent data. One approach tries to classify citations into
CV (computer vision) [41,48,60]
applicant-provided or examiner-provided [63]. Another approach uses
7
R. Krestel et al. World Patent Information 65 (2021) 102035
classification to train representations to improve clustering later on 2005 provide datasets for passage retrieval. In this task, relevant pas
[75]. In general, predefined class labels can be used to learn semanti sages (paragraphs) need to be retrieved from a patent document. For
cally meaningful representations, since the class labels work as a very example, an instance of this task might consist of a patent application
short summary of the patent itself [57]. and a prior art patent. Only those passages from the patent shall be
retrieved that are relevant to judge the novelty of the application’s
5.3. Retrieval claims. Since this task is very complex and even for experts very hard,
not many have tried to automate this task. Only one very recent
Finding patents is important for a variety of reasons and with approach using EPO search reports to learn to match claims and para
different intentions. In our description of the retrieval subtasks we graphs [79] has been proposed. It is based on contextual word embed
mostly follow Shalaby and Zadrozny [88], who provide a good overview dings and trained with positive and negative examples of matching
of patent retrieval tasks. In addition, we include passage retrieval and paragraphs and claims.
clustering as further subtasks, since there are a couple of papers dedi Clustering: Clustering is the unsupervised grouping of patents based
cated to them. on a similarity measure. In contrast to classification, where class labels
The most obvious subtask is finding prior art for a given patent exist that can be learned, clustering does not need any labels. In the
(application). But also finding patents related to a specific area or context of patent analysis, clustering is often used in combination with
dealing with a specific topic, often called landscaping, is an important visualization methods, grouping similar patents close together in a
retrieval task. Finding not patents as such but particular sentences or vector space and then visualizing a 2-dimensional projection of this
paragraphs within patent documents is called passage retrieval. And space. It is often possible to use the semantically loaded embeddings
finally, a more general, indirect retrieval task is clustering patents: for directly to compute similarity between words, sentences, or documents
each patent, the most similar patents need to be identified. [57]. The evaluation of clustering is then more difficult, since no
Prior art search: Prior art (in other words state of the art or back ground-truth labels exist. Reports from patent offices can be used to this
ground art) is composed of all publicly available information that has end, e.g., from KIPO [45]. Simple algorithms, such as k-means, can be
been made accessible in any readable form prior to a given date that used to cluster the patents based on similarity and the results can be
might be similar to a patent’s claims. If prior art already describes the visualized using dimensionality reduction methods [75], such as t-SNE
same invention as a newly filed application, then another patent on that [68].
respective invention cannot be granted. Patents mistakenly granted after
the publication of such prior art can be revoked. There are different 5.4. Quality analysis and market valuation
reasons for conducting prior art search: Related work search needs to be
done by the patent applicant to find and list related patents. Novelty Analysing the quality of patents plays a major role in determining the
detection or patentability search is carried out by patent applicants, economic value of a patent portfolio. A quality analysis and evaluation
patent examiners, patent attorneys, patent agent professionals. They typically relies on domain expertise, technical knowledge and other
search in patents and patent application databases as well as in other factors, such as market and finance strategies. Such an analysis is of
scientific literature to identify the novelty of an invention. Novelty great interest for patent applicants, venture capitalist, policymakers, and
detection takes place before and after an inventor files a patent appli business organisations. Although there are several metrics that act as
cation. Validity detection tries to discover a prior art overlooked by the consensus to measure the quality of an invention, having global metrics
patent examiner in order to invalidate a patent. A validity check is from major patent offices is challenging. Major offices, such as USPTO
carried out by patent infringement entities or patent owners. They and EPO, approach the problem with custom metrics. For instance,
search in patents and patent application databases, other scientific USPTO proposed indicators with respect to a product, process and
literature, technical society websites, and archives, usually after a patent perception.20 Others [35] identified forward citations in patents as a
was issue. Infringement search or freedom-to-operate are a special form of reliable indicator to detect the value of a patent. Other indicators
prior art search, where the purpose is to discover whether claims of include quality of the claims, family size of the patent, and the validity of
patent applications and patents are infringed by any process or product. the patent.
It is carried out by patent attorneys and professional patent searchers Approaches using deep learning are still very rare. One approach
(often directed by attorneys), both in patents and patent application focuses on the citation network of a patent to assess its value [59].
databases. This form of search is done before and after an inventor gets a Another approach tries to predict the number of forward citations as an
grant. Traditionally, the whole process of prior art search was carried indicator of patent value using abstract and claim text as well as
out through expert-generated queries or term-based search methods. hand-crafted features [24].
These approaches consume a lot of human labour, require domain
expertise, and are also often associated with sub-optimal results. To 5.5. Technology forecasting
alleviate these problems, several deep learning approaches were pro
posed [5,36,49]. Technology forecasting provides an opportunity for both, public and
Landscaping: Related to prior art search is automated patent land private enterprises, to predict upcoming technologies and make sure
scaping [3]. Patent landscaping helps in finding technology related about their capital investment. Patent documents are a major source to
patents to avoid infringement issues and also to asses the trends in base such predictions on. Technology forecasting started with unsu
technology. The change in technology may lead to several implications pervised approaches, where text and data mining techniques were
towards business, economy, and policies. It has been complex and time employed [17,20]. Several approaches [10,22] considered citation
consuming process to conduct a careful technology survey. Abood and networks and Bayesian models for clustering to provide technology
Feltenberger [3] proposed an approach to patent landscaping using 5.9 clusters. However, these unsupervised methods lack external domain
million USPTO patent abstracts, citations, and CPC codes. Patent doc knowledge and hence must incorporate domain experts interpretations
uments are further used to generate seed sets by human experts, perform in the end, which are time-consuming and costly.
feature extraction, and create embeddings. Choi et al. [23] presented One deep learning approach analyses citation networks and then
benchmark datasets from KISTA trend reports. They propose the use of tries to predict the number of future citations within a community using
graph embeddings based on metadata, such as USPC, IPC, or CPC codes. LSTM [70]. Zhou et al. [104] proposed an approach to augment training
Another approach making use of citation information finds similar
technology patents [63].
Passage Retrieval: The workshops of CLEP-IP 2013 and NTCIR 20
https://www.uspto.gov/patent/initiatives/quality-metrics-1.
8
R. Krestel et al. World Patent Information 65 (2021) 102035
data using GANs to make up for a lack of annotated data. To this end, Captioning: Semantically understanding what is depicted in an
features are extracted using classical methods, then, synthetic data is image is a very active research field. One application is to generate
generated resembling the extracted feature combinations, and finally, captions automatically describing what can be seen on an image. If
classical machine learning methods are trained on the augmented data. successful, these generated captions can be used for classification or
retrieval, e.g., to find semantically similar images. Compared to stan
5.6. Data generation dard image captioning, which is applied to photos, images in patents are
much more difficult to describe meaningfully. One team of researchers
The most progress of employing deep learning in the patent domain proposed an image captioning model using a combination of CNN and
could be gained by automating the patent writing process itself. Ad an LSTM [60]. The authors considered design patents to train an
vancements in language modeling and also GANs make it at least encoder-decoder model. The CNN is pre-trained on ImageNet [26] and is
theoretically possible to generate patents or patent claims automatically used to encode design patent images in 300-dimensional vectors.
given some seed information. Systems, such as GPT-3 [11] have Further, the text descriptions of the images are encoded with word
demonstrated their capabilities in generating text. The question re embeddings of the same dimensionality and a mapping from image
mains, how to improve the quality of the generated texts to make it features to word embeddings is learned.
interesting for the patent domain. Image-Based Retrieval: Image data are also used for patent
In preliminary work, Lee [54] proposed a transformer model for retrieval based on the content of images. Especially when dealing with
generating claims. He proposes to fine-tuned a GPT-2 model by design patents, images are the best cue to find relevant patents. This
personalizing the training data. In follow-up work [55], the proposed field is called content-based image retrieval [97]. Deep learning models
model was realized and patent claims actually generated. The reported can be utilized to improve over classical approaches. One approach [41]
results are still very poor, leaving a lot of room for improvement. uses a dual VGG network [89] to learn the representation of two images
by minimizing the cosine distance of similar images as defined by the
5.7. Litigation analysis IPC class labels.
Patent litigation is a legal process where potential patents lead to a 6. Literature discussion
dispute or litigation between any two companies by prohibiting the
development of business strategies. This process often helps the com In this section, we summarize our findings on the literature discussed
panies to protect their profits and other proprietary values. The identi above. Table 7 gives an overview in matrix-style, highlighting the main
fication of patents that might induce litigation between companies is a ingredients of each identified related work article. The timeline on the
tedious, costly and time-consuming process, which is carried out left emphasizes the growing number of research papers using deep
manually. It often comes with various intentions such as protection of learning in the field of patent analysis.
market shares and product features but also fighting competitors (patent Before the year 2016, automated patent analysis was conducted
war). In the patent domain, several feature engineering-based ap using traditional information retrieval and machine learning methods.
proaches were proposed to automate the detection of litigation risk, e.g., While there are still some approaches using these traditional methods
using collaborative filtering [42]. Litigation risk is highly related to today, the vast majority of research in the patent domain happens in the
patent quality and is therefore sometimes considered to be one facet of field of deep learning. Deep learning models consistently outperform
it. We treat it as a separate task, reflecting the different type of input data traditional approaches on perceptive tasks, i.e., tasks where semantic
that is necessary to asses litigation risk, namely legal documents in information, either from natural language or image data, plays an
addition to patents. important role. Given the complexity of patents, especially compared to
Combining legal documents and patents can be done using deep other textual data, such as news articles, product reviews, or tweets, this
learning methods. Liu et al. [61] made use of USPTO patents and Patexia advantage becomes even larger. Deep learning methods are able to
lawsuits21) to train a model to predict the risk of litigation, which in capture semantics much better and allow for a much more fine-grained
fluences the value of a patent. To this end, the authors proposed to analysis, e.g., in prior art search or classification. This comes with the
combine network embeddings learned from hand-crafted features and cost of typically requiring much more annotated training data. This is
word embeddings followed by a CNN to learn a representation for pat especially true for the large contextual word embedding models having
ents. Tensor factorization is used to predict the probability of litigation billions of parameters that need to be learned. One way to cope with this
based on this learned representation. Closely related to patent litigation is transfer learning: Instead of training solely on the in-domain training
is trademark litigation. One approach uses word embeddings to repre data, these complex models can be trained on general text to capture
sent trademark case judgments [94]. Exploring the learned space can common semantics, and then only require to be fine-tuned on domain-
help to find relevant precedents. Clustering using k-means is further specific training data. This concept is responsible for the large success
proposed to facilitate this search process. of contextual word embeddings, such as BERT, ELMo, and ULMFiT, for
domain-specific tasks.
5.8. Computer vision We already discussed the different ingredients of the considered
papers. Nevertheless, we want to briefly summarize the insights with
Inventors use figures, flowcharts or work-flows to depict their in regard to Table 7. While most research was and still is considering
vention. Analysing such image data is a great challenge and specific USPTO patents for their experiments, patents from other countries are
tasks include classification, image captioning, and image-based retrieval. not ignored. The advantage of English language patents, apart from their
Classification: A first step in analyzing images is the classification of commercial international importance, is that there are already pre-
what is depicted in the figure into more specific classes, such as technical trained models available for English and it is easier to compare with
drawing, chemical structure, sequence of gene, flowchart etc. The re other approaches when they report results on similar data. Further, the
sults can be used to improve search or enable faceted search for new possibilities of deep learning fosters the combination of patent
particular figure types. One deep learning approach that tackles this task documents and non-patent literature, such as court documents or re
specifically for patents uses CNNs to classify patent images into different ports. We expect to see more of this especially for technology forecasting
categories [48]. and quality and market valuation.
The vast majority of approaches uses word embeddings as a deep
learning method to represent patent documents. Some approaches
21
https://www.patexia.com/. combine word embeddings with other representations, such as graph
9
R. Krestel et al. World Patent Information 65 (2021) 102035
Table 7
Survey Summary. NAT (national: Chinese, Japanese, or Russian patents), COLL (curated collections: NTCIR, CLEF-IP, or
TREC-CHEM), REP (reports: Gartner, KISTA or EPO), LEGAL (post-grant documents), NUM (numeric features), CIT
(citation networks), IMG (image data), WE (word embeddings), DE (document/paragraph embeddings), GE (graph/
network embeddings), CTX (contextual word embeddings), FC (fully connected network), SEQ2SEQ (sequence-to-
sequence network), AE (autoencoder network), TRANS (transformer-based network), SUP (supporting task: extraction,
segmentation, or translation), CLASS (classification), RETR (retrieval: prior art search, landscaping, passage retrieval, or
clustering), QUAL (quality and market valuation), TECH (technology forecasting), GEN (data generation), LIT (litigation
analysis), CV (computer vision: captioning, classification, or image-based retrieval).
embeddings or document embeddings. Only a very small number of neural network layers to fulfill their tasks. Often, these kind of layers are
papers deal with non-textual input. This is either citation information or also used to combine two architectures and different input data.
image data. Numeric or discrete input data, e.g., metadata of patents, is Classification is the most popular patent analysis task. In its basic
rarely used and if, then in combination with embeddings. This is form, it is also the easiest task and the task with the most annotated
consistent with the promise of deep learning not requiring feature en training data, given that all published patents have classes assigned to
gineering and being able to extract the crucial information from the raw them. In addition, evaluating classifiers is rather simple since there are
(textual) input data. plentiful ground truth datasets available. Patent retrieval with all its
The employed deep learning architectures are more diverse. The subtask is also very popular. The remaining identified tasks have been
classic network architectures, CNN and LSTM, are the most popular for investigated only sparely so far. One reason for this is the more difficult
conducting patent analysis research, but the more complex architectures nature of these tasks. When even human experts do not agree on a so
are gaining momentum. ENDEC and GAN architectures are more lution, or the task requires a lot of common sense and background
specialized architectures and therefore not suited for all tasks. Never knowledge, then automatic methods still have a hard time. However, we
theless, we expect to see more of those architectures, especially for expect that more complex deep learning methods will be able to handle
difficult tasks. The large contextual word embedding models represent these difficult tasks better in the future.
data extremely accurately and therefore only require simple dense
10
R. Krestel et al. World Patent Information 65 (2021) 102035
7. Trends and conclusions [4] M.Z. Alom, T.M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M.S. Nasrin, M. Hasan,
B.C. Van Essen, A.A. Awwal, V.K. Asari, A state-of-the-art survey on deep learning
theory and architectures, Electronics 8 (2019) 1–66.
Currently, there is a trend towards training more complex neural [5] H. Aras, R. Türker, D. Geiss, M. Milbradt, H. Sack, Get your hands dirty:
network architectures with larger number of parameters and thus a need evaluating word2vec models for patent data, in: Proceedings of the Posters and
for larger training datasets. After bi-directional encoder representations Demos Track of the International Conference on Semantic Systems (SEMPDF),
2018, pp. 1–4.
with transformers (BERT) [27] and XLNet [98] with 340 million pa [6] L. Aristodemou, F. Tietze, The state-of-the-art on Intellectual Property Analytics
rameters, GPT-3 [11] pushed the limit to 175 billion parameters. Many (IPA): a literature review on artificial intelligence, machine learning and deep
architectures build on the underlying attention mechanism using learning methods for analysing intellectual property (IP) data, World Patent Inf.
(WPI) 55 (2018) 37–51.
transformer models [95]. These large models reveal their full potential [7] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to
for few-shot and zero-shot learning [76], e.g., with label embeddings align and translate, in: Proceedings of the International Conference on Learning
[80]. It is a challenge to access and handle the required amounts of Representations (ICLR), 2015, pp. 1–11.
[8] J. Beney, Lci-insa linguistic experiment for clef-ip classification track, in:
training data, e.g., more than 181 billion English words [11]. Thus, we Proceedings of the Conference and Labs of the Evaluation Forum (CLEF), 2010,
see another trend to reduce the amount of training data with the help of pp. 1–11.
transfer learning and training on auxiliary tasks to reduce the need for [9] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with
subword information, Trans. Assoc. Comput. Linguistics (TACL) 5 (2017)
labeled task-specific training data. This development shows a direction 135–146.
from supervised to self-supervised/semi-supervised training. [10] A. Breitzman, P. Thomas, The emerging clusters model: a tool for identifying
For the specific research direction of patent analysis with deep emerging technologies across multiple patent systems, Res. Pol. (RP) 44 (2015)
195–205.
learning, we envision new tasks. Patent text generation is a rather new
[11] T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal,
task and there are also almost no deep learning approaches for passage A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,
retrieval. The reason for this is the complexity of the task [79], which G. Krueger, T. Henighan, R. Child, A. Ramesh, D.M. Ziegler, J. Wu, C. Winter,
makes it much harder in comparison to standard document classification C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner,
S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language Models Are Few-
and requires other evaluation scenarios, e.g., a ranking of documents. Shot Learners, ArXiv E-Prints arXiv:2005.14165, 2020.
Further, instead of full documents, subsections of documents need to be [12] H. Cai, V.W. Zheng, K.C.C. Chang, A comprehensive survey of graph embedding:
matched. Finally, there are no standard labels or annotations available. problems, techniques, and applications, Trans. Knowl. Data Eng. 30 (2018)
1616–1637.
The extraction of information from search reports that can be used as [13] J. Camacho-Collados, M.T. Pilehvar, From word to sense embeddings: a survey on
labels is challenging. vector representations of meaning, J. Artif. Intell. Res. 63 (2018) 743–788.
Another new task that we expect to become more relevant is litiga [14] D.S. de Carvalho, M.L. Nguyen, Efficient neural-based patent document
segmentation with term order probabilities, in: Proceedings of European
tion analysis. While the vision of an artificial intelligence that can Symposium on Artificial Neural Networks, Computational Intelligence and
handle the entire patent life-cycle (AI patent lawyer) is emerging on the Machine Learning (ESANN), 2017, pp. 171–176.
horizon, today’s machine-learned models can still be easily fooled if [15] D. Cer, Y. Yang, S.y. Kong, N. Hua, N. Limtiaco, R.S. John, N. Constant,
M. Guajardo-Cespedes, S. Yuan, C. Tar, et al., Universal Sentence Encoder, 2018
targeted. To the best of our knowledge, there is no work on adversarial arXiv Preprint arXiv:1803.11175.
attacks on, e.g., image classification or text classification, in the patent [16] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, I. Androutsopoulos,
domain — yet. We are confident that more semi-automated applications LEGAL-BERT: the muppets straight out of law school, in: Proceedings of the
Conference on Empirical Methods in Natural Language Processing: Findings
will be developed in research and will eventually find their way into
(EMNLP), 2020, pp. 2898–2904.
industry in the near future. [17] C.K. Chang, A. Breitzman, Using patents prospectively to identify emerging, high-
With this survey, we summarized existing approaches which make impact technological clusters, Res. Eval. 18 (2009) 357–364.
use of deep learning for a variety of patent analysis tasks. While research [18] L. Chen, S. Xu, L. Zhu, J. Zhang, X. Lei, G. Yang, A deep learning based method for
extracting semantic information from patent documents, Scientometrics 125
in this area is still in its early stages, we outlined current trends of using (2020) 289–312.
various deep learning methods. We anticipate a shift in automated [19] Y.S. Chen, K.C. Chang, Exploring the nonlinear effects of patent citations, patent
patent analysis away from classical machine learning to more and more share and relative patent position on market value in the US pharmaceutical
industry, Technol. Anal. Strat. Manag. 22 (2010) 153–169.
deep learning. Further, we gave an overview of the available datasets for [20] D. Chiavetta, A. Porter, Tech mining for innovation management, Technol. Anal.
supervised learning needed by these methods. We hope that our work Strat. Manag. 25 (2013) 617–618.
fosters interest in deep learning for patent analysis and serves as a [21] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk,
Y. Bengio, Learning phrase representations using RNN encoder–decoder for
comprehensive survey for researchers and practitioners from academia statistical machine translation, in: Proceedings of the Conference on Empirical
and industry. Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734.
[22] S. Choi, S. Jun, Vacant technology forecasting using new bayesian patent
clustering, Technol. Anal. Strat. Manag. (TA&SM) 26 (2014) 241–251.
Author statement [23] S. Choi, H. Lee, E. Park, S. Choi, Deep Patent Landscaping Model Using the
Transformer and Graph Embedding, 2019. ArXiv e-prints arXiv:1903.05823.
Ralf Krestel: Conceptualization, Writing - Original Draft, Writing - [24] P. Chung, S.Y. Sohn, Early detection of valuable patents using a deep learning
model: case of semiconductor industry, Technol. Forecast. Soc. Change 158
Review & Editing, Visualization, Supervision, Project administration. (2020) 120–146.
Renukswamy Chikkamath: Writing - Original Draft. [25] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, A. Bordes, Supervised learning of
Christoph Hewel: Writing - Original Draft. universal sentence representations from natural language inference data, in:
Proceedings of the Conference on Empirical Methods in Natural Language
Julian Risch: Writing - Original Draft, Writing - Review & Editing. Processing (EMNLP), 2017, pp. 670–680.
[26] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale
hierarchical image database, in: Proceedings of the IEEE Conference on Computer
Declaration of competing interest Vision and Pattern Recognition (CVPR), 2009, pp. 248–255.
[27] J. Devlin, M.W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep
bidirectional transformers for language understanding, in: Proceedings of the
None.
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies (NAACL-HLT), 2019, pp. 4171–4186.
References [28] K. Ethayarajh, How contextual are contextualized word representations?
comparing the geometry of BERT, ELMo, and GPT-2 embeddings, in: Proceedings
of the Conference on Empirical Methods in Natural Language Processing and the
[1] A. Abbas, L. Zhang, S.U. Khan, A literature review on the state-of-the-art in patent
International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),
analysis, World Patent Inf. 37 (2014) 3–13.
2019, pp. 55–65.
[2] L. Abdelgawad, P. Kluegl, E. Genc, S. Falkner, F. Hutter, Optimizing neural
[29] C.J. Fall, A. Törcsvári, K. Benzineb, G. Karetka, Automated categorization in the
networks for patent classification, in: Joint European Conference on Machine
international patent classification, in: Proceedings of the Special Interest Group
Learning and Knowledge Discovery in Databases, 2019, pp. 688–703.
on Information Retrieval (SIGIR), 2003, pp. 10–25.
[3] A. Abood, D. Feltenberger, Automated patent landscaping, Artif. Intell. Law 26
[30] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT press, 2016.
(2018) 103–125.
11
R. Krestel et al. World Patent Information 65 (2021) 102035
[31] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, [61] Q. Liu, H. Wu, Y. Ye, H. Zhao, C. Liu, D. Du, Patent litigation prediction: a
A. Courville, Y. Bengio, Generative adversarial nets, in: Proceedings of the convolutional tensor factorization approach, in: Proceedings of the International
Advances in Neural Information Processing Systems (NeurIPS), 2014, Joint Conferences on Artificial Intelligence (IJCAI), 2018, pp. 5052–5059.
pp. 2672–2680. [62] K. Loveniers, How to interpret EPO search reports, World Patent Inf. (WPI) 54
[32] M.F. Grawe, C.A. Martins, A.G. Bonfante, Automated patent classification using (2018) 23–28.
word embedding, in: Proceedings of the International Conference on Machine [63] Y. Lu, X. Xiong, W. Zhang, J. Liu, R. Zhao, Research on classification and
Learning and Applications (ICMLA), 2017, pp. 408–411. similarity of patent citation based on deep learning, Scientometrics (2020) 1–27.
[33] A. Grover, J. Leskovec, node2vec: scalable feature learning for networks, in: [64] M. Lupu, A. Fujii, D.W. Oard, M. Iwayama, N. Kando, Patent-related tasks at ntcir,
Proceedings of the International Conference on Knowledge Discovery and Data in: Current Challenges in Patent Information Retrieval, 2017, pp. 77–111.
Mining (KDD), 2016, pp. 855–864. [65] M. Lupu, J. Huang, J. Zhu, J. Tait, Trec-chem: large scale chemical information
[34] M. Habibi, A. Rheinlaender, W. Thielemann, R. Adams, P. Fischer, S. Krolkiewicz, retrieval evaluation at trec, in: ACM SIGIR Forum, 2009, pp. 63–70.
D.L. Wiegandt, U. Leser, Patseg: a sequential patent segmentation approach, Big [66] M. Lupu, F. Piroi, A. Hanbury, Aspects and analysis of patent test collections, in:
Data Res. 19–20 (2020) 100–133. Proceedings of the International Workshop on Patent Information Retrieval,
[35] D. Harhoff, F.M. Scherer, K. Vopel, Citations, family size, opposition and the 2010, pp. 17–22.
value of patent rights, Res. Pol. 32 (2003) 1343–1363. [67] L. Lyu, T. Han, A comparative study of Chinese patent literature automatic
[36] L. Helmers, F. Horn, F. Biegler, T. Oppermann, K.R. Müller, Automating the classification based on deep learning, in: Proceedings of the Joint Conference on
search for a patent’s prior art with a full text similarity search, PLoS One 14 Digital Libraries (JCDL), 2019, pp. 345–346.
(2019). [68] L.v.d. Maaten, G. Hinton, Visualizing data using t-sne, J. Mach. Learn. Res.
[37] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (JMLR) 9 (2008) 2579–2605.
(1997) 1735–1780. [69] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient Estimation of Word
[38] J. Howard, S. Ruder, Universal language model fine-tuning for text classification, Representations in Vector Space, 2013. ArXiv e-prints arXiv:1301.3781.
in: Proceedings of the Annual Meeting of the Association for Computational [70] K. Nakai, H. Nonaka, A. Hentona, Y. Kanai, T. Sakumoto, S. Kataoka, E.C.
Linguistics (ACL), 2018, pp. 328–339. A. Carreón, T. Hiraoka, Community detection and growth potential prediction
[39] J. Hu, S. Li, Y. Yao, L. Yu, G. Yang, J. Hu, Patent keyword extraction algorithm using the stochastic block model and the long short-term memory from patent
based on distributed representation for patent classification, Entropy 20 (2018) citation networks, in: Proceedings of the International Conference on Industrial
104–123. Engineering and Engineering Management (IEEM), 2018, pp. 1884–1888.
[40] L. Jehl, S. Riezler, Document-level information as side constraints for improved [71] J. Pennington, R. Socher, C. Manning, Glove: global vectors for word
neural patent translation, in: Proceedings of the Conference of the Association for representation, in: Proceedings of the Conference on Empirical Methods in
Machine Translation in the Americas (AMTA), 2018, pp. 1–12. Natural Language Processing (EMNLP), 2014, pp. 1532–1543.
[41] S. Jiang, J. Luo, G.R. Pava, J. Hu, C.L. Magee, A CNN-Based Patent Image [72] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: online learning of social
Retrieval Method for Design Ideation, 2020. ArXiv e-prints arXiv:2003.08741. representations, in: Proceedings of the International Conference on Knowledge
[42] B. Jin, C. Che, K. Yu, Y. Qu, L. Guo, C. Yao, R. Yu, Q. Zhang, Minimizing legal Discovery and Data Mining (KDD), 2014, pp. 701–710.
exposure of high-tech companies through collaborative filtering methods, in: [73] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer,
Proceedings of the International Conference on Knowledge Discovery and Data Deep contextualized word representations, in: Proceedings of the Conference of
Mining (KDD), 2016, pp. 127–136. the North American Chapter of the Association for Computational Linguistics:
[43] H. Joho, L.A. Azzopardi, W. Vanderbauwhede, A survey of patent users: an Human Language Technologies (NAACL-HLT), 2018, pp. 2227–2237.
analysis of tasks, behavior, search functionality and system requirements, in: [74] F. Piroi, A. Hanbury, Evaluating information retrieval systems on european patent
Proceedings of the Symposium on Information Interaction in Context (IIiX), 2010, data: the clef-ip campaign, in: Current Challenges in Patent Information Retrieval,
pp. 13–24. 2017, pp. 113–142.
[44] J. Kim, S. Lee, Forecasting and identifying multi-technology convergence based [75] J. Qi, L. Lei, K. Zheng, X. Wang, Patent analytic citation-based vsm: challenges
on patent data: the case of IT and BT industries in 2020, Scientometrics 111 and applications, IEEE Access 8 (2020) 17464–17476.
(2017) 47–65. [76] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models
[45] J. Kim, J. Yoon, E. Park, S. Choi, Patent document clustering with deep are unsupervised multitask learners, OpenAI Blog 1 (2019) 9.
embeddings, Scientometrics 123 (2020) 1–15. [77] K. Rajshekhar, W. Zadrozny, S.S. Garapati, Analytics of patent case rulings:
[46] D.P. Kingma, M. Welling, Auto-encoding variational bayes, in: Proceedings of the empirical evaluation of models for legal relevance, in: Proceedings of the
International Conference on Learning Representations (ICLR), 2014, pp. 1–14. International Conference on Artificial Intelligence and Law (ICAIL), 2017,
[47] S. Kinoshita, T. Oshio, T. Mitsuhashi, Comparison of smt and nmt trained with pp. 1–9.
large patent corpora: Japio at wat2017, in: Proceedings of the Workshop on Asian [78] N. Reimers, I. Gurevych, Sentence-BERT: sentence embeddings using siamese
Translation (WAT), 2017, pp. 140–145. BERT-networks, in: Proceedings of the Conference on Empirical Methods in
[48] A. Kravets, N. Lebedev, M. Legenchenko, Patents images retrieval and Natural Language Processing and the International Joint Conference on Natural
convolutional neural network training dataset quality improvement, in: Language Processing (EMNLP-IJCNLP), 2019, pp. 3982–3992.
Proceedings of the International Research Conference on Information [79] J. Risch, N. Alder, C. Hewel, R. Krestel, Patent Match: A Dataset for Matching
Technologies in Science, Management, Social Sphere and Medicine (ITSMSSM), Patent Claims with Prior Art, 2020. ArXiv e-prints arXiv:2012.13919.
2017, pp. 287–293. [80] J. Risch, S. Garda, R. Krestel, Hierarchical document classification as a sequence
[49] A. Krishna, Y. Jin, C. Foster, G. Gabel, B. Hanley, A. Youssef, Query Expansion for generation task, in: Proceedings of the Joint Conference on Digital Libraries
Patent Searching Using Word Embedding and Professional Crowdsourcing, 2019. (JCDL), 2020, pp. 147–155.
ArXiv e-prints arXiv:1911.11069. [81] J. Risch, R. Krestel, Learning patent speak: investigating domain-specific word
[50] M.N. Kyebambe, G. Cheng, Y. Huang, C. He, Z. Zhang, Forecasting emerging embeddings, in: Proceedings of the Thirteenth International Conference on
technologies: a supervised learning approach through patent analysis, Technol. Digital Information Management (ICDIM), 2018, pp. 63–68.
Forecast. Soc. Change (TF&SC) 125 (2017) 236–244. [82] J. Risch, R. Krestel, Domain-specific word embeddings for patent classification,
[51] Q. Le, T. Mikolov, Distributed representations of sentences and documents, in: Data Technol. Appl. (DTA) 53 (2019) 108–122.
Proceedings of the International Conference on Machine Learning (ICML), 2014, [83] J.Y. Rob Srebrovic, Leveraging the BERT Algorithm for Patents with TensorFlow
pp. 1188–1196. and Big Query, Technical Report, Google, 2020, https://services.google.com/fh
[52] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to /files/blogs/bert_for_patents_white_paper.pdf.
document recognition, IEEE 86 (1998) 2278–2324. [84] B. Rozemberczki, R. Sarkar, Fast sequence-based embedding with diffusion
[53] C. Lee, O. Kwon, M. Kim, D. Kwon, Early identification of emerging technologies: graphs, in: Proceedings of the International Workshop on Complex Networks,
a machine learning approach using multiple patent indicators, Technol. Forecast. 2018, pp. 99–107.
Soc. Change (TF&SC) 127 (2018) 291–303. [85] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-
[54] J. Lee, Patent transformer: a framework for personalized patent claim generation, propagating errors, Nature 323 (1986) 533–536.
in: Proceedings of the JURIX Doctoral Consortium, 2019, pp. 1–13. [86] F. Saad, Named entity recognition for biomedical patent text using bi-lstm
[55] J.S. Lee, J. Hsiang, Patent claim generation by fine-tuning openai gpt-2, World variants, in: Proceedings of the International Conference on Information
Patent Inf. (WPI) 62 (2020) 101983. Integration and Web-Based Applications & Services (iiWAS), 2019, pp. 617–621.
[56] J.S. Lee, J. Hsiang, Patent classification by fine-tuning bert language model, [87] M. Shalaby, J. Stutzki, M. Schubert, S. Günnemann, An LSTM approach to patent
World Patent Inf. (WPI) 61 (2020) 101965. classification based on fixed hierarchy vectors, in: Proceedings of the SIAM
[57] L. Lei, J. Qi, K. Zheng, Patent analytics based on feature vector space model: a International Conference on Data Mining (SDM), 2018, pp. 495–503.
case of iot, IEEE Access 7 (2019) 45705–45715. [88] W. Shalaby, W. Zadrozny, Patent retrieval: a literature review, Knowl. Inf. Syst.
[58] S. Li, J. Hu, Y. Cui, J. Hu, Deeppatent: patent classification with convolutional (KAIS) 61 (2019) 631–660.
neural networks and word embedding, Scientometrics 117 (2018) 721–744. [89] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
[59] H. Lin, H. Wang, D. Du, H. Wu, B. Chang, E. Chen, Patent quality valuation with image recognition, in: Proceedings of the International Conference on Learning
deep learning models, in: Proceedings of the International Conference on Representations (ICLR), 2015, pp. 1–14.
Database Systems for Advanced Applications (DASFAA), 2018, pp. 474–490. [90] I. Sutskever, O. Vinyals, Q.V. Le, Sequence to sequence learning with neural
[60] H. Liu, Q. Dai, Y. Li, C. Zhang, S. Yi, T. Yuan, The design patent images networks, in: Proceedings of the Advances in Neural Information Processing
classification based on image caption model, in: Proceedings of the International Systems (NeurIPS), 2014, pp. 3104–3112.
Conference on Brain Inspired Cognitive Systems (BICS), 2019, pp. 353–362. [91] A.J. Trappey, F.C. Hsu, C.V. Trappey, C.I. Lin, Development of a patent document
classification and search platform using a back-propagation network, Expert Syst.
Appl. 31 (2006) 755–765.
12
R. Krestel et al. World Patent Information 65 (2021) 102035
[92] A.J. Trappey, C.V. Trappey, U.H. Govindarajan, J.J. Sun, Patent value analysis [101] L. Zhang, L. Li, T. Li, Patent mining: a survey, SIGKDD Explor. Newslett. 16
using deep learning models—the case of IoT technology mining for the (2015) 1–19.
manufacturing industry, Trans. Eng. Manag. (2019) 1–13. [102] Y. Zhang, J. Xu, H. Chen, J. Wang, Y. Wu, M. Prakasam, H. Xu, Chemical named
[93] A.J. Trappey, C.V. Trappey, C.Y. Wu, C.W. Lin, A patent quality analysis for entity recognition in patents by domain knowledge and unsupervised feature
innovative technology and product development, Adv. Eng. Inf. 26 (2012) 26–34. learning, Database 2016 (2016).
[94] C.V. Trappey, A.J. Trappey, B.H. Liu, Identify trademark legal case [103] Q. Zhong, X. Qiao, Y. Zhang, Automatic indexing of patent right-claiming
precedents—using machine learning to enable semantic analysis of judgments, document based on deep learning, in: Proceedings of the International Conference
World Patent Inf. (WPI) 62 (2020) 101980. on Applied Mathematics, Modelling and Statistics Application (AMMSA), 2018,
[95] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, pp. 135–139.
I. Polosukhin, Attention is all you need, in: Proceedings of the Advances in Neural [104] Y. Zhou, F. Dong, Y. Liu, Z. Li, J. Du, L. Zhang, Forecasting emerging technologies
Information Processing Systems (NeurIPS), 2017, pp. 5998–6008. using data augmentation and deep learning, Scientometrics 1–29 (2020).
[96] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.A. Manzagol, L. Bottou, Stacked [105] H. Zhu, C. He, Y. Fang, B. Ge, M. Xing, W. Xiao, Patent automatic classification
denoising autoencoders: learning useful representations in a deep network with a based on symmetric hierarchical convolution neural network, Symmetry 12
local denoising criterion, J. Mach. Learn. Res. 11 (2010). (2020) 1–12.
[97] S. Vrochidis, S. Papadopoulos, A. Moumtzidou, P. Sidiropoulos, E. Pianta,
I. Kompatsiaris, Towards content-based patent image retrieval: a framework
Dr. Ralf Krestel is a senior researcher and head of the Web Science Group at Hasso Plattner
perspective, World Patent Inf (WPI) 32 (2010) 94–106.
Institute at University of Potsdam. His research centers around text mining, information
[98] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R.R. Salakhutdinov, Q.V. Le, XLNet:
retrieval, recommender systems, natural language processing, and machine learning. He
generalized autoregressive pretraining for language understanding, in:
studied computer science at University of Karlsruhe and Concordia University in Montreal.
Proceedings of the Advances in Neural Information Processing Systems (NeurIPS),
In 2012, he received his Ph.D. from the University of Hannover, Germany for his work "On
2019, pp. 5753–5763.
the Use of Language Models and Topic Models in the Web". Afterwards he spent two years
[99] T. Young, D. Hazarika, S. Poria, E. Cambria, Recent trends in deep learning based
as a postdoctoral research fellow at University of California, Irvine. From 2019 to 2020 he
natural language processing, IEEE Comput. Intell. Mag. 13 (2018) 55–75.
held the chair of Intelligent Systems at University of Passau, Germany. He co-authored
[100] L. Yu, W. Zhang, J. Wang, Y. Yu, Seqgan: sequence generative adversarial nets
more than 100 peer-reviewed articles and is reviewer for various journals and conferences.
with policy gradient, in: Proceedings of the Conference on Artificial Intelligence
(AAAI), 2017, pp. 2852–2858.
13