0% found this document useful (0 votes)

108 views13 pages

A Survey On Deep Learning For Patent Analysis

This document provides a survey of deep learning techniques for patent analysis. It discusses how deep learning has been applied to various tasks in the patent domain, such as prior art search, classification, and novelty assessment. The survey categorizes 40 papers based on the dataset, model, architecture, and patent analysis task. It aims to foster future research at the intersection of patent analysis and deep learning, and identifies promising directions for future work.

Uploaded by

Alex E

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views13 pages

A Survey On Deep Learning For Patent Analysis

Uploaded by

Alex E

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

World Patent Information 65 (2021) 102035

Contents lists available at ScienceDirect

World Patent Information

journal homepage: http://www.elsevier.com/locate/worpatin

A survey on deep learning for patent analysis

Ralf Krestel a, *, Renukswamy Chikkamath b, Christoph Hewel c, Julian Risch a
a
Hasso Plattner Institute, University of Potsdam, Germany
b
University of Passau, Passau, Germany
c
BETTEN und RESCH Patent- und Rechtsanwaelte PartGmbB, Germany

A R T I C L E I N F O A B S T R A C T

Keywords: Patent document collections are an immense source of knowledge for research and innovation communities
Deep learning worldwide. The rapid growth of the number of patent documents poses an enormous challenge for retrieving and
Patent analysis analyzing information from this source in an effective manner. Based on deep learning methods for natural
Text mining
language processing, novel approaches have been developed in the field of patent analysis. The goal of these
Natural language processing
approaches is to reduce costs by automating tasks that previously only domain experts could solve. In this article,
we provide a comprehensive survey of the application of deep learning for patent analysis. We summarize the
state-of-the-art techniques and describe how they are applied to various tasks in the patent domain. In a detailed
discussion, we categorize 40 papers based on the dataset, the representation, and the deep learning architecture
that were used, as well as the patent analysis task that was targeted. With our survey, we aim to foster future
research at the intersection of patent analysis and deep learning and we conclude by listing promising paths for
future work.

1. The patent life-cycle and tasks for automation patent applications and granted patents, the patent offices classify newly
filed patent applications based on the field of invention. As a result,
The patent life-cycle requires experts with domain-specific knowl every published patent application and granted patent is labeled with
edge at all its different stages. First, when companies or individual in one or several class codes. Such hierarchical classification schemes, e.g.,
ventors wish to obtain patent protection for an invention, they need to the International Patent Classification (IPC), may later help to limit the
specify the invention in textual form and as illustrations in a patent search to the relevant field of invention.
application. Typically, such a patent application is drafted by a patent After a patent application has been filed with a patent office, a
attorney, who has both a technical and legal background. It contains technically skilled examiner assesses the patentability of the described
claims that define the desired scope of protection for the invention. A invention. In particular, the examiner carries out a prior art search and
patent will only be granted in case these claims specify a subject-matter evaluates whether the invention defined in the claims is new and in
that is new and inventive over prior art. Hence, already in this early ventive (i.e. not obvious) over the found prior art. In this regard, the
stage of drafting a patent application, it is beneficial to conduct a cursory examiner mostly cites older published patents or patent publications
prior art search to assess the granting chances and eventually to adapt rather than books or conference papers as prior art. The main reasons for
the claim wording. citing mostly other patents and patent applications is the large size of
However, it can be challenging to retrieve relevant prior art, i.e., patent data (according to the European Patent Office (EPO) currently
previous publications related to the invention in question. Other in 110 million documents1), the standardized structure of the patent data
ventors who had the same idea before might have used different words in patent classes, and the unquestionable publication date of each patent
to describe it. A simple keyword search is therefore not necessarily document.
successful. In particular, computer-implemented inventions as non- In case the examiner finds that the claimed invention is not new or
tangible products are often described by generic terms such as “sys not inventive, the applicant has the opportunity to further specify the
tem”, “means”, “modules” etc. In order to better structure the entirety of invention defined in the claims. In this way, the claimed invention may

* Corresponding author.
E-mail addresses: [email protected] (R. Krestel), [email protected] (R. Chikkamath), [email protected] (C. Hewel), [email protected] (J. Risch).
1
https://worldwide.espacenet.com.

https://doi.org/10.1016/j.wpi.2021.102035
Received 14 October 2020; Received in revised form 18 February 2021; Accepted 7 March 2021
Available online 30 March 2021
0172-2190/© 2021 Elsevier Ltd. All rights reserved.
R. Krestel et al. World Patent Information 65 (2021) 102035

be sufficiently delimited from the relevant prior art. However, the 1.3. Related surveys
applicant must not add any new information to the originally filed
application. In case the patent examiner can be convinced, a patent is To foster interdisciplinary research, a couple of surveys on machine
granted. learning in the patent domain have been published in the past. Joho
Independent of the outcome of the examination, a patent application et al. [43] provide a descriptive analysis of the patent literature and list
is published 18 months after its filing date. Accordingly, the applicant various requirements of patent search and functionalities of patent users
has to disclose its invention to the public even if finally no patent might adapted during analysis. Their survey also identifies the demographics
be granted. In case a patent is granted, said granted patent is published of patent specialists and their relation in performing patent search tasks.
as an additional document. Moreover, since the applicant may obtain Based on the number of patent practitioners and approaches utilized in
patent protection for an invention in several countries, multiple national literature, they outline an ideal patent search system. Abbas et al. [1]
patent applications and patents with substantially the same content may summarize text-mining and visualization-based approaches used for
be published. A granted patent may be enforced against a competitor’s patent analysis. Patent collections are rich in structured and unstruc
products in patent litigation proceedings. As a defence strategy, the tured text content which requires intelligent tools to accomplish effi
competitor may attack the validity of the patent. In particular, the cient patent analysis. The study focuses on a taxonomy of tools and
competitor usually tries to find relevant prior art, that has not been techniques to retrieve and analyze the data. The study also discloses the
revealed during the examination proceedings. In case a novelty drawbacks associated with approaches that utilize only semantics-based
destroying prior art document is found, a patent can still be nullified in techniques. Data mining techniques for patent analysis were surveyed
post-grant proceedings. by Zhang et al. [101], who critically point out technical issues associated
with patent mining tasks. For classification, visualization, search, and
evaluation tasks, not only technical issues but also solutions proposed in
1.1. Patent analysis tasks
related work are comprehensively discussed. The authors also identified
challenges for end-user applications related to patent mining. A recent
The described patent life-cycle describes a set of tasks that can be (at
survey reports on intellectual property analysis as a special type of data
least) partially automated. These tasks are often summarized under the
science [6]. It refers primarily to non-deep-learning machine learning
term “patent analysis”. From the literature, we identified the most
approaches and introduces four main categories of analysis: knowledge
popular tasks for automatic patent analysis. They can be classified into:
management, technology management, information extraction, and
economic value prediction. Another survey [88], which focuses on
1. Supporting tasks, such as pre-processing, extracting information for
patent retrieval, quantitatively compares different approaches using
further analysis, or translating patents to other languages;
benchmark datasets. The retrieval methods discussed are all query
2. Patent classification, where patent documents are categorized hier
reformulation methods and are grouped into: keyword-based, pseu
archically based on the field of invention;
do-relevance feedback, semantics-based, metadata-based, and interac
3. Patent retrieval, which branches into prior art search, automated
tive methods. The retrieval tasks and datasets are described in-depth but
patent landscaping, infringement search, Freedom-to-Operate
deep learning approaches to tackle this task are not presented.
search, and passage retrieval;
To the best of our knowledge, no survey focuses on deep learning
4. Patent valuation and market value prediction, being an innovative
approaches for different patent analysis tasks. Therefore, with this
research where content, bibliographic details of the patents are
article, we intent to fill this gap and present an overview of datasets, text
analyzed with certain human made protocols to analyze the quality
representation methods, and deep neural network architectures used for
of patent applications, this analysis is further used to add market
various patent analysis tasks.
value and solved as regression problems;
5. Technology forecasting, where patents are used to asses a technology
2. Datasets
landscape and which helps researchers to capture new or trendy
technologies;
Training data are an essential ingredient for machine learning in
6. Patent text generation, where the structure and styles incorporated
general and for deep learning in particular. Publicly available datasets
in published patent documents are used to automate the process of
facilitate access and thus research in the patent domain. Table 2 gives an
writing patent claims;
overview which papers make use of which data collection. The different
7. Litigation analysis, a legal process where potential patents lead to a
patent datasets are listed in Table 3. For a more detailed description of
dispute or litigation between any two companies by prohibiting the
available datasets and benchmark collections, see, e.g. Ref. [88], or
development of business strategies;
[66].
8. Computer vision tasks, which work with figures and drawings from
patent documents instead of text.
2.1. Granted patents and patent applications

1.2. Patent offices The large patent offices, namely USPTO and EPO, are providing
various datasets next to the actual granted patents and applications.
The ever growing number of patent applications leaves especially the There are also specific collections serving as benchmark datasets, such as
patent offices in dire need for automation. The potential of deep learning USPTO2M, which comprises the titles, abstracts, and class labels of 2
or machine learning in general to automate some of the necessary pro million granted patents and contains around 235 million tokens, avail
cessing was recognized by major patent offices and AI initiatives where able in JSON format. With 5 million full-text patent publications from
established.2 Table 1 lists some of the efforts undertaken by EPO, United 1976 to 2016, USPTO5M is an even larger dataset, which contains
States Patent and Trademark Office (USPTO), and World Intellectual around 38 billion tokens. WIPO released excerpts extracted from patent
Property Organization (WIPO). Research on deep learning for the patent documents in several datasets in XML format. Its most recent dataset,
domain is thus not only conducted by academia and industry but is WIPO-delta-en, is from 2019 and comprises an unprecedented set of 55
actively fueled by the major patent offices by providing benchmark million excerpts. EPO has patents in English, French, and German and
datasets, organizing challenges and workshops. publishes various data products for research. Other national patent of
fices also publish patent data, e.g., there has been research conducted
involving documents from the Japanese, Chinese, and Russian patent
2
https://www.wipo.int/about-ip/en/artificial_intelligence/search.jsp. offices.

2
R. Krestel et al. World Patent Information 65 (2021) 102035

Table 1
Major patent offices have recognized the potential of deep learning and started initiatives to foster its application.
Task EPO USPTO WIPO

Patent Automatic pre-classification, re-classification Concept questioning using chat bots to assist automatic Automatic categorization based on IPC main
classification using CPC scheme claim analysis and classification class, sub-class or main group
Prior art search Gold standard generation, automatic annotation, Search whole document against corpus of grants and Cognitive or semantic search using AI assisted
search, and query generation pre-grants tools
Data analysis Develop open source libraries to explore data and Browser-based advanced patent analytics Economic and strategic analysis
technology trend analysis
Patent Automatic annotation and exclusion detection Patent quality assurance using text analytics, advance Advance big data analysis to automate
examination data analytics for statistics description examination workflow
Image analysis Automatic image and figure search Image search for patents and trademarks Image search within global brand database,
patent search using images

shifted from retrieval to machine translation of patents [64].4

Table 2
TREC-CHEM organized a shared task to identify chemical structures
Datasets and the papers that make us of them. NAT (national: Chinese, Japanese,
in patent documents from 2009 to 2011 [65].5
or Russian patents), COLL (curated collection: NTCIR or CLEF-IP), REP (KISTA
or EPO search reports), LEGAL (post-grant documents: lawsuits, litigation).
Dataset Used in
2.3. Reports

USPTO [3,14,18,23,24,32,34,36,39,41,45,49,54–59,61,75,
Besides patent collections, there are also a couple of related docu
80–82,87,102,104]
WIPO [2,5,34,81,82] ments that are explored in the context of patent analysis. Among them
EPO [5,34,41] are trend and search reports, landscaping reports, or general reports
NAT (national) [34,47,48,60,63,67,70,103,105] about trending technologies.
COLL (curated [2,40,41,58,86] KIPO Reports: In addition to the patent offices already mentioned,
collections)
REP (reports) [23,45,79,104]
the Korean Intellectual Property Office (KIPO)6 is also active in patent
LEGAL (post-grant [61,94] analysis research. Along with patent documents, KIPO releases yearly
documents) trend reports (KISTA7) and landscaping reports (KIPRIS8).
Gartner’s Hype Cycle: Gartner’s Hype Cycle for Emerging Tech
nologies9 describes technology trends and classifies them into different
Table 3 stages based on maturity. This can be used as ground truth for trend
Sources of patent collections. prediction [104].
Dataset Collection Source
EPO Search Reports: Search reports published by patent offices are
a valuable resource especially for patent retrieval research. In particular,
USPTO Bulk https://bulkdata.uspto.gov/
to make sure an invention is novel and non-obvious, search reports help
USPTO 2 M http://mleg.cse.sc.edu/DeepPatent/index.html
WIPO Bulk https://www.wipo.int/classifications/ipc/en/ITsu by providing citations to the existing similar inventions [62]. These
pport/Categorization/dataset/ inventions may be described in patents or pending applications. The
EPO Bulk https://www.epo.org/searching-for-patents/data/bulk- search reports can be utilized as a ground truth resource for training
data-sets.html machine learning models for prior art search. They play an important
NAT Chinese https://oversea.cnki.net
Japanese https://www.j-platpat.inpit.go.jp/
role in the patenting process and document how the prior art search was
Russian https://new.fips.ru/publication-web/?lang=en conducted and what relevant documents were found by the examiner.10
COLL NTCIR http://research.nii.ac.jp/ntcir/data/data-en.html According to the rich format citation standard of EPO and WIPO, the
CLEF-IP http://www.ifs.tuwien.ac.at/~clef-ip/download-cent relevant passages are cited and the citation is categorized for each found
ral.shtml
prior art document. The categories indicate in which way the cited
REP KISTA http://biz.kista.re.kr/patentmap/
Search EPO https://www.epo.org/searching-for-patents/legal/regi documents are relevant to the patent application’s claims, such as
ster.html technical background (A), novelty-destroying (X), or inventive step
LEGAL Litigation http://lexmachina.com; http://www.patexia.com/; htt (Y).11
p://legal.thomsonreuters.com
Various https://developer.uspto.gov/data?search_api_fullte
USPTO xt=csv 2.4. Post-grant documents

A further interesting data source origins from post-grant proceedings

2.2. Curated collections from courts and offices. Such proceedings include office proceedings (e.
g., opposition proceedings in Europe or post-grant review and inter
To promote computer science research in the patent domain, partes review (IPR) in the US) and court proceedings at national courts
different initiatives were founded. They provide a forum for researchers
by defining shared patent analysis tasks, curating patent collections, and
organizing workshops: 4
http://research.nii.ac.jp/ntcir/index-en.html.
CLEF-IP organized shared tasks between 2009 and 2013 and thereby 5
https://trec.nist.gov/data/chem-ir.html.
promoted several patent analysis tasks, in particular, cross-language 6
https://www.kipo.go.kr/.
patent analysis [74].3 7
http://biz.kista.re.kr/patentmap.
NTCIR was the first initiative focusing on patent retrieval. Patents 8
http://www.kipris.or.kr/khome/main.jsp.
9
were a major focus since early on (2001), while in later years, tasks Understanding Gartner’s Hype Cycles https://www.gartner.com/e
n/documents/3887767.
10
https://www.epo.org/applying/european/Guide-for-applicants/html/e/ga
_c5_2_3.html.
3 11
http://www.clef-initiative.eu/. https://www.epo.org/law-practice/legal-texts/html/epc/2013/e/ar54.
html.

3
R. Krestel et al. World Patent Information 65 (2021) 102035

(e.g., the German federal court of justice). Post-grant office proceedings For example, in Germany and Switzerland patent documents (as
give third parties the possibility to challenge the validity of a granted published by the patent office) are explicitly excluded from copyright
patent, e.g., to request revocation due to novelty destroying prior art protection (Section 5(2) German UrhG, and Art. 5 d. Swiss CopA). In the
which has not been found by the patent examiner. The documents UK patent specifications published after 1989 may be reproduced for the
originating from these proceedings may thus complement the search purpose of “disseminating information”, but other uses are prohibited
reports from the pre-grant examination proceedings. One example for without a license from the copyright holder.15 In the USA, the text and
these kinds of documents can be found in Rajshekhar et al. [77], who drawings of a patent are typically not subject to copyright restrictions
provide a data set of PTAB prior art references extracted from USPTO according to the USPTO,16 even if there are limited exceptions reflected
rulings.12 in 37 CFR 1.71(d) & (e) and 1.84(s). At least for US patent applications it
Post-grant court proceedings mostly concern litigation proceedings is thus possible to include a copyright notice or mask work notice (cf. 37
due to an alleged patent infringement. Accordingly, these documents CFR 1.71 (d) and (e)). However, inclusion of such a copyright notice or
relate to the question whether a contested product of a third party in mask is only allowed if the applicant acknowledges at the same time that
fringes a patent. Documents originating from court proceedings facsimile reproduction (i.e., photocopies) are allowed.
comprise court decisions (also referred to as case law) and party sub
missions (including evidence). Meanwhile, decisions are often publicly 3. Representations
available (e.g., on a court’s website13), while party submissions are
usually not published and only available under specific conditions (e.g., Patent documents consist of text data, metadata, such as citation
in Germany due to §299 ZPO). There also exist datasets related to patent information, and sometimes image data. These data need to be pre-
litigation proceedings. For example, USPTO provides a dataset with processed into a suitable (vector) representation to allow deep
detailed patent litigation data on more than 81,000 unique district court learning models to use the data as input. Table 4 lists the various rep
cases filed between 1963 and 2016.14 The data include a variety of in resentations, which can be divided into representations learned by deep
formation, including parties and attorneys, cause of action, location, and learning methods, such as word2vec [69], and “raw” input representa
litigation history. In addition, commercial services exist that offer liti tions, as used primarily in traditional machine learning models.
gation data, which is used by researchers (see Table 3). Deep learning models can use raw input directly to learn a seman
tically meaningful representation given enough training data. The
imprecise, labor-intensive step of extracting features is then unnec
2.5. Copyright protection
essary. These learned representations already contain valuable infor
mation so that they can be directly used for certain tasks and don’t
Machine learning approaches rely on the availability of training
data. Complex deep learning models require large amounts of data to require further processing by deep learning methods. Examples are
clustering of patents based on their learned representations, or visual
learn and thus copyright issues need to be considered. The patent
domain seems to be suited very well to deploy deep learning methods. izing patent similarity by using dimensionality reduction [68] of the
embedding space. Especially embeddings of words, i.e., the represen
On the one hand, there are huge amounts of publicly available patent
documents, that are often annotated by domain experts, e.g., with re tation of words as dense vectors, can also be used as input for traditional
machine learning methods [5,34,36,39,49,94,102,103]
gard to the international patent classification (IPC) scheme. This saves
additional, expensive, manual labeling of the data for training purposes. Before deep learning revolutionized machine learning in many areas,
extracting and selecting appropriate features was a major task and
On the other hand, published patents and patent applications are
crucial processing step for traditional machine learning approaches.
generally excluded from copyright protection. That is to say, since such
Especially text and image data needed to be transformed into features
patent documents are published by a patent office, they have a public
that could be used by methods such as support vector machines or de
domain status which shall not be subject to any copyright claims. Ac
cision trees. This type of features is still valuable when available, e.g., as
cording to this general rule, anyone can reproduce published patent
metadata or citation information. However, in the context of deep
documents, at least as long as the content is not altered and the source is
learning, the raw input text or image data is typically represented by
correctly cited, i.e., the patent (application) number.
learned representations, usually called embeddings, which automates
However, there may exist exceptions to this general rule. Potential
the cumbersome feature extraction and selection process. It is also
copyright claims may concern very different aspects in a patent docu
possible to combine traditional features with embedding representa
ment. Just to give some examples, ownership might be claimed for
tions, either by combining the different input data or by combining the
drawings of a third party, for specific boilerplate paragraphs used by a
representations. In the deep learning for patent analysis literature, there
law firm, or for third-party text or drawings incorporated in a patent
are three types of traditional features used: numerical, citation, and raw
application which itself was protected by copyright. Moreover, even if
image features. Alternatively and more common for deep learning ap
there exist several multilateral international copyright treaties, copy
proaches are different embeddings to represent the input data.
rights basically constitute national rights. Hence, copyright protection
can vary between the national jurisdictions (i.e., between the single
3.1. Numerical features
countries). For example, any copyright claims with regard to a European
patent application would be subject to the individual national laws of
These are quantities that can be extracted either manually or auto
the member states of the European Patent Convention. A thorough
matically from patent data. Examples are citation counts [61,104], dates
analysis of the different national laws regarding potential copyright
[61], number of claims [59,104] or other numeric quantities derived
protection of patent documents goes beyond the scope of this survey. For
from metadata [59,61,104], but also categorical features represented as
this reason, only a brief summary of the regulations in some selected
one-hot encoded vectors, e.g., Ref. [3] or class codes [3,61].
jurisdictions is given in the following which confirm the above-
mentioned general rule.

12
https://data.world/wzadrozn/ptab-prior-art.
13 15
https://juris.bundesgerichtshof.de/cgi-bin/rechtsprechung/list.py?Gericht https://webarchive.nationalarchives.gov.uk/20140603113132/h
=bgh&Art=en&Sort=3. ttp://www.ipo.gov.uk/types/copy/c-other/c-other-faq/c-other-faq-type/c-othe
14
https://www.uspto.gov/about-us/news-updates/patent-litigation-data-th r-faq-type-patspec.htm.
16
rough-2016-now-available. https://www.uspto.gov/terms-use-uspto-websites.

4
R. Krestel et al. World Patent Information 65 (2021) 102035

3.2. Citation networks Table 4

Representations and the papers that make use of them.
Another valuable data source for patent analysis are the references. Representation Used in
They can be extracted and a citation network can be constructed. This
NUM (numeric features) [3,59,61,104]
network contains a lot of interesting information which can be combined CIT (citation [59,61,70,75]
with other data, such as texts or images, within deep learning models. networks)
The citation network can be used directly as input for further processing, IMG (image data) [41,48,60]
e.g., to detect communities [70], or as input to learn a representation WE (word domain- word2vec [5,18,39,57,58,81,86]
embeddings) specific
using graph embedding methods [59,61,75]. fastText [18,81,82]
integrated [40,47]
3.3. Image data general- word2vec [2,3,5,14,23,24,32,34,36,
purpose 49,59,61,63,67,75,94,102,
103]
Deep learning models do not require the extraction of features be fastText [2,23,49]
forehand. This makes deep learning very successful when it comes to GloVe [2,18]
image data. Here, the extraction of salient features is especially DE (document [14,23,36,45,63,87]
cumbersome and making use of deep learning to learn suitable repre embeddings)
GE (graph [23,59,61,75]
sentations is extremely beneficial. As input serve the raw pixels from
embeddings)
images of sizes between 100 × 100 and 300 × 300 [41,48,60]. CTX) (contextual word [23,54–56,79]
embeddings)
3.4. Word embeddings

encoder neural network architectures, a variety of sentence embeddings

Representing words in a meaningful way, in contrast to one-hot
[15,25,78] has been developed. In general, sentence embeddings
encoded bag-of-words vectors, is one of the reason for the success of
outperform word embeddings on natural language processing down
deep learning for natural language processing. Word embeddings [13]
stream tasks [15]. Their disadvantage is the complexity of the under
are dense vectors that are learned from large text collections by looking
lying models and thus their need for larger training datasets and more
at the context of each word. The most employed embedding method (not
computational power.
only) in the patent domain is word2vec [69]. It learns vector represen
tations of words in an unsupervised fashion, i.e., by training on a large
collection of documents, words with a similar meaning get assigned a 3.6. Graph embeddings
similar vector. There are two other word embedding methods used for
patents, namely fastText [9] and GloVe [71]. Instead of using The idea of representing text data as semantically meaningful vectors
pre-trained embeddings, it is also possible to integrate this step into a has also been adopted to represent graph data [33,72]. Besides these
larger deep learning framework and learn embeddings implicitly [40, general approaches to embed graphs [12], researchers in the patent
47]. Further, one can distinguish the approaches based on the document domain came up with their own methods to embed graph data prevalent
collection that was used to learn the word vectors. Domain-specific in patent documents, e.g., in the shape of citation networks [59,61,75],
embeddings were trained on a collection of patents or related docu or adopted other approaches [23,84].
ments, while general embeddings were trained on huge amounts of
general text, e.g., on Wikipedia and news articles. These pre-trained
general word vectors can be downloaded from the Web.17 3.7. Contextual word embeddings
Domain-specific word vectors have the advantage, that they contain also
representations of highly domain-specific words. This is especially The most recent deep learning methods to represent textual data are
important in the patent domain. Word vectors trained on millions of context-dependent word embeddings [28]. The most popular one of
patent documents are also available for download18 and can be used these methods is BERT [27], which is based on a transformer architec
directly to represent words in the patent domain for downstream tasks. ture [95]. It learns individual word vectors for each word in a sentence
Alternatively, one can learn domain-specific representations by training dependent on the other words in the sentence. Alternative approaches
these methods on selected document collections. Table 4 lists the are ULMFiT [38] or ELMo [73], and, especially for text generation, the
different methods used in the corresponding papers. different versions of GPT [11]. On the one hand, these contextual word
embedding models have billions of parameters and need a lot of training
3.5. Document (sentence/paragraph) embeddings data. On the other hand, they can be trained in parallel and they
represent textual data very accurately and thus make it easy for down
When it comes to representing phrases, sentences, paragraphs, or stream tasks, e.g., patent classification, to produce great results by just
whole documents, various methods have been proposed. The simplest adding one more (fully connected, dense) layer on top of the
approach takes the average of the word embedding vectors of a text, representation-generating deep neural network. This step is called
ignoring the ordering of the words in the text. Given that the meaning of fine-tuning and has been applied to many NLP-tasks very successfully:
a sentence is more than the sum of the meaning of its words, this simple you take a pre-trained (general) BERT model and fine-tune it by training
approach cannot capture subtle semantic differences on a sentence level. the exchanged last layer on task-specific data. Google has recently
More sophisticated approaches, such as doc2vec [51], try to capture not released19 a BERT model pre-trained on over 100 million patent publi
only the semantics of individual words, but of longer text parts (sen cations from the U.S. and other countries [83]. Close to the patent
tences, paragraphs, or documents). With the upcoming of more complex domain is LEGAL-BERT [16], which was pre-trained on court cases and
legislation documents. Besides BERT, which becomes more and more
popular in the patent domain [23,54,56,79], a recent approach tries to
17
word2vec: https://code.google.com/archive/p/word2vec/; fastText: use GPT to generate patent claims [55].
https://fasttext.cc/docs/en/crawl-vectors.html; GloVe: https://nlp.stanford.
edu/projects/glove/.
18 19
https://hpi.de/naumann/projects/web-science/deep-learning-for-text/pa https://cloud.google.com/blog/products/ai-machine-learning/how-ai-imp
tent-classification.html. roves-patent-analysis.

5
R. Krestel et al. World Patent Information 65 (2021) 102035

4. Deep learning architectures recognizing different kinds of patterns in the data.

Deep neural networks can be used either to compute representations
Before we dive into the different architecture types, we give a brief of the data, e.g., word embeddings or contextual word embeddings
introduction to the basic concepts of deep learning. For a much more (BERT, etc.) as discussed earlier, or to solve a specific task. In this sec
detailed introduction, we recommend Goodfellow et al. [30], Alom et al tion, we focus on the architectures used to solve a task, i.e., to classify,
[4]. for a general introduction or Young et al [99]. for a more specific predict, cluster, etc. A variety of deep learning architectures have been
introduction to deep learning for NLP. developed, which are tailored to different problem settings. Table 5 lists
the most popular neural network architectures used in patent analysis.
4.1. Deep learning basics Fully connected network: The most basic deep neural network
architecture consists of dense layers (also called fully connected layers),
Deep learning can be seen as learning a transformation of the input i.e., each neuron in one layer is connected to each neuron in the previous
data to the output data [30]. This is a major difference in comparison to layer. In the context of patent analysis, networks consisting only of dense
traditional machine learning. Apart from that, deep learning shares the layers are rarely used. More often, these layers are used to combine or
basic building blocks with the supervised machine learning process. bring together the output of other networks [3,23,24,61,63,75].
Convolutional neural network: CNNs [52] are predominantly used
• Input data: In the patent domain, input data can be in the form of for image classification because their architecture allows recognizing
texts, images, or references from patent documents, but also from local, spatial patterns, e.g., for a group of neighbored pixels. But they can
external sources, such as court documents or citation counts. also be employed for one-dimensional data, e.g., text. CNNs have less
• Expected output data: The output is task specific: one or multiple, parameters to train than more complex architectures such as RNNs,
discrete or continuous variables. E.g., for a classification task, the which makes them attractive also for textual data. In the context of
outputs are different class labels, e.g., the classes that should be patent analysis, CNNs were deployed to solve different tasks related to
assigned to a patent. image data [41,48,60] as well as text data [2,24,57–59,61,63,67,75,
• Model: A model, such as a complex neural network or a naive Bayes 105].
classifier, whose parameters are adjusted based on the input data and Recurrent neural network: RNNs were developed to work with
expected output data, is the final result of the learning process. sequences of data (time series, text sequences). They can memorize
• Metric: A way to measure the progress of the model is necessary earlier parts of a sequence for the classification of later parts. They are
based on task-specific loss functions, e.g., the fraction of correct typically either based on long short-term memory (LSTM) cells or gated
classifications. recurrent units (GRUs), but there is also a simple version lacking these
complex extensions. Simple RNNs are used, e.g., for machine translation
Deep learning is a special kind of machine learning that uses multi- of patents [40].
layer, artificial neural networks. Learning means that the parameters of Long short term memory network: Neural networks based on
the network, i.e., the weights of the layers, are modified in a way that a LSTMs [37] are the most popular RNNs and are frequently used for
loss function is minimized for a set of training samples — the training patent analysis tasks [3,18,24,32,47,60,70,80,86,87].
dataset. The loss function needs to be differentiable so that stochastic Gated recurrent unit network: Compared to LSTMs, networks
gradient descent can be used to find a local minimum of the function. If using gated recurrent units have a smaller number of parameters. From
the training data comprises labeled ground truth pairs of inputs and an application point of view, the difference to LSTMs is often negligible,
expected outputs, it is called supervised learning in contrast to unsu with comparable performance. This results in GRUs being also often
pervised learning. The word deep in deep learning is related to the used for patent analysis [18,63,67,81,82,105].
number of stacked layers that learn a hierarchical representation mainly Sequence-to-sequence network: SEQ2SEQ network architectures
in computer vision tasks. To summarize, the core principle adopted in were developed in the context of machine translation [21,90]. These
neural networks is to learn representations from the given input data in a architectures can learn a “translation” from generic input to output data
layered transformation which provides a mapping from input data to and are therefore not limited to machine translation. A
output data. sequence-to-sequence network consists of two components: an encoder
Deep learning methods evolved from more basic neural networks. and a decoder. In the context of patent analysis, SEQ2SEQ architectures
While the story of artificial neural networks goes back more than 70 are used in various contexts [40,45,80]. Some of the architectures allow
years, we focus on the most recent development, starting with con to choose suitable networks for the encoder and the decoder parts. One
volutional neural networks. The basic artificial neural networks are approach uses, e.g., a CNN to encode image data and then an LSTM to
similar to other traditional machine learning approaches, such as sup decode it to the output text data [60].
port vector machines or decision trees, since they require numerical Generative adversarial network: GANs consist of two sub-
(hand-crafted) features as input and they show similar performance and networks: a generative network, which generates candidates and a
can be applied to similar tasks. Nevertheless, these basic neural net discriminative network, which aims to distinguish generated candidate
works set the ground for modern deep learning and there are several
research papers making use of neural networks to solve patent analysis
tasks [19,44,50,53,91–93,104]. We briefly introduce the main deep Table 5
neural network architectures in the following. For a more detailed Deep learning architectures classified after the primary layer type employed and
introduction, we refer to the original papers for the individual archi the papers that make use of them.
tectures referenced in the corresponding subsections. DL Architecture Used in

FC (fully connected network) [3,23,24,61,63,75,104]

4.2. Deep neural network architectures CNN (convolutional neural network) [2,24,41,48,57–61,63,67,75,105]
RNN ((simple) recurrent neural network) [40]
LSTM (long short term memory network) [3,18,24,32,47,60,70,80,86,87]
Although the breakthrough of deep learning, especially in computer
GRU (gated recurrent unit network) [18,63,67,81,82,105]
vision and in natural language processing, happened rather recently, the SEQ2SEQ (sequence-to-sequence network) [40,60,80]
ideas and basic concepts of deep learning are older [85]. Today, the GAN (generative adversarial network) [104]
terms deep learning and deep neural networks typically refer to convolu AE (autoencoder network) [45]
tional neural networks (CNNs) and recurrent neural networks (RNNs). TRANS (transformer-based network) [23,54–56,79]

The difference of these two architectures is that they are tailored to

6
R. Krestel et al. World Patent Information 65 (2021) 102035

samples from real data [31]. Roughly speaking, the two sub-networks 5.1. Supporting tasks
compete during the training process and thereby improve each other’s
parameters. One major area of application for GANs is the generation of There are a couple of tasks that can be considered supporting tasks,
authentic looking, but artificially generated data. In the context of in the sense that they produce results that can be used to analyze patents
patent data, GANs have not been used for this task so far. One reason in a later step. Among these supporting tasks, there are three broader
might be the low quality of the resulting texts. Even when solving the areas that were investigated in the context of patent data employing
problem of the non-differentiable selection of the next token to generate deep learning methods: extraction of information from patents, segmen
a sentence with an RNN [100], the output of these GAN models is far tation of patent documents into semantically meaningful smaller parts,
away from genuine-looking texts. Besides generating texts, GANs can and translation of patents from a foreign language.
also be used to generate other types of data. There is an approach using Extraction: Named entity extraction is a very important task in
GANs to generate the features of artificial samples to create more various domains, e.g., from news articles or tweets. In the patent
training data for standard machine learning approaches in the patent domain, automatically extracting chemical named entities [102] or
domain [104]. biomedical named entities [86] is particularly interesting. Not only
Autoencoder network: The main idea behind AEs [96] are to learn entity mentions can be extracted, but also the relations between a pair of
a condensed, lower-dimensional representation of the input data in an entities [18]. Besides entities, extracting general keywords from patent
unsupervised fashion. To this end, the autoencoder tries to reconstruct texts can also be very useful, e.g., for classification [39].
the encoded data in a decoding step. It learns to represent the data in a Segmentation: Patents or patent applications are semi-structured
way that keeps the reconstruction error minimal. In the patent domain, documents consisting of different sections. If patents are not available
AEs have not been employed very frequently [45]. An extension to plain in electronic form, this structure might get lost. Deep learning based
autoencoders are variational AEs [46] which allow to generate data, but representations can be used to segment large OCRed text into predefined
haven’t been employed for patent data yet. sections [14]. But even within a large section, such as the “description”
Transformer-based network: Transformer models [95] have been section, text can often be further segmented, primarily into the part
developed in the context of machine translation and consist of an describing the invention and the part describing experiments [34].
encoder and a decoder. As such, they form the foundation of contextual Translation: If patents are only available in a certain language,
word embedding models, such as BERT [27], based on a transformer’s translating these patents is the first step to further analyze them. Given
encoder, or GPT [11], based on a transformer’s decoder. These models the huge success of deep learning methods in the area of machine
learn powerful representations (therefore categorized as “representa translation, it comes with no surprise that there has been research
tion” in this survey). By exchanging the last layer of BERT with a focusing on patent texts in the context of translation [40,47].
task-specific layer, these transformer-based network architectures can These supporting tasks are useful as a preparation step to then
be trained on different tasks. This step is called fine-tuning. Since the further analyze the results or use the results in subsequent steps, e.g., for
underlying representations are so powerful, one fully connected layer on classification.
top is sufficient to get very good results. In the patent domain, re
searchers have fine-tuned BERT for different tasks [23,54,56,79]. Some 5.2. Classification
experiments have been conducted to use GPT to generate patent texts
[55]. Patent classification is the most prominent patent analysis task,
Besides these main concepts, there are a couple of further improve where one needs to assign a classification code to a patent document
ments and specializations. An important one is the attention mecha based on IPC or CPC classification schemes. In practice, patent docu
nisms [7], which allows neural networks to learn which part of the input ments are analyzed manually and then the classification codes are
is most relevant for the desired output. A side effect of the attention assigned by the applicant and patent officers. These manual labeling
mechanism is that the words on which the attention is placed can serve tasks require domain expertise and are time-consuming. Besides tradi
as an explanation for the network’s output, e.g., to explain a classifica tional machine learning, deep learning techniques can be used for
tion decision. Attention can be used on top of convolutional or recurrent automatic patent classification. Since the classification schemes are hi
layers. It is especially popular in combination with erarchical, different variants for the setting exists, e.g., only predicting
sequence-to-sequence network architectures. The training of architec the top-level class. This is the simplest version and in practice not very
tures based only on attention without an underlying LSTM or GRU layer useful. More interesting settings require to prediction of the subclass up
can be better parallelized and thus enables the processing of more data to a certain level. This setting was also used for large shared-task com
in shorter time. Transformer-based networks make heavy use of atten petitions (e.g., CLEF-IP 2010 [8]). Regarding the evaluation, there are
tion allowing to learn huge models. also different variants possible. Given that a patent can have multiple
subclasses assigned to it, three measures can be deduced [29]: The
5. Patent analysis tasks straightforward measure compares the top prediction with the main
subclass assigned to the patent. Another measure compares the top three
There are different patent analysis tasks that have been automated at predictions with the main subclass. And the third measure compares the
least partially in the past. Table 6 lists different tasks together with the top prediction with the main class and the incidental subclasses assigned
publications that propose deep learning methods to automate them. to the patent.
In the context of machine learning for patent analysis, classification
is by far the most popular task. One reason for this is the availability of
Table 6
Patent analysis tasks and the papers that addressed them. large quantities of training data, i.e., patent documents with assigned
class labels. Another reason is the straightforward setting: given a
Analysis Task Used in
document, predict the subclass codes. The deep learning approaches
SUP (supporting tasks) [14,18,34,39,40,47,86,102,103] differ therefore only slightly, e.g., with respect to the data they use as
CLASS (classification) [2,32,56–58,63,67,80–82,87,105]
input (abstract, claims, metadata, etc.) or the network architecture
RETR (retrieval) [3,5,23,36,45,49,57,63,75,79]
QUAL (quality analysis and market valuation) [24,59] (GRU, LSTM, etc.). Some approaches also explicitly model the hierarchy
TECH (technology forecasting) [70,104] of the classification codes [80]. Besides predicting classes based on a
GEN (data generation) [54,55] classification scheme, other classification tasks are possible when
LIT (litigation analysis) [61,94] analyzing patent data. One approach tries to classify citations into
CV (computer vision) [41,48,60]
applicant-provided or examiner-provided [63]. Another approach uses

7
R. Krestel et al. World Patent Information 65 (2021) 102035

classification to train representations to improve clustering later on 2005 provide datasets for passage retrieval. In this task, relevant pas
[75]. In general, predefined class labels can be used to learn semanti sages (paragraphs) need to be retrieved from a patent document. For
cally meaningful representations, since the class labels work as a very example, an instance of this task might consist of a patent application
short summary of the patent itself [57]. and a prior art patent. Only those passages from the patent shall be
retrieved that are relevant to judge the novelty of the application’s
5.3. Retrieval claims. Since this task is very complex and even for experts very hard,
not many have tried to automate this task. Only one very recent
Finding patents is important for a variety of reasons and with approach using EPO search reports to learn to match claims and para
different intentions. In our description of the retrieval subtasks we graphs [79] has been proposed. It is based on contextual word embed
mostly follow Shalaby and Zadrozny [88], who provide a good overview dings and trained with positive and negative examples of matching
of patent retrieval tasks. In addition, we include passage retrieval and paragraphs and claims.
clustering as further subtasks, since there are a couple of papers dedi Clustering: Clustering is the unsupervised grouping of patents based
cated to them. on a similarity measure. In contrast to classification, where class labels
The most obvious subtask is finding prior art for a given patent exist that can be learned, clustering does not need any labels. In the
(application). But also finding patents related to a specific area or context of patent analysis, clustering is often used in combination with
dealing with a specific topic, often called landscaping, is an important visualization methods, grouping similar patents close together in a
retrieval task. Finding not patents as such but particular sentences or vector space and then visualizing a 2-dimensional projection of this
paragraphs within patent documents is called passage retrieval. And space. It is often possible to use the semantically loaded embeddings
finally, a more general, indirect retrieval task is clustering patents: for directly to compute similarity between words, sentences, or documents
each patent, the most similar patents need to be identified. [57]. The evaluation of clustering is then more difficult, since no
Prior art search: Prior art (in other words state of the art or back ground-truth labels exist. Reports from patent offices can be used to this
ground art) is composed of all publicly available information that has end, e.g., from KIPO [45]. Simple algorithms, such as k-means, can be
been made accessible in any readable form prior to a given date that used to cluster the patents based on similarity and the results can be
might be similar to a patent’s claims. If prior art already describes the visualized using dimensionality reduction methods [75], such as t-SNE
same invention as a newly filed application, then another patent on that [68].
respective invention cannot be granted. Patents mistakenly granted after
the publication of such prior art can be revoked. There are different 5.4. Quality analysis and market valuation
reasons for conducting prior art search: Related work search needs to be
done by the patent applicant to find and list related patents. Novelty Analysing the quality of patents plays a major role in determining the
detection or patentability search is carried out by patent applicants, economic value of a patent portfolio. A quality analysis and evaluation
patent examiners, patent attorneys, patent agent professionals. They typically relies on domain expertise, technical knowledge and other
search in patents and patent application databases as well as in other factors, such as market and finance strategies. Such an analysis is of
scientific literature to identify the novelty of an invention. Novelty great interest for patent applicants, venture capitalist, policymakers, and
detection takes place before and after an inventor files a patent appli business organisations. Although there are several metrics that act as
cation. Validity detection tries to discover a prior art overlooked by the consensus to measure the quality of an invention, having global metrics
patent examiner in order to invalidate a patent. A validity check is from major patent offices is challenging. Major offices, such as USPTO
carried out by patent infringement entities or patent owners. They and EPO, approach the problem with custom metrics. For instance,
search in patents and patent application databases, other scientific USPTO proposed indicators with respect to a product, process and
literature, technical society websites, and archives, usually after a patent perception.20 Others [35] identified forward citations in patents as a
was issue. Infringement search or freedom-to-operate are a special form of reliable indicator to detect the value of a patent. Other indicators
prior art search, where the purpose is to discover whether claims of include quality of the claims, family size of the patent, and the validity of
patent applications and patents are infringed by any process or product. the patent.
It is carried out by patent attorneys and professional patent searchers Approaches using deep learning are still very rare. One approach
(often directed by attorneys), both in patents and patent application focuses on the citation network of a patent to assess its value [59].
databases. This form of search is done before and after an inventor gets a Another approach tries to predict the number of forward citations as an
grant. Traditionally, the whole process of prior art search was carried indicator of patent value using abstract and claim text as well as
out through expert-generated queries or term-based search methods. hand-crafted features [24].
These approaches consume a lot of human labour, require domain
expertise, and are also often associated with sub-optimal results. To 5.5. Technology forecasting
alleviate these problems, several deep learning approaches were pro
posed [5,36,49]. Technology forecasting provides an opportunity for both, public and
Landscaping: Related to prior art search is automated patent land private enterprises, to predict upcoming technologies and make sure
scaping [3]. Patent landscaping helps in finding technology related about their capital investment. Patent documents are a major source to
patents to avoid infringement issues and also to asses the trends in base such predictions on. Technology forecasting started with unsu
technology. The change in technology may lead to several implications pervised approaches, where text and data mining techniques were
towards business, economy, and policies. It has been complex and time employed [17,20]. Several approaches [10,22] considered citation
consuming process to conduct a careful technology survey. Abood and networks and Bayesian models for clustering to provide technology
Feltenberger [3] proposed an approach to patent landscaping using 5.9 clusters. However, these unsupervised methods lack external domain
million USPTO patent abstracts, citations, and CPC codes. Patent doc knowledge and hence must incorporate domain experts interpretations
uments are further used to generate seed sets by human experts, perform in the end, which are time-consuming and costly.
feature extraction, and create embeddings. Choi et al. [23] presented One deep learning approach analyses citation networks and then
benchmark datasets from KISTA trend reports. They propose the use of tries to predict the number of future citations within a community using
graph embeddings based on metadata, such as USPC, IPC, or CPC codes. LSTM [70]. Zhou et al. [104] proposed an approach to augment training
Another approach making use of citation information finds similar
technology patents [63].
Passage Retrieval: The workshops of CLEP-IP 2013 and NTCIR 20
https://www.uspto.gov/patent/initiatives/quality-metrics-1.

8
R. Krestel et al. World Patent Information 65 (2021) 102035

data using GANs to make up for a lack of annotated data. To this end, Captioning: Semantically understanding what is depicted in an
features are extracted using classical methods, then, synthetic data is image is a very active research field. One application is to generate
generated resembling the extracted feature combinations, and finally, captions automatically describing what can be seen on an image. If
classical machine learning methods are trained on the augmented data. successful, these generated captions can be used for classification or
retrieval, e.g., to find semantically similar images. Compared to stan
5.6. Data generation dard image captioning, which is applied to photos, images in patents are
much more difficult to describe meaningfully. One team of researchers
The most progress of employing deep learning in the patent domain proposed an image captioning model using a combination of CNN and
could be gained by automating the patent writing process itself. Ad an LSTM [60]. The authors considered design patents to train an
vancements in language modeling and also GANs make it at least encoder-decoder model. The CNN is pre-trained on ImageNet [26] and is
theoretically possible to generate patents or patent claims automatically used to encode design patent images in 300-dimensional vectors.
given some seed information. Systems, such as GPT-3 [11] have Further, the text descriptions of the images are encoded with word
demonstrated their capabilities in generating text. The question re embeddings of the same dimensionality and a mapping from image
mains, how to improve the quality of the generated texts to make it features to word embeddings is learned.
interesting for the patent domain. Image-Based Retrieval: Image data are also used for patent
In preliminary work, Lee [54] proposed a transformer model for retrieval based on the content of images. Especially when dealing with
generating claims. He proposes to fine-tuned a GPT-2 model by design patents, images are the best cue to find relevant patents. This
personalizing the training data. In follow-up work [55], the proposed field is called content-based image retrieval [97]. Deep learning models
model was realized and patent claims actually generated. The reported can be utilized to improve over classical approaches. One approach [41]
results are still very poor, leaving a lot of room for improvement. uses a dual VGG network [89] to learn the representation of two images
by minimizing the cosine distance of similar images as defined by the
5.7. Litigation analysis IPC class labels.

Patent litigation is a legal process where potential patents lead to a 6. Literature discussion
dispute or litigation between any two companies by prohibiting the
development of business strategies. This process often helps the com In this section, we summarize our findings on the literature discussed
panies to protect their profits and other proprietary values. The identi above. Table 7 gives an overview in matrix-style, highlighting the main
fication of patents that might induce litigation between companies is a ingredients of each identified related work article. The timeline on the
tedious, costly and time-consuming process, which is carried out left emphasizes the growing number of research papers using deep
manually. It often comes with various intentions such as protection of learning in the field of patent analysis.
market shares and product features but also fighting competitors (patent Before the year 2016, automated patent analysis was conducted
war). In the patent domain, several feature engineering-based ap using traditional information retrieval and machine learning methods.
proaches were proposed to automate the detection of litigation risk, e.g., While there are still some approaches using these traditional methods
using collaborative filtering [42]. Litigation risk is highly related to today, the vast majority of research in the patent domain happens in the
patent quality and is therefore sometimes considered to be one facet of field of deep learning. Deep learning models consistently outperform
it. We treat it as a separate task, reflecting the different type of input data traditional approaches on perceptive tasks, i.e., tasks where semantic
that is necessary to asses litigation risk, namely legal documents in information, either from natural language or image data, plays an
addition to patents. important role. Given the complexity of patents, especially compared to
Combining legal documents and patents can be done using deep other textual data, such as news articles, product reviews, or tweets, this
learning methods. Liu et al. [61] made use of USPTO patents and Patexia advantage becomes even larger. Deep learning methods are able to
lawsuits21) to train a model to predict the risk of litigation, which in capture semantics much better and allow for a much more fine-grained
fluences the value of a patent. To this end, the authors proposed to analysis, e.g., in prior art search or classification. This comes with the
combine network embeddings learned from hand-crafted features and cost of typically requiring much more annotated training data. This is
word embeddings followed by a CNN to learn a representation for pat especially true for the large contextual word embedding models having
ents. Tensor factorization is used to predict the probability of litigation billions of parameters that need to be learned. One way to cope with this
based on this learned representation. Closely related to patent litigation is transfer learning: Instead of training solely on the in-domain training
is trademark litigation. One approach uses word embeddings to repre data, these complex models can be trained on general text to capture
sent trademark case judgments [94]. Exploring the learned space can common semantics, and then only require to be fine-tuned on domain-
help to find relevant precedents. Clustering using k-means is further specific training data. This concept is responsible for the large success
proposed to facilitate this search process. of contextual word embeddings, such as BERT, ELMo, and ULMFiT, for
domain-specific tasks.
5.8. Computer vision We already discussed the different ingredients of the considered
papers. Nevertheless, we want to briefly summarize the insights with
Inventors use figures, flowcharts or work-flows to depict their in regard to Table 7. While most research was and still is considering
vention. Analysing such image data is a great challenge and specific USPTO patents for their experiments, patents from other countries are
tasks include classification, image captioning, and image-based retrieval. not ignored. The advantage of English language patents, apart from their
Classification: A first step in analyzing images is the classification of commercial international importance, is that there are already pre-
what is depicted in the figure into more specific classes, such as technical trained models available for English and it is easier to compare with
drawing, chemical structure, sequence of gene, flowchart etc. The re other approaches when they report results on similar data. Further, the
sults can be used to improve search or enable faceted search for new possibilities of deep learning fosters the combination of patent
particular figure types. One deep learning approach that tackles this task documents and non-patent literature, such as court documents or re
specifically for patents uses CNNs to classify patent images into different ports. We expect to see more of this especially for technology forecasting
categories [48]. and quality and market valuation.
The vast majority of approaches uses word embeddings as a deep
learning method to represent patent documents. Some approaches
21
https://www.patexia.com/. combine word embeddings with other representations, such as graph

9
R. Krestel et al. World Patent Information 65 (2021) 102035

Table 7
Survey Summary. NAT (national: Chinese, Japanese, or Russian patents), COLL (curated collections: NTCIR, CLEF-IP, or
TREC-CHEM), REP (reports: Gartner, KISTA or EPO), LEGAL (post-grant documents), NUM (numeric features), CIT
(citation networks), IMG (image data), WE (word embeddings), DE (document/paragraph embeddings), GE (graph/
network embeddings), CTX (contextual word embeddings), FC (fully connected network), SEQ2SEQ (sequence-to-
sequence network), AE (autoencoder network), TRANS (transformer-based network), SUP (supporting task: extraction,
segmentation, or translation), CLASS (classification), RETR (retrieval: prior art search, landscaping, passage retrieval, or
clustering), QUAL (quality and market valuation), TECH (technology forecasting), GEN (data generation), LIT (litigation
analysis), CV (computer vision: captioning, classification, or image-based retrieval).

embeddings or document embeddings. Only a very small number of neural network layers to fulfill their tasks. Often, these kind of layers are
papers deal with non-textual input. This is either citation information or also used to combine two architectures and different input data.
image data. Numeric or discrete input data, e.g., metadata of patents, is Classification is the most popular patent analysis task. In its basic
rarely used and if, then in combination with embeddings. This is form, it is also the easiest task and the task with the most annotated
consistent with the promise of deep learning not requiring feature en training data, given that all published patents have classes assigned to
gineering and being able to extract the crucial information from the raw them. In addition, evaluating classifiers is rather simple since there are
(textual) input data. plentiful ground truth datasets available. Patent retrieval with all its
The employed deep learning architectures are more diverse. The subtask is also very popular. The remaining identified tasks have been
classic network architectures, CNN and LSTM, are the most popular for investigated only sparely so far. One reason for this is the more difficult
conducting patent analysis research, but the more complex architectures nature of these tasks. When even human experts do not agree on a so
are gaining momentum. ENDEC and GAN architectures are more lution, or the task requires a lot of common sense and background
specialized architectures and therefore not suited for all tasks. Never knowledge, then automatic methods still have a hard time. However, we
theless, we expect to see more of those architectures, especially for expect that more complex deep learning methods will be able to handle
difficult tasks. The large contextual word embedding models represent these difficult tasks better in the future.
data extremely accurately and therefore only require simple dense

10
R. Krestel et al. World Patent Information 65 (2021) 102035

7. Trends and conclusions [4] M.Z. Alom, T.M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M.S. Nasrin, M. Hasan,
B.C. Van Essen, A.A. Awwal, V.K. Asari, A state-of-the-art survey on deep learning
theory and architectures, Electronics 8 (2019) 1–66.
Currently, there is a trend towards training more complex neural [5] H. Aras, R. Türker, D. Geiss, M. Milbradt, H. Sack, Get your hands dirty:
network architectures with larger number of parameters and thus a need evaluating word2vec models for patent data, in: Proceedings of the Posters and
for larger training datasets. After bi-directional encoder representations Demos Track of the International Conference on Semantic Systems (SEMPDF),
2018, pp. 1–4.
with transformers (BERT) [27] and XLNet [98] with 340 million pa [6] L. Aristodemou, F. Tietze, The state-of-the-art on Intellectual Property Analytics
rameters, GPT-3 [11] pushed the limit to 175 billion parameters. Many (IPA): a literature review on artificial intelligence, machine learning and deep
architectures build on the underlying attention mechanism using learning methods for analysing intellectual property (IP) data, World Patent Inf.
(WPI) 55 (2018) 37–51.
transformer models [95]. These large models reveal their full potential [7] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to
for few-shot and zero-shot learning [76], e.g., with label embeddings align and translate, in: Proceedings of the International Conference on Learning
[80]. It is a challenge to access and handle the required amounts of Representations (ICLR), 2015, pp. 1–11.
[8] J. Beney, Lci-insa linguistic experiment for clef-ip classification track, in:
training data, e.g., more than 181 billion English words [11]. Thus, we Proceedings of the Conference and Labs of the Evaluation Forum (CLEF), 2010,
see another trend to reduce the amount of training data with the help of pp. 1–11.
transfer learning and training on auxiliary tasks to reduce the need for [9] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with
subword information, Trans. Assoc. Comput. Linguistics (TACL) 5 (2017)
labeled task-specific training data. This development shows a direction 135–146.
from supervised to self-supervised/semi-supervised training. [10] A. Breitzman, P. Thomas, The emerging clusters model: a tool for identifying
For the specific research direction of patent analysis with deep emerging technologies across multiple patent systems, Res. Pol. (RP) 44 (2015)
195–205.
learning, we envision new tasks. Patent text generation is a rather new
[11] T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal,
task and there are also almost no deep learning approaches for passage A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,
retrieval. The reason for this is the complexity of the task [79], which G. Krueger, T. Henighan, R. Child, A. Ramesh, D.M. Ziegler, J. Wu, C. Winter,
makes it much harder in comparison to standard document classification C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner,
S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language Models Are Few-
and requires other evaluation scenarios, e.g., a ranking of documents. Shot Learners, ArXiv E-Prints arXiv:2005.14165, 2020.
Further, instead of full documents, subsections of documents need to be [12] H. Cai, V.W. Zheng, K.C.C. Chang, A comprehensive survey of graph embedding:
matched. Finally, there are no standard labels or annotations available. problems, techniques, and applications, Trans. Knowl. Data Eng. 30 (2018)
1616–1637.
The extraction of information from search reports that can be used as [13] J. Camacho-Collados, M.T. Pilehvar, From word to sense embeddings: a survey on
labels is challenging. vector representations of meaning, J. Artif. Intell. Res. 63 (2018) 743–788.
Another new task that we expect to become more relevant is litiga [14] D.S. de Carvalho, M.L. Nguyen, Efficient neural-based patent document
segmentation with term order probabilities, in: Proceedings of European
tion analysis. While the vision of an artificial intelligence that can Symposium on Artificial Neural Networks, Computational Intelligence and
handle the entire patent life-cycle (AI patent lawyer) is emerging on the Machine Learning (ESANN), 2017, pp. 171–176.
horizon, today’s machine-learned models can still be easily fooled if [15] D. Cer, Y. Yang, S.y. Kong, N. Hua, N. Limtiaco, R.S. John, N. Constant,
M. Guajardo-Cespedes, S. Yuan, C. Tar, et al., Universal Sentence Encoder, 2018
targeted. To the best of our knowledge, there is no work on adversarial arXiv Preprint arXiv:1803.11175.
attacks on, e.g., image classification or text classification, in the patent [16] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, I. Androutsopoulos,
domain — yet. We are confident that more semi-automated applications LEGAL-BERT: the muppets straight out of law school, in: Proceedings of the
Conference on Empirical Methods in Natural Language Processing: Findings
will be developed in research and will eventually find their way into
(EMNLP), 2020, pp. 2898–2904.
industry in the near future. [17] C.K. Chang, A. Breitzman, Using patents prospectively to identify emerging, high-
With this survey, we summarized existing approaches which make impact technological clusters, Res. Eval. 18 (2009) 357–364.
use of deep learning for a variety of patent analysis tasks. While research [18] L. Chen, S. Xu, L. Zhu, J. Zhang, X. Lei, G. Yang, A deep learning based method for
extracting semantic information from patent documents, Scientometrics 125
in this area is still in its early stages, we outlined current trends of using (2020) 289–312.
various deep learning methods. We anticipate a shift in automated [19] Y.S. Chen, K.C. Chang, Exploring the nonlinear effects of patent citations, patent
patent analysis away from classical machine learning to more and more share and relative patent position on market value in the US pharmaceutical
industry, Technol. Anal. Strat. Manag. 22 (2010) 153–169.
deep learning. Further, we gave an overview of the available datasets for [20] D. Chiavetta, A. Porter, Tech mining for innovation management, Technol. Anal.
supervised learning needed by these methods. We hope that our work Strat. Manag. 25 (2013) 617–618.
fosters interest in deep learning for patent analysis and serves as a [21] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk,
Y. Bengio, Learning phrase representations using RNN encoder–decoder for
comprehensive survey for researchers and practitioners from academia statistical machine translation, in: Proceedings of the Conference on Empirical
and industry. Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734.
[22] S. Choi, S. Jun, Vacant technology forecasting using new bayesian patent
clustering, Technol. Anal. Strat. Manag. (TA&SM) 26 (2014) 241–251.
Author statement [23] S. Choi, H. Lee, E. Park, S. Choi, Deep Patent Landscaping Model Using the
Transformer and Graph Embedding, 2019. ArXiv e-prints arXiv:1903.05823.
Ralf Krestel: Conceptualization, Writing - Original Draft, Writing - [24] P. Chung, S.Y. Sohn, Early detection of valuable patents using a deep learning
model: case of semiconductor industry, Technol. Forecast. Soc. Change 158
Review & Editing, Visualization, Supervision, Project administration. (2020) 120–146.
Renukswamy Chikkamath: Writing - Original Draft. [25] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, A. Bordes, Supervised learning of
Christoph Hewel: Writing - Original Draft. universal sentence representations from natural language inference data, in:
Proceedings of the Conference on Empirical Methods in Natural Language
Julian Risch: Writing - Original Draft, Writing - Review & Editing. Processing (EMNLP), 2017, pp. 670–680.
[26] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale
hierarchical image database, in: Proceedings of the IEEE Conference on Computer
Declaration of competing interest Vision and Pattern Recognition (CVPR), 2009, pp. 248–255.
[27] J. Devlin, M.W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep
bidirectional transformers for language understanding, in: Proceedings of the
None.
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies (NAACL-HLT), 2019, pp. 4171–4186.
References [28] K. Ethayarajh, How contextual are contextualized word representations?
comparing the geometry of BERT, ELMo, and GPT-2 embeddings, in: Proceedings
of the Conference on Empirical Methods in Natural Language Processing and the
[1] A. Abbas, L. Zhang, S.U. Khan, A literature review on the state-of-the-art in patent
International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),
analysis, World Patent Inf. 37 (2014) 3–13.
2019, pp. 55–65.
[2] L. Abdelgawad, P. Kluegl, E. Genc, S. Falkner, F. Hutter, Optimizing neural
[29] C.J. Fall, A. Törcsvári, K. Benzineb, G. Karetka, Automated categorization in the
networks for patent classification, in: Joint European Conference on Machine
international patent classification, in: Proceedings of the Special Interest Group
Learning and Knowledge Discovery in Databases, 2019, pp. 688–703.
on Information Retrieval (SIGIR), 2003, pp. 10–25.
[3] A. Abood, D. Feltenberger, Automated patent landscaping, Artif. Intell. Law 26
[30] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT press, 2016.
(2018) 103–125.

11
R. Krestel et al. World Patent Information 65 (2021) 102035

[31] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, [61] Q. Liu, H. Wu, Y. Ye, H. Zhao, C. Liu, D. Du, Patent litigation prediction: a
A. Courville, Y. Bengio, Generative adversarial nets, in: Proceedings of the convolutional tensor factorization approach, in: Proceedings of the International
Advances in Neural Information Processing Systems (NeurIPS), 2014, Joint Conferences on Artificial Intelligence (IJCAI), 2018, pp. 5052–5059.
pp. 2672–2680. [62] K. Loveniers, How to interpret EPO search reports, World Patent Inf. (WPI) 54
[32] M.F. Grawe, C.A. Martins, A.G. Bonfante, Automated patent classification using (2018) 23–28.
word embedding, in: Proceedings of the International Conference on Machine [63] Y. Lu, X. Xiong, W. Zhang, J. Liu, R. Zhao, Research on classification and
Learning and Applications (ICMLA), 2017, pp. 408–411. similarity of patent citation based on deep learning, Scientometrics (2020) 1–27.
[33] A. Grover, J. Leskovec, node2vec: scalable feature learning for networks, in: [64] M. Lupu, A. Fujii, D.W. Oard, M. Iwayama, N. Kando, Patent-related tasks at ntcir,
Proceedings of the International Conference on Knowledge Discovery and Data in: Current Challenges in Patent Information Retrieval, 2017, pp. 77–111.
Mining (KDD), 2016, pp. 855–864. [65] M. Lupu, J. Huang, J. Zhu, J. Tait, Trec-chem: large scale chemical information
[34] M. Habibi, A. Rheinlaender, W. Thielemann, R. Adams, P. Fischer, S. Krolkiewicz, retrieval evaluation at trec, in: ACM SIGIR Forum, 2009, pp. 63–70.
D.L. Wiegandt, U. Leser, Patseg: a sequential patent segmentation approach, Big [66] M. Lupu, F. Piroi, A. Hanbury, Aspects and analysis of patent test collections, in:
Data Res. 19–20 (2020) 100–133. Proceedings of the International Workshop on Patent Information Retrieval,
[35] D. Harhoff, F.M. Scherer, K. Vopel, Citations, family size, opposition and the 2010, pp. 17–22.
value of patent rights, Res. Pol. 32 (2003) 1343–1363. [67] L. Lyu, T. Han, A comparative study of Chinese patent literature automatic
[36] L. Helmers, F. Horn, F. Biegler, T. Oppermann, K.R. Müller, Automating the classification based on deep learning, in: Proceedings of the Joint Conference on
search for a patent’s prior art with a full text similarity search, PLoS One 14 Digital Libraries (JCDL), 2019, pp. 345–346.
(2019). [68] L.v.d. Maaten, G. Hinton, Visualizing data using t-sne, J. Mach. Learn. Res.
[37] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (JMLR) 9 (2008) 2579–2605.
(1997) 1735–1780. [69] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient Estimation of Word
[38] J. Howard, S. Ruder, Universal language model fine-tuning for text classification, Representations in Vector Space, 2013. ArXiv e-prints arXiv:1301.3781.
in: Proceedings of the Annual Meeting of the Association for Computational [70] K. Nakai, H. Nonaka, A. Hentona, Y. Kanai, T. Sakumoto, S. Kataoka, E.C.
Linguistics (ACL), 2018, pp. 328–339. A. Carreón, T. Hiraoka, Community detection and growth potential prediction
[39] J. Hu, S. Li, Y. Yao, L. Yu, G. Yang, J. Hu, Patent keyword extraction algorithm using the stochastic block model and the long short-term memory from patent
based on distributed representation for patent classification, Entropy 20 (2018) citation networks, in: Proceedings of the International Conference on Industrial
104–123. Engineering and Engineering Management (IEEM), 2018, pp. 1884–1888.
[40] L. Jehl, S. Riezler, Document-level information as side constraints for improved [71] J. Pennington, R. Socher, C. Manning, Glove: global vectors for word
neural patent translation, in: Proceedings of the Conference of the Association for representation, in: Proceedings of the Conference on Empirical Methods in
Machine Translation in the Americas (AMTA), 2018, pp. 1–12. Natural Language Processing (EMNLP), 2014, pp. 1532–1543.
[41] S. Jiang, J. Luo, G.R. Pava, J. Hu, C.L. Magee, A CNN-Based Patent Image [72] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: online learning of social
Retrieval Method for Design Ideation, 2020. ArXiv e-prints arXiv:2003.08741. representations, in: Proceedings of the International Conference on Knowledge
[42] B. Jin, C. Che, K. Yu, Y. Qu, L. Guo, C. Yao, R. Yu, Q. Zhang, Minimizing legal Discovery and Data Mining (KDD), 2014, pp. 701–710.
exposure of high-tech companies through collaborative filtering methods, in: [73] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer,
Proceedings of the International Conference on Knowledge Discovery and Data Deep contextualized word representations, in: Proceedings of the Conference of
Mining (KDD), 2016, pp. 127–136. the North American Chapter of the Association for Computational Linguistics:
[43] H. Joho, L.A. Azzopardi, W. Vanderbauwhede, A survey of patent users: an Human Language Technologies (NAACL-HLT), 2018, pp. 2227–2237.
analysis of tasks, behavior, search functionality and system requirements, in: [74] F. Piroi, A. Hanbury, Evaluating information retrieval systems on european patent
Proceedings of the Symposium on Information Interaction in Context (IIiX), 2010, data: the clef-ip campaign, in: Current Challenges in Patent Information Retrieval,
pp. 13–24. 2017, pp. 113–142.
[44] J. Kim, S. Lee, Forecasting and identifying multi-technology convergence based [75] J. Qi, L. Lei, K. Zheng, X. Wang, Patent analytic citation-based vsm: challenges
on patent data: the case of IT and BT industries in 2020, Scientometrics 111 and applications, IEEE Access 8 (2020) 17464–17476.
(2017) 47–65. [76] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models
[45] J. Kim, J. Yoon, E. Park, S. Choi, Patent document clustering with deep are unsupervised multitask learners, OpenAI Blog 1 (2019) 9.
embeddings, Scientometrics 123 (2020) 1–15. [77] K. Rajshekhar, W. Zadrozny, S.S. Garapati, Analytics of patent case rulings:
[46] D.P. Kingma, M. Welling, Auto-encoding variational bayes, in: Proceedings of the empirical evaluation of models for legal relevance, in: Proceedings of the
International Conference on Learning Representations (ICLR), 2014, pp. 1–14. International Conference on Artificial Intelligence and Law (ICAIL), 2017,
[47] S. Kinoshita, T. Oshio, T. Mitsuhashi, Comparison of smt and nmt trained with pp. 1–9.
large patent corpora: Japio at wat2017, in: Proceedings of the Workshop on Asian [78] N. Reimers, I. Gurevych, Sentence-BERT: sentence embeddings using siamese
Translation (WAT), 2017, pp. 140–145. BERT-networks, in: Proceedings of the Conference on Empirical Methods in
[48] A. Kravets, N. Lebedev, M. Legenchenko, Patents images retrieval and Natural Language Processing and the International Joint Conference on Natural
convolutional neural network training dataset quality improvement, in: Language Processing (EMNLP-IJCNLP), 2019, pp. 3982–3992.
Proceedings of the International Research Conference on Information [79] J. Risch, N. Alder, C. Hewel, R. Krestel, Patent Match: A Dataset for Matching
Technologies in Science, Management, Social Sphere and Medicine (ITSMSSM), Patent Claims with Prior Art, 2020. ArXiv e-prints arXiv:2012.13919.
2017, pp. 287–293. [80] J. Risch, S. Garda, R. Krestel, Hierarchical document classification as a sequence
[49] A. Krishna, Y. Jin, C. Foster, G. Gabel, B. Hanley, A. Youssef, Query Expansion for generation task, in: Proceedings of the Joint Conference on Digital Libraries
Patent Searching Using Word Embedding and Professional Crowdsourcing, 2019. (JCDL), 2020, pp. 147–155.
ArXiv e-prints arXiv:1911.11069. [81] J. Risch, R. Krestel, Learning patent speak: investigating domain-specific word
[50] M.N. Kyebambe, G. Cheng, Y. Huang, C. He, Z. Zhang, Forecasting emerging embeddings, in: Proceedings of the Thirteenth International Conference on
technologies: a supervised learning approach through patent analysis, Technol. Digital Information Management (ICDIM), 2018, pp. 63–68.
Forecast. Soc. Change (TF&SC) 125 (2017) 236–244. [82] J. Risch, R. Krestel, Domain-specific word embeddings for patent classification,
[51] Q. Le, T. Mikolov, Distributed representations of sentences and documents, in: Data Technol. Appl. (DTA) 53 (2019) 108–122.
Proceedings of the International Conference on Machine Learning (ICML), 2014, [83] J.Y. Rob Srebrovic, Leveraging the BERT Algorithm for Patents with TensorFlow
pp. 1188–1196. and Big Query, Technical Report, Google, 2020, https://services.google.com/fh
[52] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to /files/blogs/bert_for_patents_white_paper.pdf.
document recognition, IEEE 86 (1998) 2278–2324. [84] B. Rozemberczki, R. Sarkar, Fast sequence-based embedding with diffusion
[53] C. Lee, O. Kwon, M. Kim, D. Kwon, Early identification of emerging technologies: graphs, in: Proceedings of the International Workshop on Complex Networks,
a machine learning approach using multiple patent indicators, Technol. Forecast. 2018, pp. 99–107.
Soc. Change (TF&SC) 127 (2018) 291–303. [85] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-
[54] J. Lee, Patent transformer: a framework for personalized patent claim generation, propagating errors, Nature 323 (1986) 533–536.
in: Proceedings of the JURIX Doctoral Consortium, 2019, pp. 1–13. [86] F. Saad, Named entity recognition for biomedical patent text using bi-lstm
[55] J.S. Lee, J. Hsiang, Patent claim generation by fine-tuning openai gpt-2, World variants, in: Proceedings of the International Conference on Information
Patent Inf. (WPI) 62 (2020) 101983. Integration and Web-Based Applications & Services (iiWAS), 2019, pp. 617–621.
[56] J.S. Lee, J. Hsiang, Patent classification by fine-tuning bert language model, [87] M. Shalaby, J. Stutzki, M. Schubert, S. Günnemann, An LSTM approach to patent
World Patent Inf. (WPI) 61 (2020) 101965. classification based on fixed hierarchy vectors, in: Proceedings of the SIAM
[57] L. Lei, J. Qi, K. Zheng, Patent analytics based on feature vector space model: a International Conference on Data Mining (SDM), 2018, pp. 495–503.
case of iot, IEEE Access 7 (2019) 45705–45715. [88] W. Shalaby, W. Zadrozny, Patent retrieval: a literature review, Knowl. Inf. Syst.
[58] S. Li, J. Hu, Y. Cui, J. Hu, Deeppatent: patent classification with convolutional (KAIS) 61 (2019) 631–660.
neural networks and word embedding, Scientometrics 117 (2018) 721–744. [89] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
[59] H. Lin, H. Wang, D. Du, H. Wu, B. Chang, E. Chen, Patent quality valuation with image recognition, in: Proceedings of the International Conference on Learning
deep learning models, in: Proceedings of the International Conference on Representations (ICLR), 2015, pp. 1–14.
Database Systems for Advanced Applications (DASFAA), 2018, pp. 474–490. [90] I. Sutskever, O. Vinyals, Q.V. Le, Sequence to sequence learning with neural
[60] H. Liu, Q. Dai, Y. Li, C. Zhang, S. Yi, T. Yuan, The design patent images networks, in: Proceedings of the Advances in Neural Information Processing
classification based on image caption model, in: Proceedings of the International Systems (NeurIPS), 2014, pp. 3104–3112.
Conference on Brain Inspired Cognitive Systems (BICS), 2019, pp. 353–362. [91] A.J. Trappey, F.C. Hsu, C.V. Trappey, C.I. Lin, Development of a patent document
classification and search platform using a back-propagation network, Expert Syst.
Appl. 31 (2006) 755–765.

12
R. Krestel et al. World Patent Information 65 (2021) 102035

[92] A.J. Trappey, C.V. Trappey, U.H. Govindarajan, J.J. Sun, Patent value analysis [101] L. Zhang, L. Li, T. Li, Patent mining: a survey, SIGKDD Explor. Newslett. 16
using deep learning models—the case of IoT technology mining for the (2015) 1–19.
manufacturing industry, Trans. Eng. Manag. (2019) 1–13. [102] Y. Zhang, J. Xu, H. Chen, J. Wang, Y. Wu, M. Prakasam, H. Xu, Chemical named
[93] A.J. Trappey, C.V. Trappey, C.Y. Wu, C.W. Lin, A patent quality analysis for entity recognition in patents by domain knowledge and unsupervised feature
innovative technology and product development, Adv. Eng. Inf. 26 (2012) 26–34. learning, Database 2016 (2016).
[94] C.V. Trappey, A.J. Trappey, B.H. Liu, Identify trademark legal case [103] Q. Zhong, X. Qiao, Y. Zhang, Automatic indexing of patent right-claiming
precedents—using machine learning to enable semantic analysis of judgments, document based on deep learning, in: Proceedings of the International Conference
World Patent Inf. (WPI) 62 (2020) 101980. on Applied Mathematics, Modelling and Statistics Application (AMMSA), 2018,
[95] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, pp. 135–139.
I. Polosukhin, Attention is all you need, in: Proceedings of the Advances in Neural [104] Y. Zhou, F. Dong, Y. Liu, Z. Li, J. Du, L. Zhang, Forecasting emerging technologies
Information Processing Systems (NeurIPS), 2017, pp. 5998–6008. using data augmentation and deep learning, Scientometrics 1–29 (2020).
[96] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.A. Manzagol, L. Bottou, Stacked [105] H. Zhu, C. He, Y. Fang, B. Ge, M. Xing, W. Xiao, Patent automatic classification
denoising autoencoders: learning useful representations in a deep network with a based on symmetric hierarchical convolution neural network, Symmetry 12
local denoising criterion, J. Mach. Learn. Res. 11 (2010). (2020) 1–12.
[97] S. Vrochidis, S. Papadopoulos, A. Moumtzidou, P. Sidiropoulos, E. Pianta,
I. Kompatsiaris, Towards content-based patent image retrieval: a framework
Dr. Ralf Krestel is a senior researcher and head of the Web Science Group at Hasso Plattner
perspective, World Patent Inf (WPI) 32 (2010) 94–106.
Institute at University of Potsdam. His research centers around text mining, information
[98] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R.R. Salakhutdinov, Q.V. Le, XLNet:
retrieval, recommender systems, natural language processing, and machine learning. He
generalized autoregressive pretraining for language understanding, in:
studied computer science at University of Karlsruhe and Concordia University in Montreal.
Proceedings of the Advances in Neural Information Processing Systems (NeurIPS),
In 2012, he received his Ph.D. from the University of Hannover, Germany for his work "On
2019, pp. 5753–5763.
the Use of Language Models and Topic Models in the Web". Afterwards he spent two years
[99] T. Young, D. Hazarika, S. Poria, E. Cambria, Recent trends in deep learning based
as a postdoctoral research fellow at University of California, Irvine. From 2019 to 2020 he
natural language processing, IEEE Comput. Intell. Mag. 13 (2018) 55–75.
held the chair of Intelligent Systems at University of Passau, Germany. He co-authored
[100] L. Yu, W. Zhang, J. Wang, Y. Yu, Seqgan: sequence generative adversarial nets
more than 100 peer-reviewed articles and is reviewer for various journals and conferences.
with policy gradient, in: Proceedings of the Conference on Artificial Intelligence
(AAAI), 2017, pp. 2852–2858.

9781618857354TheTakingofClara3Crescent - Sam Crescent PDF
40% (10)
9781618857354TheTakingofClara3Crescent - Sam Crescent PDF
127 pages
MET Speaking & Listening Book
100% (2)
MET Speaking & Listening Book
140 pages
Copyrights Protection For Patents Some Surprising Implications For Artificial Intelligence
100% (1)
Copyrights Protection For Patents Some Surprising Implications For Artificial Intelligence
33 pages
Patent Search
100% (1)
Patent Search
48 pages
Patent Mining - A Survey
No ratings yet
Patent Mining - A Survey
50 pages
Chasing Moon - HJ Stallard
No ratings yet
Chasing Moon - HJ Stallard
364 pages
Challenges in Patent Law For Emerging Technologies
No ratings yet
Challenges in Patent Law For Emerging Technologies
4 pages
Breaking The Bias: Mastering The Art of Patent Searches
No ratings yet
Breaking The Bias: Mastering The Art of Patent Searches
108 pages
3rd Paper
No ratings yet
3rd Paper
36 pages
Importance of Patent Information
No ratings yet
Importance of Patent Information
46 pages
Intellectual Property Rights-Bioethics
No ratings yet
Intellectual Property Rights-Bioethics
131 pages
Patent Law Presentation
No ratings yet
Patent Law Presentation
23 pages
A Feasible Dashboard To Predict Patent Mining Using Classification Algorithms
No ratings yet
A Feasible Dashboard To Predict Patent Mining Using Classification Algorithms
11 pages
01 Deepak
No ratings yet
01 Deepak
13 pages
Patent 123
No ratings yet
Patent 123
18 pages
Jurnal - UTS - Kecerdasan Buatan PDF
No ratings yet
Jurnal - UTS - Kecerdasan Buatan PDF
15 pages
Patent Search Tools & Strategies
No ratings yet
Patent Search Tools & Strategies
44 pages
Natural Language Processing in Patents: A Survey: Lekang Jiang Stephan Goetz
No ratings yet
Natural Language Processing in Patents: A Survey: Lekang Jiang Stephan Goetz
44 pages
Concept of Patent
No ratings yet
Concept of Patent
5 pages
Exploring Artificial Intelligence WP Upload 2022 5
No ratings yet
Exploring Artificial Intelligence WP Upload 2022 5
61 pages
E4020 Week 4 - 2 Presentation - 2019
No ratings yet
E4020 Week 4 - 2 Presentation - 2019
10 pages
BS en 10257-2-2011
100% (2)
BS en 10257-2-2011
14 pages
Nature Patenting Vs Publishing Dilemma
No ratings yet
Nature Patenting Vs Publishing Dilemma
4 pages
The Patent Procedure
No ratings yet
The Patent Procedure
65 pages
Types of Patent Searches
No ratings yet
Types of Patent Searches
37 pages
Research Paper-1
No ratings yet
Research Paper-1
20 pages
Patent Creation Roadmap Week 2
No ratings yet
Patent Creation Roadmap Week 2
12 pages
SG 6 THESIS Edited - 121356
No ratings yet
SG 6 THESIS Edited - 121356
6 pages
Final Ipr PPT 2
No ratings yet
Final Ipr PPT 2
18 pages
IJRRSSH On AI and Patents-20042024-35
No ratings yet
IJRRSSH On AI and Patents-20042024-35
11 pages
IPL Research Paper by Akshit Dubey
No ratings yet
IPL Research Paper by Akshit Dubey
14 pages
Patent Search Tools Strategies
No ratings yet
Patent Search Tools Strategies
43 pages
Patent Trend Analysis and Future Prediction
No ratings yet
Patent Trend Analysis and Future Prediction
6 pages
Patent Protection Is A Fundamental Aspect of Technology Law
No ratings yet
Patent Protection Is A Fundamental Aspect of Technology Law
5 pages
WordNet Sense Disambiguation Based Patent Search
No ratings yet
WordNet Sense Disambiguation Based Patent Search
5 pages
Ipr Evo
No ratings yet
Ipr Evo
11 pages
Ipr Unit 2 Part 2
No ratings yet
Ipr Unit 2 Part 2
42 pages
Module 3-1: Patent Search
No ratings yet
Module 3-1: Patent Search
19 pages
International Approaches To AI Patenting
No ratings yet
International Approaches To AI Patenting
6 pages
Patent Search and Analysis
No ratings yet
Patent Search and Analysis
181 pages
Trusted Partner For USPTO Patent Filings Since 2010
No ratings yet
Trusted Partner For USPTO Patent Filings Since 2010
8 pages
Iitkgpipguide
No ratings yet
Iitkgpipguide
30 pages
Unit 5 & 6: Prior Art Searches Significance of Prior Art Searches How To Search Patents
No ratings yet
Unit 5 & 6: Prior Art Searches Significance of Prior Art Searches How To Search Patents
12 pages
Chapter 3
No ratings yet
Chapter 3
20 pages
Mid Term
No ratings yet
Mid Term
35 pages
Ipr Uni-02
No ratings yet
Ipr Uni-02
15 pages
Navigating Innovation: Legal Challenges at The Crossroads of AI and Patent Law
No ratings yet
Navigating Innovation: Legal Challenges at The Crossroads of AI and Patent Law
7 pages
The Intersection of Patent Law and Artificial Intelligence
No ratings yet
The Intersection of Patent Law and Artificial Intelligence
5 pages
Harceles A Monteloy D&D 5e
100% (1)
Harceles A Monteloy D&D 5e
18 pages
Patent Application
No ratings yet
Patent Application
20 pages
Unit2 Ipr
No ratings yet
Unit2 Ipr
5 pages
Ebook For Patent Search
No ratings yet
Ebook For Patent Search
31 pages
The Patents Act, 1970
No ratings yet
The Patents Act, 1970
13 pages
The National Patent System and Procedure in The Philippines: Essentials For Patent Drafting
No ratings yet
The National Patent System and Procedure in The Philippines: Essentials For Patent Drafting
51 pages
Website Based Patent Information Searching Mechanism
No ratings yet
Website Based Patent Information Searching Mechanism
12 pages
Paper 2
No ratings yet
Paper 2
12 pages
Patent Search and Portfolio Analysis
No ratings yet
Patent Search and Portfolio Analysis
17 pages
Bs en 480-4 1997 Determination of Concrete Bleeding
No ratings yet
Bs en 480-4 1997 Determination of Concrete Bleeding
8 pages
Uncle Vanya Anton Chekhov PDF Download
100% (2)
Uncle Vanya Anton Chekhov PDF Download
34 pages
Patents: What You Should Know
No ratings yet
Patents: What You Should Know
49 pages
Patent Searches and Their Importance
No ratings yet
Patent Searches and Their Importance
2 pages
Terms & Conditions: Adults Only
No ratings yet
Terms & Conditions: Adults Only
13 pages
Multi Label Classification of Artificial Intelligence Related Patents Using Modified D2SBERT and Sentence Attention Mechanism Yoo Et Al. (2023)
No ratings yet
Multi Label Classification of Artificial Intelligence Related Patents Using Modified D2SBERT and Sentence Attention Mechanism Yoo Et Al. (2023)
8 pages
Rough Draft
No ratings yet
Rough Draft
3 pages
Practice Makes Perfect Jean Yates PDF Download
100% (1)
Practice Makes Perfect Jean Yates PDF Download
42 pages
NZS4219 2009v2
No ratings yet
NZS4219 2009v2
114 pages
Prior Art Search
No ratings yet
Prior Art Search
1 page
HC 5 Eiplat 10-02-2011
No ratings yet
HC 5 Eiplat 10-02-2011
3 pages
Learning Guide 055 & 056 Working As An Educator A Regulated Children's Centre 09.08.21
No ratings yet
Learning Guide 055 & 056 Working As An Educator A Regulated Children's Centre 09.08.21
221 pages
Copyright Ipr 2
No ratings yet
Copyright Ipr 2
8 pages
Mutative Media: James A. Dator John A. Sweeney Aubrey M. Yee
No ratings yet
Mutative Media: James A. Dator John A. Sweeney Aubrey M. Yee
223 pages
Full Download The Doctor's Handbook Pt. 1 - 1st Edition Complete Chapter Download
No ratings yet
Full Download The Doctor's Handbook Pt. 1 - 1st Edition Complete Chapter Download
15 pages
Assingment 12
No ratings yet
Assingment 12
7 pages
Slide Digital Marketing Strategy, Implementation and Practice 7th
No ratings yet
Slide Digital Marketing Strategy, Implementation and Practice 7th
195 pages
Mmoec
No ratings yet
Mmoec
62 pages
Journalism Education 10.3
No ratings yet
Journalism Education 10.3
12 pages
Regenagri Content Standard v1.1
No ratings yet
Regenagri Content Standard v1.1
12 pages
The Environmental Law Handbook Planning and Land Use in NSW Sixth Edition. Edition Peter Williams (Editor) PDF Download
No ratings yet
The Environmental Law Handbook Planning and Land Use in NSW Sixth Edition. Edition Peter Williams (Editor) PDF Download
31 pages
Street Art of Resistance Sarah H. Awad (Editor) Instant Download
No ratings yet
Street Art of Resistance Sarah H. Awad (Editor) Instant Download
29 pages
(FREE PDF Sample) Teach Yourself VISUALLY PowerPoint 2010 1st Edition Bill Wood Ebooks
100% (3)
(FREE PDF Sample) Teach Yourself VISUALLY PowerPoint 2010 1st Edition Bill Wood Ebooks
55 pages
Relationship Between Human Rights and Intellectual Property Rights
No ratings yet
Relationship Between Human Rights and Intellectual Property Rights
9 pages
Volere Template
No ratings yet
Volere Template
59 pages
Cloudera JDBC Driver For Apache Hive Install Guide PDF
No ratings yet
Cloudera JDBC Driver For Apache Hive Install Guide PDF
111 pages
FTG v1 1 SMM2 L4 Social Media Marketing Fundamental 2 RET OTO 4007-1-1
No ratings yet
FTG v1 1 SMM2 L4 Social Media Marketing Fundamental 2 RET OTO 4007-1-1
17 pages
Marissa St. James - Highland Eyes, The Spellbinder PDF
No ratings yet
Marissa St. James - Highland Eyes, The Spellbinder PDF
164 pages
Unit - 3
No ratings yet
Unit - 3
6 pages
Solution Manual For Managerial Economics: Foundations of Business Analysis and Strategy, 13th Edition, Christopher Thomas, S. Charles Maurice
100% (42)
Solution Manual For Managerial Economics: Foundations of Business Analysis and Strategy, 13th Edition, Christopher Thomas, S. Charles Maurice
12 pages
Consent Form
No ratings yet
Consent Form
4 pages
T and C Ticket
No ratings yet
T and C Ticket
11 pages

A Survey On Deep Learning For Patent Analysis

Uploaded by

A Survey On Deep Learning For Patent Analysis

Uploaded by

World Patent Information 65 (2021) 102035

Contents lists available at ScienceDirect

World Patent Information

A survey on deep learning for patent analysis

shifted from retrieval to machine translation of patents [64].4

A further interesting data source origins from post-grant proceedings

3.2. Citation networks Table 4

encoder neural network architectures, a variety of sentence embeddings

4. Deep learning architectures recognizing different kinds of patterns in the data.

FC (fully connected network) [3,23,24,61,63,75,104]

The difference of these two architectures is that they are tailored to

You might also like