0% found this document useful (0 votes)
29 views6 pages

Joint Recognition of Handwritten Text and Named Entities With A Neural End-To-End Model

1) The document proposes a single neural network model to jointly perform handwritten text recognition and named entity recognition on historical marriage records. 2) By performing both tasks simultaneously with a shared model, it aims to avoid error propagation that can occur when performing text recognition before named entity recognition. 3) Experimental results on a dataset of historical marriage records show the joint model achieves performance comparable to state-of-the-art methods, without requiring additional dictionaries or language models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views6 pages

Joint Recognition of Handwritten Text and Named Entities With A Neural End-To-End Model

1) The document proposes a single neural network model to jointly perform handwritten text recognition and named entity recognition on historical marriage records. 2) By performing both tasks simultaneously with a shared model, it aims to avoid error propagation that can occur when performing text recognition before named entity recognition. 3) Experimental results on a dataset of historical marriage records show the joint model achieves performance comparable to state-of-the-art methods, without requiring additional dictionaries or language models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Joint Recognition of Handwritten Text and Named Entities with a Neural

End-to-end Model

Manuel Carbonell∗† , Mauricio Villegas∗ , Alicia Fornés† and Josep Lladós†


∗ omni:us
Berlin, Germany,
{manuel,mauricio}@omnius.com
† Computer Vision Center - Computer Science Department
Universitat Autonoma de Barcelona, Spain
{afornes,josep}@cvc.uab.es

Abstract—When extracting information from handwritten would be much easier to predict correctly since it restricts
arXiv:1803.06252v2 [cs.CV] 22 Mar 2018

documents, text transcription and named entity recognition the language model within the corresponding category.
are usually faced as separate subsequent tasks. This has the The downside is that we rarely have large amounts of
disadvantage that errors in the first module affect heavily the
performance of the second module. In this work we propose word level segmented data, a key for most ANNs proper
to do both tasks jointly, using a single neural network performance. In case that automatic word segmentation is
with a common architecture used for plain text recognition. needed, the whole information extraction process involves
Experimentally, the work has been tested on a collection three steps which will probably accumulate errors in each
of historical marriage records. Results of experiments are of them. Another and most common option is to perform
presented to show the effect on the performance for different
configurations: different ways of encoding the information, handwritten text recognition (HTR) first and then named
doing or not transfer learning and processing at text line or entity recognition (NER). An advantage of this approach
multi-line region level. The results are comparable to state of is that it has one less step than the previous explained
the art reported in the ICDAR 2017 Information Extraction approach, but it has the counterpart that if the transcription
competition, even though the proposed technique does not is wrong, the NER part is affected.
use any dictionaries, language modeling or post processing.
Recent work in ANNs suggests that using models that
Keywords-Named entity recognition; handwritten text solve tasks as general as possible, might give similar or
recognition; neural networks better performance than concatenating subprocesses due
to error propagation in the different steps, as shown in
I. I NTRODUCTION [6], [7]. This is the main motivation of this work, and
Extracting information from historical handwritten text consequently we propose a single convolutional-sequential
documents in an optimal and efficient way is still a model to jointly perform transcription and semantic an-
challenge to solve, since text in these kind of documents notation. Adding a language model, the transcription can
are not as simple to read as printed characters or modern be restricted to each semantic category and therefore
handwritten calligraphies [1], [2]. Historical manuscripts improved. The contribution of this work is to show the
contain information that gives an interpretation of the improvement when joining a sequence of processes in a
past of societies. Systems designed to search and retrieve single one, and thus, avoiding to commit accumulation of
information from historical documents must go beyond errors and achieving generalization to emulate human-like
literal transcription of sources. Indeed it is necessary to intelligence.
shorten the semantic gap and get semantic meaning from Some examples of historical handwritten text docu-
the contents, thus the extraction of the relevant information ments include birth, marriage and defunction records
carried out by named entities (e.g. names of persons, or- which provide very meaningful information to reconstruct
ganizations, locations, dates, quantities, monetary values, genealogical trees and track locations of family ancestors,
etc.) is a key component of such systems. Semantic an- as well as give interesting macro-indicators to scholars in
notation of documents, and in particular automatic named social sciences and humanities. The interpretation of such
entity recognition is neither a perfectly solved problem types of documents unavoidably requires the identification
[3]. of named entities. As experimental scenario we illustrate
Many existing solutions make use of Artificial Neural the performance of the proposed method on a collection
Networks (ANNs) to transcribe handwritten text lines and of handwritten marriage records.
then parse the transcribed text with a Named Entity Recog- The rest of the paper is organized as follows: Next
nition model, but the precision of those existing solutions section explains the task being considered. In section III
is still to improve [1], [2], [4]. One possible approach is we review the state of the art work in HTR and NER. In
to start with already segmented words, by an automatic or IV we explain our model architecture, ground truth setup
manual process, and predict the semantic category using and training details. In Section V we analyze the results
visual descriptors (c.f. [5]) which has the benefit that when for the different configurations and last in VI we give the
the name entity prediction is correct, the transcription conclusions.
Figure 1. An example of a document line annotation from [4].

Table I: Semantic and person categories in the IEHHR Table II: Marriage Records dataset distribution
competition Train Validation Test
Semantic Person Pages 90 10 25
Name Wife Records 872 96 253
Surname Husband Lines 2759 311 757
Occupation Wife’s father Words 28346 3155 8026
Location Wife’s Mother Out of vocabulary words: 5.57 %
Civil State Husband’s father
Other Husband’s mother
Other person This idea can also be applied to information extraction
None from handwritten text documents which consists of HTR
followed by NER. From the HTR side there is still a long
way to improve until human level transcription is achieved
II. T HE TASK : I NFORMATION E XTRACTION IN
[8]. Attention models have helped to understand the in-
M ARRIAGE R ECORDS
side behavior of neural networks when reading document
The approach presented in this paper is general enough images but still have lower accuracy than Recurrent Neu-
to be applied to many information extraction tasks, but due ral Network with Connectionist Temporal Classification
to time constraints and our access to a particular dataset, (RNN+CTC) approaches [9].
the approach is evaluated on the task of information ex-
Named entity recognition is the problem of detecting
traction in a system for the analysis of population records,
and assigning a category to each word in a text, either at
in particular handwritten marriage records. It consists of
part-of-speech level or in pre-defined categories such as
transcribing the text and to assign to each word a semantic
the names of persons, organizations, locations, expressions
and person category, i.e. to know which kind of word
of times, quantities, monetary values, percentages, etc. The
has been transcribed (name, surname, location, etc.) and
goal is to select and parse relevant information from the
to what person it refers to. The dataset and evaluation
text and relationships within it. One could think that it
protocol are exactly the same as the one proposed in
would be sufficient to keep a list of locations, common
the ICDAR 2017 Information Extraction from Historical
names and organizations, but the case is that these lists are
Handwritten Records (IEHHR) competition [4]. The se-
rarely complete, or one single name can refer to different
mantic and person categories to identify in the IEHHR
kind of entities. Also it is not easy to detect properties of a
competition are listed in table I.
named entity and how different named entities are related
Two tracks were proposed. In the basic track the goal is
to each other. Most widely used kind of models for this
to assign the semantic class to each word, whereas in the
task are conditional random fields (CRFs), which were the
complete track it is also necessary to identify the person.
state of the art technique for some time [10], [11].
An example of both tracks is shown in Figure 1.
In the area of Natural Language Processing, Lample
The dataset for this competition contains 125 pages with
et al. [3] proposed a combination of Long Short-term
1221 marriage records (paragraphs), where each record
Memory networks (LSTMs) and CRFs, obtaining good
contains several text lines giving information of the wife,
results for the CoNLL2003 task. The problem is similar
husband and their parents’ names, occupations, locations
to the one we are facing, except that it starts from raw
and civil states. The text images are provided at word and
text. In this work the input to the system are images of
line level, naturally having the increased difficulty of word
handwritten text lines, for which it is not even known how
segmentation when choosing to work with line images.
many characters or words are present. This undoubtedly
More details of the dataset can be found in table II.
introduces a higher difficulty.
III. S TATE OF THE ART In Adak’s work [12] a similar end-to-end approach from
Recent work shows that neural models allow generaliza- image to semantically annotated text is proposed, but in
tion of problems that earlier were solved separately [7]. that case the key relies in identifying capital letters to
detect possible named entities. The problem is that in This kind of encoding is not expected to perform well
many cases, such as in the IEHHR competition [4] dataset, in the IEHHR task, since tags are assigned to only one
named entities do not always have capital letters, and also, word at a time, so it is redundant to have two tags for
it is a task-specific approach that could not be used in each word. However, in other tasks it could make sense
many other cases. having opening and closing tags and this is why it has
Finally, another concept that can help to improve the been considered in this work.
quality of our models’ prediction is curriculum learning 2) Single separate tags: Similar to the previous ap-
[13]. Letting the model look at the data in a meaningful proach, in this case both category and person tags are
and ordered way, such that the difficulty of prediction goes independent symbols but there is only one for each word
from easy to hard, and therefore, can make the training added before the word. Thus, the ground truth of the
evolve with a much better performance. previous example would be encoded as:

IV. M ETHODOLOGY
h a b i t a t {space} e n {space}
The main goal of this work is to explore a few pos- <location/> <husband/> B a r a {space}
sibilities for a single end-to-end trainable ANN model a b {space} <name/> <wife/> E l i s a b
that receives as input text images and gives as output e t h {space} J u a n a {space}
transcripts, already labeled with their corresponding se- <state/> <wife/> {space} d o n s e l l
mantic information. One possibility to solve it could be to a ...
propose a ANN with two sequence outputs, one for the
transcript and the other for the semantic labels. However, 3) Change of person tag: In this variation of the
keeping an alignment between these two independent semantic encoding the person label is only given if there
outputs complicates a solution. An alternative would be to is a change of person, i.e. the person label indicates that
have a single sequence output that combines the transcript all the upcoming words refer to that person until another
and semantic information, which is the approach taken person label comes, in contrast to previous approaches
here. There are several ways in which this information where we give the person label for each word. This
can be encoded such that a model learns to predict it. The approach is possible due to the structured form of the
next subsection describes the different ways of encoding sentences in the dataset. As we can see in Figure 2 the
it that were tested in this work. Then there are subsections marriage records give the information of all the family
describing the architecture chosen for the neural network, members without mixing them.
the image input and characteristics of the learning. <wife/> <name/> E l i s a b e t h
{space} <name/> J u a n a {space}
A. Semantic encoding <state/> d o n s e l l a ...
The first variable which we explored is the way in 4) Single combined tags: The final possibility tested
which ground truth transcript and semantic labels are for encoding the named entity information is to combine
encoded so that the model learns to predict them. To category and person labels into a single tag. So the
allow the model to recognize words not observed during example would be as:
training (out-of-vocabulary) the symbols that the model
learns are the individual characters and a space to identify h a b i t a t {space} e n {space}
separation between words. For the semantic labels special <location_husband/> B a r a {space} a b
tags are added to the list of symbols for the recognizer. {space} <name_wife/> E l i s a b e t h
The different possibilities are explained below. {space} <name_wife/> J u a n a {space}
1) Open & close separate tags: In the first approach, <state_wife/> d o n s e l l a ...
the words are enclosed between opening and closing tags B. Level of input images: lines or records
that encode the semantic information. Both the category
and the person have independent tags. Thus, each word The IEHHR competition dataset includes manually seg-
is encoded by starting with opening category and person mented images at word level. But to lower ground truthing
symbols, followed by a symbol for each character and cost or avoid needing a word segmentator, we will assume
ends by closing person and category symbols. The “other” that only images at line level are available. Having text line
and “none” semantics are not encoded. For example, the images then the obvious approach is to give the system
ground truth of the image shown in Figure 1 would be individual line images for recognition. However, there are
encoded as: semantic labels that would be very difficult to predict if
only a single line image is observed due to lack of context.
For example, it might be hard to know if the name of a
h a b i t a t {space} e n {space} person corresponds to the husband or the father of the
<location> <husband> B a r a </husband> wife if the full record is not given. Because of this, in the
</location> {space} a b {space} <name> experiments we have explored having as input both text
<wife> E l i s a b e t h </wife> line images and full marriage record images, concatenating
</name> ... all the lines of a record one after the other.
sequence are calculated with a dynamic programming
algorithm called ”forward-backward”.
Some special features of our model are that the activa-
tion function for the convolutional layers is leaky ReLu
f (x) = x if x > 0.01, 0.01x otherwise.
We also use batch normalization to reduce internal
covariate shift [19].

Figure 2. Reading the whole record makes it easier to transcribe as V. R ESULTS


well as to identify the semantic categories based on context information.
We compare the performance of our methods1 with the
results of the participants of the IEHHR competition in
C. Transfer learning [4] thereby using the same metric, see Table III. The
evaluation metric counts the words that were correctly
The next variable we examined was the effect of the
transcribed and annotated with their category and person
use of transfer learning from a previously trained model
label with respect to the total amount of words in the
for HTR. Transfer Learning consists of training for the
ground truth. For those words that were not correctly
same or a similar task (HTR) using other datasets, and
transcribed but the category and person labels match one
then fine tune it for our purpose, in our case HTR+NER.
or more words in the ground truth, we add to the score 1
To perform transfer learning from a generic HTR model,
- CER (character error rate) on the best matching word.
the softmax layer is removed and replaced with a softmax
This means that the named entity recognition part is vital
that allows as an output the activations for the number of
for a good score, since a perfect transcription will count
possible classes in the fine tuning step. In our case, they
as 0 in the score if its named entity is incorrectly detected.
will be all the characters in the alphabet plus the semantic
We can observe in the results that our best performance
labels. In the experiments for transfer learning we have
is reached when receiving the whole marriage record,
tested only one HTR model that was trained with the
which is probably due to the help of contextual informa-
following datasets: IAM [14], Bentham [15], Bozen [16],
tion. For example, it can benefit the detection of named
and some datasets used by us internally: IntoThePast,
entities composed of several words when they are written
Wiensanktulrich, Wienvotivkirche and ITS.
in separate consecutive lines. Also we observe that the
D. Curriculum Learning best performing encoding of the semantic labels is the
The last variation that we propose is curriculum learning combined tags setup. This can be due to the lower amount
i.e. start with easier demands to the model and then of symbols to predict, which might require to store less
increase the difficulty. In this case this method can be long term dependencies in the network.
interpreted as starting by learning to transcribe single text The most significant improvement was achieved when
lines, and when the training is finished, continue with picking our best performing configuration and running it
learning to transcribe images of a whole marriage record. with an alternative line extraction. In the competition, the
text lines were extracted by including all the bounding
E. Model architecture and training boxes of the words within every line. As a result, when
In this work we use a CNN+BLSTM+CTC model, there are large ascenders and descenders, the bounding
which is one of the most common models for performing box of the line is too wide, including sections of other
HTR exclusively, although other HTR models could be text lines. In order to cope with this limitation, we used
used as well. In particular, the architecture consists of the XML containing the exact location of the segmented
4 convolutional layers with max pooling followed by 3 words within a page, and for the y-coordinates, we used
stacked BLSTM layers. The detailed model architecture a weighted (by the words widths) average of upper and
is shown in Figure 3. lower limits of the word bounding boxes. As expected, the
To train the model we use the Laia HTR toolkit [17] performance highly improves because the segmentation of
which uses Baidu’s parallel CTC [18] implementation, the text lines is more accurate. However, this result is
which consists of minimizing the loss or “objective” not directly comparable to the other participants’s methods
function because the segmentation is different.
X In Figure 4 we show some examples of committed
OM L (S, Nw ) = − ln(p(z|x)) (1) errors. We can see that they consist of small typos that
(x,z)∈S
are understandable when looking at the text images. It is
where S is the training set, x is the input sequence (visual definitely difficult to transcribe certain names that have
features), z is the sequence labeling (transcription) for x never been seen before. The proposed approach could be
and combined with a category-based language model [1] which
Nw : (Rm )T 7→ (Rn )T (2) could potentially improve the results.

is a recurrent neural network with m inputs, n outputs and 1 Scripts used for the experiments available at http://doi.org/10.5281/
weight vector w. The probabilities of a labeling of an input zenodo.1174113
Figure 3. Used model architecture

Figure 4. Some of the errors committed in the predictions

of a sequence of word images) and we do not make use


of task specific tools like dictionaries or language model.
By investigating different ways of encoding the image
transcripts and semantic labels we have shown that the
recognition performance is highly affected, even though
it is indeed representing the same information. Also,
curriculum learning (first text lines and then records) can
make the model reach a higher final prediction accuracy.
Future work would include the use of language models
to improve the accuracy of the predictions, the effect
of automatic text line and record detection, and also, to
evaluate our method in other datasets.
ACKNOWLEDGMENTS
This work has been partially supported by the Spanish
project TIN2015-70924-C2-2-R, the grant 2016-DI-095
Figure 5. Train and validation (green and violet respectively) CER (%). from the Secretaria d’Universitats i Recerca del Depar-
tament d’Economia i Coneixement de la Generalitat de
Catalunya, the Ramon y Cajal Fellowship RYC-2014-
Our best performing model took 4 hours 38 to run 133 16831, the CERCA Programme /Generalitat de Catalunya,
training epochs with a NVIDIA GTX 1080 GPU. The train and RecerCaixa (XARXES, 2016ACUP-00008), a re-
and validation error rates can be seen in Figure 5. As train- search program from Obra Social ”La Caixa” with the
ing configuration we used an adversarial regularizer [20] collaboration of the ACUP.
with weight 0.5, an initial learning rate of 5 · 10−4 with
decay factor of 0.99 per epoch and batch size 6. R EFERENCES
[1] V. Romero, A. Fornes, E. Vidal, and J. A. Sanchez,
VI. C ONCLUSION “Using the mggi methodology for category-based lan-
In this paper we have proposed to solve a complex task guage modeling in handwritten marriage licenses books,” in
(i.e. text recognition and named entity recognition) with 15th international conference on Frontiers in Handwriting
Recognition, 2016.
a single end-to-end neural model. Our first conclusion is
that, also in information extraction problems, a generic [2] A. H. Toselli, E. Vidal, V. Romero, and V. Frinken,
“Hmm word graph based keyword spotting in handwritten
model for solving two subsequent tasks can perform at document images,” Inf. Sci., vol. 370, no. C, pp. 497–518,
least similarly as two separated models. This is true even Nov. 2016. [Online]. Available: https://doi.org/10.1016/j.
if there is less prepared data (record level images instead ins.2016.07.063
Table III: Average scores of the experiments compared
[8] T. Bluche, “Joint Line Segmentation and Transcription
with the IEHHR competition participants’ methods. for End-to-End Handwritten Paragraph Recognition,” in
Segm. Proc. Track Track Advances in Neural Information Processing Systems 29,
Method
Level Level Basic Complete D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and
IEHHR competition results R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 838–
846.
Hitsz-ICRC-1
Word Record∗ 87.56 85.72 [9] T. Bluche, J. Louradour, and R. O. Messina, “Scan, attend
CNN HTR+NER
Hitsz-ICRC-2
and read: End-to-end handwritten paragraph recognition
Word Record∗ 94.16 91.97 with MDLSTM attention,” CoRR, vol. abs/1604.03286,
ResNet HTR+NER
2016. [Online]. Available: http://arxiv.org/abs/1604.03286
Baseline
Line Record 80.24 63.08 [10] J. D. Lafferty, A. McCallum, and F. C. N. Pereira,
HMM+MGGI
CITlab-ARGUS-1
“Conditional random fields: Probabilistic models for
Line Record† 89.53 89.16 segmenting and labeling sequence data,” in Proceedings
LSTM+CTC+regex
of the Eighteenth International Conference on Machine
CITlab-ARGUS-2 Learning, ser. ICML ’01. San Francisco, CA, USA:
LSTM+CTC Line Record† 91.93 91.56
+OOV+regex
Morgan Kaufmann Publishers Inc., 2001, pp. 282–
289. [Online]. Available: http://dl.acm.org/citation.cfm?id=
Results of our experiments 645530.655813
Separate-single tags Line Line 73.49 61.96 [11] J. R. Finkel, T. Grenager, and C. Manning, “Incorporating
Separate- non-local information into information extraction systems
Line Line 73.70 64.09 by gibbs sampling,” in Proceedings of the 43rd Annual
open-close tags
Combined-single tags Line Line 87.96 80.74
Meeting on Association for Computational Linguistics,
ser. ACL ’05. Stroudsburg, PA, USA: Association for
Combined-single tags Computational Linguistics, 2005, pp. 363–370. [Online].
Line Line 87.01 80.05
+ transfer learn Available: https://doi.org/10.3115/1219840.1219885
Change person tag
+ transfer learn
Line Record 84.41 80.51 [12] C. Adak, B. B. Chaudhuri, and M. Blumenstein, “Named
entity recognition from unstructured handwritten document
Combined-single tags images,” in 2016 12th IAPR Workshop on Document Anal-
Line Record 86.58 84.72
+ transfer learn ysis Systems (DAS), April 2016, pp. 375–380.
Combined-single tags
+ transfer learn Line Record 90.58 89.39
[13] Y. Bengio, J. Louradour, R. Collobert, and J. Weston,
+ curriculum learn “Curriculum learning,” in Proceedings of the 26th Annual
International Conference on Machine Learning, ser. ICML
Combined-single tags ’09. New York, NY, USA: ACM, 2009, pp. 41–48.
+ transfer learn
Word‡ Record‡ 96.39‡ 96.63‡ [Online]. Available: http://doi.acm.org/10.1145/1553374.
+ curriculum learn
+ alt. line extraction 1553380
[14] U. v. Marti and H. Bunke, “A full english sentence database
∗ HTR is word based. for off-line handwriting recognition,” in In Proc. Int. Conf.
† Posterior character probabilities computed at line level. on Document Analysis and Recognition, 1999, pp. 705–708.
‡ Not fair to compare with the IEHHR results because it uses a different
[15] J. A. Sánchez, V. Romero, A. H. Toselli, and E. Vidal,
segmentation (alternative line extraction) than the one provided in the competition. “Icfhr2014 competition on handwritten text recognition on
transcriptorium datasets (htrts),” in 2014 14th International
Conference on Frontiers in Handwriting Recognition, Sept
[3] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, 2014, pp. 785–790.
and C. Dyer, “Neural architectures for named entity [16] J. Sánchez, V. Romero, A. Toselli, and E. Vidal,
recognition,” CoRR, vol. abs/1603.01360, 2016. [Online]. “ICFHR2016 competition on handwritten text recognition
Available: http://arxiv.org/abs/1603.01360 on the READ dataset,” in ICFHR. IEEE, 2016, pp. 630–
[4] A. Fornés, V. Romero, A. Baró, J. I. Toledo, J. A. Sanchez, 635.
E. Vidal, and J. Lladós, “Competition on information [17] J. Puigcerver, D. Martin-Albo, and M. Villegas, “Laia:
extraction in historical handwritten records,” in Interna- A deep learning toolkit for htr.” GitHub, 2016, gitHub
tional Conference on Document Analysis and Recognition repository.
(ICDAR). IEEE, 2017. [18] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber,
[5] J. I. Toledo, S. Sudholt, A. Fornés, J. Cucurull, G. A. Fink, “Connectionist temporal classification: Labelling unseg-
and J. Lladós, Handwritten Word Image Categorization mented sequence data with recurrent neural networks,”
with Convolutional Neural Networks and Spatial Pyramid in Proceedings of the 23rd International Conference on
Pooling. Cham: Springer International Publishing, 2016, Machine Learning, ser. ICML ’06. New York, NY,
pp. 543–552. USA: ACM, 2006, pp. 369–376. [Online]. Available:
[6] H. Liu, J. Feng, M. Qi, J. Jiang, and S. Yan, http://doi.acm.org/10.1145/1143844.1143891
“End-to-end comparative attention networks for person re- [19] S. Ioffe and C. Szegedy, “Batch normalization:
identification,” CoRR, vol. abs/1606.04404, 2016. [Online]. Accelerating deep network training by reducing internal
Available: http://arxiv.org/abs/1606.04404 covariate shift,” CoRR, vol. abs/1502.03167, 2015.
[7] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, [Online]. Available: http://arxiv.org/abs/1502.03167
B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, [20] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining
J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to end and harnessing adversarial examples,” 2014.
learning for self-driving cars,” CoRR, vol. abs/1604.07316,
2016. [Online]. Available: http://arxiv.org/abs/1604.07316

You might also like