Statistical Analysis - 2024 - Kim - A Deep Learning Approach For The Comparison of Handwritten Documents Using Latent
Statistical Analysis - 2024 - Kim - A Deep Learning Approach For The Comparison of Handwritten Documents Using Latent
DOI: 10.1002/sam.11660
RESEARCH ARTICLE
1
Department of Statistics, Pusan National
University, Busan, South Korea Abstract
2
Department of Statistics, Iowa State Forensic questioned document examiners still largely rely on visual assessments
University, Ames, Iowa, USA and expert judgment to determine the provenance of a handwritten document.
Here, we propose a novel approach to objectively compare two handwritten doc-
Correspondence
Soyoung Park, Department of Statistics, uments using a deep learning algorithm. First, we implement a bootstrapping
Pusan National University, Busan, technique to segment document data into smaller units, as a means to enhance
South Korea.
Email: [email protected]
the efficiency of the deep learning process. Next, we use a transfer learning
algorithm to systematically extract document features. The unique character-
Funding information
istics of the document data are then represented as latent vectors. Finally, the
National Research Foundation of Korea,
Grant/Award Number: similarity between two handwritten documents is quantified via the cosine sim-
2021R1C1C100711111; Center for ilarity between the two latent vectors. We illustrate the use of the proposed
Statistics and Applications in Forensic
Evidence, Grant/Award Numbers:
method by implementing it on a variety of collections of handwritten documents
70NANB20H019, 70NANB15H176; Texas with different attributes, and show that in most cases, we can accurately classify
A&M University: The Hagler Institute for pairs of documents into same or different author categories.
Advanced Studies
KEYWORDS
autoencoder, bootstrapping, forensic science, handwriting verification, siamese network,
Vision Transformer
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any
medium, provided the original work is properly cited and is not used for commercial purposes.
© 2024 The Authors. Statistical Analysis and Data Mining: The ASA Data Science Journal published by Wiley Periodicals LLC.
Stat Anal Data Min: The ASA Data Sci Journal. 2024;17:e11660. wileyonlinelibrary.com/sam 1 of 19
https://doi.org/10.1002/sam.11660
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 19 KIM et al.
F I G U R E 1 Examples of
handwritten datasets separated by
signature and document; (A) AND
dataset [6], (B) Center for Statistics and
Applications in Forensic Evidence
handwritten dataset [8], (C) Center of
Excellence for Document Analysis and
Recognition signature dataset [30], (D)
Computer Vision Lab dataset [21]; (A,
C) is signature and (B, D) is document.
According to the European Network of Forensic Sci- automatically extract features that enable classification of
ence Institutes (ENFSI [13]), handwritten data can be images into predefined classes, or quantification of the
broadly categorized into two types. The first type is doc- similarity between two images. While it is possible to
uments, which include letters, notes, and other stan- define features that characterize handwriting by many
dard written texts. Documents contain features that can other means, deep learning models tend to extract fea-
help identify authors, including handwriting habits, letter tures in an objective and efficient manner. Models that
shapes, connection patterns, and handwriting consistency. have been pretrained on hundreds of thousands, or even
These features can be divided into character-, word-, or millions of images which include the Residual Network
sentence-level features. Character-level features include (ResNet) model [17], the Efficient Network (EFF) model
slant, height, roundness, and others. Word-level features [32], and the Vision Transformer (VIT) model [11] enable
focus on the individual characteristics within words, such approaches such as transfer learning, which can be use-
as letter shapes, the spacing between letters, and the for- ful in problems such as ours. Our research leverages the
mation of each letter in various positions within a word. advantages of deep learning to extract complex latent fea-
Sentence-level features, on the other hand, examine the tures of handwriting to quantify similarity of two or more
overall structure and layout of the text, including the spac- documents.
ing between words, the alignment of text on the line, and
the overall organization of the writing on the page.
The second type of document consists mostly of sig- 2 PREVIOUS WO RK
natures, which are unique marks left by individuals on
legal documents, bank transactions, or other important 2.1 Guidelines
papers. Also known as autographs, these are a set of
an individual’s handwriting that signify recognition and ENFSI [26] is an organization established to foster collab-
responsibility for documents with legal effect. Examples oration among national forensic institutes, and to share
of signature and document data can be found in Figure 1. expertise and best practices. ENFSI provides international
Figure 1A shows an example from the AND dataset [6], standards for forensic science, enabling forensic laborato-
and Figure 1C shows an example from the Center of Excel- ries to compare standard methods with newly proposed
lence for Document Analysis and Recognition (CEDAR) approaches.
signature dataset [30]. Both of these datasets include signa- ENFSI [26] provides international guidelines for hand-
tures. Figure 1B is an example from the Center for Statistics writing verification. As per ENFSI [13], the process of
and Applications in Forensic Evidence (CSAFE) handwrit- handwriting verification can be broadly divided into two
ing database [8], and Figure 1D is an example from the steps: feature analysis and results derivation. The feature
Computer Vision Lab (CVL) database [21]. These last two analysis step is conducted by trained investigators and can
datasets include longer handwritten documents. be further divided into the description of general features
Here, we evaluate handwriting evidence in the form and detailed features. According to Appendix 4 of ENFSI
of documents using deep learning algorithms. Deep learn- [13], general features include style and legibility, general
ing algorithms are well suited for analyzing images, and layout, detailed layout features, detailed baseline, relative
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KIM et al. 3 of 19
size and proportions, relative spacing, and slope. Detailed confirmed these findings. Despite these apparently low
features are defined as pen path and character construc- error rates, Hicklin et al. [18] also reported modest repro-
tion, fluency, pressure, tapering features, variation in pres- ducibility of categorical conclusions across examiners,
sure and connectivity, overall assessment of fluency, range which can be explained by the fact that the process of eval-
of variation, and superimposition. All these characteristics uating handwriting evidence is subjective. This can lead to
fall under visual traits and are subjectively evaluated on different conclusions depending on the investigator.
a 7-point scale, including “missing feature/not compara-
ble.” The results derivation step is referenced in Appendix
5 of ENFSI [13], and is formalized as a test of hypothesis. 2.2 Recent advances in the forensic
Using the case where the questioned document is a signa- analysis of handwritten evidence
ture as an example, the two competing hypotheses H1 and
H2 are stated as follows: In recent years, several authors have proposed the use of
statistical and algorithmic approaches to evaluate hand-
H1. The questioned signature is genuine, that
written documents. These include Crawford et al. [9],
is, it was written by person A.
Johnson and Ommen [20], and Crawford et al. [10]. These
H2. The questioned signature is simulated, three studies utilized the CSAFE handwriting dataset [8]
that is, it was written by a person other than A. or the CVL dataset [21]. The authors used the R pack-
Features extracted from the signature, and from other age “handwriter” [3] for feature extraction, which begins
signatures from a relevant sample, will tend to support by representing handwriting as a collection of graphs.
either H1 or H2. The strength of the evidence in favor of Decomposing handwriting samples into graphical struc-
one or the other hypothesis is typically quantified in the tures is not a new idea (see Bulaku and Shomaker [5]), but
form of a likelihood ratio (LR). In general, the likelihood the subsequent statistical analysis of the graphs is a new
ratio is calculated as the ratio of the chances of observ- idea.
ing the evidence (E) when H1 is true and the chances of Crawford et al. [9] and Crawford et al. [10] address the
observing the evidence when H2 is true. What is of inter- classification problem in a closed-set environment, which
est to jurors is the posterior probabilities of H1 and H2, assumes that the author of the questioned document is
which can be calculated as shown in Equation (1), where included in a finite group of potential writers. The authors
I denotes the contribution of other evidence independent propose a dynamic clustering approach based on k-means
of the handwriting evidence. In Equation (1), the left-most to group graphs in a document into a fixed number K of
term is the LR, the term in the middle is the prior odds in clusters, and use the frequency of graphs assigned to each
favor of H1, and the resulting quantity on the right is the cluster as a document-level feature. This feature becomes
posterior odds in favor of H1. This equation is known as the the multinomial K–dimensional response variable in a
odds-form of Bayes’ Rule and is written as in Equation (1): Bayesian hierarchical model to estimate posterior proba-
bilities of writership of the questioned document for each
Pr(E|H1 , I) Pr(H1 |I) Pr(H1 |E, I) writer in the closed set. The performance of the method
× = (1) was tested by using different subsets of the CSAFE and the
Pr(E|H2 , I) Pr(H2 |I) Pr(H2 |E, I)
CVL databases, and found that as long as the questioned
Likelihood ratio Prior odds = Posterior odds. document had about 20 or 25 words, accuracy was high,
The role of examiners is limited to calculating the LR over 95% or 96% in most cases.
based on the handwriting evidence (E), while the calcula- Johnson and Ommen [20] propose a writer-matching
tion of the posterior odds is the responsibility of the court. solution in an open-set environment, where the writer
In the United States, questioned document examina- of the questioned document can be anyone in a rele-
tion continues to rely on visual inspection and the knowl- vant population. Johnson and Ommen [20] use the same
edge and expertise of the examiner. Examiners report the K–dimensional vector of cluster frequencies in a docu-
results of their evaluation in the form of a categorical con- ment, but now they calculate the differences in the fre-
clusion using a multi-category scale similar to the one quencies observed in the questioned document and in a
used by European examiners. Sita et al [29]. conducted an sample of writing obtained from the defendant. The K dif-
experiment to assess the difference in handwriting analy- ferences are combined into a single similarity score using
sis skills between laypeople and trained investigators. They a random forest [4]. Given reference distributions of the
found that laypeople made errors in 19.3% of cases, while value of the similarity score that are computed from pairs
investigators erred in approximately 3.4% of cases, which of documents known to have been written by the same per-
suggested that questioned document examiners tend to son or by different individuals, Johnson and Ommen [20]
have low error rates. A recent black-box study [18] largely propose calculating a score-based likelihood ratio (SLR)
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 19 KIM et al.
F I G U R E 2 Overview of the model proposed to determine whether two documents were written by the same person by sequentially
undergoing stages of preprocessing, feature extraction, latent vector generation, and similarity calculation.
to determine whether the evidence supports the same or architecture of the model we propose is shown in Figure 2,
different writer hypothesis. to perform handwriting verification using deep learning.
An attractive feature of the methods we described In the first step, the two handwritten documents to be
above is their interpretability. Crawford et al. [9] and Craw- compared undergo a preprocessing stage, which involves
ford et al. [10] fit a statistical model to semantically sensi- splitting both documents into smaller pieces. Next, we use
ble features. Johnson and Ommen [20] combine features pretrained deep learning models to extract features from
using a random forest and therefore sacrifice some inter- these smaller-sized images. These features are then com-
pretability, but still retain the ability to determine which pressed into latent vectors using either an autoencoder or
features are most discriminating. While these approaches a siamese network structure. Finally, we calculate a sim-
can in addition be quite accurate, their limitation is that ilarity score for the latent vectors extracted from the two
they fail when the questioned document is short (e.g., a documents. This similarity score can then be used by the
few words as in a threatening note, or a signature). Craw- examiner to reach a decision regarding writership of the
ford et al. [10] showed that accuracy plummets from a high documents. In the next few sections, we describe each of
of about 97% to a low of about 75% when the questioned the steps in the model in more detail.
document goes from four to one sentence in length.
At least two authors have proposed using neural nets
for the forensic evaluation of handwritten documents. Fiel 3.1 Preprocessing step
and Sablatnic [14] used a convolutional neural net to
extract features from handwriting and then use those fea- In our work, we used scanned images of handwritten doc-
tures to compare documents. More recently, Marcinowski uments from two databases: the CSAFE database, where
[23] proposed constructing a top-interpretable neural net the average size of an image was 2500 × 2800, and the CVL
which he tested on a small dataset with promising results. database, where the average image size was 2500 × 1400.
None of the two sets of authors progressed beyond showing To prepare these images for input into the deep learning
that deep learning methods have great potential and can feature extractor, we considered three different methods to
help overcome some of the limitations of the purely sta- split images into smaller analytical units.
tistical approaches, albeit at the expense of interpretability
and transparency.
F I G U R E 3 Schematic diagram of simple split process: (1) Padding the document to make it a multiple of the desired split size, (2)
Creating boundary lines in a grid pattern at intervals of the desired split size, and (3) Individually saving each segmented image.
1. Add padding to set the original image to a multiple of of sub-images when the original document is short. In the
the sub-image size we wish to obtain from the split. case of the simple split, we could reduce the size of each
2. Establish boundaries for an even split. sub-image, but only at the expense of the amount of writ-
3. Cut the original image to obtain the smaller ing that is included in each. One approach to overcome this
sub-images. problem is to implement a resampling method such as the
bootstrap [1, 12].
The bootstrap is a resampling approach that is typically
3.1.2 Text detection used to approximate the sampling distribution of almost
any statistic. Simple bootstrapping consists in repeatedly
The text detection method utilizes an optical character resampling with replacement from some population to
recognition (OCR [28]) approach. An appealing attribute draw inferences about population parameters. Here, we
of this method when implemented on handwritten docu- propose an algorithm that applies the idea of bootstrapping
ments is that the resulting sub-images mostly contain full to handwriting, repeatedly resampling small sub-images
words. This is in contrast to the simple split, where each from the handwritten document. When the number of
sub-image can include portions of words and of charac- bootstrap samples, denoted by n, is sufficiently large, col-
ters. While here the extracted images contain words, the lection of sub-samples are likely to improve over the simple
text detection approach does not preserve sentence level split at least in terms of information content. Moreover,
features. There are many different text detection models by varying the size of the bootstrapped images, it is possi-
available in the literature; we used the Character Region ble to extract sentence level features that the simple split
Awareness for Text Detection (CRAFT [2]) model devel- and the text detection methods cannot preserve. The boot-
oped by Naver Cloud Virtual Assistant (CLOVA) AI. The strap method we propose is illustrated in Figure 5, and the
steps in this text detection split method are shown in algorithm can be found in Algorithm 1. The steps in the
Figure 4, and the specific approach is as follows: approach are as follows:
1. Use the text detection model on a document image to
1. Use text detection to find the areas where text exists.
define a rectangular area containing each word.
2. Repeatedly resample the original image to obtain
2. Each rectangle is a new sub-image.
sub-images of the desired size from the area where text
3. Resize the sub-images to have equal size.
exists.
When handwriting is disconnected, text detection
methods may split a word into two or more rectangular We use the idea of information content in each
areas. sub-image to decide whether to keep a specific sub-image
or to continue resampling. The images of handwritten doc-
3.1.3 Document split using bootstrapping uments in the databases we use are composed of black
text on a white background. Typically, pixels with inten-
The simple split method and the text detection method sity value close to 0 are black, and those with intensity
that relies on text identification can result in a low number value close to 255 are white. To characterize the amount of
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 19 KIM et al.
F I G U R E 4 Schematic diagram of text detection process: (1) Using a detection model to identify the regions of individual words in the
document, (2) Extracting small images from the regions of individual words, and (3) Resizing the extracted images to the desired split size.
F I G U R E 5 Schematic diagram of bootstrapping process: (1) Document cropping, (2) A bootstrapping approach for segmenting the
document into smaller components, (3) Determination of a threshold through an analysis of pixel value distributions, and (4) Resampling of
bootstrapped images when the discernible handwriting information falls below the established threshold.
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KIM et al. 7 of 19
Algorithm 1. Proposed document split method using evaluation of questioned documents, there is a dearth of
bootstrapping labeled samples on which to train a model, so we rely
on transfer learning by using pretrained models. In this
Require: Large image I, cropping size s, number of itera-
work, we extracted features using three pretrained mod-
tions n.
els: ResNet [17], EFF [32], and VIT [11]. The three models
Ensure: Resampled n cropped image samples.
are used widely, and have been shown to have good perfor-
1: for i = 1 to n do
mance (see, [27]). We briefly describe each model below.
2: Randomly select a region Ri of size s from image I.
3: Crop region Ri and save as a new image sample Si .
4: end for 3.2.1 ResNet: Residual Network
5: For each image sample Si , calculate the sum of pixel val-
ues where black pixels are close to 255 and white pixels The Residual Network (ResNet) [17] model was devel-
are close to 0. This average represents the information oped to address the vanishing gradient problem and per-
content of Si . formance degradation in deep neural networks. Unlike
6: Calculate the average 𝜇 and standard deviation 𝜎 of the conventional deep neural networks, which tend to become
information content from the n image samples Si . harder to train and decrease in performance as they grow
7: Set the threshold T as 𝜇 − k × 𝜎 where k is a constant deeper, ResNet can be effectively trained even in deep
corresponding to the user-selected percentile from the networks. The core architecture of ResNet is “Residual
standard normal distribution table. connections” [17]. To make residual connections, the net-
8: for each image sample Si do work adds the input of each layer directly to its output
9: if information content of Si < T then several layers ahead, thereby allowing the network to learn
10: Remove Si . only the residual between the input and output.
11: Resample a new image sample based on T and
save as Si .
12: Continue this resampling process until the 3.2.2 EFF: Efficient Network
information content of the new Si is greater than
or equal to T. The Efficient Network (EFF) [32] model was developed
13: end if to maximize both efficiency and performance in deep
14: end for learning. While CNN-based structures focused on individ-
ually adjusting depth, width, and resolution, the EFF is
based on the concept of “Compound scaling” [32], a new
text in a sub-image, we calculate the information content method that simultaneously adjusts these three elements.
for each sub-image. To do so, we invert the pixel values, The model exhibits high performance even when compu-
so that pixels with low intensity classified as white and tational resources are limited. Despite the small model size
those with intensity close to 255 are classified as black. and low computational cost, EFF can be quite accurate for
The average pixel value in a sub-image is defined as the image classification and other computer vision tasks.
information content of that image. The histogram on the
bottom-left panel in Figure 5 shows the distribution of
3.2.3 VIT: Vision Transformer
information content obtained from n bootstrapped images.
We discard and resample those images with information
The Vision Transformer (VIT) [11] model applies the
content below a given threshold.
Transformer [34] model, which arose in the field of nat-
ural language processing. Before VIT, the field of image
3.2 Extracting features from processing was largely reliant on convolutional neural net-
the documents in the comparison works (CNNs). The VIT model for image recognition prob-
lems is different, in that it converts images into sequence
Deep learning algorithms extract information from orig- data for processing. To do so, the model segments an
inal images and transform that information into features image into multiple small patches and then transforms
which concisely represent the input image. In this step, each patch into an input sequence for the transformer.
the model identifies important patterns, characteristics, These patches are then processed through the transformer
and structures in images, which are then summarized into model’s attention mechanism [34], which allows it to learn
feature vectors. To learn how these features are associ- about the context of the entire image and the relationships
ated to some output, models are trained on vast databases between each patch. Positional encoding [34] provides the
of labeled images. In applications such as the forensic model with spatial location information of each patch,
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 of 19 KIM et al.
F I G U R E 6 The structure of latent vector generators; (A) Input image goes through a feature extractor and encoder layer that outputs a
latent vector which then passes through decoder, (B) A pair of input images pass through the feature extractor to output a latent vector for
each document and from which the model quantifies the similarity between the two vectors.
which is important for the model’s understanding of how a decoder as in Figure 6A. The encoder compresses the
the various parts of the image interact. input data into a lower-dimensional representation in the
latent space, while the decoder reconstructs it back to its
original high dimensional form. Autoencoders are trained
3.3 Generating latent vectors to minimize the differences between the input and the
regenerated image. In multi-classification tasks, the loss
Unobserved latent vectors provide a lower-dimensional function is often the multi-cross entropy (MCE) shown in
representation of an image by compressing the feature vec- Equation (2):
tors even further. The latent vectors are supposed to con-
tain the essential attributes of the original image, allowing ∑
K
( (k) )
classification of images into multiple classes, generation of ̂
MCE(y, y) = − y(k) log ̂
y . (2)
new images, or finely tuned comparison of images. k=1
the process of generating a latent vector using a siamese /cvl-databases. Both databases include writing samples
network structure. Siamese networks are useful when the from a large number of individuals. We briefly describe
goal is to compare two inputs and determine whether they each below.
may belong to the same class. For a binary classification
task, a commonly used loss function is the binary cross
entropy (BCE) shown in Equation (3): 4.1 Center for statistics and forensic
evidence database
BCE(y, ̂ y) + (1 − y) log(1 − ̂
y) = −(y log(̂ y)). (3)
The CSAFE handwriting data [8] was collected in 2020 at
In Equation (3), y and ̂
y denote the true label (0 or 1) Iowa State University. The dataset is continually updated,
and the predicted probability of same class, respectively. and in this study, version 4 was used. Version 4 includes
The total loss is the mean of the BCE loss computed for data from 344 individuals. Each participant provided
each pair of images. handwriting samples on three occasions separated by at
least 3 weeks. In each of those occasions, participants
were asked to copy three different prompts three times
3.4 Quantifying similarity between two each, in random order. The three prompts include LND
latent vectors (“The London Letter,” 86 words), WOZ (“The Wonderful
Wizard of Oz by L. Frank Baum,” 67 words), and PHR
Once the important features of input images have been (“Short Common Phrase,” 14 words). Figure 7 shows sam-
represented in lower dimensional space via latent vec- ples of the three types of phrases from the CSAFE data.
tors, the next step is to quantify the similarity between LND and WOZ are relatively long documents with over
those vectors. One measure of the similarity between two 60 words, while PHR is a short document with only 14
vectors is the cosine similarity, which numerically com- words. We used the samples contributed by 300 randomly
pares the directionalities of two vectors by calculating the selected participants for training the models. Writing sam-
cosine value of the angle between them. This value ranges ples from the remaining 44 participants were used for
between −1 and 1, where 1 indicates that the vectors testing.
are in the same direction, 0 indicates orthogonality, and
−1 indicates that the vectors point in opposite directions.
An equation for calculating cosine similarity between two 4.2 Computer vision lab dataset
vectors A and B is given in Equation (4):
The CVL dataset [21] was collected in 2013 at the Vienna
Cosine similarity(A, B) University of Technology. Here, we use version 1.1 of the
∑n dataset. Writing samples were collected from 310 individu-
A⋅B i=1 Ai Bi
= =√ √∑ . (4) als, with 27 of them writing seven documents each and 283
||A||||B|| ∑n 2 n 2
A
i=1 i B
i=1 i writing five documents each. We used five documents from
309 individuals, excluding anyone with missing data. The
here, the dot product A ⋅ B represents the sum of the prod- prompts used were ROD (“A Romance of Many Dimen-
ucts of the vector elements, while ||A|| and ||B|| denote the sions by Edwin A. Abbott,” 90 words), MAC (“Macbeth by
magnitudes of vectors A and B, respectively. Ai and Bi are William Shakespeare,” 47 words), MAI (“Mailufterl from
the ith elements of vectors A and B, and n represents the Wikipedia,” 74 words), OOS (“Origin of Species by Charles
dimension of the vectors. Darwin,” 52 words), and POD (“The Picture of Dorian
Gray by Oscar Wilde,” 65 words). Figure 8 shows samples
of the five types of prompts from the CVL data. Unlike the
4 APPLICAT ION: QUANTIFYING CSAFE database, the CVL database does not include repli-
SIMILARITY BETWEEN TWO cate samples and lacks short prompts. We used the CVL
HANDWRITTEN DOCUMENTS database for testing.
F I G U R E 7 Samples of handwriting data from the Center for Statistics and Forensic Evidence handwriting dataset [8]. Samples are
“WOZ,” “LND,” and “PHR” respectively.
F I G U R E 8 Samples of handwriting data from the Computer Vision Lab dataset [21]. Samples are “ROD,” “MAC,” “MAI,” “OOS,” and
“POD” respectively.
desired—classify the pairs into one of two classes: same or linear layer compresses the 100-dimensional vectors into
different author. 50-dimensional generalized latent vectors. Rectified lin-
The autoencoder architecture, is shown in Figure 9A, ear unit (ReLU) functions were applied after each linear
was utilized for the preprocessing of document data. In the layer to capture the nonlinearity of the data. To train the
preprocessing stage, a document was segmented into small model, the decoder reconstructs the input observations by
images. These cropped images were then passed through passing the 50-dimensional latent vectors through linear
a feature extractor to be represented as vectors. We used layers that output 100-, 200-, and 300-dimensional vectors,
samples from 300 contributors to the CSAFE database for respectively.
training, so the feature extractor outputs 300-dimensional The testing step is shown in Figure 9B. The trained
vectors. model consists of the encoder layers including the
The 300-dimensional vectors obtained after feature layer that outputs the generalized latent vector in
extraction are compressed to 200 dimensions through a 50-dimensional space.
linear layer that takes 300 dimensions as input. Another To test the model, pairs of documents to be compared
linear layer then takes 200-dimensional vectors as input are first split into cropped images during the preprocess-
and outputs 100-dimensional vectors. Finally, one more ing stage and are then represented as latent vectors using
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KIM et al. 11 of 19
FIGURE 9 Used architecture of autoencoder structure in this paper; (A) Train architecture, (B) Test architecture.
F I G U R E 10 Used architecture of siamese network structure in this paper; (A) Train architecture, (B) Test architecture.
the trained model. One latent vector is obtained from each Figure 10A shows the architecture of the Siamese net-
cropped image, and are then averaged, resulting in a final work we implemented next. Here, cropped images from
50-dimensional vector representing the handwritten doc- each of the two documents in the comparison are paired
ument. The degree of similarity between two documents is and passed through a feature extractor. The feature extrac-
quantified via the cosine similarity of these latent vectors, tor is configured to output a 300-dimensional vector, mir-
and can be used to determine whether the two documents roring the autoencoder structure. Each pair of cropped
may have been written by the same person. images is represented by a pair of 300-dimensional vectors
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 of 19 KIM et al.
which are compressed into 50 dimensions via a lin- TA B L E 1 Performance comparison between latent vector
generators.
ear layer. By computing the difference between these
latent vectors we obtain a single 50-dimensional vector Latent
that summarizes the similarity of the two input docu- vector Feature
generator Preprocess extractor AUC Accuracy
ment images. To complete training of the model, this
50-dimensional vector undergoes a series of transforma- Autoencoder Bootstrapping VIT 0.972 90.84
tions: passes through a linear layer that takes 50 dimen- EFF 0.940 87.40
sions as input, followed by ReLU functions and dropout
Text detection VIT 0.937 87.78
[31], then another linear layer that takes 50 dimensions
EFF 0.938 86.65
as input and ultimately outputs a scalar that takes on the
value 1 if both documents are classified into “same writer” Simple split VIT 0.909 84.87
class and the value 0 otherwise. Figure 10B shows the EFF 0.911 84.26
testing process, which equals the testing process of the Siamese Bootstrapping VIT 0.872 79.50
autoencoder structure.
EFF 0.881 81.27
We constrained image size to 224 × 224 to match the
dimensions on which the pretrained feature extractors Text detection VIT 0.951 87.70
introduced in Section 3.2 were pretrained. Regarding EFF 0.928 85.37
document splitting, we used the bootstrap sampling Simple split VIT 0.955 88.93
approach we proposed earlier, and obtained 1000 cropped
EFF 0.898 81.45
images from the CSAFE handwriting prompts LND and
WOZ, and 200 cropped images from the shorter prompt, Note: VIT: Vision Transformer 16patch, EFF: EfficientNet_B0.
T A B L E 2 Performance comparison of Center for Statistics T A B L E 3 Performance analysis of the Center for Statistics
and Applications in Forensic Evidence data focusing on changes and Applications in Forensic Evidence data for each prompt using
between preprocess and feature extractor when using autoencoder Strategy A.
structure as latent vector generator.
Prompt AUC Accuracy
Feature
Preprocess extractor AUC Accuracy All 0.987 94.45
F I G U R E 12 Graphical representations of the Center for Statistics and Applications in Forensic Evidence test data reduced to two
dimensions, achieved by applying dimensionality reduction techniques on 50 vectors when using Strategy A; (A) t-Stochastic Neighbor
Embedding results, (B) Uniform Manifold Approximation and Projection results, and (C) Principal Component Analysis results.
only showed a performance drop of about 3% in accuracy that summarize the information of the original variables
when classifying the PHR document with only 14 words, while minimizing loss and maximizing the variance of the
when compared to the LND and WOZ both of which have resulting data points. The process involves quantifying the
more than 60 words. We conclude that Strategy A is robust explanatory power of the reduced dimensions, by how well
to document length. they explain the variance of the original data. The explana-
tory power of each principal component is expressed as a
proportion of the total variance.
4.4.4 Data reduction and visualization Figure 12 shows how to visualize the CSAFE test doc-
uments using the three methods introduced above. Visu-
In the analysis of high-dimensional data, dimensionality alizing all 44 authors in the test dataset would result in
reduction techniques are essential for enabling visual- insufficient colors to represent labels, so we randomly
ization and interpretation of the observations. Here, we selected nine authors for illustration. Figure 12A shows
use t-distributed Stochastic Neighbor Embedding (t-SNE the results of applying t-SNE to the input data, where we
[33]), Uniform Manifold Approximation and Projection can distinguish the global structure between labels and the
(UMAP [24]), and Principal Component Analysis (PCA variability within labels. Figure 12B shows the results after
[16]) to reduce the dimensionality of the documents in the applying UMAP with very dense clusters, where we see
CSAFE database from 50 dimensions to two dimensions that UMAP preserves the global structure well but does not
for visualization. explain the internal variability of the labels as effectively as
The t-SNE preserves the local neighbor structure by t-SNE. Finally, Figure 12C displays the PCA results. Com-
maintaining the similarity of high-dimensional vectors in pared to the t-SNE and the UMAP groupings, we see that
lower dimensions. The method involves choosing a refer- clustering is not as effectively performed, which may be
ence point and calculating the distance to all other data due to PCA’s limitations in capturing nonlinear relation-
points, then selecting the corresponding t-distribution ships within the data. We find that t-SNE was the most
value, to group similar data points together. If the size of effective approach to visually separate clusters that cor-
the dataset is n, the computational complexity increases respond to labels in the CSAFE handwritten documents
by the square of n, and each run yields different visual- database.
ization results. The t-SNE can only reduce data to two Both the t-SNE and the UMAP clusters in Figure 12
or three dimensions. The UMAP, based on a neighbor- confirm the fundamental assumption that a person’s hand-
ing graph, creates a graph from data in high-dimensional written characters are discriminating and can help identify
space and then projects this graph onto a lower dimen- the writer of a questioned document. When embedded
sion. In general, UMAP is faster than t-SNE and is in two-dimensional space, the latent vectors that repre-
not limited by the size of the embedding space, mak- sent the important attributes of handwriting show that
ing it a more generally applicable dimensionality reduc- the between-writer variance is substantially larger than
tion algorithm. PCA is a statistical method that reduces the within-writer variance, a result consistent with the
the size of high-dimensional data while preserving the high accuracy that was achieved when classifying pairs of
most important information. This method finds new axes documents into same or different writer classes.
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KIM et al. 15 of 19
T A B L E 4 Performance comparison of Computer Vision Lab strength of the evidence in support of one or the other
data focusing on changes between preprocess and feature extractor
propositions, and reported as the odds of observing the
when using autoencoder structure as latent vector generator.
evidence if H1 rather than H2 is true.
Feature For pattern comparison disciplines such as handwrit-
Preprocess extractor AUC Accuracy
ing, where highly multivariate features are summarized
Bootstrapping + VIT 0.993 97.15 into a single similarity metric, the likelihood ratio is
Text detection approximated by what is known as a score-based likeli-
EFF 0.984 94.26
hood ratio (SLR), which represents the odds of observing
ResNet 0.991 96.26
a given degree of similarity between two items if the items
Bootstrapping VIT 0.992 96.55
have a common source. To compute the probabilities in
EFF 0.984 94.21 the numerator and denominator of the SLR, one needs to
ResNet 0.992 96.97 know the distributions of the similarity metric under each
VIT 0.992 96.96
of the two propositions. The distributions can be approxi-
Text detection
mated experimentally using many pairs of items for which
EFF 0.958 85.11
source is known.
ResNet 0.913 83.79 Figure 13 shows the distributions of similarity values
Simple split VIT 0.990 95.99 calculated using the trained Strategy A model and the test-
EFF 0.983 94.11
ing datasets constructed from the CSAFE and the CVL
databases. The left panel in the figure shows the distribu-
ResNet 0.991 96.23
tions calculated using the CSAFE testing subset, and the
Note: VIT: Vision Transformer 16patch, EFF: EfficientNet_B0, ResNet: right panel shows the distributions obtained from the CVL
ResNet 18.
testing dataset. In both cases, the blue distribution cor-
responds to similarities computed for pairs of documents
4.5 Results: Testing the models on CVL with different writers and the red distribution corresponds
writers to similarities observed when a pair of documents were
written by the same person.
To test how the deep learning models—that were trained As expected, similarity tends to be higher when docu-
on CSAFE handwritten documents—perform on a com- ments are written by the same person, although that is not
pletely different set of samples, we implemented them on always the case. Ideally, the two distributions would show
a subset of the CVL dataset, and attempted to classify pairs no overlap so that given a similarity value s computed from
a questioned pair of documents in a case, Pr(s|Hi ) >>>
of documents into same or different writer. Some results ( )
are shown in Table 4. Unlike the CSAFE database, the Pr s|H𝑗 , for i ≠ 𝑗 = 1, 2. The resulting SLR in this case
CVL dataset does not include replicates for each prompt. would then clearly indicate whether the evidence supports
Additionally, none of the prompts are as short as the PHR one or the other hypothesis. In our application, there is
prompt of in the CSAFE database. Consequently, classifi- minimum overlap. For an observed similarity s, the SLR
cation accuracy of pairs of CVL samples was high overall, can be approximated as the ratio of the heights of the two
and even higher when compared to testing results for the distributions evaluated at s.
CSAFE data. The model structure that showed the best
performance on the CSAFE data, which we called Strat-
egy A, also exhibited the best performance when using the 5 DISCUSSION
CVL data for testing.
In forensic practice, examiners are beginning to move Deep learning methods show real promise in forensic
away from offering categorical opinions (e.g., same or dif- applications. Here, we have explored different model
ferent writer) toward the use of probabilistic statements, architectures and have evaluated their classification accu-
often in the form of a likelihood ratio (see, [10, 20, 26]). On racy when applied to images of handwriting.
this context, the LR in its simplest form is calculated as: We separately considered the different components of
the model to decide on a combination that exhibited good
Pr(E|H1 ) performance when implemented on different sets of doc-
LR = , uments, and on documents of different length. Overall,
Pr(E|H2 )
we considered four different approaches for preprocess-
where E denotes “evidence,” and H1 and H2 are the ing input images, three pretrained models to extract fea-
competing propositions of same and different source, tures from those images, and two architectures to com-
respectively. The LR is used as a metric to calculate the press information into a low-dimensional latent space and
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16 of 19 KIM et al.
F I G U R E 13 The distribution of cosine similarity when using Strategy A. Blue indicates different pairs, and red indicates the same
pairs; (A) results for the Center for Statistics and Applications in Forensic Evidence dataset, (B) results for the Computer Vision Lab dataset.
F I G U R E 14 2D visualization applying t-Stochastic Neighbor Embedding dimensionality reduction on Center for Statistics and
Applications in Forensic Evidence data when using Strategy A; (A) visualization for nine randomly selected authors out of 44, (B) graphs
emphasizing authors numbered 321 and 336. The labels in the upper graph are divided by prompt type: WOZ is represented with circles,
LND with triangles, and PHR with squares. The labels in the lower graph are divided by season: season one is represented with circles,
season two with triangles, and season three with squares.
quantify similarity using latent vectors of features. As is nicely visualized using t-SNE as shown in Figure 14. In
typically the case with deep learning algorithms and as Figure 14A, we see that writer #321 is split into two dis-
discussed in Section 4.1, models involve hundreds or even tinct clusters. To understand the reason, we used shapes
thousands of hyperparameters. We did not attempt to opti- to label instances where the documents in the compari-
mize the value of the hyperparameters in the models we son involved different prompts (top panel in Figure 14B) or
fitted to the data, meaning that it might be possible to different writing occasions (bottom panel in Figure 14B).
improve the performance of the models via more careful While no interesting patterns were found when looking
tuning of model parameters. at prompts, we did find that one of the clusters was com-
Even though the preferred model showed high classifi- posed of documents written during the first data collec-
cation accuracy in both the CSAFE and the CVL datasets, tion occasion and the other cluster grouped documents
there were instances of misclassification. For example, written on the second and third occasions. To investigate
in the case of the CSAFE data, misclassifications some- further, we looked at samples from writer #321, shown
times occurred when the two documents were written in Figure 15. It appears that the writing is darker in sea-
by the same person but on different occasions. This was sons two and three compared to season one. This suggests
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KIM et al. 17 of 19
F I G U R E 15 Example divided into two groups when visualized using t-Stochastic Neighbor Embedding in Center for Statistics and
Applications in Forensic Evidence data. Seasons two and three have thicker and darker writing, compared to season one.
that the person used different writing tools and that writ-
ing tool can be a factor that affects model performance.
One possible approach to ameliorate the effect of writing
tool might be to consider pixel normalization as part of the
preprocessing stage. Writer #336 was also represented by
two distinct (but close) clusters. In this case, we observed
that word-level features representing individual charac-
ters were very similar, but there were differences between
sentence-level features in documents written by #336. This
may have been a result of the preprocessing step, in which
we uniformly split the original document data into small
224 × 224 images, which does not preserve sentence-level
information. A possible solution may be to disaggregate
the original image into small images of varying size for
bootstrapping, and then resizing them.
Finally, we revisited the PCA-based dimensionality
F I G U R E 16 Graph of the principal component explained
reduction that we applied to the latent vectors obtained
variance for Center for Statistics and Applications in Forensic
from CSAFE and CVL test sets. Results are shown in Evidence and Computer Vision Lab test data represented as 50
Figure 16. We found that reducing 50-dimensional vec- vectors when using Strategy A. The red line represents the
tors to about 20 dimensions explains over 99.9% of the cumulative explained variance, while the blue bars indicate the
variability. This suggests that the feature extractor in the explained variance for each principal component.
model captures about 20 inherent characteristics of a per-
son’s handwriting. Additionally, the fact that the explana-
tory power of the first principal component is below 20% The approach we proposed is readily adaptable to dif-
indicates that it would be difficult to represent the com- ferent applications without significant constraints. For
plex features of handwriting with low-dimensional vectors example, comparing similarity between two or more
alone. images is a problem that arises in the analysis of satellite
The deep learning-based image comparison analysis and other remote-sensing images, the analysis of biologi-
method proposed here has two strengths. First, we apply cal tissues and blood smears, the comparison of patterns
a bootstrapping approach to the input image, randomly across textile samples in the fashion industry, the imple-
dividing large-sized images into smaller fragments to min- mentation of biometric authentication systems, and inves-
imize information loss. This enables more efficient data tigations into copyright infringement within social media
processing and analysis. Second, we systematically opti- content, to name a few. In these diverse fields, images of
mized the various steps in the image comparison process interest may be very large, or very small, or be available in
by carefully selecting and combining deep learning algo- small numbers, situations in which standard deep learn-
rithms for image analysis, automatically extracting image ing methods may be less effective. The method we propose,
features, converting these features into numerical vectors, with a modular structure and different choices for the com-
and quantifying the similarity of the transformed vectors. ponents that make up each module is particularly well
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18 of 19 KIM et al.
suited for this type of nonstandard applications and can be 11. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
effectively applied by following the protocols we describe. T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly,
J. Uszkoreit, and N. Houlsby. An image is worth 16 × 16 words:
Transformers for image recognition at scale. arXiv Preprint
ACKNOWLEDGMENTS
arXiv:2010.11929. 2020.
Kim and Park’s work was supported by the National
12. B. Efron, Nonparametric estimates of standard error: The jack-
Research Foundation of Korea (NRF) grant funded by the knife, the bootstrap and other methods, Biometrika 68 (1981), no.
Korea government (MSIT) (No. 2021R1C1C100711111). 3, 589–599.
Alicia Carriquiry’s work was partially funded by the 13. European Network of Forensic Science Institutes (ENFSI).
Center for Statistics and Applications in Forensic Best Practice Manual for the Forensic Examination of Hand-
Evidence (CSAFE) through cooperative Agreements writing – Edition 04. Accessed November 21, 2023. URL 2022
70NANB15H176 and 70NANB20H019 between NIST http://www.enfsi.eu/document/best-practice-manual-forensic
-examination-handwriting-edition-04.
and Iowa State University, which includes activities car-
14. S. Fiel and R. Sablatnig, “Writer identification and retrieval using
ried out at Carnegie Mellon University, Duke University, a convolutional neural network,” Computer analysis of images
University of California Irvine, University of Virginia, and patterns: 16th international conference, CAIP 2015, Valletta,
West Virginia University, University of Pennsylvania, Malta, September 2–4, 2015, proceedings, part II 16, Springer,
Swarthmore College, and University of Nebraska, Lincoln. Berlin, Germany, 2015, 26–37.
Carriquiry is also partially funded by a Fellowship from 15. I. Goodfellow, Y. Bengio, and A. Courville, Deep learning,
the Hagler Institute for Advanced Studies at Texas A&M MIT Press, Cambridge, Massachusetts URL, 2016. http://www
.deeplearningbook.org.
University.
16. T. Hastie, R. Tibshirani, and J. Friedman, The elements of statis-
tical learning, 2nd ed., Springer, New York, 2009.
DATA AVAILABILITY STATEMENT 17. K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning
The data that support the findings of this study are openly for image recognition, Proc. IEEE Conf. Comput. Vis. Pattern
available in CSAFE HANDWRITING DATABASE at https: Recognit. (2016), 770–778.
//data.csafe.iastate.edu/HandwritingDatabase/. 18. R. A. Hicklin, L. Eisenhart, N. Richetelli, M. D. Miller, P. Belcas-
tro, T. M. Burkes, C. L. Parks, M. A. Smith, J. Buscaglia, E. M.
Peters, R. S. Perlman, J. V. Abonamah, and B. A. Eckenrode,
REFERENCES Accuracy and reliability of forensic handwriting comparisons,
1. S. Abney, “Bootstrapping,” Proceedings of the 40th annual meet- Proc. Natl. Acad. Sci. 119 (2022), no. 32, e2119944119.
ing of the association for computational linguistics, Associa- 19. R. A. Huber and A. M. Headrick, Handwriting identi-
tion for Computational Linguistics, Stroudsburg, Pennsylvania, fication: Facts and fundamentals, CRC press, Boca Raton,
2002, 360–367. Florida, 1999.
2. Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, “Character 20. M. Q. Johnson and D. M. Ommen, Handwriting identification
region awareness for text detection,” Proceedings of the IEEE/CVF using random forests and score-based likelihood ratios, Stat. Anal.
conference on computer vision and pattern recognition, IEEE, Data Min. 15 (2022), no. 3, 357–375.
Piscataway, New Jersey, 2019, 9365–9374. 21. F. Kleber, S. Fiel, M. Diem, and R. Sablatnig, “Cvl-
3. N. Berry, J. Taylor, and F. Baez-Santiago. handwriter: Handwrit- database: An off-line database for writer retrieval, writer identifi-
ing Analysis in R. R package version 1.0.1. URL 2021 https:/ cation and word spotting,” 2013 12th international conference on
/CRAN.R-project.org/package=handwriter. document analysis and recognition, IEEE, New York City, 2013,
4. L. Breiman, Random forests, Mach. Learn. 45 (2001), 5–32. 560–564.
5. M. Bulacu and L. Schomaker, Text-independent writer identifica- 22. A. Lim and D. Ommen. Handwriting analysis. Proceedings
tion and verification using textural and allographic features, IEEE of the 106th International Association for Identification (IAI)
Trans. Pattern Anal. Mach. Intell. 29 (2007), no. 4, 701–717. Annual Educational Conference, 1. 2022.
6. M. Chauhan, M. A. Shaikh, and S. N. Srihari. Explanation based 23. M. Marcinowski, Top interpretable neural network for hand
handwriting verification. arXiv Preprint arXiv:1909.02548. 2019. writing identification, J. Forensic Sci. 67 (2022), no. 3, 1140–1148.
7. D. Chicco, Siamese neural networks: An overview, Artif. Neural 24. L. McInnes, J. Healy, and J. Melville. Umap: Uniform manifold
Netw. (2021), 73–94. approximation and projection for dimension reduction. arXiv
8. A. Crawford, A. Ray, and A. Carriquiry, A database of handwrit- Preprint arXiv:1802.03426, 3. 2018.
ing samples for applications in forensic statistics, Data Brief 28 25. National Research Council Committee on Identifying the Needs
(2020), 105059. of the Forensic Sciences Community, Strengthening forensic
9. A. M. Crawford, N. S. Berry, and A. L. Carriquiry, A science in the United States: A path forward, The National
clustering method for graphical handwriting components and Academies Press, Washington, DC, USA. URL, 2009. https:/
statistical writership analysis, Stat. Anal. Data Min. 14 (2021), no. /www.nap.edu/catalog/12589/strengthening-forensic-science
1, 41–60. -in-the-united-states-a-path-forward.
10. A. M. Crawford, D. M. Ommen, and A. L. Carriquiry, A 26. European Network of Forensic Science Institutes. The euro-
rotation-based feature and bayesian hierarchical model for the pean network of forensic science institutes (enfsi). Online,
forensic evaluation of handwriting evidence in a closed set, Ann. Accessed November 21, 2023. URL 2023 http://www.enfsi.eu
Appl. Stat. 17 (2023), no. 2, 1127–1151. /about-enfsi.
19321872, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/sam.11660 by Tunisia Hinari NPL, Wiley Online Library on [10/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KIM et al. 19 of 19
27. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, 32. M. Tan and Q. Le, “Efficientnet: Rethinking model
Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and scaling for convolutional neural networks,” International
L. Fei-Fei, Imagenet large scale visual recognition challenge, Int. conference on machine learning, PMLR, New York City, 2019,
J. Comput. Vis. 115 (2015), 211–252. 6105–6114.
28. F. Sabry, Optical character recognition: Fundamentals and appli- 33. L. van der Maaten and G. Hinton, Visualizing data using t-sne,
cations, One Billion Knowledgeable, London, United Kingdom, J. Mach. Learn. Res. 9 (2008), no. 11, 2579-2605.
2023. 34. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.
29. J. Sita, B. Found, and D. K. Rogers, Forensic handwriting examin- Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need,
ers’ expertise for signature comparison, J. Forensic Sci. 47 (2002), Adv. Neural Inf. Proces. Syst. 30 (2017), 6000–6010.
no. 5, 1117–1124.
30. H. Srinivasan, S. N. Srihari, and M. J. Beal, “Machine learning
for signature verification,” Computer vision, graphics and image How to cite this article: J. Kim, S. Park, and
processing: 5th Indian conference, ICVGIP 2006, Madurai, India, A. Carriquiry, A deep learning approach for the
December 13–16, 2006. Proceedings, Springer, Berlin, Germany, comparison of handwritten documents using latent
2006, 761–775. feature vectors, Stat. Anal. Data Min.: ASA Data Sci.
31. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
J. 17 (2024), e11660. https://doi.org/10.1002/sam
R. Salakhutdinov, Dropout: A simple way to prevent neural
networks from overfitting, J. Mach. Learn. Res. 15 (2014), no. 1,
.11660
1929–1958.