0% found this document useful (0 votes)

20 views

Writer Identification Using Machine Lear

Uploaded by

alucacomunicadigital

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Writer Identification Using Machine Lear

Uploaded by

alucacomunicadigital

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Multimedia Tools and Applications (2019) 78:10889–10931

https://doi.org/10.1007/s11042-018-6577-1

Writer identification using machine learning approaches:

a comprehensive review

Arshia Rehman 1 & Saeeda Naz 1 & Muhammad Imran Razzak 2

Received: 1 February 2018 / Revised: 16 August 2018 / Accepted: 20 August 2018 /

Published online: 17 September 2018
# Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract
Handwriting is one of the most common types of questioned writing encountered and
frequently attracts the attention in litigation. Contrary to the physiological characteristics,
handwriting is a behavioral characteristic thus no two individuals with mature handwriting
are exactly alike or an individual cannot produce the others writing exactly. Writing behavior
and individualities are examined for similarities for both specimen and questioned document,
thus, it is very efficient and effective strategy for biometrics. In this paper, we present a
comprehensive review of writer identification methods and intend to provide taxonomy of
dataset, feature extraction methods, as well as classification (conventional and deep learning
based) for writer identification. For ease of reader, we grouped the discussion into English,
Arabic, Western and Other languages from script prospective, whereas, from algorithm and
methods perspective, we grouped the discussion with respect to implementation steps se-
quence. In the end, we highlighted the challenges and open research issues in the field of writer
identification. Finally, we also suggest future direction.

Keywords Writer identification . Multi-script . Features extraction . Deep learning

1 Introduction

Handwriting is an art that is developed by birth in nature and can not be imitated. Thus, no two
persons can generate exactly the same handwriting and that even an individual cannot exactly

* Saeeda Naz
[email protected]

Arshia Rehman
[email protected]
Muhammad Imran Razzak
[email protected]

1
Government Girls Postgraduate College, no.1, Abbottabad, KPK, Pakistan
2
University of Technology Sydney, Sydney, Australia
10890 Multimedia Tools and Applications (2019) 78:10889–10931

reproduce his own handwriting. The late one is call variation that is natural deviation occurred
in an individual’s writings. It is a very strong identifying characteristic of a person and plays a
significant role for forensic document experts in proving someone’s authenticity. Recently,
automatic analysis of handwritten documents has significantly attracted the researchers’
attention, especially in the field of historical document analysis as due to the sheer size of
the handwritten specimen, it will take the long time for forensic expert to manually analyze and
compare the questioned document with all specimen in the database to find imposter. Thus
making an automated system for writer identification could be very useful and reduce the
forensics expert efforts by identifying the text written by suspected writer from the large set of
document with high confidence.
History reveals that each individual has its own writing style that varies from others [166].
From the early childhood age, we start to learn the writing text from the standard “the copy
book” as shown in Fig. 1. The ones writing diverges according to the situation, geographical
location, traditional and historical backgrounds. However, with the passage of time, handwrit-
ing of individual starts change from the learned copy-book style. The writing style is unique for
every person. It is impossible or rare to have the same writing style of two individuals, even an
individual cannot produce exactly the same writing the one he did before. This variation in
handwriting patterns due to individual writing style is known as inter-class variation [4]. These
characteristics serve to discriminate an individuals writing from another’s and reveal the great
interest in the research of writer identification.
With the emergence of information security, handwriting is served as a trait of biometric
and being used to authenticate, validate, verify and identify an individual on the basis of
behavioral physiognomies. It is cheapest way to obtain for identification and carries significant
importance for authentication and authorship of questioned document, identify forgeries,
detect alterations, verify legal documents, signatures and cheques or analyze indented writings.
Different handwriting bio-metrics is used in wider application areas as depicted in Fig. 2. The
purpose of writer identification is to determine the genuine writer from a list of registered
candidates according to the similarity between their hand writings. Thus, people who are
working in the humanities can use writer identification methods to analyze their handwritten
text to determine the writer of a specific document. From the view point of graphology,
handwriting is an insightful means of personality profiling, highlighting the character traits,
tracking the feelings and emotions of a person. Therefore handwriting is also known as brain
writing because the manipulation of writing tool is formed by the order of brain that sends to
nervous system, hand, arm and fingers [15]. Thus, neurological brain pattern represents the
personality traits [174].

Fig. 1 Copybook Style of United State. Available at: http://www.handwriting.org/united-states.html

Multimedia Tools and Applications (2019) 78:10889–10931 10891

Fig. 2 Handwriting bio-metrics types

Not only writer identification and personality analysis, it plays vital role in different
walks of life among them one of its interesting relationship is with neuro-science and
neurological disorders in patients reflected from handwriting [58, 181]. As an example,
Parkinson’s disease detection and diagnosis from handwriting [181], where the effects of
the disorder are manifested in the writing of the patient during early stages. The writing
of the patient tends to get smaller and smaller and at the end the letters might even be
unreadable micrographia [57].
Moving towards the world of forensic analysis, handwriting is used for authentication and
authorship of questioned document, identify forgeries, detect alterations, verify legal docu-
ments, signatures and cheques or analyze indented writings. The oldest mechanism of forensic
examiner is tiresome, thus developing computerized systems for handwriting analysis could
serve as valuable tools in forensic document analysis.
Summarizing, we could say that handwriting carries remarkable evidence about writing
style itself, the writer of handwriting and demographic information like gender, age, nationality
and handiness. With the advancement in information technology, computerized analysis of
handwriting is extensively used in various applications of world since last three decades [55].
The problem of writer recognition and handwriting recognition are quite similar and related
with each other [137, 175].
The objective of this paper is to present overview of various techniques on writer
identification. Section 2 provides the detailed background of writer identification and
verification, section 3 explains the dataset on the subject of writer identification. Then
section 4 presents the literature review in related to steps of writer identification pipeline,
reported for handwritten and online systems. The section 3 and section 4 are starting to
give information about English then proceed by another widely language Arabic and then
other languages like French, Dutch, and Chinese etc. The overview of the paper is
graphically represented in Fig. 3.
10892 Multimedia Tools and Applications (2019) 78:10889–10931

Fig. 3 Overview of the paper

2 Background

Before elaborating writer identification, it is essential to know about writer recognition. Writer
recognition is a branch of behavioral bio-metrics that authenticates individual using handwrit-
ing. It includes writer identification and writer verification. The Writer identification is the
process of finding the genuine writer from a list of other registered candidates based on the
similarity between their handwriting illustrated in Fig. 4.a. The Writer verification is the
process of comparing test handwriting sample with pre-stored known source samples for
authentication [32]. The procedure of writer verification is represented in Fig. 4.b. The Writer
verification is a two-fold or binary classification problem that involves the decision of accept/
reject however, writer identification is a multinomial classification problem and hence it is
considered more challenging [75]. Literature divulges that writer identification and verification
are taken as the central and a core pillar of studying the handwriting variations for bio-metric
purposes [174]. Side by side, new term Writer Retrieval is also emerged. The process of

Fig. 4 Differentiate among writer recognition systems like identification, verification and retrieval systems
Multimedia Tools and Applications (2019) 78:10889–10931 10893

finding all relevant documents of a specific writer is termed as writer retrieval [50]. The Fig.
4.c depicts the frame work of writer retrieval system.
On the basis of acquisition of writing samples, writer identification are generally catego-
rized into two categories: online and offline. Online writing identification is also known as
dynamic method. In online method, handwriting samples are taken through tablets, PDAs,
magnetic pad, smart phones and touch screens etc. Writing samples are kept as trajectories
embodied as time series of two dimensional coordinates. Dynamic features are calculated and
used for identification. Different parameters like writing speed, direction of writing, positions
of pen tips, velocity, angles and pressure etc. are extracted. Online handwriting contains
sequential and spatial information. These features result spatiotemporal parameter space
exemplification of handwriting. On the other hand, offline writing identification is also known
as static method. Writing samples images are scanned from paper, image or document using
scanner. This method based on spatial attributes like words, paragraphs, characters and lines
[101]. Due to the deficiency of sequential information in handwriting and large intra class
variation, offline writer recognition is considered as a harder task.
On the basis of content of writing, offline writer identification is categories into two methods:
text-independent and text-dependent. Text-independent writer identification deals with image of
arbitrary texts that does not depend on fixed text content. On the other hand, the methods for
text-dependent necessitate input image with static (fixed) text contents and compares the input
with registered templates for identification [189]. It is also known as script based or content
based identification. In general, text-dependent methods operate at the character or word level
whereas text-independent methods work on the line or paragraph levels.
Writer identification is relatively essential these days with exponential increase in technol-
ogy and no person deny its applications in range of areas which include, forensic analysis,
historic files and ancient manuscripts and so on. One can develop an authentication system by
the amalgamation of writer identification and verification which can be used to monitor and
regulate the access to certain confidential sites or data in which massive amounts of documents,
forms, notes and meeting minutes are constantly being processed and managed. This system is
valuable as it contain huge knowledge about the identity of the writer additionally. Furthermore,
it can also be used for historical document analysis [68], handwriting recognition system
enhancement [144] and hand held and mobile devices [147]. Summarizing the applications
of writer identification, we can say that its recent development and performance are considered
as a strong tool for physiologic modalities of identification, such as DNA and fingerprints [166].
Writer identification framework has several phases but firstly it is decided that approach is
online, offline, text dependent or text independent. Phases involve in the framework of offline
text independent writer identification encompasses: data acquisition, pre-processing, feature
extraction and classification or identification. Figure 5 depicts the main phases of any writer
identification system.

3 Datasets

Data-set is the cornerstone of any research work. The availability of dataset is one of the essential
prerequisite for development and evaluation in any research domain and same is the case with
handwriting and writer recognition. In the last few years, different databases for character
recognition, word spotting and writer identification have been published in the literature. We
are elaborating numbers of dataset according to the languages in the coming sections.
10894 Multimedia Tools and Applications (2019) 78:10889–10931

Fig. 5 Writer identification framework

3.1 English datasets

English is a standard and old language that is spoken by 1.5 billion of people in the world that
is 20% population of the World. It is reported that first language of 360 million people is
English. A lot of significant work has been published for handwritten text for different
problems like OCR, writer identification and recognition, handwriting analysis, parkinson
disease prediction etc. The renowned datasets of the English based scripts for writer identifi-
cation are deliberated in the coming sections.

3.1.1 CEDAR

The CEDAR (Center of Excellence for Document Analysis and Recognition) [42, 166]
developed number of datasets at University of Buffalo.1 First and larger database used for
writer identification and handwritten verification was CEDAR-Letter that contains gray scale
and binary images of text by 1000 writers.

3.1.2 IAM

IAM [121] is extensively used handwritten data set for writer recognition developed by
University of Bern.2 The data-set initially included 1066 forms generated by 400 different
writers, and then extended to include 1539 forms produced by 657 different writers The data
set contains detailed information about the writer identity, the ground truth text and the
segmentation at the line, sentence and word levels. It contains 13,353 labeled text lines of
variable content with approximate of 14 text lines per writer [79]. 60% of the text lines were
used as the reference base and 40% used for testing the performance correspondingly.
Abundant researches in writer identification and verification was performed using IAM
data-set [11, 26, 27, 32, 35, 74, 79, 92, 145, 146, 149, 150, 160–162].

1
http://www.buffalo.edu/
2
http://www.inf.unibe.ch/
Multimedia Tools and Applications (2019) 78:10889–10931 10895

3.2 Arabic datasets

Arabic is a central sematic language that is sixth most spoken language in the World.
420 million people of the World speak Arabic. Due to the complexity of the Arabic
script, researchers paid great attention towards Arabic script for the image processing,
pattern recognition and document analysis tasks. Different datasets in Arabic script
deployed for handwritten recognition and writer recognition are elaborated in the
succeeding sections.

3.2.1 KHATT

KHATT [18, 116] developed by research groups from KFUPM Saudi Arabia,3 TU Braun-
schweig, Germany and TUD ortmund, Germany.4 It consists of Arabic handwritten documents
from 1000 writers. Each writer wrote 6 paragraphs with the approximate of 2000 random and
fixed paragraphs and free paragraphs.

3.2.2 AHDB

AHDB stands for Arabic Handwritten Database [10] is widely used in handwritten text
recognition and writer recognition of Arabic script. The data-set comprises of most popular
written Arabic words and text of 105 writers. The data-set contains approximate of 10,000
words for Arabic cheque processing.

3.2.3 IFN/ENIT

Another data-set used for handwriting recognition as well as writer identification is

IFN/ENIT dataset [59, 134] developed by the Institute of Communications Technology
(IFN)5 and the Ecole Nationale dIngenieurs de Tunis (ENIT).6 Data-set consists of
937 names of towns and cities filled by 411 writers. It formed 26,459 images and
more than 210,000 characters. Each writer filled 5 pages, and each page had 12 city
names. Each name is coded with the ground truth information of style, sequence of
character shapes and the baseline. This data-set is extensively used by more than 100
research groups from more than 30 countries for Arabic handwritten text recognition
and in several global competitions [60, 117–120]. It is also employed in writer
identification of Arabic text in [3, 38, 79].

3.2.4 Al Isra

Al Isra [104] was developed by 500 writers. It contains 37,000 words, 10,000 digits, 2500
signatures, 500 sentences that are most popular in Arabic.

3
http://www.kfupm.edu.sa/default.aspx
4
https://www.tu-braunschw
5
https://www.ifn.ing.tu-bs.de/en/ifn/
6
http://www.enit.rnu.tn/
10896 Multimedia Tools and Applications (2019) 78:10889–10931

3.2.5 MADCAT

Multilingual Automatic Document Classification Analysis and Translation (MADCAT)

[169] is a five year program of US Defense Advanced Research Projects Agency
(DARPA) that releases MADCAT Phase 1 Training corpus, an Arabic dataset available
publicly. This dataset comprises 9693 pages written by the amalgamation of around 400
writers.

3.3 Western languages dataset

There are some datasets in western languages like French, Dutch etc. used for handwritten
recognition and writer identification. Few of them are:

3.3.1 CDER-CRDROM2

CEDAR have three main dataset one of them is CDER-CRDROM7 database that contains
machine-printed Japanese character images.

3.3.2 Firemaker dataset

Firemaker dataset [154] is a Dutch script data-set used for writer identification. 252
Dutch student’s handwritings in 1008 scanned pages were acquired to develop the
dataset. Each student write 4 pages in such a way first page cover 5 paragraphs with
normal handwriting. Second page consist of two paragraphs using uppercase text, third
page contains unnatural and forged handwriting and the fourth page contains description
about a given cartoon that writer discuss in written form using his own words. Thus we
can say generally that first and fourth pages are used for writer identification.

3.3.3 RIMES dataset

Another relatively different dataset in writer recognition is RIMES (Recon- naissance

et Indexation de donnes Manuscrites et de fac similes) [76]. It is a French script data-
set that comprises handwritten letters in French text representing the mails sent by
people to companies or administrations. The data was collected by more than 1300
writers that wrote 5 letters, making a total of 5600 letters in more than 12,000 pages
with annotated, as well as, secondary databases of characters, handwritten words of
300,000 snippets and logos.

3.4 Multilingual datasets

Multilingual means more than one language. In the domain of writer identification there are
several dataset that contains more than one language that provide reliability, affectivity and
consistency to true the hypothesis that the documents written in different scripts have same
author. We are presenting statistic of dataset for writer identification.

7
http://www.cedar.buffalo.edu/Databases/JOCR/.
Multimedia Tools and Applications (2019) 78:10889–10931 10897

3.4.1 ICDAR2013

In ICDAR2013 [82], data-set used for competition, is collected from 475 writers on 4
handwritten documents in both English and Arabic script as similar to QUWI dataset. First
and second page had Arabic handwritten text while third and fourth contains English samples.

3.4.2 CVL

CVL [105] consist of both English and German (1 in German and 6 in English) handwritten
text of 311 different writers. 7 documents of 27 writers are used for training and 284 writers of
5 documents are used for testing set.

3.4.3 QUWI

QUWI [7] consist of handwritten text gathered from 1017 writers in both Arabic and
English scripts. Each individual were asked to write 4 pages. Furthermore, data-set
contains 4068 digitized pages, approximately 60,000 words written in Arabic for text-
independent analysis and more than 100,000 Arabic words for text-dependent analysis
and same statistics analysis for English script. The first page contains approximately 6
handwritten lines in the Arabic language. The second page contains an Arabic text of 3
paragraphs with the average of 11 lines. Similarly, the third page contains about 6
handwritten lines in English and the fourth page contains 14 English text lines approx-
imately. The first and the third pages are to be used for text-independent writer identi-
fication tasks, whereas the second and fourth pages are to be used for text-dependent
writer identification tasks. This bilingual dataset had been used in writer recognition
researches in [11, 28, 54, 55].

3.5 Other datasets

However there are some datasets that encompasses digits, ZIP codes and musical sheets etc.
that can be used to identify the writer. Few of them are:

3.5.1 CEDAR-CDROM1 dataset

One of the CEDAR dataset is CEDAR-CDROM111 data-set that comprises alphabetic char-
acters, digits, ZIP Codes, and handwritten words.

3.5.2 MNIST

MNIST [124] is the numeric data-set that contains 60,000 numbers documents for training and
10,000 for testing written by 250 writers.

3.5.3 CVC-MUSCIMA

CVC-MUSCIMA stands for Computer Vision Center-MUsic SCore IMAges [67] is another
appealing and different data-set that consist of music scores that serves for the identification of
musicians. It encompasses 1000 music sheets written by 50 different musicians per 20 pages.
10898 Multimedia Tools and Applications (2019) 78:10889–10931

Each document has 1000 original images and 11,000 distorted images. Table 1 summarizes the
databases for handwriting analysis and writer identification tasks for different languages.
After having introduced with the data sets, we will now move on to the framework of
writer identification along with pre-processing, feature extraction and classifications in the
following sections.

4 State of art

The work started back to appear [106] on handwriting analysis, writer identification and
verification with the turn of the year. The subject has gauge the interest of researchers in the
field of pattern recognition that can prove from the fact of dedication and publication of
number of thesis [29, 40, 160] in this domain. Then reported significant related surveys.
Finally, this section classify the works related to steps of writer identification pipeline, reported
for handwritten and online systems.
In 1989, a pioneer review presented in [136] to address the static and dynamic techniques of
signature verification, handwritten recognition and writer identification comprehensively. Very
limited articles were published on that time and this was very concise survey for these
problems. Plamondo et al. [137] presented a comprehensive review on handwritten recognition
based on online and offline methods in 2000. In 2007, Schomaker [152] explained widely the
nature of handwriting along with the texture and allographic features in detail with the
summarized results.
Another review paper was presented in 2009 by Bin-Abdl and Hashim in [29]. Sreeraj
and Idicula [95] presented a review paper for writer identification in different languages
like English, Arabic, Chinese, Persian etc. along with the depiction of features. One
limitation in this paper was found that the renowned features like SIFT, HOG, SURF and
CNN based features etc. was not discussed. In 2012, Awaida [20] presented the state of
art of writer identification and verification of Arabic script. This paper comprehensively
explained databases with research groups, feature extraction techniques and different
classification approaches. Only minimum distance classifiers and statistical classifiers
were discussed. Another survey paper published in [6] by Ahmed and Sulong that
covered the characteristics of Arabic writing, datasets, local and global features used in
literature of writer identification.
The above surveys summarized the state of the art till 2014 in the field of writer identifi-
cation. We are going to update partially the writer identification according to pipe-line of its
stages. We explain different feature extraction techniques along with deep learning based
features and classification approaches thoroughly since 2014 in coming sections.

4.1 Preprocessing

Preprocessing is the data cleaning stage in which irrelevant information is removed from the
data. In this phase, binarization, normalization and noise removal are applied on handwritten
samples using image processing techniques. Furthermore, segmentation is also performed
according to the research domain problem at letter, word, and sentence or paragraph level.
Different preprocessing techniques were applied by the researchers in the territory of writer
identification using English script datasets. Said et al. [141, 142] generated uniform blocks of
text in preprocessing using word de-skewing, text padding and lines or words distance.
Multimedia Tools and Applications (2019) 78:10889–10931
Table 1 Writer identification datasets

Database Language Writers Description Availability

CEDAR-Letter [41, 166] English 1000 Concise database of 156 words Public
IAM [122] English 400 1066 documents, 9285 text lines, 82,227 words Public
MIAM [33, 122] English 657 Extended of IAM dataset. 1539 document pages Public
4881 lines, 4,3751 word instances
KHATT [117] Arabic 1000 1000 handwritten documents forms, 2000 Public
paragraphs (random and fixed),9327 lines
Free Paragraphs
IFN/ENIT [59, 135] Arabic 411 2200 documents, 26,459 names with 212,211 Public
characters, 115,585 connected part of Arabic
words
AHDB [10] Arabic 105 10,000 words for Arabic cheque processing Public
Al Isra [105] Arabic 500 500 sentences, 37,000 words, 10,000 digits, Public
2500 signatures
MADCAT Arabic 400 9693 handwritten Pages Public
Phase 1 Training corpus [172]
AD/MADBase [61] Arabic 700 60,000 digits for training, 10,000 digits for On Request
testing
WAHD [1] Arabic 302 353 manuscripts, 43,976 pages Public
Alamri et al. [13] Arabic 328 46,800 digits, 13,439 numerical strings, 21,426 Public
letters, 11,375 words, 1640 special symbols
CDER-CRDROM2 Japanese Developed from books, 400 binary images, 180,000 symbolic characters, paid ($1500)
Japanese Character Image Database journal, magazine etc. 3300 characters
Firemaker dataset [156] Dutch 252 1008 scanned pages Public
RIMES [77] French 1300 12,000 pages, 5600 real mails/letters, 300,000 Public
snippets and logos
ICDAR2013 [82] Arabic 475 1900 documents On Request
English
CVL- Database German, English 311 311 documents, 7 handwritten texts, 1 in Public
German and 6 in English. 1,01,069 words
QUWI [13] Arabic 1017 5085 documents, 4068 digitized pages, 60,000 On request
English Arabic words and 100,000 English words for

10899
text independent analysis
10900
Table 1 (continued)

Database Language Writers Description Availability

CEDAR- CDROM1 English developed from city, state 184,68 images, 5632 city words, 4938 state Proprietary ($950 CD-1)
words and zip codes words, 9454 ZIP codes, 27,837 mixed
alphabets and numeric segmented from address
blocks 21,179 digits segmented from ZIP
Codes
MNIST [125] Numbers 250 60,000 number documents for training, 10,000 Public
number documents for testing
CVC- MUSCTMA [67] Musical notes 50 1000 music sheets, 1000 original images, Public
11000distorted images

KHATT KFUPM Handwritten Arabic Text; CVL Computer Vision Laboratory; AHDB Arabic Handwritten Data Base; QUWI Qatar University Writer Identificationdataset; IFN/ENIT
Institute of Communications Technology/Ecole Nationale dIngenieurs de Tunis; ICDAR International Conference on Document Analysis and Recognition; CEDAR Center of
Excellence for Document Analysis and Recognition; IAM Institut für Informatik und angewandte Mathematik

Multimedia Tools and Applications (2019) 78:10889–10931

Multimedia Tools and Applications (2019) 78:10889–10931 10901

Bensefia et al. [26] used segmentation approach. They firstly extracted connected compo-
nents from images to remove irrelevant details. Then the words were characterized. Siddiqi
and Vincent [161] applied global thresholding for binarization of images. Handwriting samples
were divided into sub-images by employing horizontal and vertical window positioning.
Schlapbach and Bunke in [149] performed normalization, vertical scaling and slant correction
operations in pre-processing stage. Pandey and Seeja [133] imposed the Otsu method for
binary image conversion from gray scale images and removal of undesired text from images.
They thinned the images using Zhang-Suen thinning algorithm.
Literature reveals that there are many languages like Farsi and Bengali etc. that was
preprocessed by various image processing techniques. Shahabi and Rahmati [156] extracted
horizontal projection profiling and then employed low pass Gaussian filter for smoothing. The
smoothed profile peak value gave space between text lines. Then vertical projection was
computed on the binarized images. This gave space between characters and words. Also
padding was applied to remove blank spaces.
Sheikh and Khotanlou [158] applied morphological operations like opening and closing for
the creation of binary images in pre-processing phase. Adak et al. [4] applied connected
component labeling algorithm to label the pages. They removed the noise and non-text
components like dots, commas etc. Furthermore, they employed 2D Gaussian filter-based
technique for lines and word segmentation. Water reservoir principle-based method was used
for segmentation of character level.
There are many techniques deployed in multilingual dataset. Fiel and Sablatnig [66]
performed binarization using global threshold, text line segmentation using local projec-
tion prole, skew of the text line, sliding windows with a step size of 20 pixels in pre-
processing phase. Christlein et al. [49] improved performance by normalization using
ZCA whitening with the KL-Kernel. In [51], script contours were extracted from
binarized images using connected component analysis. They extracted 32 × 32 image
patches from random script contours.
Wu et al. [188] performed word segmentation using LoG filter and line segmentation using
Hough transform in pre-processing. Khan et al. [103] improved the robustness of noise and
blur falsification using Discrete Cosine Transform (DCT) coefficients. Ahmed et al. [5]
performed binarization, detection of connected components and removal of punctuation marks
in pre-processing. Chahi et al. [43] binarized the images using Otsu algorithm and transfor-
mation using Kronecker delta function.
Xing and Qiao in [189] and [190] used the patch scanning strategies to feed the input image
patches of 113,113. They employed data augmentation for the better performance of deep
writer. Yang et al. [193] significantly improved the performance with data augmentation
techniques and DropStroke.

4.2 Features extraction

Feature extraction is the process of the conversion of an input image into vector
comprises of numerical values [6]. Features are also called attribute, variable, dimension,
descriptors which are much more lower numerical values as compare to original image
that reduce the overhead for feeding input features vector to the machine learning
models. It also reduces the training time of the model. Best features help in recognition
and identification of optimal object. In the domain of writer identification, researchers
used different types of features and increase the performance of the model.
10902 Multimedia Tools and Applications (2019) 78:10889–10931

Depending on the online or offline approaches, there are three categories of features, like,
Statistical Features, Structural Features, Model based Features/Automatic features.There is a
brief review of contribution of researchers for extraction of different types of features in the
following subsections.

4.2.1 Statistical features

The statistical features are statistical and geometric measurements for classification of
relevant information for reducing the gap among difference classes. It is subdivided into
global features and local features. The global features describe the global traits of entire
image. It represents texture features, contour representations, and shape descriptors in the
entire image. Some example of global features are Invariant Moments like Hu, Zerinke,
Shape Matrices like perimeter, area, compactness etc., texture matrices like local binary
patterns, Histogram Oriented Gradients. The global features work well when there is a
single object in an image and there is enough contrast between foreground and back-
ground. It is also pron to error due to occlusion and clutter. The Local features describe
the key points in the patches of image. It computes robust and salient features from
multiple interest point in neighborhood. These features represent the salient shapes,
texture and key points in a patch of an image patch. Some examples are scale-
invariant feature transform (SIFT), Speed up Robust Feature (SURF), Binary Robust
Invariant Scalable Key points (BRISK), MSER, LBP, and FREAK. The grouping of local
and global features will increase the accuracy but it also increase the computational time
of the system. We now present the present work for local and global statistical features
employed for writer identification.
Global features that operate on paragraph level in offline handwritten samples are Code-
book generation [27, 153, 155], Gabor, Directional features [33, 38, 39], GSCM [141, 142,
173], GGD, Contourlet GGD [85, 87]. In writer identification sometimes text line database is
used. In this case text line is an input unit to extract the features for text independent analysis.
Some of the line features were grapheme based features [126], connected components enclose
regions, lower and upper profiles, Fractal features [93, 122]. Several features were extracted at
Word level from handwriting samples using offline approach. Word level features employed
for writer identification were Edge based directional features [22, 39], Morphological features
[203], GSC, WMR, SC, SCON [179, 197].
There are certain writer identification system that operate on character level. Such
systems used individual character to segregate a writer from an- others. These systems
either character data-set or segmented the data-set into individual characters. In this
scheme, character features were used. Numerous character features for writer identi-
fication were presented in the literature. Few of them are Height, Area, and Slant
[110], HMM features [149], Directional features [183], Fuzzy directional features and
fuzzy learning vector quantization (FLVQ) [23], GSC features [167, 195]. Several
types of statistical features were extracted from English dataset. Among them one of
the widely used statistical feature is based on Gabor filtering. Gabor filter is defined
as the multiplication of a plane wave for 2D Gabor filters with Gaussian function.

!
x0 2 þ γ2 y0 2
0
x
g ðx; y; λ; θ; ψ; σ; γÞ ¼ exp − sin 2π þ ψ ð1Þ
2σ2 λ
Multimedia Tools and Applications (2019) 78:10889–10931 10903

where, λ represents the wavelength of the sinusoidal factor, θ represents the orienta-
tion of the normal to the parallel stripes of a Gabor function, ψ is the phase offset, σ
is the sigma/standard deviation of the Gaussian envelope γ is the spatial aspect ratio.
0
x ¼ xcosθ þ ysinθ ð2Þ
0
y ¼ −xsinθ þ ycosθ ð3Þ

A two-dimensional (2D) Gabor filter can be represented by means of the subsequent equation
inside the spatial domain.

hðx; yÞ ¼ gðx; yÞe−2πjðuo xþvo yÞ ð4Þ

Where g(x,y) is the Gaussian function given by:

g ðx; yÞ ¼ e1=2 x2 þ y2 =σ2

ð5Þ

Said et al. [141, 142] performed multichannel Gabor filtering and Gray scale co-occurrence
matrices on 25 samples per writer. They also concluded that Gabor filtering gave promising
performance as compare to gray scale co-occurrence matrix. Same methodology employed
machine print documents for script [177] and font identification [201]. Siddiqi and Vincent
[161] applied Gabor filter on IAM data-set using 100 writing samples.
Shahabi and Rahmati [156] developed a method for Farsi text independent script. Gabor-
energy and moments approaches used for feature extraction from 40 samples. 48 features
extracted using Gabor filtering and transformation. 4 texture images blocks used for training
and 2 for the testing of system. Ubul et al. [180] proposed a technique of feature selection for
the identification of writer in Uyghur. They also combined this technique with Gabor features.
Extended Gabor features are obtained by the way of modulating a 2D circular sinusoid with
2D Guassian given by:

r x x2 þ r y y2
2
x þ y2

xg x; y; θ; rx ; ry ¼ exp − sin θ: ð6Þ
σ2 rx þ r y

Extended Gabor features employed by Helli and Moghaddam [90–92]. They applied extend
Gabor model for the feature extraction of Persian script. They represent the strength of image
as sum of all pixel values of image and extract 2d extended Gabor features. Gabor wavelet
features were calculated by He et al. in [88, 89].
One of the popular statistical is Contour Direction and Hinge Features proposed by Bulacu
and Schomaker [37].
Contour is extracted by the following formula:
n o
Contouri ¼ p j j j ≤M i ; pi ¼ pM i ð7Þ

Along the contour of writing stroke, the perspective angle (φ) histogram is generated and
normalized into a probability distribution pf(φ). From the horizontal direction, the angle (φ) is
calculated as
ykþ∈ −yk
φ ¼ tan−1 ð8Þ
xkþ∈ −xk

Where ϵ represents the thickness of the stroke.

10904 Multimedia Tools and Applications (2019) 78:10889–10931

Siddiqi and Vincent [162] identified writer of handwritten document by extracting contours,
writer specific features using local approach and code-book generation. They used the application
of Gabor filtering for feature extraction. Furthermore, in [164] they extracted histograms of the
chain code, first and second order differential chain code. Al-Maadeed et al. [12] extracted edge-
based directional probability distributions features like height, area, length, and three edge-
direction distributions with different sizes. Adak et al. [4] extracted handcrafted features like
micro-macro features, contour direction and hinge features, direction and curvature features.
Schlapbach and Bunke in [149] develop HMM recognizer by extracting 3 global features
like black pixels in window, the second order moment and the center of gravity while 6 local
features like upper and lower pixels positions and orientations, black pixels fraction and the
transitions in window with a total of 9 parameters were extracted. They developed feature
vector of 9 dimensions for training using Baum Welch algorithm. Hassaine et al. [82]
characterized English writing using a set of geometrical features like directions, curvatures,
tortuosity, chain code and edge based directional features to identify the writer.
One form of statistical features in frequency domain is gradient features. Let f(x, y) is the
grayscale level of point (x, y), the horizontal and vertical grayscale gradient are derived as:

dx ¼ f ðx þ 1; y þ 1Þ þ 2f ðx þ 1; yÞ

þ f ðx þ 1; y−1Þ−f ðx−1; y−1Þ−2f ðx−1; yÞ− f ðx−1; y þ 1Þ ð9Þ

dy
direction ¼ tan−1 ð10Þ
dx
Another feature approach for writer identification is gradient features in frequency domain by
Ram and Moghaddam in [125]. They used direction information and interval ranges for
Persian script in order to extract important discriminative ranges intervals for features.
Chanda et al. [44] conducted experiment on Bengali script and quantified into sixteen
directions. In this way they extracted gradient based direction features. Also they quantified
four directions and computed chain code based direction features. Same methodology was
employed by Awaida and Mahmoud in [21] for feature extraction. Kumar and Kaur [109]
computed directional features and then applied PCA for dimension reduction. They used slant
of handwritten samples, skew, pixel distribution like horizontal profiling, curvature, and
entropy calculated using image processing techniques. These features were selected using
Fishers Linear Discriminant Analysis.
Srihari et al. [166] extracted micro and macro features with large number of parameters.
Micro features that were extracted was Gradient Structural and Concavity (GSC) features
while macro features contained height, slant, gray-level entropy and threshold, number of text
pixels, number of slope components, paragraph aspect ratio and indentation, word length, and
zone ratio. They reported that micro features increase the identification rate at 80%.
Arazi [16, 17] computed Grey scale histogram for extracting specific features like first
indented letter from handwritten sample and external features like margin etc.
Zimmerman and Varady [202] calculated run length coding to extract features. Another
approach for mining features in Chinese script is edge-hinge features presented by Wen
et al. in [185]. They employed generalized GMF of edge structure coding (ESC) for the
distributing edge fragments on multiple scales.
Scale Invariant Feature Transform (SIFT) [114] is used to detect and describe local features
in images. It worked on four majior steps. First step is the detection of scale-space extrema.
Multimedia Tools and Applications (2019) 78:10889–10931 10905

The scale space of an image is defined as a function

Where, Lðx; y; σÞ ¼ Gðx; y; σÞI ðx; yÞ ð11Þ

1 −ðx2 þy2 Þ=2σ2

Gðx; y; σÞ ¼ e ð12Þ
2πσ2

Second step is key point localization that is interpolation using the quadratic Taylor expansion
of the Difference-of-Gaussian scale-space function given by,

∂DT 1 ∂2 D
Dðx; y; σÞ ¼ D þ x þ xT 2 x ð13Þ
∂Dx 2 ∂x

Third step is orientation assignment in which gradient and orientation is computed as

mðx; yÞ ¼ ðLðx þ 1; yÞ−Lðx−1; yÞÞ2 þ ðLðx; y þ 1Þ−Lðx; y−1ÞÞ2 ð14Þ

θðx; yÞ ¼ atan2ðLðx; y þ 1Þ−Lðx; y−1Þ; Lðx þ 1; yÞ−Lðx−1; yÞÞ ð15Þ

Finally the feature vector is created from each key point.

Woodard et al. [186] extracted local features using quantized SIFT for recognizing writer.
Tang et al. [175] combined SIFT features and triangular features to develop a writer identifi-
cation system that has improved structural features. Hu et al. [94] described the writing style by
encoding SIFT using two coding strategies: locality constrained linear coding and fisher kernel
coding. Fecker et al. [63] conducted experiment for historical Arabic writer identification by
computing SIFT features.
Statistical techniques were extracted from multilingual datasets. Wu et al. [188]
extract the SIFT descriptor, scale and orientation features and generate a codebook
using hierarchical Kohonen SOM clustering. Xiong et al. [190] accompanied perfor-
mance on ICFHR2012-Latin, ICDAR2013 by extracting SIFT descriptor and contour
directional features. They applied K-means clustering for code book generation,
occurrence histogram of SIFT descriptor were extracted. Fiel and Sablatnig [64]
extracted the local features SIFT from the English and Dutch languages. Another
attempt is [65] to retrieve and identify the writers using SIFT and bag of word
features. They generated a code-book and calculate occurrence histogram of clusters.
A major contribution by Christlein et al. [47], in 2014 was encoding of features with GMM
super vectors. They extracted Root SIFT features extracted into super vectors. Universal
Background Model (UBM) was created by estimating a GMM from a set of SIFT descriptors.
They applied normalization by computing square root element wise and then l2 normalized.
Same technique of GMM super vector encoding was employed in [49]. In [50], they used
VLAD encoding for SIFT descriptors and global descriptors generation. Another attempt by
the same author was in [48] to increase the recognition rate by extracting Root SIFT features
from the boundary edges of handwriting. They created the feature vector from the GMM
parameters represented by λ = {ωk, μk,Pk|k = 1....K}, given as:
K
pðxjλÞ ¼ ∑ ωkgk ðxÞ ð16Þ
k¼0
10906 Multimedia Tools and Applications (2019) 78:10889–10931

Where gk is the guassian function is given by means of:

1 T
gk ðxÞ ¼ g ðx; μk; ∑K Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ϵ−1=2ðx−μk Þ ∑k −1 ð17Þ
ð2πÞD j∑k

One of the inspired feature of SIFT is Oriented Basic Image Feature Columns (oBIF Columns)
that has been extensively used in character recognition [131] and texture recognition [178].
Newell and Griffin [132] computed oBIF Columns and improved texture-based scheme by
encoding a writer’s style. Another type of local features for statistical analysis is SURF [24]
that is used for writer identification by Sharma and Dhaka in [157].
Garg et al. [69] worked on Punjabi script Gurmukhi. They extracted different statistical
features like transition, zoning, horizontal peak and vertical peak value, diagonals, centroid,
end points and intersection, curve fitting of parabola and power based. They used the
technique of Kumar et al. [108] to extract the features.
Khan et al. [103] extracted features using Discrete Cosine Transform (DCT) and sliding
window. DCT generated thousands of features that were randomly selected using three clustering
techniques. They applied k-means, one dimensional SOM (Self Organizing Map) and two
dimensional SOM for clustering of features to generate code-book of each sample. They
generated a final feature vector by producing normalized histogram of co-occurrences. They
imposed bagging technique for re-sampling as random selected features caused misclassification
problem. The DCT transforms a block of pixels b of length N1 x N2 into a matrix of actual
numbers as:

2

N 1 −1 N 2 −1 uπ vπ
Bðu; vÞ ¼ C ðuÞC ðvÞ ∑ ∑ cos ði þ 0:5Þ cos ð j þ 0:5Þ bði; jÞ ð18Þ
N 1N 2 i¼0 j¼0 N1 N2

Venugopal and Sundaram [182] worked on online territory to extract the codebook
using a renowned encoding scheme Vector of Local Aggregate descriptor (VLAD).
Some features are specific according to the domain of online writer identification like
pressure of pen, Azimuth, velocity [77], Velocity bary center [177],altitude, direction of
writing, Correlation between length [45], Continuous Dynamic programming [62, 100] and
input of pen (pen down or pen up.The values record if pen is up also) [127]. The aforesaid
features are related to the writing associated with the document. However, online writing is
also allied with paragraphs. Paragraph features that were used by the researchers were Code-
book generation [98, 128], stroke based, and point based [151].
Zhang et al. [198] used random hybrid stroke features like movement of pen tip in x and y
directions, status of pen in up and down position for online domain. Gargouri et al. [70]
extracted dynamic features like strokes, spaces between strokes and words, and points from
Arabic script ADAB database. Table 2 summarize the statistical local and global features.

4.2.2 Structural features

The structural features represent the local structure and topology of characters or writing such
as Edges, loops, dots and diacritics, vertical and horizontal lines, start and end point, direction
of writing, thickness or thinness of strokes and corners etc.
There are many structural features [110, 135, 171] like graphemes, fragments, strokes etc.
extracted from handwritten samples. Grapheme is a small segmented handwriting. Inside the
Multimedia Tools and Applications (2019) 78:10889–10931 10907

Table 2 Review of writer identification systems using statistical local and global features

Reference Language Features

Garg et al. [69] Panjabi (Gurmukhi) transition, zoning, horizontal peak

and vertical peak value, diagonals,
centroid, end points and intersection,
curve fitting of parabola and power based
Khan et al. [103] English Discrete Cosine Transform (DCT) based features
German
Arabic
Christlein et al. [47, 50] English,Arabic, RootSIFT encoded by UBF and GMM super
German, Greek vectors
Adak et al. [4] Bengali Contour Direction and Hinge Features,
Direction and Curvature Features at Keypoints
Kumar and Kaur [108] English directional features, slant, skew, pixel distribution
like horizontal profiling, curvature, and entropy
Venugopal and Sundaram [182] English Vector of Local Aggregate descriptor (VLAD)
Zhang et al. [198] English, Chinese movement of pen tip in x and y directions,
status of pen in up and down position
Sharma and Dhaka [157] English, German, SURF
French, Greek
Xiong et al. [190] English, Greek SIFT descriptor and contour directional feature
Kumar et al. [107] Panjabi(Gurmukhi) transition, zoning, horizontal peak and vertical peak
value, diagonals, centroid, end points and
intersection,
curve fitting of parabola and power based
Wu et al. [188] English, French, SIFT descriptors, scales and orientations
German, Chinese,
Greek
Tang et al. [178] Chinese SIFT features and triangular
Newell and Griffin [132] English oBIF Column
Hu et al. [94] Chinese SIFT, Bag of features
Fecker et al. [63] Arabic SIFT
Awaida and Mahmoud [22] Arabic Gradient based direction features,
Chain code based direction features
Fiel and Sablatnig [65] English, Greek, SIFT and bag of word features
German, French
Fiel and Sablatnig [64] English, Dutch SIFT
Wen et al. [188] Chinese Edge-Hinge features
Hassaine et al. [83] English Directions, curvatures, tortuosity, chaincode
and edge based directional features
Chanda et al. [44] Bangali Gradient based direction features, Chain
code based direction features
He et al. [90] Chinese Gabor wavelet features
Woodard et al. [186] Arabic SIFT
Ram and Moghaddam [126] Persian Gradient features
Shahabi and Rahmati [156] Farsi Gabor-energy and moments features
Siddiqi and Vincent [165] English Histograms of the chain code, first and
second order differential chain code
Helli and Moghaddam [90–92] Persian Extended Gabor features
Zhang et al. [198] Chinese 2d Gabor features with dimension of mesh fractal
Al-Maadeed et al. [12] Arabic Height, area, length, and three edge-direction
distributions with different sizes.
Siddiqi and Vincent [164] English Gabor features
He et al. [86, 89] Chinese Wavelet features with GGDM
Ubul et al. [180] Uyghur Gabor features
Al-Maadeed et al. [11] Arabic Fourier spectral features
Siddiqi and Vincent [163] English Gabor features
10908 Multimedia Tools and Applications (2019) 78:10889–10931

Table 2 (continued)

Reference Language Features

Bulacu and Schomaker [37] English Contour Direction and Hinge Features
Schlapbach and Bunke [151] English black pixels in window, the second order moment
and the center of gravity, upper and lower pixels
positions and orientations, black pixels fraction
and the transitions in window
Nejad and Rahmati [130] Farsi Moment based gabor energy features
He et al. [87] Chinese 2d Gabor features, auto correlation function,
2d Gabor features & auto correlation function
Schomaker and Bulacu [154] Dutch Edge Hinge, Edge Direc- tion features
Bulacu et al. [39] English Edge Direction, Edge Hinge, Auto correlation,
entropy, Run-Length features
Shen et al. [159] English Gabor wavelet features
Zhu et al. [202] Chinese 2d Gabor features
Said et al. [141, 142] English Gabor filtering, Gray scale co-occurrence

DCT Discrete Cosine Transform; SIFT Scale Invariant Feature Transform; GMM Gaussian Mixture model; UBM
Universal Background Model; VLAD Vector of Local Aggregate descriptor; oBIF oriented Basic Image Features;
SURF Speed up Robust Feature; GGDM Generated Gaussian Density Model; GSC Gray Scale Cooccurrence

grapheme codebook creation level, graphemes are extracted from samples of handwritten
textual content. This step is performed via segmenting the handwriting text into segments of
lines and then the segmented lines are segmented into small handwriting segments. Every
segment might contain zero, one or more than one grapheme. Every handwritten file Dj is
hence described by using the set of graphemes xi is made by the following relation
D j ¼ fxi; i ≤Card ðDÞg ð19Þ

Bensefia et al. [26] segmented handwriting samples and local features like graphemes were
clustered. Feature space was created to store query and database samples. They used vector
space model for local features. Another effort by the same authors were fragmented handwrit-
ing for invariants of writer and extracted graphemes [25].
Pandey and Seeja [133] segmented the handwriting documents into graphemes features that
were represented using horizontal profile projection. They generated the codebook using k
means clustering and feature vector of 1xk is produced. Abdi and Khemakhem [2] employed
beta-elliptic model for graphemes extraction. Miller et al. [123] created graphemes from
segmented handwriting samples based on topological and geometric class framework and
then skeletonized. Durou et al. [56] welded OBIs and Grapheme features to produce a feature
vector that is reduced by PCA and mapped using k Eigen vectors.
Kumar et al. [108] extracted graphemes with the representation of Fourier and Wavelet descriptor.
Each grapheme was encoded by sparse coding using vector quantization. Graz et al. [71] computed
the relationship between strokes, junctions, endings, and loops. They extracted scale independent
descriptors like local-angle, orientation, orientation local angle distributions and multiscale
descriptors.
Local Binary Patterns (LBP) features is a unifying approach that is traditionally divergent
from statistical and structural models. The LBP maps each pixel to an integer code representing
the connection among the center pixel and its neighborhoods. It encapsulates the neighborhood
geometry at every pixel via encoding binarized differences with neighbor pixels as:
Multimedia Tools and Applications (2019) 78:10889–10931 10909

N −1
LBP ¼ ∑ sðPn ; Pc Þ*2n ð20Þ
n¼0

Wherein, pc is the relevant pixel being encoded, pn are N symmetrically and uniformly
sampled factors at the outer edge of a circular region of pc, and s(pn,pc) is a binarization
function. A broadly used binarization function s(pn,pc) is defined as:

1; pn≥pc
sðpn ; pc Þ ¼ ð21Þ
0; pn≤pc

Bertolini et al. [27] computed Local Binary Pattern (LBP) and Local Phase Quantization
(LPQ) descriptors. Hannad and Siddiqi [78] employed LBP as a texture descriptor and
the excessive discriminative feature of handwritten fragments to enhance the overall
performance of Arabic writer identification. Another contribution was in [80] in which
handwritten text is divided into small fragments as a texture. They evaluated the
effectiveness of LBP, LPQ and local ternary patterns (LTP).
In multiscript environment, Chahi et al. [43] calculated Block Wise Local Binary
Count (BW-LBC) features in such a way to compute the histogram from connected
components and then find co-occurrence distribution function. Additional structural
features were fragments that are small parts or strokes rather than complex shape
representation of writing. Ghiasi and Safabakhsh [74] used minor connected components
to extract fragments. Alternative structural feature is allographs that is calculated by
Bulacu and Schomaker in [34, 36, 153].
However Jain and Doermann [97] represented handwriting in the shape of segments
using k adjacent technique. They also calculated contour gradient descriptors (CGD).
Another method to detect the junctions of stroke fragments was presented by He et al. in
[84]. They applied probability distribution and calculated junctions that served as
features. Another attempt by the same author was [83] to extract run-lengths features
of local binary pattern and Cloud of Line Distribution (COLD) features.
Siddiqi and Vincent [163, 165] divided the words into small sub-pixels which best contain a
part of stroke or fragment. They applied the sub-pixels to symbolize the redundant styles
which are unique to specific writer. Ahmed et al. [5] deployed optimum features. They
computed contour detection from the connected components and extracted fragment code
from it. Along with it they also calculated ending strokes form handwriting.
Another way to identify writer was to combine textural and allographic features
presented by Bulacu and Schomaker in [38] on Arabic Script. A probability distribution
function was generated from the extracted texture features while 400 allographs were
clustered to generate code-book. Another approach by the same author is the simple
definition of this method in [39]. They retrieved the best performance by employing edge
hinge features.
We summarize and compare the different structural features deployed by different re-
searchers in the literature in Table 3.

4.2.3 Model based automatic features

The Machine model based or automatic features extract by specific models automatically from
the raw data of the image directly. Deep learning is based on learning data representation and
10910 Multimedia Tools and Applications (2019) 78:10889–10931

Table 3 Review of writer identification systems using structural features

Reference Language Features

Pandey and Seeja [134] English Graphemes

Chahi et al. [43] English, Arabic, German Block Wise Local Binary
Count (BW-LBC) features
Durou et al. [56] English, Arabic OBIs and Graphemes
He et al. [84] English, Chinese, Dutch LBP and COLD features
Miller et al. [124] English Graphemes
Ahmed et al. [5] English, Arabic, Kurdish, Fragments
German, Greek and French
Bertolini et al. [27] English, Arabic LBP and LPQ descriptors
Hannad and Siddiqi [81] Arabic Fragments, LBP, LPQ and LTP
Garz et al. [71] English p(Is, Iθ), p(IBOS)
Hannad and Siddiqi [79] Arabic LBP
Abdi and Khemakhem [2] Arabic Graphemes
He et al. [85] English, Chinese Junctions of stroke fragments
Kumar et al. [109] English Graphemes
Ghiasi and Safabakhsh [75] English Contour fragment features
Tang et al. [179] English Contour pattern features,
Stroke fragment features
Jain and Doermann [98] English, Arabic Segments, Contour Gradient
Descriptors (CGD)
Siddiqi and Vincent [167, 165] English Strokes, Fragments
Bulacu and Schomaker [34, 36] English Allographs
Schlapbach et al. [150] English Stroke based text line features
Bensefia et al. [25, 27] English Graphemes

BW-LBC Block Wise Local Binary Count; OBIs Oriented Basic Images; COLD Cloud Of Line Distribution; LPQ
Local Phase Quantization; LTP Local Ternary Patterns; CGD Contour Gradient Descriptors; LBP Local Binary
Pattern

has the ability to learn from data without explicitly programmed using statistical approaches
and algorithms. This type features extraction need enormous samples of images for training the
model. Some examples are Convolutional Neural Network (CNN), Recurrent Neural Network
(RNN), Extend Learning Model (ELM) and some other deep machine learning models based
features.
As compare to hand-designed features and structural features, automatic features learned by
deep model usually show higher performance because more data-adaptive information can be
exploited in the learned features. Thus we can say that automatic features are effective to
provide better recognition rate.

Fig. 6 CNN based feature extraction

Multimedia Tools and Applications (2019) 78:10889–10931 10911

Convolutional Neural Network consists of multiple layers like input, convolution, Relu,
pooling, fully connected and softmax layers. There are two ways to extract the automatic
features from CNN depicted in Fig. 6. One can also choose the features from the convolutional
layers. ConvNet features are more generic in early layers and more original-dataset-specific in
later layers. Second approach is to cutt off the last layer of the CNN. This layer is basically
does the labeling of the input data. The output of the neurons of the second last fully connected
layer are used as feature vector. This vector is then used for the distance measurement between
two different document images to describe the similarity of the handwriting.
CNN has employed in the field of text recognition by Wang et al. [184]. However, to the
best of our knowledge CNNs have rare used for writer identification so far. A reason might be
that typically the training and test sets of most datasets are disjoint making it impossible to
train a CNN for classification. We here present the related work of writer identification using
deep learning approach.
Deep learning techniques was first introduced for writer recognition by Fiel and Sablatnig
[66]. They employed eight layer CNN and extract the features from fully connected or penulti-
mate layer. These CNN based activation features were served as feature vector. Xing and Qiao in
[189] introduced CNN based approach called as deep writer multi-stream to learn features. They
learn the automatic features from the last fully connected layer that is FC7.
Christlein et al. [49] calculated local descriptors using activation features from CNN.
Another effort by the same author in [50] was unsupervised learning. They computed CNN
activation features from image patches of 3232. In [51] CNN activation features were extracted
from LeNet and ResNet CNN models. They used CNN activation features as local features by
encoding with VLAD.
Nasuno and Arai [129] employed AlexNet CNN to extract activation features from trained
90 words of Japanese. Yang et al. [193] calculated automated features from CNN and named
as path-signature features. The comparative summary is given in Table 4.

4.3 Classification

After extracting the features, classification is performed to classify the target classes in pattern
recognition problems. Different approaches used to identify, compare and classify the writer.
The objective is to matching the features of query document image with the pre-stored
knowledge base features for the sake of authenticity and identification of writer using large

Table 4 Review of writer identification systems using model based or automatic features

Reference Language Features

Christlein et al. [51] English, Greek, German, Arabic LeNet, ResNet

Xing and Qiao [189] English CNN based features
Christlein et al. [50] German, Latin, and French Deep Residual Network
(ResNet) based learned features
Nasuno and Arai [129] Japnease Alexnet CNN based features
Christlein et al. [49] English, Arabic, German, Greek CNN activation features
Yang et al. [193] Chinese Path signature features learned from
CNN model deployed in online
domain called DeepwriterID
Fiel and Sablatnig [66] English, Greek CNN based features

ResNet Residual Network; CNN Convolutional Neural Network

10912 Multimedia Tools and Applications (2019) 78:10889–10931

number of instances of writer handwriting images in a train set. The approaches used in the
literature of writer identification are nearest neighbor, Hidden Markov Models (HMM), Cosine
similarity, Gaussian mixture model (GMM), Fourier transformation approach, Euclidean
distances, Bayesian classifiers and neural networks approaches. For the ease of understanding,
we divide the literature of classification into three categories: distance based classification,
conventional machine learning models based classification and deep learning models based
classification. Related work of each approach is illustrated in the coming sections.

4.3.1 Distance based classification

One of the simple and effective approach is to identify the writer is using distance based
classification. This approach is free of parameter and training of model. Due to the absence of
models complexity is minimum or reduced. Distance measure is applied between the query
and reference knowledge document image. Most widely used distance measures in writer
identification and recognition are Euclidean distance, Chi-square distance, Manhattan distance,
and hamming distance.
Query document matched with the references documents using different approaches.
Among them, one is Weighted Euclidean Distance (WED) defined as:

2
f n −f kn

N
WEDðk Þ ¼ ∑ 2 ð22Þ
n¼1 vkn

where fn is the nth feature of the input document, fn(k) is the sample mean vn(k) is the sample
standard deviation and n is the feature vector of writer.
Said et al. [142] employed nearest centroid classification using Weighted Euclidean
Distance (WED). They stated that WED classified better than KNN and reported the accuracy
of 96% on classifying 40 writers having 1000 test documents. Another attempt by the same
author was in [141]. In this work they tested 150 documents of 10 writer and retrieved highest
accuracy of 96%. Al-Maadeed et al. [22] worked on Arabic script by constructing an offline
handwritten database collected from 100 writers using only 16 words for experimentation.
They used 75% of the data for training and 25% for testing with 32,000 Arabic text images.
Weighted Euclidean distance was used for the classification with the 90% accuracy in Top-10.
Another distance measure is chi-square distance χ2 defined as:
2
2
n f ki −mkj
χ ¼ ∑ ð23Þ
k¼1 f ki þ mkj

where, j represents writer, i is the input, fki is the kth feature of the unknown input text i and mkj
is the mean value of the kth feature.
Shahabi and Rahmati [156] conducted experiment on handwriting samples of 40 persons.
They used weighted Euclidean distance and chi-square distance and reported the better
performance of chi-square distance on 80 features with frequency of 2.8, 4.4 the identification
rate is 97.5% on size of hit list 5.
Ahmed et al. [5] identified the writers using chi-square distance on conducting experiment
on four datasets namely KURD, ICFHR, IAM and GRDS. They achieved the identification
rate of 94.63% on KURD, 97.12% on ICFHR, 95.59% on IAM and 100% on GRDS
correspondingly.
Multimedia Tools and Applications (2019) 78:10889–10931 10913

Xiong et al. [191] accompanied performance on ICFHR2012-Latin, ICDAR2013. They

computed similarity using weighted Chi-squared distance and reported 94.0% accuracy on
ICFHR2012-Latin, 96.2% on ICDAR2013 for Greek, 94.0% on ICDAR2013 for English on
Top1. Fiel and Sablatnig [64] computed comparison with the chi-square distance for writer
retrieval and identification. They employed nearest neighbor for the identification of writer.
Computed accuracy of 93.1% on IAM, 98.9% on TrigraphSlant for writer retrieval, and 98.9%
on TrigraphSlant for writer identification was achieved.
Manhattan distance is the distance measure used to classify two handwritten images Ii and
I2 having values u = u1, u2, ...., uN and v = v1, v2, ...., vN. Manhattan distance is calculated as:
N
Dm ðu; vÞ ¼ ∑ jui −vi j ð24Þ
i¼1

One of the renowned distance measure is Euclidean distance, calculated as:

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
N
De ðu; vÞ ¼ ∑ ðui −vi Þ2 ð25Þ
i¼1

Distance based classification was consummate on Multilanguage datasets by many

researchers. Wu, Tang, and Bu [188] used Manhattan and Chi-square distances for the
comparison to identify the writer. They conducted experiment on six datasets and
reported accuracy of 98.5% on IAM, 92.4% on Fire- maker 95.4% on HIT-MW,
99.5% on ICDAR 2011, 98.0% on ICFHR2012 in Top1.
Durou et al. [56] deployed KNN using Euclidean distance, Chi-square distances
and Manhattan distances on IAM and ICFHR-2012 dataset. They retrieved 96.05%
using Euclidian distance, 73% using Chi-square distance and 80% using Manhattan
distance. Siddiqi and Vincent [164] using Euclidean, chi-square, Hamming, and
Bhattacharyya distance to identify the writer. They reported 86% accuracy on IAM
dataset and 79% on RIMES dataset in top-1.

4.3.2 Conventional machine learning models based classification

The conventional models required sufficient training samples and produced effective and
appropriate results than distance based classification. The probabilistic and statistical genera-
tive conventional models are Bayesian model, Hidden markov model (HMM) and Gaussian
mixture model (GMM), Neural networks (NN), decision tree learning, nearest neighbor
(KNN), Support Vector Machine (SVM) and random forest etc. The coming sections will
describe the related work according to each model.

Naive Bayes model Naive Bayes is the probabilistic model that operate on Bayes theorem. It
has been extensively used since 1950s.Bayesian classifier can be write using decision rules in
terms of posterior probability P(Ci-x), class conditional probability p(x-Ci), and prior proba-
bility P(Ci):

ibayes ¼ argi maxPðC i jxÞ ¼ argi maxpðxjC i ÞPðC i Þ ð26Þ

10914 Multimedia Tools and Applications (2019) 78:10889–10931

This search for a class i and maximize the probability that a pattern x belongs to class Ci. Here
class conditional probability p(x-Ci) can be assumed as Gaussian distribution for class Ci. It
can be expressed as:

1 1

I −1
PðxjC i Þ ¼ exp 1− ðx−α i Þ Conv i ðx−α i Þ ð27Þ
ð2πÞd=2 2

where αi represents the class mean. In order to perform comparison between an unknown
image I with the already documents D, similarity index is calculated as:

1 card ðI Þ
SimilarityðI; DÞ ¼ ∑ maxci∈D PðC i jxi Þ ð28Þ
Card ðI Þ j¼1

The writer of image I is the maximum similarity index of writer belonging to document D
given as:

WriterðI Þ ¼ Writer arg Di∈R maxSimilarityðI; Di Þ ð29Þ

Siddiqi and Vincent [161] employed Naive Bayes model on IAM dataset. They used 50
writing samples for training and grouped the same sub images in a cluster. Bayesian classifier
gave the identification accuracy of 94%. Another contribution by the same author was in
[163]. They conducted experiment on IAM dataset using 100 writing samples.
Naive Bayesian was used as a classifier to identify the writer. They achieved 92% accuracy
in top-1. Kamal and Rahman [99] authenticate the document of writer using Bayesian
classifier by conducting experiment on 50 documents for evaluating test. They retrieved
94% accuracy for writer identification. In multilingual environment, Zois and Anastassopoulos
[203] classified 50 group of writers using Bayesian classifier along with weighted Euclidean
distance. They achieved the identification rate of 92.48% for the English word and 92.63% for
the Greek by creating their own private dataset. Garg et al. [69] deployed naive bayes and three
other classifiers on Gurmukhi dataset of 49,000 samples written 70 persons. 70.10% accuracy
was retained using Transition features classified by naive bayes.

Hidden Markov model (HMM) Hidden Morkov Model is a statistical model in which the
system being modeled is assumed to be a Markov process with hidden states. It can be
expressed as dynamic Bayesian network. HMM works on the principle of assuming the
observation sequence is generated by hidden sequence. The joint probability of the observation
sequence x1:T and hidden state sequence z1:T is defined as:

pðz1:T ; x1:T Þ ¼ pðz1:T Þpðx1:T jz1:T Þ

T T ð30Þ
pðz1:T ; x1:T Þ ¼ pðx1 Þ ∏ p zT jzT −1 ∏ pðxT jzT Þ
t¼2 t¼1

where zt ϵ1,2,N, N is the number of states. The HMM is characterized by λ = π,

A,B.π = πi are the initial state distribution, πi = p(z1 = i),1 ≤ i ≤ N.A = aij are the state
transition distribution, aij = p(zt = j|zt1 = i),1 ≤ i,j ≤ N. B represents the observation of
model parameters.
HMM was first used in writer identification by Schlapbach and Bunke in [145, 146].
They identified 100 true classified writers using IAM dataset and achieved 96.56%
Multimedia Tools and Applications (2019) 78:10889–10931 10915

identification rate. Also verification of 2.5% EER was retrieved using 120 writers having
8600 text lines. Another development of HMM recognizer for each writer by the same
author was presented in [149]. They used 100 writing samples from IAM database and
developed feature vector of 9 dimensions for training using Baum Welch algorithm.
Using this approach they acquired 96.5% accuracy. Another attempt by the same author
[147] was to test the identification rate on HMM based writer identification system.
93.13% accuracy is achieved by default.
HMM was also deployed in online writer identification domain by Wu et al. [187] on IAM
On-line English Hand- written Text Database (IAM-OnDB). They concluded that HMM gave
better results than traditional GMM systems while conducting experiments on the level of
paragraph and line.
Hidden Markov Tree (HMT) model was employed by He et al. in [88] for offline text
independent approach using 1000 samples of Chinese handwriting. Sheikh and Khotanlou
[158] trained the HMM features using Baum Welch algorithm. They used 70 Persian hand-
writing samples, 50 for training and 20 for testing of HMM toolbox. They salvaged the highest
accuracy of 99% using Chain Network to train the model for acquiring increase in accuracy.

K nearest neighbor (KNN) K Nearest Neighbor (KNN) is the conventional machine learning
model. It is the non-parametric model used for classification. KNN suppose pairs (X1,Y1),
(X2,Y2) ...(Xn,Yn) taking values from Rd. Here Y is the class label of X so that X-Y = r ∼ Pr
for r = 1,2, pr (probability distribution). Let give some norm ||. || on Rd. with a point x ϵ Rd.
Let the training data (X(1)),(Y(1)) ...(X(n)),(Y(n)) be reordering such that ||X(1) –x
|| ≤ ··· ≤ ||X(n) –x||. The training phase contains the training samples feature vector and class
labels. The classification phase contain a user-defined constant k and a test vector. KNN used a
continuous variable usually a distance measure Euclidean distance given in eq. (25).
Literature reveals that many researchers identified writer using KNN on English script.
Marti et al. [122] employed KNN on 100 pages samples of IAM dataset written by 20 writer.
They achieved the accuracy of 87.8%. Blankers et al. [31] classified 41 writers database using
KNN and achieved 98% identification rate.
Bulacu and Schomaker in [37, 38] imposed the experiment on Arabic database of
350 writers with 5 samples per writer. A probability distribution function was gener-
ated from the extracted texture features while 400 allographs were clustered to
generate codebook. They applied nearest neighbor as a distance measure. They
computed 88% in top-1 and 99% in top-10.
Awaida and. Mahmoud [20] performed experiment on Arabic digit database of 70,000
samples. They applied KNN and nearest mean classifier and reported 88.14% accuracy in
top 1. Pandey and Seeja [133] identified the writer using KNN. They salvaged 88.57%
accuracy with 240 clusters on accomplishment experiment with IAM dataset.
KNN was deployed on multilingual corupus. Fiel and Sablatnig [66] conducted experiment
on ICDAR 2013, ICDAR 2011,CVL Databases. They identify the writer using nearest
neighbor approach with the computed accuracy of 94.7% on ICDAR 2011, 88.5% on
ICDAR2013, 98.3% on CVL in Top 1.
Durou et al. [56] deployed KNN with codebook size of 250, 500 and 1000 and
compared the results with [102]. They conducted experiment on IAM and and ICFHR-
2012 dataset and achieved 87.56% on codebook size of 250, 87.96% on 500 and 88.01%
on 1000 in top-1 accuracy by compared with the results of Khalifa et al. [102]. Chahi
et al. [43] instigated KNN with hamming distance on conducting experiment on IAM,
10916 Multimedia Tools and Applications (2019) 78:10889–10931

IFN/ENIT, AHTID/MW and CVL databases. They reported average accuracy of 88.99%
on IAM, 96.47% on IFN/ENIT, 99.53% on AHTID/MW and 98.38% on CVL while
classifying BW-LBC features.
He et al. [83] measured the similarity between handwritings using Chi- square distance.
They used nearest neighbor for classification. They divided the CERUG dataset into English
and Chinease subsets. They reported 93.3% on CERUG-Chinease, 95.2% on CERUG-En-
glish, 98.5% on CERUG-Mixed, 86.2% on Firemaker and 89.9% on IAM respectively.

Support vector machine (SVM) Support Vector Machine is the widely used classification
model since 1990. It has been used successfully in many real-world problems especially in
handwritten character recognition and text categorization. Similarly researchers are found to
be interested in accomplishment of SVM for writer identification and verification.
SVM takes the training dataset of n points of the form: (x1, y1) … (xn, yn) where the yi represents
the class to which real !
xi belongs. SVM soft margin can be computed and minimize as:

1 n
∑ max 0; 1; yi w:!
xi −b þ λ k w k2 ð31Þ
n i¼1

where, w. xi is the hyper plane, value for λ yields that xi lies on the correct side of margin size.
Imdad et al. [96] trained and tested Steered Hermite Features using SVM with 20 authors of
IAM database and retrieved 90% accuracy. Gargouri et al. [70] employed linear SVM and
DTW on Arabic database of 15,158 words. They reported 45.67% accuracy in top 1 which is
improved to 96.90% in top 10 using SVM. Adak et al. [4] conducted experiment on Bengali
script of 100 samples. SVM with RBF function was used for classification of handcrafted
features. They conducted training and testing on various handwriting speeds and computed F-
measures. F-Measure of 43.65% had been obtained for Top-1. Chanda et al. [43] worked on
Oriya script of 100 writers’ database. They classified using SVM and achieved 94% accuracy.
Amaral et al. [14, 15] conducted experiment on Brazilian Forensic Letter Database of 20
writers. They applied SVM as a classifier and reported the identification rate of 80%.
SVM had been deployed in the multilingual data corpus. Christlein et al. [49]
employed Exemplar SVM for training. They evaluated 88.9% in Top 1using ICDAR
2013 and CVL database. In [51], they conducted experiment on ICDAR13, CVL and
KHATT dataset using Exemplar SVM and VLAD encoding. They reported the accuracy
of 99.6% on ICDAR13, 99.5% on CVL, 99.6% on KHATT in Top-1. Kumar and Kaur
[109] classified the features using SVM. IAM dataset was used for experimentation.
They reported the accuracy of 0.95.
Bertolini et al. [27] selected 475 samples from QUWI dataset to performed two multiscript
experiments. They trained SVM in Arabic script and test in English and achieved 22.1% EER
for LBP and 25.9%EER for LPQ in text independent analysis while same experiment for text
dependent analysis gave 1.3% EER for LBP and 2.8% EER for LPQ correspondingly. On the
contrary, they trained SVM in English and test in Arabic and reported EER of 38.3% on LBP
and 29.1% on LPQ in text independent analysis while in text dependent analysis they retrieved
EER of 9.1% on LBP and 5.5% on LPQ respectively.
SVM was also used in online domain by Gargouri, Kanoun and Ogier [70] along with
DTW for classification of ADAB database. Venugopal and Sun- daram [182] conducted
experiment in online domain on IAM online database and IBM UB 1 dataset. They employed
multiclass SVM with the kernal of Ra- dial Basis Function. They reterived highest accuracy of
Multimedia Tools and Applications (2019) 78:10889–10931 10917

97.81% on paragraph on IAM dataset at codebook size of 45. On IBM UB 1, they achived
94.37% on paragraph at codebook 60 respectively.

Others models Besides the aforementioned classification models there are some other models
and techniques to identify the writer. One of them is vector space model employed by Bensefia
et al. [126]. They conducted experiment on PSI and IAM databases. They used vector space
model for local features. Hypothesis of information retrieval model was used for identification.
They reported 95% identification rate on PSI database while 86% on IAM database. Christlein
et al. [47] used GMM super vectors on ICDAR- 2013 and CVL database. Fisher and VLAD
encoding schemes were used for the comparison to identify writer. They achieved 95.1 to
97.1% on ICDAR- 2013, and from 97.9 to 99.2% on CVL in Top 1.
Khan et al. [103] generated a predictor model using Spectral Regression- Kernel Discrim-
inant Analysis (SR-KDA) by deploying experiment on IAM, CVL, AHTID/MW database and
IFN/ENIT database. They compared their system with existing state of arts and achieved
97.2% on IAM, 99.6% on CVL, 71.6% on AHTID/MW and 76.0% on IFN/ENIT database.
Garg et al. [69] classified Gurmukhi dataset of 49,000 samples written 70 persons using four
classifiers that are Decision Tree, Random Forest, AdaBoostM1 and Naive Bayes. They
achieved highest accuracy of 81.75% using AdaBoostM1 on classification of centroid features.

4.3.3 Deep learning models based classification

Deep learning models and neural network approaches are widely used due to the tremendous
emergence in image processing, artificial intelligence and pattern recognition. Convolutional
Neural Networks are feed-forward artificial neural networks that are widely used for object
recognition [140], object detection [139], image tagging [73], ranking scoring [194], speech
recognition [143], face recognition [170], handwritten recognition [52] and recognition of
digits like MNIST dataset recognition.
We’re dwelling in the technology of wearable devices and environmental sensors. Activites
are more complex than actions as semantically they’re more representative of a human’s real life
styles. There are several machine learning techniques proposed for recognizing the complex
activities from sensor data [111–113]. These techniques are playing a vital role in many
domains where data is collected from sensor based devices such as smart-phone accelerometers
[115]. CNN have produced better recognition rates than other conventional machine learning
models. CNN had been learned to optimize and maximize the popular performance measure
known as positive data points ranked at the top positions (Pos@Top) [72].
Some of the neural network approaches were also employed in the field of writer identi-
fication and verification. Marti et al. [122] employed feed farward neural network on 100
pages samples of IAM dataset written by 20 writer. They used 20 hidden neurons and achieved
recoginition rate of 90.7%. Rafiee and Motavalli [138] classified 20 writers database and 5 to 7
lines of farsi script using feed farward neural network and reterived 86.5% accuracy. Adak
et al. [4] classified Auto derived parameters using RNN when conducted experiment on
Bengali script of 100 samples and reported F-Measure of 43.65%.
In multiscript environment, Zois and Anastassopoulos [203] identified 50 group of writers
using neural network with three layers of 20 neurons. They attained the identification rate of
97.7% for the English word and 98.6% for the Greek by creating their own private dataset.
Zhang et al. [198] deployed RNN with bi directional LSTM for the encoding of random hybrid
10918 Multimedia Tools and Applications (2019) 78:10889–10931

Table 5 Review of writer identification systems using different features and models

Year Reference Features Models Dataset Accuracy

(%)

2018 Garg et al. [69] Statistical Nave bayes Gurmukhi dataset 70.10
(49,000 samples)
2018 Pandey and Seeja Structural KNN IAM 88.57
[133]
2018 Chahi et al. [43] Structural KNN with hamming distance IAM 88.99
IFN/ENIT 96.47
AHTID/MW 99.53
CVL 98.38
2018 Christlein et al. [51] Model Exemplar SVM ICDAR13 99.6
Based CVL 99.5
KHATT 99.6
2017 Khan et al. [104] Statistical SR-KDA IAM 97.2
AHTID/MW 71.6
CVL 99.6
IFN/ENIT 76.0
2017 Adak et al. [5] Statistical SVM Author generated (100 43.65
Bengali samples)
2017 Kumar and Kaur Statistical SVM IAM 92
[109]
2017 Kumar and Kaur Statistical Neural Network IAM 95
[109]
2017 Venugopal and Statistical SVM IAM online 97.81
Sundaram [182] IBM UB 1 94.37
2017 Zhang et al. [201] Statistical RNN- BLSTM BIT-English 100
BIT-Chinese 99.46
2017 Durou et al. [56] Structural KNN IAM 92
ICFHR- 2012 97.0
2017 He et al. [84] Structural KNN CERUG-CN 93.3
CERUG-EN 95.2
CERUGMIX 98.5
Firemaker 86.2
IAM 89.9
2017 Ahmed et al. [5] Structural Chi- square KURD 94.63
IAM 95.59
ICFHR GRDS 97.12
100
2017 Xing and Qiao in Model CNN IAM 99.01
[189] Based HWDB1.1 93.85
2017 Christlein et al. [48] Statistical SVM ICDAR 89.4
CVL 91.0
KHATT 97.2
2017 Christlein et al. [50] Model Exampler SVM ICDAR2017 (Historical- 84.1
Based WI)
2017 Nasuno and Arai Model Alexnet CNN Author generated (100 90
[129] Based words)
2017 Wu et al. [187] Statistical HMM IAM-OnDB 94.5
2017 Sheikh and Statistical HMM Author generated (70 58.7
Khotanlou [158] Persian samples)
2016 Bertolini et al. [27] Structural SVM QUWI
2016 Hannad and Siddiqi Structural Hamming Distance IFN/ENIT 94.89
[80] IAM 89.54
2016 Garz et al. [71] Structural Naive Bayes IAM 86.9
2016 Yang et al. [192] Model CNN NLPR-Chinease 95.72
Based NLPR- English 98.51
Multimedia Tools and Applications (2019) 78:10889–10931 10919

Table 5 (continued)

Year Reference Features Models Dataset Accuracy

(%)

2016 Zhu and Wang Structural Manhattan IAM 96.48

[201] HIT-MW 95.44
2015 Xiong et al. [194] Statistical weighted Chi- squared distance ICFHR2012-Latin 94.0
ICDAR2013-Greek 96.2
ICDAR2013-English 94.0
2015 Fiel and Sablatnig Model KNN ICDAR 2013 88.5
[66] Based ICDAR 2011 94.7
CVL 98.3
2015 Abdi and Structural Chi-square IFN/ENIT 90.02
Khemakhem [2]
2015 He et al. in [85] Structural Nearest Neighbor Firemaker 91.1
IAM 89.8
2015 Christlein et al. [49] Model CNN ICDAR 2013 98.9
Based CVL 99.4
2015 Yang et al. [193] Model CNN CASIA-OLHWDB1.0 99.52
Based
2015 Khalifa et al. [102] Structural SR-KDA IAM 98
2014 Kumar et al. [108] Statistical SVM Gurmukhi 91.80
2014 Wu et al. [188] Statistical Manhattan Chi- square IAM 98.5
Firemaker 92.4
HIT-MW 95.4
ICDAR 2011 99.5
ICFHR 2012 98.0
2014 Hu et al. [94] Statistical KNN CASIA Offline DB 2.1 96.25
2014 Newell and Griffin Statistical Nearest Neigh- bour with the IAM 99
[132] Eu- clidean distance ICDAR 2012 93.1
2014 Kumar et al. [109] Structural KNN IAM 86.75
2014 Fecker et al. [63] Statistical Chi- square Islamic Heritage project 92.5
(IHP)

KNN K Nearest Neighbor; HD Hamming Distance; NN Neural Network; SVM Support Vector Machine; RNN
Recurrent Neural Network; NB Naïve Bayes; CSD Chi-Square Distance; MD Manhattan Distance; ED Euclidean
Distance; BLSTM Bidirectional Long Short-Term Memory; HMM Hidden Markov Model; CNN Convolutional
Neural Network; WCSD: Weighted Chi-Square Distance; SR-KDA Spectral Regression- Kernel Discriminant
Analysis; IAM Institut für Informatik und angewandte Mathematik; IFN/ENIT Institute of Communications
Technology/Ecole Nationale dIngenieurs de Tunis; AHTID/MW Arabic Handwritten Text Images Database
written by Multiple Writers; KHATT KFUPM Handwritten Arabic Text BIT Biometrics Ideal Test; ICFHR
International Conference on Frontiers in Handwriting Recognition; IHP Islamic Heritage Project

stroke features and classification. They conducted experiment on English and Chi- nese dataset
and retrieved 100% accuracy on English dataset and 99.46% on Chinese dataset while
compared with existing online domain systems.
Yang et al. [193] deployed CNN for online domain and called their system as Deep
WriterID. They evaluated experiment on two datasets of National Lab- oratory of Pattern
Recognition (NLPR) and attained the identification rate of 95.72% for Chinese and 98.51% for
English. Same authors [192] conducted experiment on CASIA-OLHWDB1.0 dataset to
employed deep convolutional network and achieved 99.52% accuracy. Xing and Qiao in
[189] and [190] conducted experiment on IAM and HWDB datasets. They classified using
CNN model Deep Writer. They achieved 99.01% accuracy on 301 writers of IAM with 4
English alphabets as input.
Christlein et al. [49] performed experiment on ICDAR 2013, CVL with 4 million image
patches of size 3232. They reported 0.21 absolute mAP. Nasuno and Arai [129] experimented
10920 Multimedia Tools and Applications (2019) 78:10889–10931

on Japanese dataset with 100 kind of words from each 100 writers. AlexNet CNN was
employed to train 90 words. They tested 10 words and calculated approximately 90%
accuracy. Yang et al. [193] evaluated performance on CASIA-OLHWDB1.0 Chinease dataset.
They reported accuracy rate of 99.52% on text line. Performance evaluation of different
systems can be look at glance from the Table 5.

5 Open research issues

Despite extensive research, the problem still remains open to research due to the variety of
challenges it offers. One main challenge is inter-class variation that occurs due to variation in
handwriting patterns due to the individual writing style. Writing style depends on various
factors like mood, mental caliber, age and situation etc. Likewise, in our daily life, a writer could
have different handwriting samples due to the use of diverse writing instruments. This leads to
the width of strokes and size of characters from the same writer alterable in different time, which
makes the authentication of the writer more challenging and difficult. During acquisition of
handwriting images with verity of equipment it is also noted that training and testing samples
have not same resolution thus a writing of same person produces different writing styles and the
issue of inter-class variation arises. We have noticed that Latin scripts writer identification have
received enough attention of researcher and are at mature lever, however, family of Arabic
script (i.e. Arabic, Urdu, Persian, etc.) and Chines have not received much attention as a result
writer identification results are far from satisfactory performance. It is due to the complexity of
these script. For example, mostly researchers’ face issue in Arabic script and accuracy is
dejected. As Arabic is a native cursive language due to the jointed characters within the words.
Diacritics (that is usually at the top or below of character), overlapping and different represen-
tation of words may also exist in Arabic script. Thus the segmentation of Arabic script is
difficult. Features selection and representation and the appropriate best practice of text depen-
dent and text independent regarding to Arabic script is still an open and hot issue. However,
symbolic languages like Bengali, Oriya, and Persian etc. produced increase in accuracy as they
have not unique characteristics like Arabic. Similarly passing symbolic and letters to machine
learning models are easy as compared to passing the whole line. Text independent methods
produce lesser identification rates than text dependent methods. Reason behind this is a text-
dependent method operates at the character or word level and gives better results to increase
accuracy. Conversely, text- independent methods work on the line or paragraph levels thus
segmentation is required and low identification results was achieved. From the discussion of
text-dependent and text- independent methods, one can conclude that in general higher
identification rates are achievable with the former type but at the cost of the requirement to
have same fixed text or human intervention to extract the elements (characters or words) to be
compared. Text-independent methods are much more useful and applicable. These methods,
however, require a certain minimum amount of text to produce acceptable results.
The challenges for writer identification and writer retrieval include the use of different pens,
which changes a persons writing style, the physical condition of the writer, distractions like
multitasking and noise, and also that the writing style changes with age. The changing of the
style with increasing age is not covered by any available data-set and cannot be examined, but
makes the identification or retrieval harder for real life data.
One of the major challenge is CNN based writer identification system that was rare in
literature. A reason might be that the training and test sets of most datasets are disjoint making
Multimedia Tools and Applications (2019) 78:10889–10931 10921

it impossible to train a CNN for classification. Deep learning models required sufficient
number of instances for better performance. There are many datasets for writer identification
having a lot number of classes. But still a problem arises that the data within the class is less.
This problem highly effect the performance of deep learning models.
Multi-script writer identification is one of the hot research problem. It brings a lot of interest to
validate the hypothesis that writer across different scripts are same. This approach gave better results
on structural and statistical features but unfortunately deep learning approaches had not been
deployed in this domain.

6 Conclusion and future direction

In this paper, we provided a comprehensive insight of state of the art of writer identification
techniques with an emphasis on the pre-processing, feature extraction and classification
(conventional machine learning and deep learning based models). The effective implementa-
tion of writer identification systems can be applicable in forensic and historical analysis, banks,
check processing, signature analysis, graphology, legal documents, ancient manuscripts, digital
rights administration, and document analysis methods. The extensive review of the literature
has led to identifying several open problems as mentioned in the previous section. Language
differ form other languages and have unique characteristics that poses different challenges for
each language, thus standard approach does not fit for all i.e. features used for English does not
fit for Arabic script due to cursive nature of the script. We noticed that Latin scripts have been
widely considered for writer identification however, family of Arabic script (i.e. Arabic, Urdu,
Persian, etc.) and Chines have not received that much attention and still far from satisfactory
performance. It is due to the complexity and challenges in these scripts. Furthermore, we have
notices that there is lack of benchmark evaluation and competitions, different researchers are
using different data-set for evaluation. Thus, there is work required to develop benchmark
data-set and evaluation mechanism through competitions to compare and evaluate the work in
the field of write identification. We have notices that there is still a large room for research to
go for new researchers. To highlight these challenges and open research issues, for the future
perspective, our study lay down to enumerate few research directions for researchers in the
area of writer identification using deep learning techniques. We discussed the notable dataset
for handwritten writer identification. However, instances of text in each data-set are less within
the class, which result in poor performance using deep learning models. Therefore, very
limited work available in the literature. There is a crucial need to develop an unconstrained
data-set that contains large number of samples within the class. Next direction is that data

Fig. 7 Future directions for researchers

10922 Multimedia Tools and Applications (2019) 78:10889–10931

augmentation techniques can help in increasing the number of instances within the class using
existing dataset and will lead to enjoy the deep learning approaches in this domain. Further-
more the pre-processing step like normalization, high frequency and contours can also be
improved performance of deep learning models.
Next, there is a wide room for online writer identification for researchers. As, It is cleared
that offline writer recognition is considered as a harder task due to the deficiency of sequential
information in handwriting and large intra-class variation. On contrary, online handwriting
contains sequential and spatial information. Writing samples are kept as trajectories embodied
as time series of two dimensional coordinates. Dynamic features are calculated and used for
identification. Different parameters like writing speed, direction of writing, positions of pen
tips, velocity, angles and pressure etc. will be extracted. These features result served as spatio-
temporal parameter space exemplification of handwriting. Online writer identification is much
attractive research with the advancement of information technology and use of smart phone,
tablets and like these gadgets.
Finally, we are suggesting to explore and investigate different architectures of deep neural
networks for writer identification, especially a case of Arabic script. The deep convolution
neural network proved itself as a best recognizer in the optical character recognition, hand-
writing recognition, speech recognition. Few researchers in the literature had employed deep
learning model fro writer identification using text images. There is a need to employ deep
learning based modern techniques like transfer learning and fine tuning etc. The Transfer
learning or inductive learning is the technique in which the learned knowledge of one problem
is stored and apply to the other different related problems (same domain). One can take a pre-
trained network and use it as a starting point to learn a new task. Fine-tuning a network with
transfer learning is usually much faster and easier than training a network with randomly
initialized weights from scratch. Another way is to freeze the weights of CNN layers for
extracting features and then a linear classifier like SVM use for classification (perform well for
problems belong to different domain).
At last, we present the future recommendations through graphical representation shown in
Fig. 7 that provide better understanding for new researchers in the field of writer identification.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

References

1. Abdelhaleem A, Droby A, Asi A, Kassis M, Al Asam R, El-sanaa J (2017) Wahd: a database for writer
identification of arabic historical documents. In: Arabic Script Analysis and Recognition (ASAR), 1st
International Workshop on, pp. 64–68. IEEE
2. Abdi MN, Khemakhem M (2015) A model-based approach to offline text-independent Arabic writer
identification and verification. Pattern Recognition p. 18901903
3. Abdi MN, Khemakhem M, Ben-Abdallah H (2009) A novel approach for offline Arabic writer identifi-
cation based on stroke feature combination. In: Computer and Information Sciences, 2009. ISCIS 2009.
24th International Symposium on, pp. 597–600. IEEE
4. Adak C, Chaudhuri BB, Blumenstein M (2017) Writer identification and verification from intra-variable
individual handwriting. arXiv preprint arXiv:1708.03361
5. Ahmed AA, Hasan HR, Hameed FA, Al-Sanjary OI (2017) Writer identification on multi-script hand-
written using optimum features. Kurdistan Journal of Applied Research 2(3):178–185
6. Ahmed AA, Sulong G (2014) Arabic writer identification: A review of literature. Journal of Theoretical &
Applied Information Technology
Multimedia Tools and Applications (2019) 78:10889–10931 10923

7. Al Maadeed S, Ayouby W, Hassaïıne A, Aljaam JM (2012) QUWI: An Arabic and english handwriting
dataset for offline writer identification. In: Frontiers in Handwriting Recognition (ICFHR), International
Conference on, pp. 746–751. IEEE
8. Alamri H, Sadri J, Suen CY, Nobile N (2008) A novel comprehensive database for Arabic offline
handwriting recognition. In: Proceedings of 11th International Conference on Frontiers in Handwriting
Recognition, ICFHR, pp. 664–669
9. Al-Dmour A, Abu Zitar R (2007) Arabic writer identification based on hybrid spectral– statistical
measures. Journal of Experimental & Theoretical Artificial Intelligence 19(4):307–332
10. Al-Ma’adeed S, Elliman D, Higgins CA (2002) A data base for Arabic handwritten text recognition
research. In: Frontiers in Handwriting Recognition, Proceedings. Eighth International Workshop on, p.
485–489. IEEE
11. Al-Ma’adeed S, Mohammed E, Al Kassis D, Al-Muslih F (2008) Writer identification using edge-based
directional probability distribution features for Arabic words. In: Computer Systems and Applications,
AICCSA-08. IEEE/ACS International Conference on, p. 582–590. IEEE
12. Al-Maadeed S, Hassaine A, Bouridane A, Tahir MA (2016) Novel geometric features for offline writer
identification. Pattern Anal Applic 19(3):699–708
13. Amaral AMM, Freitas CO, Bortolozzi F (2012) The graphometry applied to writer identification. In:
Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern
Recognition (IPCV), p. 1. The Steering Committee of the World Congress in Computer Science,
Computer Engineering and Applied Computing (WorldComp)
14. Amaral AMM, Freitas CO, Bortolozzi F (2013) Multiple graphometric features for writer
identification as part of forensic handwriting analysis. In: Proceedings of the International
Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV), p. 1.
The Steering Committee of the World Congress in Computer Science, Computer Engineering
and Applied Computing (WorldComp)
15. Amend K (1980) Handwriting analysis: The complete basic book. New Page Books
16. Arazi B (1977) Handwriting identification by means of run-length measurements. IEEE Trans. Syst., Man
and. Cybernetics 7(12):878–881
17. Arazi B (1983) Automatic handwriting identification based on the external properties of the samples. IEEE
Transactions on Systems, Man, and Cybernetics 13(4):635–642
18. Asi A, Abdalhaleem A, Fecker D, Märgner V, El-Sana J (2017) On writer identification for Arabic
historical manuscripts. International Journal on Document Analysis and Recognition (IJDAR) p. 173–187
19. Awaida S, Mahmoud S (2011) Writer identification of Arabic handwritten digits. Universit¨atsbibliothek
Dortmund
20. Awaida, S.M., Mahmoud, S.A. (2012) State of the art in offline writer identification of handwritten text
and survey of writer identification of Arabic text. Educational Research and Reviews 4–45
21. Awaida SM, Mahmoud SA (2013) Writer identification of Arabic text using statistical and structural
features. Cybernetics and Systems 57–76
22. Baghshah MS, Shouraki SB, Kasaei S (2006) A novel fuzzy classifier using fuzzy LVQ to recognize
online Persian handwriting. In: Information and Communication Technologies, 2006. ICTTA’06. 2nd, vol.
1, pp. 1878–1883. IEEE
23. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Computer Vision and
Image Understanding 346–359
24. Bensefia A, Paquet T, Heutte L (2003) Information retrieval based writer identification. PRIS, p. 56–63
25. Bensefia A, Paquet T, Heutte L (2005) Handwritten document analysis for automatic writer recognition.
ELCVIA: Electronic Letters on Computer Vision and Image Analysis 72–86
26. Bensefia A, Paquet T, Heutte L (2005) A writer identification and verification system. Pattern Recognition
Letters 2080–2092
27. Bertolini D, Oliveira LS, Sabourin R (2016) Multi-script writer identification using dissimilarity. In:
Pattern Recognition (ICPR), 23rd International Conference on, p. 3025–3030. IEEE
28. Bertolini D, Oliveira LS, Sabourin R (2016) Multi-script writer identification using dissimilarity. In:
International Conference on Pattern Recognition (ICPR), p. 3020–3025
29. bin Abdl KM, Hashim SZM (2009) Handwriting identification: a direction review. In: Signal and Image
Processing Applications (ICSIPA), 2009 IEEE International Conference on, pp. 459–463. IEEE
30. Bisquerra AF (2009) Writer identification by combination of f Graphical Features in the Framework of old
Handwritten Music Scores. Ph.D. thesis, Autonomous University of Barcelona
31. Blankers V, Niels R, Vuurpijl L (2007) Writer identification by means of explainable features: shapes of
loop and lead-in strokes. Proc. of BNAIC 17–24
32. Bradford RR, Bradford R (1992) Introduction to handwriting examination and identification. Nelson-Hall
Publishers, Chicago
10924 Multimedia Tools and Applications (2019) 78:10889–10931

33. Brink A, Bulacu M, Schomaker L (2008) How much handwritten text is needed for text- independent writer
verification and identification. In: Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pp. 1–
4. IEEE
34. Bulacu ML (2007) Statistical pattern recognition for automatic writer identification and verification. Ph.D.
thesis, University of Groningen
35. Bulacu M, Schomaker L (2003) Writer style from oriented edge fragments. In: International Conference on
Computer Analysis of Images and Patterns, pp. 460–469. Springer
36. Bulacu M, Schomaker L (2005) A comparison of clustering methods for writer identification and
verification. In: Document Analysis and Recognition, Proceedings. Eighth International Conference on,
p. 1275–1279. IEEE
37. Bulacu M, Schomaker L (2006) Combining multiple features for text-independent writer identification
and verification. In: Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft
38. Bulacu M, Schomaker L (2007) Automatic handwriting identification on medieval documents. In: Image
Analysis and Processing, ICIAP-07. 14th International Conference on, pp. 279–284. IEEE
39. Bulacu M, Schomaker L (2007) Text-independent writer identification and verification using textural and
allographic features. IEEE Trans Pattern Anal Mach Intell 29(4):701–717
40. Bulacu M, Schomaker L, Brink A (2007) Text-independent writer identification and verification on offline
Arabic handwriting. In: Document Analysis and Recognition (ICDAR- 07), Ninth International
Conference on, p. 769–773. IEEE
41. Bulacu M, Schomaker L, Vuurpijl L (2003) Writer identification using edge-based directional features.
Writer 1:1
42. Cha SH, Srihari SN (2000) Assessing the authorship confidence of handwritten items. In: Proceedings, p.
42. IEEE
43. Chahi A, Ruichek Y, Touahni R et al (2018) Block wise local binary count for offline text-independent
writer identification. Expert Syst Appl 93:1–14
44. Chanda S, Franke K, Pal U (2012) Text independent writer identification for Oriya script. In: Document
Analysis Systems (DAS), 10th IAPR International Workshop on, pp. 369–373. IEEE
45. Chanda S, Franke K, Pal U, Wakabayashi T (2010) Text independent writer identification for Bengali
script. In: Pattern Recognition (ICPR), 20th International Conference on, p. 2005–2008. IEEE
46. Chapran J (2006) Biometric writer identification: feature analysis and classification. Int J Pattern Recognit
Artif Intell 20(04):483–503
47. Chaudhry R, Pant SK (2004) Identification of authorship using lateral palm printa new concept. Forensic
Sci Int 141(1):49–57
48. Christlein V, Bernecker D, Honig F, Angelopoulou E (2014) Writer identification and verification using
GMM supervectors. In: Applications of Computer Vision (WACV), IEEE Winter Conference on, p. 998–
1005. IEEE
49. Christlein V, Bernecker D, Hönig F, Maier A, Angelopoulou E (2017) Writer identification using GMM
supervectors and exemplar-SVMs. Pattern Recogn 63:258–267
50. Christlein V, Bernecker D, Maier A, Angelopoulou E (2015) Offline writer identification using
convolutional neural network activation features. In: German Conference on Pattern Recognition, p.
540–552. Springer
51. Christlein V, Gropp M, Fiel S, Maier A (2017) Unsupervised Feature Learning for Writer Identification
and Writer Retrieval. arXiv preprint arXiv:1705.09369
52. Christlein V, Maier A (2018) Encoding CNN activations for writer recognition. In: 13th IAPR International
Workshop on Document Analysis Systems (DAS), pp. 169–174. IEEE
53. Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification.
arXiv preprint arXiv:1202.2745
54. Djeddi C, Al-Maadeed S, Gattal A, Siddiqi I, Ennaji A, El Abed H (2016) ICFHR 2016 competition on
multi-script writer demographics classification using “QUWI” database. In: Frontiers in Handwriting
Recognition (ICFHR), 15th International Conference on, p. 602–606. IEEE
55. Djeddi C, Al-Maadeed S, Gattal A, Siddiqi I, Souici-Meslati L, El Abed H (2015) Icdar2015 competition
on multi-script writer identification and gender classification using QUWI database. In: Document
Analysis and Recognition (ICDAR), 2015 13th International Conference on, p. 1191–1195. IEEE
56. Durou A, Aref I, Al-Maadeed S, Bouridane A, Benkhelifa E (2017) Writer identification approach based
on bag of words with obi features. Information Processing & Management
57. Duvoisin RC, Sage J (2001) Parkinson’s disease: A guide for patient and family. Lippincott Williams &
Wilkins, Philadelphia
58. Eaton HD (1938) Handwriting a neurological study. California and Western Medicine 61
59. El Abed H, Margner V (2007) The IFN/ENIT-database-a tool to develop Arabic handwriting recognition
systems. In: Signal Processing and its Applications (ISSPA 2007), 9th International Symposium on, p. 14. IEEE
Multimedia Tools and Applications (2019) 78:10889–10931 10925

60. El Abed H, Märgner V (2011) ICDAR 2009-Arabic handwriting recognition competition. International
Journal on Document Analysis and Recognition (IJDAR) 313
61. El-Sherif EA, Abdelazeem S (2007) A two-stage system for Arabic handwritten digit recognition tested on
a new large database. Artificial Intelligence and Pattern Recognition 237–242
62. Fairhurst M, Chapran J (2006) Biometric writer identification based on the interdependency between static
and dynamic features of handwriting. In: Proceedings of the 10th International Workshop on Frontiers in
Handwriting Recognition, pp. 505–510
63. Fecker D, Asit A, Märgner V, El-Sana J, Fingscheidt T (2014) Writer identification for historical arabic
documents. In: Pattern Recognition (ICPR), 2014 22nd International Conference on, pp. 3050–3055. IEEE
64. Fiel S, Sablatnig R (2010) Writer retrieval and writer identification using local features. In: Document
Analysis Systems (DAS), 10th IAPR International Workshop on, pp. 145–149. IEEE
65. Fiel S, Sablatnig R (2013) Writer identification and writer retrieval using the fisher vector on visual vocabularies.
In: Document Analysis and Recognition (ICDAR), 12th International Conference on, p. 545–549. IEEE
66. Fiel, S., Sablatnig, R. (2015) Writer identification and retrieval using a convolutional neural network. In:
International Conference on Computer Analysis of Images and Patterns, p. 26–37. Springer
67. Fornes A, Dutta A, Gordo A, Lladós J (2012) CVC-MUSCIMA: A database of handwritten music score
images for writer identification and staff removal. International Journal on Document Analysis and
Recognition manuscript p. 243–251
68. Fornes A, Llados J, Sanchez G, Bunke H (2012) Writer identification in old handwritten music scores. In:
Pattern Recognition and Signal Processing in Archaeometry: Mathematical and Computational Solutions
for Archaeology, pp. 27–63. IGI Global
69. Garg NK, Kumar M, et al (2018) Writer identification system for handwritten Gurmukhi characters: Study
of different feature-classifier combinations. In: Proceedings of International Conference on Computational
Intelligence and Data Engineering, pp. 125–131. Springer
70. Gargouri M, Kanoun S, Ogier JM (2013) Text-independent writer identification on online Arabic
handwriting. In: Document Analysis and Recognition (ICDAR), 12th International Conference on, pp.
428–432. IEEE
71. Garz A, Würsch M, Fischer A, Ingold R (2016) Simple and fast geometrical descriptors for writer
identification. Electronic Imaging 2016(17):1–12
72. Geng Y, Liang RZ, Li W, Wang J, Liang G, Xu C, Wang JY (2016) Learning convolutional neural network
to maximize pos@ top performance measure. arXiv preprint arXiv:1609.08417
73. Geng Y, Zhang G, Li W, Gu Y, Liang RZ, Liang G, Wang J, Wu Y, Patil N, Wang JY (2017) A novel
image tag completion method based on convolutional neural transformation. In: International Conference
on Artificial Neural Networks, pp. 539–77. Springer
74. Ghiasi G, Safabakhsh R (2013) Offline text-independent writer identification using codebook and efficient
code extraction methods. Image Vis Comput 31:379–391
75. Gibbons M, Yoon S, Cha SH, Tappert C (2005) Evaluation of biometric identification in open systems. In:
International Conference on Audio and Video Based Biometric Person Authentication, p. 823–831.
Springer
76. Grosicki E, Carre M, Brodin JM, Geoffrois E (2008) Rimes evaluation campaign for handwritten mail
processing. In: 11th International Conference on Frontiers in Handwriting Recognition, p. 16. Concordia
University
77. Hangai S, Yamanaka S, Hanamoto T (2000) Online signature verification based on altitude and direction
of pen movement. In: Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on,
vol. 1, pp. 489–492. IEEE
78. Hannad Y, Siddiqi I, El Kettani MEY (2015) Arabic writer identification using local binary patterns (LBP)
of handwritten fragments. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 237–
244. Springer
79. Hannad Y, Siddiqi I, El Kettani MEY (2016) Writer identification using texture descriptors of handwritten
fragments. Expert Systems with Applications 1422
80. Hannad Y, Siddiqi I, El Kettani MEY (2016) Writer identification using texture descriptors of handwritten
fragments. Expert Syst Appl 47:14–22
81. Hassaïıne A, Al-Maadeed S, Bouridane A (2012) A set of geometrical features for writer identification. In:
International Conference on Neural Information Processing, p. 584–591. Springer
82. Hassaïne A, Al Maadeed S, Aljaam J, Jaoua A (2013) ICDAR 2013 competition on gender prediction
from handwriting. In: Document Analysis and Recognition (ICDAR), 12th International Conference on,
pp. 1417–1421. IEEE
83. He Z, Fang B, Du J, Tang YY, You X (2005) A novel method for offline handwriting- based writer
identification. In: Document Analysis and Recognition, 2005. Proceedings. Eighth International
Conference on, pp. 242–246. IEEE
10926 Multimedia Tools and Applications (2019) 78:10889–10931

84. He S, Schomaker L (2017) Writer identification using curvature-free features. Pattern Recogn 63:451–464
85. He Z, Tang Y (2004) Chinese handwriting-based writer identification by texture analysis. In: Machine
Learning and Cybernetics, Proceedings of 2004 International Conference on, vol. 6, pp. 3488–3491. IEEE
86. He Z, Tang YY, You X (2005) A contourlet-based method for writer identification. In: Systems, Man and
Cybernetics, 2005 IEEE International Conference on, vol. 1, pp. 364–368. IEEE
87. He S, Wiering M, Schomaker L (2015) Junction detection in handwritten documents and its application to
writer identification. Pattern Recognition p. 40364048
88. He Z, You X, Tang YY (2008) Writer identification of Chinese handwriting documents using hidden
Markov tree model. Pattern Recogn 41(4):1295–1307
89. He Z, You X, Zhou L, Cheung Y, Du J (2010) Writer identification using fractal dimension of wavelet
subbands in Gabor domain. Integrated Computer-Aided Engineering 17(2):157–165
90. Helli B, Moghadam ME (2008) Persian writer identification using extended Gabor filter. In: International
Conference Image Analysis and Recognition, p. 579–586. Springer
91. Helli B, Moghaddam ME (2008) A text-independent Persian writer identification system using LCS based
classifier. In: Signal Processing and Information Technology, ISSPIT-08. IEEE International Symposium
on, p. 203206. IEEE
92. Helli B, Moghaddam ME (2009) A writer identification method based on XGabor and LCS. IEICE
Electronics Express 6(10):623–629
93. Hertel C, Bunke H (2003) A set of novel features for writer identification. In: International Conference on
Audio and Video Based Biometric Person Authentication, pp. 679–687. Springer
94. Hu Y, Yang W, Chen Y (2014) Bag of features approach for offline text-independent Chinese writer
identification. In: Image Processing (ICIP), IEEE International Conference on, p. 26092613. IEEE
95. Idicula SM (2011) A survey on writer identification schemes. Writer 15
96. Imdad A, Bres S, Eglin V, Rivero-Moreno C, Emptoz H (2007) Writer identification using steered hermite
features and SVM. In: Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International
Conference on, vol. 2, pp. 839–843. IEEE
97. Jain R, Doermann D (2011) Offline writer identification using k-adjacent segments. In: Document
Analysis and Recognition (ICDAR), International Conference on, p. 769–773. IEEE
98. Jin W, Wang Y, Tan T (2005) Text-independent writer identification based on fusion of dynamic and static
features. In: Advances in Biometric Person Authentication, pp. 197–204. Springer
99. Kamal P, Rahman F, Mustafiz S (2014) A robust authentication system handwritten documents
using local features for writer identification. Journal of Computing Science and Engineering
8(1):11–16
100. Kameya H, Mori S, Oka R (2003) Figure-based writer verification by matching between an arbitrary part
of registered sequence and an input sequence extracted from online handwritten figures. In: Proceedings of
the Seventh International Conference on Document Analysis and Recognition Volume 2, p. 985. IEEE
Computer Society
101. Khalid S, Naqvi U, Siddiqi I (2015) Framework for human identification through offline handwritten
documents. In: Computer, Communications, and Control Technology (I4CT), 2015 International
Conference on, p. 54–58. IEEE
102. Khalifa E, Al-Maadeed S, Tahir MA, Bouridane A, Jamshed A (2015) Offline writer identification using
an ensemble of grapheme codebook features. Pattern Recogn Lett 59:18–25
103. Khan FA, Tahir MA, Khelifi F, Bouridane A, Almotaeryi R (2017) Robust offline text independent writer
identification using bagged discrete cosine transform features. Expert Syst Appl 71:404–415
104. Kharma N, Ahmed M, Ward R (1999) A new comprehensive database of handwritten Arabic words,
numbers, and signatures used for OCR testing. In: Electrical and Computer Engineering, IEEE Canadian
Conference on, vol. 2, p. 766–768. IEEE
105. Kleber F, Fiel S, Diem M, Sablatnig R (2013) CVL-database: An offline database for writer retrieval,
writer identification and word spotting. In: Document Analysis and Recognition (ICDAR), 2013 12th
International Conference on, p. 560–564. IEEE
106. Kozinets B, Lantsman R, Sokolov B, Yakubovich V (1967) Identification and differentiation of
handwritings with the help of electronic computers (opozanie i differentsyatsiya pocherkov pri
pomoshchi elektronnovychislitelnykh mashin). Tech. rep., Foreign Technology Div Wright-
Patterson AFB Ohio
107. Kumar R, Chanda B, Sharma J (2014) A novel sparse model based forensic writer identification. Pattern
Recogn Lett 35:105–112
108. Kumar M, Jindal M, Sharma R (2014) A novel hierarchical technique for offline handwritten Gurmukhi
character recognition. National Academy Science Letters 37(6):567–572
109. Kumar R, Kaur M (2017) A character based handwritten identification using neural network and SVM.
International Journal of Scientific Research in Science, Engineering and Technology (IJSRSET)
Multimedia Tools and Applications (2019) 78:10889–10931 10927

110. Leedham G, Chachra S (2003) Writer identification using innovative binarised features of handwritten
numerals. In: Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference
on, pp. 413–416. IEEE
111. Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic
interval-based model. AAAI 30:1266–1272
112. Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: Recognizing complex activities
from sensor data. IJCAI 2015:1617–1623
113. Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition.
Neurocomputing 181:108–115
114. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
115. Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition
using smartphone accelerometers. Multimedia Tools and Applications 76(8):10,701–10,719
116. Mahmoud SA, Ahmad I, Al-Khatib WG, Alshayeb M, Parvez MT, Maargner V, Fink GA (2014) Khatt:
An open arabic offline handwritten text database. Pattern Recogn 47(3):1096–1112
117. Margner V, El Abed H (2007) ICDAR 2007 - Arabic handwriting recognition competition. In: Document
Analysis and Recognition (ICDAR), 9th International Conference on, p. 1274–1278
118. Margner V, El Abed H (2007) ICFHR 2010 - Arabic handwriting recognition competition. In: Frontiers in
Handwriting Recognition (ICFHR), 12th International Conference on, p. 1274–1278
119. Margner V, El Abed H (2011) ICDAR 2011-Arabic handwriting recognition competition. In: Document
Analysis and Recognition (ICDAR), International Conference on, p. 1444–1448. IEEE
120. Margner V, Pechwitz M, El Abed H (2005) ICDAR 2005 Arabic handwriting recognition competition. In:
Document Analysis and Recognition (ICDAR), International Conference on, p. 7074
121. Marti UV, Bunke H (1999) A full english sentence database for off-line handwriting recognition. In:
Document Analysis and Recognition, 1999. ICDAR’99. Proceedings of the Fifth International Conference
on, p. 705–708. IEEE
122. Marti UV, Messerli R, Bunke H (2001) Writer identification using text line based features. In: Document
Analysis and Recognition, Proceedings. Sixth International Conference on, pp. 101–105. IEEE
123. Miller JJ, Patterson RB, Gantz DT, Saunders CP, Walch MA, Buscaglia J (2017) A set of handwriting
features for use in automated writer identification. J Forensic Sci 62(3):722–734
124. MNISTWebsite (2018) The MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist/
125. Moghaddam ME, et al (2009) A persian writer identification method based on gradient features and neural
networks. In: Image and Signal Processing, CISP’09. 2nd International Congress on, p. 14. IEEE
126. Moghaddam ME, et al (2009) Text-independent Persian writer identification using fuzzy clustering
approach. In: Information Management and Engineering, 2009. ICIME’09. International Conference on,
pp. 728–731. IEEE
127. Nakamura Y, Kidode M (2005) Individuality analysis of online kanji handwriting. In: Document
Analysis and Recognition, 2005. Proceedings. Eighth International Conference on, pp. 620–624.
IEEE
128. Namboodiri A, Gupta S (2006) Text independent writer identification from online handwriting. In: Tenth
International Workshop on Frontiers in Handwriting Recognition. Suvisoft
129. Nasuno R, Arai S (2017) Writer identification for offline japanese handwritten character using
convolutional neural network. In: Proceedings of the 5th IIAE(Institute of Industrial Applications
Engineers) International Conference on Intelligent Systems and Image Processing, pp. 94–97
130. Nejad F, Rahmati M (2007) A new method for writer identification and verification based on Farsi/Arabic
handwritten texts. In: Document Analysis and Recognition, Ninth International Conference on, vol. 2, pp.
829–833. IEEE
131. Newell AJ, Griffin LD (2011) Natural image character recognition using oriented basic image features. In:
Digital Image Computing Techniques and Applications (DICTA), 2011 International Conference on, pp.
191–196. IEEE
132. Newell AJ, Griffin LD (2014) Writer identification using oriented basic image features and the delta
encoding. Pattern Recogn 47(6):2255–2265
133. Pandey P, Seeja K (2018) Forensic writer identification with projection profile representation of graph-
emes. In: Proceedings of First International Conference on Smart System, Innovations and Computing, pp.
129–136. Springer
134. Pechwitz M, Maddouri SS, Mäargner V, Ellouze N, Amiri H, et al (2002) IFN/ENIT-database of
handwritten Arabic words. In: Proc. of CIFED, p. 127136. Citeseer
135. Pervouchine V, Leedham G (2007) Extraction and analysis of forensic document examiner features used
for writer identification. Pattern Recogn 40(3):1004–1013
136. Plamondon R, Lorette G (1989) Automatic signature verification and writer identification-the state of the
art. Pattern Recogn 22(2):107–131
10928 Multimedia Tools and Applications (2019) 78:10889–10931

137. Plamondon R, Srihari SN (2000) Online and offline handwriting recognition: a comprehensive survey.
CIEEE Transactions on Pattern Analysis and Machine Intelligence 63–84
138. Rafiee A, Motavalli H (2007) Offline writer recognition for farsi text. In: Artificial Intelligence Special
Session, 2007. MICAI 2007. Sixth Mexican International Conference on, pp. 193–197. IEEE
139. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region
proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence 39(6):1137–1149
140. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M
et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
141. Said H, Peake G, Tan T, Baker KD (1998) Writer identification from non-uniformly skewed handwriting
images. BMVC 110
142. Said HE, Tan TN, Baker KD (2000) Personal identification based on handwriting. Pattern Recognition
149–160
143. Sainath TN, Mohamed AR, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for
LVCSR. In: Acoustics, speech and signal processing (ICASSP), IEEE international conference on, pp.
8614–8618. IEEE
144. Sas J (2006) Handwriting recognition accuracy improvement by author identification. In: International
Conference on Artificial Intelligence and Soft Computing, pp. 682–691. Springer
145. Schlapbach A, Bunke H (2004) Offline handwriting identification using hmm based recognizers. In:
Pattern Recognition (ICPR-04), Proceedings of the 17th International Conference on, p. 654–658. IEEE
146. Schlapbach A, Bunke H (2004) Using hmm based recognizers for writer identification and verification. In:
Frontiers in Handwriting Recognition (IWFHR-9), Ninth International Workshop on, p. 167–172. IEEE
147. Schlapbach A, Bunke H (2005) Writer identification using an hmm-based handwriting recognition system:
to normalize the input or not. In: Proc. 12th Conf. of the Int. Graphonomics Society, pp. 138–142
148. Schlapbach A, Bunke H (2006) Offline writer identification using Gaussian mixture models. In: Pattern
Recognition (ICPR-06), 18th International Conference on, vol. 3, pp. 992–995. IEEE
149. Schlapbach A, Bunke H (2007) A writer identification and verification system using hmm based
recognizers. Pattern Anal Applic 10(1):33–43
150. Schlapbach A, Kilchherr V, Bunke H (2005) Improving writer identification by means of feature selection
and extraction. In: Document Analysis and Recognition, Proceedings. Eighth International Conference on,
p. 131–135. IEEE
151. Schlapbach A, Liwicki M, Bunke H (2008) A writer identification system for online whiteboard data.
Pattern Recogn 41(7):2381–2397
152. Schomaker L (2007) Advances in Writer Identification and Verification. In: Int. Conf. on Document
Analysis and Recognition, p. 1268–1273
153. Schomaker L, Bulacu M (2004) Automatic writer identification using connected-component contours and
edge-based features of uppercase western script. IEEE Trans Pattern Anal Mach Intell 26(6):787–798
154. Schomaker L, Vuurpijl L (2000) Forensic writer identification: A benchmark data set and a comparison of
two systems. [Internal Report for the Netherlands Forensic Institute]
155. Seropian A, Grimaldi M, Vincent N (2003) Writer identification based on the fractal construction of a
reference base. ICDAR 1163–1167
156. Shahabi F, Rahmati M (2009) A new method for writer identification of handwritten farsi documents. In:
10th International Conference on Document Analysis and Recognition, p. 426430. IEEE
157. Sharma MK, Dhaka VP (2015) Offline scripting-free author identification based on speeded-up robust
features. International Journal on Document Analysis and Recognition (IJDAR) p. 303–316
158. Sheikh A, Khotanlou H (2017) Writer identity recognition and confirmation using persian handwritten
texts. International Journal of Advances in Applied Sciences 6(2):98–105
159. Shen C, Ruan XG, Mao TL (2002) Writer identification using Gabor wavelet. In: Intelligent Control and
Automation, Proceedings of the 4th World Congress on, vol. 3, pp. 2061–2064. IEEE
160. Siddiqi I (2009) Classification of handwritten documents: writer recognition. Ph.D. thesis, Universit Paris
Descartes UFR Mathmatiques et Informatique
161. Siddiqi I, Vincent N (2007) Writer identification in handwritten documents. In: Document analysis and
recognition, ninth international conference on, p. 108–112. IEEE
162. Siddiqi I, Vincent N (2008) Combining global and local features for writer identification. In: Proceedings
of the 11. Int. Conference on Frontiers in Handwriting Recognition, Montreal
163. Siddiqi I, Vincent N (2009) Combining contour based orientation and curvature features for writer recognition.
In: International Conference on Computer Analysis of Images and Patterns, pp. 245–252. Springer
164. Siddiqi I, Vincent N (2009) A set of chain code based features for writer recognition. In: Document
Analysis and Recognition, ICDAR’09. 10th International Conference on, pp. 981–985. IEEE
165. Siddiqi I, Vincent N (2010) Text independent writer recognition using redundant writing patterns with
contour-based orientation and curvature features. Pattern Recogn 43(11):3853–3865
Multimedia Tools and Applications (2019) 78:10889–10931 10929

166. Srihari SN, Cha S, Arora H, Lee S (2002) Individuality of handwriting. J Forensic Sci 117
167. Srihari SN, Tomai CI, Zhang B, Lee S (2003) Individuality of numerals. ICDAR 3:1096–1100
168. Steinherz T, Rivlin E, Intrator N (1999) A survey on offline cursive word recognition. International Journal
on Document Analysis and Recognition 90–110
169. Strassel S (2009) Linguistic resources for Arabic handwriting recognition. In: Proceedings of the Second
International Conference on Arabic Language Resources and Tools, Cairo, Egypt
170. Sun Y, Liang D, Wang X, Tang X (2015) Deepid3: Face recognition with very deep neural networks. arXiv
preprint arXiv:1502.00873
171. Sutanto PJ, Leedham G, Pervouchine V (2003) Study of the consistency of some discriminatory features
used by document examiners in the analysis of handwritten letter’a’. In: Document Analysis and
Recognition, 2003. Proceedings. Seventh International Conference on, pp. 1091–1095. IEEE
172. Tan T (1992) Texture feature extraction via visual cortical channel modelling. In: Pattern Recognition, Image,
Speech and Signal Analysis, Proceedings, 11th IAPR International Conference on, pp. 607–610. IEEE
173. Tan T (1998) Rotation invariant texture features and their use in automatic script identification. IEEE
Transactions on Pattern Analysis and Machine Intelligence 751756
174. Tan GX, Viard-Gaudin C, Kot AC (2009) Automatic writer identification framework for online handwrit-
ten documents using character prototypes. Pattern Recognition 33133323
175. Tang Y, Bu W, Wu X (2014) Text-independent writer identification using improved structural features. In:
Chinese Conference on Biometric Recognition, p. 404411. Springer
176. Tang Y, Wu X, Bu W (2013) Offline text-independent writer identification using stroke fragment and
contour based features. In: Biometrics (ICB), International Conference on, pp. 1–6. IEEE
177. Thumwarin P, Matsuura T (2004) Online writer recognition for Thai based on velocity of barycenter of
pen-point movement. In: Image Processing, 2004. ICIP’04. 2004 International Conference on, vol. 2, pp.
889–892. IEEE
178. Timofte R, Van Gool LJ (2012) A training-free classification framework for textures, writers, and
materials. BMVC 13:14
179. Tomai CI, Zhang B, Srihari SN (2004) Discriminatory power of handwritten words for writer recognition.
In: Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 2, pp.
638–641. IEEE
180. Ubul K, Hamdulla A, Aysa A, Raxidin A, Mahmut R (2008) Research on uyghur offline handwriting-
based writer identification. In: Signal Processing, 2008. ICSP-08. 9th International Conference on, p.
1656–1659. IEEE
181. Ünlü A, Brause R, Krakow K (2006) Handwriting analysis for diagnosis and prognosis of parkinson's
disease. In: International Symposium on Biological and Medical Data Analysis, pp. 441–450. Springer
182. Venugopal V, Sundaram S (2017) An online writer identification system using regression based feature
normalization and codebook descriptors. Expert Syst Appl 72:196–206
183. Wang X, Ding X, Liu H (2003) Writer identification using directional element features and linear
transform. In: null, p. 9–42. IEEE
184. Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks.
In: Pattern Recognition (ICPR), 21st International Conference on, pp. 3304–3308. IEEE
185. Wen J, Fang B, Chen J, Tang Y, Chen H (2012) Fragmented edge structure coding for chinese writer
identification. Neurocomputing 4551
186. Woodard J, Lancaster M, Kundu A, Ruiz D, Ryan J (2010) Writer recognition of arabic text by generative
local features. In: Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International
Conference on, p. 1–7. IEEE
187. Wu Y, Lu H, Zhang Z (2017) Text-independent online writer identification using hidden markov models.
IEICE Trans Inf Syst 100(2):332–339
188. Wu X, Tang Y, Bu W (2014) Offline text-independent writer identification based on scale invariant feature
transform. IEEE Transactions on Information Forensics and Security 526–536
189. Xing L, Qiao Y (2016) DeepWriter: A Multi-Stream Deep CNN for Text independent Writer Identification.
In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589
190. Xiong YJ, Wen Y, Wang, PS, Lu Y (2015) Text-independent writer identification using sift descriptor and
contour-directional feature. In: Document Analysis and Recognition (ICDAR), 13th International
Conference on, p. 91–95. IEEE
191. Yakopcic C, Alom MZ, Taha TM (2016) Memristor crossbar deep network implementation based
on a convolutional neural network. 2016 International Joint Conference on Neural Networks
(IJCNN) pp. 963–9 @articleYakopcic2016MemristorCD, title=Memristor crossbar deep network
implementation based on a Convolutional neural network, author=Chris Yakopcic and Md.
Zahangir Alom and Tarek M. Taha, journal=2016 International Joint Conference on Neural
Networks (IJCNN), year=2016, pages=963–970 70
10930 Multimedia Tools and Applications (2019) 78:10889–10931

192. Yang W, Jin L, Liu M (2015) Chinese character-level writer identification using path signature feature,
dropstroke and deep CNN. In: Document Analysis and Recognition (ICDAR), 2015 13th International
Conference on, pp. 546–550. IEEE
193. Yang W, Jin L, Liu M (2016) DeepWriterID: An End-to-end Online Text-independent Writer Identification
System. IEEE Intell. Syst. 45–53
194. Zhang J, He Z, Cheung YM, You X (2009) Writer identification using a hybrid method combining Gabor
wavelet and mesh fractal dimension. In: International Conference on Intelligent Data Engineering and
Automated Learning, pp. 535–542. Springer
195. Zhang G, Liang G, Li W, Fang J, Wang J, Geng Y, Wang JY (2017) Learning convolutional ranking-score
function by query preference regularization. In: International Conference on Intelligent Data Engineering
and Automated Learning, pp. 1–8. Springer
196. Zhang B, Srihari SN (2003) Analysis of handwriting individuality using word features. In: Proceedings of
the Seventh International Conference on Document Analysis and Recognition Volume 2, p. 1142. IEEE
Computer Society
197. Zhang B, Srihari SN, Lee S (2003) Individuality of handwritten characters. In: Proceedings of the Seventh
International Conference on Document Analysis and Recognition Volume 2, p. 1086. IEEE Computer Society
198. Zhang XY, Xie GS, Liu CL, Bengio Y (2017) End-to-end online writer identification with recurrent neural
network. IEEE Transactions on Human-Machine Systems 47(2):285–292
199. Zhu Y, Tan T, Wang Y (2000) Biometric personal identification based on handwriting. In: Pattern
Recognition, Proceedings. 15th International Conference on, vol. 2, pp. 797–800. IEEE
200. Zhu Y, Tan T, Wang Y (2001) Font recognition based on global texture analysis. IEEE Transactions on
pattern analysis and machine intelligence p. 1192–1200
201. Zhu Y, Wang Y (2016) An offline text-independent writer identification system with sae feature extraction.
In: Progress in Informatics and Computing (PIC), 2016 International Conference on, pp. 432–436. IEEE
202. Zimmermann K, Varady M (1985) Handwriter identification from one-bit quantized pressure patterns.
Pattern Recogn 18(1):63–72
203. Zois EN, Anastassopoulos V (2000) Morphological waveform coding for writer identification. Pattern
Recogn 33(3):38

Arshia Rehman is BSCS student of GGPGC No.1, Abbottabad, Higher Education Department of Government
of Khyber-Pakhtunkhwa, Pakistan and works as a Research Assistant under the supervision of Dr. Saeeda Naz at
GGPGC No.1, Abbottabad. Her areas of interest are Document Image Understanding, Machine Learning and
Multimedia.

Saeeda Naz an Assistant Professor by designation and Head of Computer Science Department at GGPGC No.1,
Abbottabad, Higher Education Department of Government of Khyber-Pakhtunkhwa, Pakistan, since 2008. She
did her Ph.D. in Computer Science from Hazara University, Department of Information Technology, Mansehra,
Pakistan. She has published five book chapters and more than 40 papers in peer reviewed national and
international conferences and journals. Her areas of interest are Optical Character Recognition, Pattern Recog-
nition, Machine Learning, Medical Imaging and Natural Language Processing and Multimedia.
Multimedia Tools and Applications (2019) 78:10889–10931 10931

Muhammad Imran Razzak is working with Advanced Analytics Institute, University of Technology, Sydney,
Australia. Previously, he was with College of Public health and Health Informatics, King Saud bin Abdulaziz
University for Health Sciences, Riyadh, Saudi Arabia. His research philosophy is to endeavor the use of deep
learning methods to investigate practical problems in image analysis and data analysis with special emphasis to
healthcare industry. During his research career, he has developed and delivered several research projects
successfully. He is the inventor of one Patent and author of more than 70 papers in well reputed journals and
conferences. He has awarded as Young Researcher-2015 NGHA, Saudi Arabia, one of the pretigious award,
based on his research contributions and “Best Researcher” during his stay at CoEIA, King Saud University. He
has secured several research grants. He also serves as an associate editor for reputable international journals such
as PlosOne, IEEE Access, IJIIP, IJIP. He has been part of dozens of conferences in various capacities such as
Chair, Co-Chiar, scientific committee.

Handwriting Forensics
100% (2)
Handwriting Forensics
17 pages
Uhdojsadmin,+Journal+Manager,+UHDJST Twana 20210417 V1
No ratings yet
Uhdojsadmin,+Journal+Manager,+UHDJST Twana 20210417 V1
7 pages
Personnel Identification Using Handwriting, Tested On Indian Writers
No ratings yet
Personnel Identification Using Handwriting, Tested On Indian Writers
4 pages
JAI-763 Online - 231013 - 111357
No ratings yet
JAI-763 Online - 231013 - 111357
14 pages
Writer Identification of Hand Written Scripts Using Machine Learning
No ratings yet
Writer Identification of Hand Written Scripts Using Machine Learning
26 pages
Texture Features From Handwritten Images For Writer Identification
No ratings yet
Texture Features From Handwritten Images For Writer Identification
4 pages
Anupama+U 3468 (Pakistan+Heart+Journal) +
No ratings yet
Anupama+U 3468 (Pakistan+Heart+Journal) +
6 pages
HA and Paper
No ratings yet
HA and Paper
5 pages
Regular quota
No ratings yet
Regular quota
17 pages
9359 ch 29(1-17).ps
No ratings yet
9359 ch 29(1-17).ps
17 pages
Una Encuesta Exhaustiva Sobre Escritura A Mano y Grafología Computarizada
No ratings yet
Una Encuesta Exhaustiva Sobre Escritura A Mano y Grafología Computarizada
6 pages
Graphology 160319005034
No ratings yet
Graphology 160319005034
39 pages
The Annals of Applied Statistics 10.1214/10-AOAS379 Institute of Mathematical Statistics
No ratings yet
The Annals of Applied Statistics 10.1214/10-AOAS379 Institute of Mathematical Statistics
20 pages
layer_1_10
No ratings yet
layer_1_10
21 pages
Analysis On Handwriting Using Pen-Tablet For Identification of Person and Handedness
No ratings yet
Analysis On Handwriting Using Pen-Tablet For Identification of Person and Handedness
5 pages
Behavior Prediction Through Handwriting Analysis: June 2012
No ratings yet
Behavior Prediction Through Handwriting Analysis: June 2012
5 pages
Online Analysis of Handwriting For Disease Diagnosis: A Review
No ratings yet
Online Analysis of Handwriting For Disease Diagnosis: A Review
7 pages
Handwriting Biometrics
No ratings yet
Handwriting Biometrics
16 pages
Comparative Study of Handwriting Sample For Identification of Individuals Handwriting
No ratings yet
Comparative Study of Handwriting Sample For Identification of Individuals Handwriting
4 pages
Development of An Automated Handwriting Analysis System
No ratings yet
Development of An Automated Handwriting Analysis System
6 pages
1-s2.0-S1877050920300429-main
No ratings yet
1-s2.0-S1877050920300429-main
7 pages
Frcij 04 00097
No ratings yet
Frcij 04 00097
4 pages
Q. Document Examination
No ratings yet
Q. Document Examination
9 pages
Personality Prediction Based On Handwriting Using Machine Learning
No ratings yet
Personality Prediction Based On Handwriting Using Machine Learning
4 pages
Arabic Writer Identification System Using The Histogram of Oriented Gradients (HOG) of Handwritten Fragments
No ratings yet
Arabic Writer Identification System Using The Histogram of Oriented Gradients (HOG) of Handwritten Fragments
5 pages
Anewtexture Descriptor For Handwritten Document Writer Identification
No ratings yet
Anewtexture Descriptor For Handwritten Document Writer Identification
14 pages
Statistical Analysis - 2024 - Kim - A Deep Learning Approach For The Comparison of Handwritten Documents Using Latent
No ratings yet
Statistical Analysis - 2024 - Kim - A Deep Learning Approach For The Comparison of Handwritten Documents Using Latent
19 pages
The Graphometry Applied To Writer Identification
No ratings yet
The Graphometry Applied To Writer Identification
6 pages
Forensic Examination of Handwriting and Signatures: ISSN: 2278 - 0211 (Online)
100% (1)
Forensic Examination of Handwriting and Signatures: ISSN: 2278 - 0211 (Online)
14 pages
Computationally Inexpensive Sequential Forward Floating Selection For Acquiring Significant Features For Authorship Invarianceness in Writer Identification
No ratings yet
Computationally Inexpensive Sequential Forward Floating Selection For Acquiring Significant Features For Authorship Invarianceness in Writer Identification
18 pages
A-Survey-Machine-Learning-Approach-for-Personality-Analysis-and-Writer-Identification-through-Handwriting
No ratings yet
A-Survey-Machine-Learning-Approach-for-Personality-Analysis-and-Writer-Identification-through-Handwriting
6 pages
EContent_5_2024_11_26_12_03_11_QuestionbankHIRdocx__2024_10_11_09_42_33
No ratings yet
EContent_5_2024_11_26_12_03_11_QuestionbankHIRdocx__2024_10_11_09_42_33
2 pages
1_texture
No ratings yet
1_texture
6 pages
Research Paper On Handwriting and Specimen Signature Forensics
83% (6)
Research Paper On Handwriting and Specimen Signature Forensics
11 pages
10.1007_978-981-16-3802-2_5
No ratings yet
10.1007_978-981-16-3802-2_5
13 pages
Forensic 4 Midterm Notes
No ratings yet
Forensic 4 Midterm Notes
22 pages
Intro-to-Quantitative-Systems-for-Forensic-Handwriting-Comparisons-Buscaglia
No ratings yet
Intro-to-Quantitative-Systems-for-Forensic-Handwriting-Comparisons-Buscaglia
7 pages
Scientific Examination of Documents. by Ordway Hilton 1982
No ratings yet
Scientific Examination of Documents. by Ordway Hilton 1982
8 pages
Handwriting Forensics
No ratings yet
Handwriting Forensics
17 pages
Graphology 2
No ratings yet
Graphology 2
8 pages
Handwriting Analysis
No ratings yet
Handwriting Analysis
8 pages
Handwriting Notes
No ratings yet
Handwriting Notes
17 pages
DikshaTripathi
No ratings yet
DikshaTripathi
6 pages
Handwriting Examination Lesson 4.2
No ratings yet
Handwriting Examination Lesson 4.2
3 pages
Handwriting Identification What Is Handwriting?
No ratings yet
Handwriting Identification What Is Handwriting?
56 pages
IJERT Handwriting Analysis For Mental Di
No ratings yet
IJERT Handwriting Analysis For Mental Di
3 pages
Certification Authority Law: Around the World
From Everand
Certification Authority Law: Around the World
Stephen Errol Blythe
No ratings yet
Exploiting Character Class Information in Forensic Writer Identification
No ratings yet
Exploiting Character Class Information in Forensic Writer Identification
12 pages
Paper - National Seminar in Forensic Science
No ratings yet
Paper - National Seminar in Forensic Science
11 pages
Handwriting Recognition - "Offline" Approach: P. Shankar Rao, J. Aditya (Dept of CSE, Andhra University)
No ratings yet
Handwriting Recognition - "Offline" Approach: P. Shankar Rao, J. Aditya (Dept of CSE, Andhra University)
4 pages
DEJOURNAL-3540-ArticleText-6162-1-10-20210819
No ratings yet
DEJOURNAL-3540-ArticleText-6162-1-10-20210819
11 pages
Handwriting Analysis Part 1
No ratings yet
Handwriting Analysis Part 1
27 pages
Kehkashan H and Manohar YM
No ratings yet
Kehkashan H and Manohar YM
7 pages
Forensic Science: Document Analysis
No ratings yet
Forensic Science: Document Analysis
28 pages
Handwriting Analysis Forgery Counterfeiting
No ratings yet
Handwriting Analysis Forgery Counterfeiting
20 pages
QDE Module 3
100% (2)
QDE Module 3
17 pages
FORENSIC HANDWRITING ANALYSIS Research
No ratings yet
FORENSIC HANDWRITING ANALYSIS Research
6 pages
For 4 Topic 3 Prelims 2025
No ratings yet
For 4 Topic 3 Prelims 2025
56 pages
IJSSIR, Vol 11, No 02, FEBRUARY2022
No ratings yet
IJSSIR, Vol 11, No 02, FEBRUARY2022
8 pages
Digital Earth: Cyber threats, privacy and ethics in an age of paranoia
From Everand
Digital Earth: Cyber threats, privacy and ethics in an age of paranoia
Sarah Katz
No ratings yet
ASSIGNMENT2
No ratings yet
ASSIGNMENT2
11 pages
PKI Key Generation Based On Iris Features: Yazhuo Gong Kaifa Deng Pengfei Shi
No ratings yet
PKI Key Generation Based On Iris Features: Yazhuo Gong Kaifa Deng Pengfei Shi
4 pages
Counterfeit Detection Intaglio SPIE96 1
No ratings yet
Counterfeit Detection Intaglio SPIE96 1
10 pages
NIS Report
No ratings yet
NIS Report
13 pages
Biometric Systems Design and Applications
No ratings yet
Biometric Systems Design and Applications
274 pages
A Survey On Growing Trends in Automaticjurnal
No ratings yet
A Survey On Growing Trends in Automaticjurnal
10 pages
TestKing CISM
No ratings yet
TestKing CISM
7 pages
Biometrics: Facial Recognition Technology: by Matthew Willert PUAD 620
No ratings yet
Biometrics: Facial Recognition Technology: by Matthew Willert PUAD 620
22 pages
Punching Circular 09.01.2023 (1)
No ratings yet
Punching Circular 09.01.2023 (1)
16 pages
Title: Biometric Security System Using Arduino and Fingerprint Sensor For School Examination Cente
No ratings yet
Title: Biometric Security System Using Arduino and Fingerprint Sensor For School Examination Cente
15 pages
Synopsis Format
No ratings yet
Synopsis Format
19 pages
Muskan Mini 2
No ratings yet
Muskan Mini 2
13 pages
Biometric Based Authentication System For Computer Based Assessment
No ratings yet
Biometric Based Authentication System For Computer Based Assessment
8 pages
Riya3162766823325 1
No ratings yet
Riya3162766823325 1
4 pages
Privacy Policy CMA Final
No ratings yet
Privacy Policy CMA Final
4 pages
4.0 - Information, Control and PrivacyREPORT
No ratings yet
4.0 - Information, Control and PrivacyREPORT
2 pages
Pakistan Single Window: Guide
No ratings yet
Pakistan Single Window: Guide
28 pages
Unit - 4 Iot (Q&a)
No ratings yet
Unit - 4 Iot (Q&a)
12 pages
Biometrics Recognition Using Deep Learning: A Survey
No ratings yet
Biometrics Recognition Using Deep Learning: A Survey
32 pages
Master Thesis in Production Engineering
100% (2)
Master Thesis in Production Engineering
5 pages
MorphoManager 15.4.3 - User Manual
No ratings yet
MorphoManager 15.4.3 - User Manual
149 pages
MorphoManager 16.6.0 - User Manual (1)
No ratings yet
MorphoManager 16.6.0 - User Manual (1)
175 pages
Chapter One Dissertation Sample
100% (2)
Chapter One Dissertation Sample
4 pages
Biometric (Ankita Domadia, Nidhi Jha)
No ratings yet
Biometric (Ankita Domadia, Nidhi Jha)
7 pages
Adobe Scan 10 Jul 2023
No ratings yet
Adobe Scan 10 Jul 2023
1 page
UIDAI APIerrorcode
No ratings yet
UIDAI APIerrorcode
3 pages
DMIT short Training 21 PATTERN
No ratings yet
DMIT short Training 21 PATTERN
4 pages
To Access Security: A Beginner's Guide
No ratings yet
To Access Security: A Beginner's Guide
15 pages
Module 4
No ratings yet
Module 4
110 pages

Writer Identification Using Machine Lear

Uploaded by

Writer Identification Using Machine Lear

Uploaded by

Multimedia Tools and Applications (2019) 78:10889–10931

Writer identification using machine learning approaches:

Arshia Rehman 1 & Saeeda Naz 1 & Muhammad Imran Razzak 2

Received: 1 February 2018 / Revised: 16 August 2018 / Accepted: 20 August 2018 /

Keywords Writer identification . Multi-script . Features extraction . Deep learning

Fig. 1 Copybook Style of United State. Available at: http://www.handwriting.org/united-states.html

Fig. 2 Handwriting bio-metrics types

Fig. 3 Overview of the paper

Fig. 5 Writer identification framework

3.1 English datasets

3.2 Arabic datasets

Another data-set used for handwriting recognition as well as writer identification is

Multilingual Automatic Document Classification Analysis and Translation (MADCAT)

3.3 Western languages dataset

3.3.2 Firemaker dataset

3.3.3 RIMES dataset

Another relatively different dataset in writer recognition is RIMES (Recon- naissance

3.4 Multilingual datasets

3.5 Other datasets

3.5.1 CEDAR-CDROM1 dataset

Database Language Writers Description Availability

Database Language Writers Description Availability

Multimedia Tools and Applications (2019) 78:10889–10931

4.2 Features extraction

4.2.1 Statistical features

hðx; yÞ ¼ gðx; yÞe−2πjðuo xþvo yÞ ð4Þ

g ðx; yÞ ¼ e1=2 x2 þ y2 =σ2

Where ϵ represents the thickness of the stroke.

þ f ðx þ 1; y−1Þ−f ðx−1; y−1Þ−2f ðx−1; yÞ− f ðx−1; y þ 1Þ ð9Þ

The scale space of an image is defined as a function

Where, Lðx; y; σÞ ¼ Gðx; y; σÞI ðx; yÞ ð11Þ

1 −ðx2 þy2 Þ=2σ2

Third step is orientation assignment in which gradient and orientation is computed as

mðx; yÞ ¼ ðLðx þ 1; yÞ−Lðx−1; yÞÞ2 þ ðLðx; y þ 1Þ−Lðx; y−1ÞÞ2 ð14Þ

θðx; yÞ ¼ atan2ðLðx; y þ 1Þ−Lðx; y−1Þ; Lðx þ 1; yÞ−Lðx−1; yÞÞ ð15Þ

Finally the feature vector is created from each key point.

Where gk is the guassian function is given by means of:

4.2.2 Structural features

Reference Language Features

Garg et al. [69] Panjabi (Gurmukhi) transition, zoning, horizontal peak

Reference Language Features

4.2.3 Model based automatic features

Table 3 Review of writer identification systems using structural features

Reference Language Features

Pandey and Seeja [134] English Graphemes

Fig. 6 CNN based feature extraction

Reference Language Features

Christlein et al. [51] English, Greek, German, Arabic LeNet, ResNet

ResNet Residual Network; CNN Convolutional Neural Network

4.3.1 Distance based classification

Xiong et al. [191] accompanied performance on ICFHR2012-Latin, ICDAR2013. They

One of the renowned distance measure is Euclidean distance, calculated as:

Distance based classification was consummate on Multilanguage datasets by many

4.3.2 Conventional machine learning models based classification

ibayes ¼ argi maxPðC i jxÞ ¼ argi maxpðxjC i ÞPðC i Þ ð26Þ

WriterðI Þ ¼ Writer arg Di∈R maxSimilarityðI; Di Þ ð29Þ

pðz1:T ; x1:T Þ ¼ pðz1:T Þpðx1:T jz1:T Þ

where zt ϵ1,2,N, N is the number of states. The HMM is characterized by λ = π,

4.3.3 Deep learning models based classification

Year Reference Features Models Dataset Accuracy

Year Reference Features Models Dataset Accuracy

2016 Zhu and Wang Structural Manhattan IAM 96.48

5 Open research issues

6 Conclusion and future direction

Fig. 7 Future directions for researchers

You might also like