Writer Identification of Hand Written Scripts Using Machine Learning
Writer Identification of Hand Written Scripts Using Machine Learning
by
Shiva Datta Erroju - 204219
Gopu Sai Shiva Varaprasad - 204222
Sravan Kodumoori - 204231
Supervisor
Dr. Raghunadh M
V Associate
Professor
Department of Electronics and Communication Engineering
Done by
Shiva Datta Erroju - 204219
Gopu Sai Shiva Varaprasad - 204222
Sravan Kodumoori - 204231
Supervisor
Sri M V Raghunadh
Associate Professor
Department of Electronics and Communication Engineering
Date: 31-10-2023
Place: Warangal
Declaration
I declare that this written submission represents my ideas in my own words and where others'
ideas or words have been included, I have adequately cited and referenced the original
sources. I also declare that I have adhered to all principles of academic honesty and integrity
and have not misrepresented or fabricated or falsified any idea / data / fact / source in my
submission. I understand that any violation of the above will be cause for disciplinary action
by the Institute and can also evoke penal action from the sources which have thus not been
properly cited or from whom proper permission has not been taken when needed.
Date: 31-10-2023
Certificate
This is to certify that the dissertation Work entitled “WRITER IDENTIFICATION USING
DEEP NEURAL NETWORKS” is a bonafide record of work carried out by
Prof. Vakula D.
Sri M V Raghunadh
Head of the Department
Associate Professor
Department of Electronics and
Department of Electronics and
Communication Engineering
Communication Engineering
NIT Warangal
NIT Warangal
Abstract
Thus by doing this project we aim to reduce document frauds, which in sense is
identity theft, and add to field of digital forensics. We will implement it locally by using this
on handwritten assignments of students.
Introduction
Computer and phones may be more ubiquitous than ever, but many people still prefer the
traditional feeling of writing with ink on paper. Despite having various tools, traditional handwriting
using pen on a paper has been the dominant way for non-verbal communication for years. Each
individual's writing has its own uniqueness so that handwriting often becomes the character or
characteristic of the author. The hand-writing pattern usually becomes a character for the
writer so that people who recognize the writing will easily guess the ownership of the related
handwriting. However, handwriting is often used by irresponsible people in the form of handwriting
falsification. The acts of writing falsification often occur in the workplace or even in the field of
education. This is one of the driving factors for creating a reliable system in tracking someone's
based on their ownership. This tool has various applications, such as forensic investigations,
document authentication, and educational assessment
The Identification can also be differentiated as text dependent and text independent.
Text independent identification, as the name suggests, identifies the writer based on his
writing style and does not depend on the content of the script written. This is of preference
due to the fact that the writer may not similar content everywhere and the lack of large dataset
availability in cases of forensics. Further we will be experimenting to achieve language
independent writer identification.
Most of the current related studies focused on feature extraction and generally on the
feature engineering using either new modern deep approaches or classical descriptors. We
propose a deep neural network-based writer identification system, which is a text-
independent approach and very efficient for discovering author characteristics from
handwriting images. It is also known as offline handwriting writer identification. Deep
Convolution Neural Networks is useful in extracting meaningful features from the dataset
automatically. We will be implementing an conjugation of deep and traditional feature
descriptors (e.g., the inter letter spacing or the handwriting thickness information) is useful
for handwriting writer identification applications.
In this report, we will then discuss previous research on writer identification using
deep learning. We will discuss the use of deep learning for writer identification. Explore
conjugating traditional feature extraction with Deep Convolution Neural Networks. Then test
different architectures so that we can build an architecture and fine tuning it. Try extending
this to make it an language independent model. Finally, we will present our results and
conclusions we have made.
Problem Statement
Text-independent Language-independent Offline handwriting writer identification
using conjugation of deep learning and traditional feature descriptors
Motivation:
Applications:
● Forensics:
This project can be used to detect fraud such as student handwritten projects. This
algorithm detects discrepancies between the author of the old assignment and the
submitted assignment and informs the professor. It can integrate with our LMS and
allow students to upload assignments.
● Documents Authenticity:
To prevent fraud, the authenticity of legal documents such as wills and signatures
must be verified
● Forgery Detection:
Forgery detection methods should examine static and dynamic features to detect subtle
differences between presented and real samples (handwriting shake, pen float, etc.).
● Verifying Signatures:
Among biometric systems, handwritten signatures are one of the most widely used
personal characteristics as a legal means of personal authentication in government and
financial institutions. A person's signature can basically be defined as a combination
of symbols and strokes that are personal characteristics and represent the person's own
unique writing style.
● Property Security:
A collateral agreement is a legal document that gives the lender the right to claim
certain assets or properties pledged by the borrower as collateral for the loan. These
documents can be easily forged, so we need a way to identify the author to see if the
document was signed by the rightful owner
Merits:
1.Feature Learning: Deep learning models can automatically learn relevant features
from raw audio data, reducing the need for handcrafted feature engineering. This makes the
process more adaptable to different acoustic scenes
2.High Accuracy: Deep learning models have demonstrated the ability to achieve
high accuracy in acoustic scene classification, often outperforming traditional machine
learning approaches. They can capture complex patterns and relationships in audio data.
3.Generalization: Deep learning models can generalize well to unseen data, which is
crucial for real-world applications where hand-writing scenes may vary widely.
Demerits:
3.Complexity: Designing and fine-tuning deep learning models can be complex and
require expertise in neural network architecture, hyperparameter tuning, and training
4.Overfitting: Deep learning models are susceptible to overfitting, where they may
perform well on the training data but poorly on unseen data. Effective regularization
techniques are necessary to mitigate this issue.
5.Interpretability: Deep learning models are often seen as "black boxes" with
limited interpretability. Understanding the model's decision-making process can be
challenging, which may not be suitable for applications requiring transparency.
LITERATURE SURVEY
N
O
Generation of the
Feature Vector: For
this, a well know CNN
model called Caffenet is
used. It is trained using a
softmax loss function
Design of the CNN, the “caffenet” of the “Caffe - Deep learning framwork”
6.
Writer A.A.Brinka, ,J. Analyzes the width of In performance
2011
Smitb,M.L.Bul
identification ink traces to identify experiments, the well-
acua,L.R.B.Sc
using directional distinctive writing known Directions
homakera
ink-trace width patterns feature achieved top-1
measurements Collects samples of accuracy between
known authors for 48% and 74%. In
analysis. contrast, the Ink
Preprocesses handwritten Width feature,
samples by digitizing and describing ink trace
isolating ink traces. width, showed results
Preprocesses handwritten ranging from 22% to
samples by digitizing and 73%. The Quill
isolating ink traces. feature, combining
Uses statistical and trace direction and
machine learning
width, outperformed
techniques for writer
both Directions and
identification.
Ink Width, achieving
Validates and tests the
top-1 accuracy of 63%
model with additional
to 95%. This
samples for accuracy.
emphasizes the value
of combining
directionality and
width measurements
for writer
identification,
demonstrating Quill's
superior performance
compared to other
features.
WORK FLOW
A typical workflow in deep learning involves several key steps, from data preparation
to model deployment. Here is a general overview of the deep learning workflow:
Data Acquisition:
- Gather the data needed for your deep learning project. The dataset should be
representative of the problem and large enough to train a robust model.
Data Pre-processing:
- Prepare the data for training and testing. This may involve tasks such as data
cleaning, data augmentation, data splitting (into training, validation, and test sets), and
label encoding or one-hot encoding.
Model Selection:
- Choose an appropriate deep learning architecture or model type based on the
problem. This could be a convolutional neural network (CNN) for image tasks,
recurrent neural network (RNN) for sequences, or a pre-trained model like a ResNet or
BERT.
Model Design:
- Design the architecture of the deep learning model, specifying the number of layers,
types of layers, activation functions, and any special considerations (e.g., recurrent
layers for sequential data).
Model Compilation:
- Compile the model by specifying the loss function, optimizer, and evaluation
metrics. This step configures how the model is trained.
Model Training:
- Train the model on the training data. This is the process of adjusting the model's
weights and biases using backpropagation and optimization algorithms. Monitor the
model's performance on the validation set during training to avoid overfitting.
Model Evaluation:
- Assess the model's performance on the test dataset, which it has not seen during
training. Evaluate the model using appropriate metrics (e.g., accuracy, precision, recall,
F1-score, etc.).
Model Fine-Tuning(Optional):
- If the model's performance is not satisfactory, fine-tune hyperparameters, adjust the
architecture, or explore different pre-processing techniques.
Model Validation:
- Validate the model's performance on new, real-world data, ensuring it works in the
intended application. This may involve deployment to a controlled environment.
Model Deployment:
- Deploy the trained model for making predictions on new, unseen data. Deployment
can be on the cloud, edge devices, or as part of an application.
Documentation:
- Document the entire deep learning process, including data sources, model
architecture, training procedures, and evaluation metrics. Proper documentation is
crucial for reproducibility and knowledge sharing.
Scaling:
- For large-scale applications, consider scaling your deep learning solution to handle
higher loads, whether it's through distributed training, model serving systems, or other
methods.
The deep learning workflow can vary depending on the specific problem and
application. However, these general steps provide a roadmap for tackling deep learning
projects effectively and systematically.
WORK DONE
Local dataset:We collected the image samples of individual friends from their
hand written notebooks,assignments,records….
Writer-1 Writer-2
ICDAR17: The dataset mentioned was originally put forth for the 2017 ICDAR
Competition on Author Identification of Historical Documents, which took
place at the International Conference on Document Analysis and Recognition
[10]. Its primary purpose was to enable the evaluation of methods and
algorithms for searching written documents using the "Query by Example"
approach. The dataset was specifically designed to facilitate research and
advancements in author identification of historical documents, offering
participants a standardized benchmark for their investigations during the
competition.
II.Dataset Preprocessing :
Dataset preprocessing is a crucial step in preparing data for input into a deep
learning model. It involves a series of tasks to clean, transform, and organize the data to
make it suitable for training a neural network.
Data Cleaning: Remove or handle missing values, outliers, and errors in the
data. Inconsistent or noisy data can negatively impact model performance.
i) Median Filter: Replaces each pixel's value with the median value in its
neighborhood, which is effective in removing salt-and-pepper noise.
ii) Gaussian Filter: Used to blur and reduce high-frequency noise in images.
It's especially useful for images corrupted with Gaussian noise.
Results:
1.Epochs=100,AlexNet
2.Epochs=100,ResNet