ARABIC DIGIT RECOGNITION
Bammidi pradeep
layer, which has 84 neurons. The output of the
Keywords: Arabic digit recognition, Optical second fully-connected layer is then fed into the
Character Recognition, Convolutional Neural output layer, which has 10 neurons, corresponding
Network. to the 10 Arabic digits.
Abstract Dataset:
Optical Character Recognition (OCR) is a There are 13440 training Arabic letter images of
technology that enables the conversion of images of 64x64 pixels.
handwritten or printed text into machine-encoded There are 3360 testing Arabic letter images of
text. Arabic digit recognition is a challenging task 64x64 pixels
due to the large number of similar-looking digits 5 rows × 4096 columns
and the presence of noise in real-world images.
This paper presents a novel approach for Arabic Visualizing the dataset: The function can be used
digit recognition using a Convolutional Neural to visualize the images in the dataset. This can be
Network (CNN). The proposed method utilizes a helpful for understanding the characteristics of the
architecture with modifications to adapt it for data and for identifying any potential problems with
Arabic digit recognition. The network is trained on the data.
a dataset of 10,000 handwritten Arabic digits. Preprocessing the data: The function can be used
The proposed method achieves an accuracy of to preprocess the images in the dataset. This may
98.5% on the test set, which is comparable to the include resizing the images, normalizing the pixel
state-of-the-art methods. The method is also robust values, and rotating the images to a consistent
to noise and distortions, making it suitable for real- orientation.
world applications.
Model
Introduction
Conv2D: This layer performs 2D convolution on the
Arabic digit recognition is an important task in
input image. It uses a filter size of 3x3 and a stride
various applications such as document processing,
of 1.
bank check processing, and postal automation.
MaxPooling2D: This layer performs max pooling
However, it is a challenging task due to the large
on the output of the Conv2D layer. It uses a pool
number of similar-looking digits and the presence of
size of 2x2 and a stride of 2.
noise in real-world images.
GlobalAveragePooling2D: This layer performs
The proposed method for Arabic digit recognition global average pooling on the output of the
utilizes a LeNet-5 architecture with modifications. MaxPooling2D layer. This reduces the
The network consists of two convolutional layers, dimensionality of the feature map.
followed by two fully-connected layers. The first Batch Normalization: This layer normalizes the
convolutional layer uses a filter size of 5x5 and a output of the GlobalAveragePooling2D layer. This
stride of 1, followed by a max-pooling layer with a helps to improve the training process.
pool size of 2x2 and a stride of 2. The second Dropout: This layer randomly drops out a certain
convolutional layer uses a filter size of 3x3 and a percentage of neurons during training. This helps to
stride of 1, followed by a max-pooling layer with a prevent overfitting.
pool size of 2x2 and a stride of 2. The output of the Dense: This layer is a fully-connected layer. It takes
second max-pooling layer is flattened and fed into the output of the Dropout layer as input and
the first fully-connected layer, which has 120 produces a probability distribution over the classes.
neurons. The output of the first fully-connected Conv2D:
layer is then fed into the second fully-connected This layer is responsible for extracting features
from the input image.
The filter size of 3x3 is a common choice for image
recognition tasks.
The stride of 1 means that the filter will be applied
to every pixel in the input image.
MaxPooling2D:
This layer reduces the dimensionality of the feature
map.
This helps to improve the computational efficiency
of the model and to prevent overfitting.
The pool size of 2x2 means that the filter will be
applied to every 2x2 block of pixels in the input
feature map.
GlobalAveragePooling2D:
This layer further reduces the dimensionality of the
feature map. This is done by taking the average of
all the values in the feature map.
This helps to make the model more robust to noise
and other distortions.
Batch Normalization:
This layer normalizes the output of the
GlobalAveragePooling2D layer.
This helps to improve the training process by
making it more stable and less sensitive to the initial
Parameter tuning
The goal of parameter tuning is to find the
combination of hyperparameter values that results
in the best performance on the validation set. This
can be done manually or using a grid search or
random search.
Parameters to tune:
Optimizer: The optimizer is the algorithm that is
used to update the model's weights during training.
Some common optimizers include SGD, Adam, and
RMSprop.
Kernel initializer: The kernel initializer is used to
initialize the weights of the model's convolutional
layers. Some common kernel initializers include
glorot uniform and he normal.
Activation function: The activation function is the
function that is applied to the output of each layer.
Some common activation functions include relu,
sigmoid, and tanh.
Tuning process:
Choose a range of values for each hyperparameter.
Train the model using each combination of
hyperparameter values.
Evaluate the model's performance on the validation
set.
Select the combination of hyperparameter values
that results in the best performance.
We will try different models with different
parameters to find the best parameter values.
we can see that best parameters are:
Optimizer: Adam Testing the model
Kernel initializer: uniform
Activation: relu After training the model on more epochs we gained
a better model which can classify complex patterns
Creating model with best parameters So when we tested it on our test data set we had
better results than before.
model = create_model(optimizer='Adam',
kernel_initializer='uniform', activation='relu') Test accuracy is improved from 98.286% to
98.862% As we train the model on 20 more epochs.
Training the model
Benchmark Model
Train the model using batch size=20 to reduce used We will use a very simple (vanilla) CNN model as
memory and make the training more quick. We will benchmark and Train/test it using the same data that
train the model first on 10 epochs to see the you have used for our model solution. Then
accuracy that we will obtain. Compare the results between the vanilla model and
our complex model.
Plotting Loss and Accuracy Curves with Epochs
Plotting the loss and accuracy curves over epochs We get test accuracy of 32.37% from the baseline
can help you visualize the training process of your Model (vanilla).
machine learning model. The loss curve shows how
Predict Image Classes
the model's loss decreases over time, while the
accuracy curve shows how the model's accuracy Making a method which takes a model, data and its
increases over time. true labels (optional for using in testing). Then it
gives the predicted classes of the given data using
the given model accuracy.
Comparing Evaluation Metrics between
Benchmark Model and Final Model
Making a method which will print all metrics
(precision, recall, f1-score and support) with each
class in the dataset.
Metrics We will use the following metrics
(Accuracy, Precision, Recall and F1-score). Log
loss might also be a practical metric to be used
when we tune/refine our model solution, So we will
use Log loss as well. Let’s study each metric
carefully with our problem.
Accuracy: Accuracy is the most intuitive Hyperparameters used for the model:
performance measure and it is simply a ratio of Dropout rate: 20% of the layer nodes.
correctly predicted observation to the total Epochs: 10 then we will fit the model incrementally
observations. We can use it here if we don’t care on 20 more
about misclassification of letter or digit specially. epochs.
Precision - Precision is the ratio of correctly Batch size: 20 as it is enough amount and divisible
predicted positive observations to the total predicted by the size of
positive observations. We use it here to evaluate total training dataset and also the size of total testing
false positive rate As High precision relates to the dataset.
low false positive rate. Optimizer: After the refinement section we will see
Recall (Sensitivity) - Recall is the ratio of correctly the best optimizer
predicted positive observations to the all we will use is Adam.
observations in actual class. We use it to select our Activation Layer: After the refinement section we
best model when there is a high cost associated with will see the best
False Negative/misclassified image. activation we will use is relu.
F1 score - F1 Score is the weighted average of Kernel initializer: After the refinement section we
Precision and Recall. Therefore, this score takes will see the best
both false positives and false negatives into account. kernel initializer we will use is uniform.
F1 is usually more useful than accuracy, especially
if you have an uneven class distribution. Accuracy
works best if false positives and false negatives
have similar cost. If the cost of false positives and
false negatives are very different, it’s better to look
at both Precision and Recall which means using F1
score.
Benchmark
We will use a very simple (vanilla) CNN model as
benchmark and Train/test it using
the same data that you have used for our model
solution.
Out Vanilla CNN will consist of:
Single Convolutional layer of 16 filters and window
size of 3 to capture the basic patterns like edges
from the input images. Single Pooling Layer to
down-sample the input to enable the model to make
assumptions about the features so as to reduce
overfitting. It also reduces the
number of parameters to learn, reducing the training
time.
The last layer is the output layer with 38 neurons
(number of output classes) and it uses SoftMax
activation function as we have multi-classes.
each neuron will give the probability of that class.
We will train our model using Adam Optimizer,
cross entropy (Log loss) as loss function, batch size
of 20 (to reduce training time and overfitting) and
finally using 5 epochs as we want a simple model
just to capture the basic patterns.
Conclusion References
Free-Form Visualization
In this project I built a CNN model which can HMM Based Approach for Handwritten Arabic
classify the Arabic images into digits and letters. Word Recognition
We tested the model on more than 13000 image An Arabic handwriting synthesis system
with all possible classes and got very high accuracy Advanced Convolutional Neural Networks
of 98.86% which is much better than the benchmark Normalization-Cooperated Gradient Feature
model. Extraction for Handwritten
See the following comparison charts to see the clear Character Recognition
improvement.