Generating Arabic Letters Using Generative
Generating Arabic Letters Using Generative
Generating Arabic Letters Using Generative
Master Thesis
To obtain diploma
Master of Computer Science
Thesis subject:
Submitted by :
Abderrahmane CHEBOUAT
The Generative modeling became recently one of the most important field in deep
learning, and the Generative Adversarial Networks is the new research line in this field,
the GANs have proven their ability to generate a high resolution of images and achieved
remarkable success for computer vision in general. by the same way, we assume that this
tool will be able to generate accurate Arabic letters.
The expected benefit of this work is creating a perception by using GANs and explore the
potential of this deep learning powerful tool to serve the Arabic calligraphy. As an impact
for further studies in the future, this work may provide an insight into the possibility of
generating new kinds of Arabic fonts (calligraphy).
Keywords: Com-puter vision , Arabic calligraphy, Deep learning, Generative modelling,
Generative Adversarial Networks, .
Résumé:
1
2
Contents
1 Deep Learning 10
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Definition1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Definition2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Definition3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Definition4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Definition5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Why deep learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 When to use deep learning? . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 History and Chronology of ideas and concepts used in deep learning . . . . 14
1.6 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.2 Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.3 Types of activation functions . . . . . . . . . . . . . . . . . . . . . 15
Binary Step Function . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Linear Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Sigmoid Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Tanh Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
ReLU Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3
CONTENTS
4
CONTENTS
3 Implementation 47
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Environment of Implementation . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.1 Python Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.2 Anaconda Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.3 Google Colaboratory . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.4 Pytorch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Program Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.2 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.3 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Results: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5
List of Figures
6
LIST OF FIGURES
7
General introduction
Text generation has had a large application in recent years, but there are still great
prospects to be discovered in this field, using new deep learning techniques such as Gener-
ative Adversarial Networks we can push the development in this area and make it a very
smooth.
With the appearance of GANs there was no tendency to use them for text generation, and
there are only very few studies related to this subject, And as it the calligraphy processing
intersect in large portion with image processing we would like to explore the potential of
GANs algorithms regarding letters generation in order to achieve a study that allows us
to judge the prospects in this topic.
This work aims to study the possibility of generating letters with Generative Adversarial
Networks, the GANs belong to generative models where the architectures are combined
from different serval architectures of Neural Networks, that’s might make it powerful
enough to generate a very kinds of data, such an images, audio, video , and text, below a
set of research questions, are enumerated, to be answered through this study:
• Why are some GANs architectures performing better than other ones?
• How will be used an Arabic letters dataset to generate a new type of Arabic callig-
raphy?
8
LIST OF FIGURES
• Is it possible to get enough level of quality in the generated Arabic letters, so that
makes the used model to produce new Arabic calligraphy?
• Finally, What’re the ideal and best tools libraries that should be used to achieve
the goal of this study?
The study project will be structured as follows: the first chapter will give an overview of
the deep learning field to understand better the current situation. In the second chapter,
will be fully dedicated to understand the GANs by explaining all related details, then we
will explain in detail the DCGAN which is a direct extension of the GANs with algorithms
and architectures that have performed better than GAN. In the last chapter, we will
explain the implementation of our work, with used environments then review the obtained
results, and Conclusion.
9
Chapter 1
Deep Learning
1.1 Introduction
In this chapter, we will try to define the deep learning then we will explore the methods
and generative models used in deep learning.
1.2 Definitions
Definition1
Specialized Education: a complete way of learning something that means you fully
understand it and will not forget it.
Specialized Computing: a type of artificial intelligence that uses algorithms ( = sets of
mathematical instructions or rules) based on the way the human brain operates. [1]
10
DEEP LEARNING
Definition2
A class of machine learning techniques that exploit many layers of non-linear information
processing for supervised or unsupervised feature extraction and transformation, and for
pattern analysis and classification.[2].
Definition3
Deep Learning is a new area of Machine Learning research, which has been introduced
with the objective of moving Machine Learning closer to one of its original goals: Arti-
ficial Intelligence. Deep Learning is about learning multiple levels of representation and
abstraction that help to make sense of data such as images, sound, and text. [3]
11
DEEP LEARNING
Definition4
”Deep learning is a class of machine learning algorithm that uses multiple stacked layers
of processing units to learn high level representations from unstructured data.” [4]
Definition5
Deep learning is a subset of machine learning that use a learning algorithms on multiple
levels of distributed representations to allow us extract the useful patterns from data
automatically with less and less as possible of human intervention.
As is shown in figure2 deep learning automates much of the extraction of features from
data and gets us closer without the new human involvement, as was done before with
machine learning algorithms, so we should learn deep learning because it is a powerful
technique to increase the automation of IA.[5] [6] [7]
12
DEEP LEARNING
for machine learning algorithms you must identify all needed features in advance by
an domain expert to make patterns more visible to learning algorithms to work, but the
deep learning algorithms start to learn high-level features and extract patterns from data
incrementally, this minimize the need of domain expertise for feature extraction every
time. This is a big advantage for deep learning that make able to use a massive amounts
of data ” era of big data” and this will provide huge amounts of opportunities for new
innovations.
We should use deep learning when we trying to find complex patterns, and it comes
to complex problems like image classification, audio recognition, and natural language
processing, also, when simpler models (like logistic regression) don’t achieve the needed
accuracy level, in these cases the deep learning shines here.
When we have a high level of data dimension, or when there is a need to sequences, (in
this case the time dimension will be present in our vectors).
13
DEEP LEARNING
When you do not have sufficient support from domain experts to understand features
must be extracted, with deep learning we will not worry too much about the lack of
support specialists.
When the data is large (big data), we just have to worry about the results is needed in
a reasonable time, so we should to have infrastructure with a high level of performance.
14
DEEP LEARNING
1.6.1 Definition
A neural network is a type of machine learning algorithm inspired from the way of how a
human brain learn, by creating a neural network from multilayer perceptron able to learn
from inserted learning data.
X
Y = Activation( ((weight ∗ input) + bias))) (1.1)
The activation function is the non linear transformation applied on input. This trans-
formed output is then sent to the next layer of neurons as input.
[8], [9], [10], [11]
Activation function would be a threshold based classifier i.e. whether or not the neuron
should be activated. If the value Y is above a given threshold value then activate the
neuron else leave it deactivated.
15
DEEP LEARNING
Linear Function
In the step function, the gradient being zero, it was impossible to update gradient during
the backpropagation. Instead of a simple step function, we can try using a linear function.
f (x) = ax (1.2)
Sigmoid Function
1
f (x) = (1.3)
(1 + e−x )
Tanh Function
The tanh function is very similar to the sigmoid function. It is actually just a scaled
version of the sigmoid function, tanh(x)=2 sigmoid(2x)-1, it can be directly written as
2
tanh(x) = (1.4)
(1 + e−2x ) −1
16
DEEP LEARNING
ReLU Function
The ReLU function is the Rectified linear unit. It is the most widely used activation
function. It is defined as: f(x)=max(0,x).it can be graphically represented as shown in
figure 6. .
However depending upon the properties of the problem, we might be able to make a
better choice for easy and quicker convergence of the network.
• Sigmoid functions and their combinations generally work better in the case of clas-
sifiers.
• Sigmoids and tanh functions are sometimes avoided due to the vanishing gradient
problem.
17
DEEP LEARNING
• ReLU function is a general activation function and is used in most cases these days.
• As a rule, is better to start with using ReLU function and then move over to other
activation functions in case ReLU doesn’t provide with optimum results.[8], [9], [11]
1.6.5 Back-Propagation:
The classification of methods used in deep learning leads us to a large aspects, tech-
niques, and architectures of machine learning, depending how these architectures was
designed to represent linear non-linear information on multiple layers, and techniques
and techniques are intended like classification, recognition,..etc. three major categories or
classes can be obtained for methods used in deep learning nets:
which are intended to capture high-order correlation of the observed or visible data for
pattern analysis or synthesis purposes when no information about target class labels is
available. Unsupervised feature or representation learning in the literature refers to this
category of the deep networks. When used in the generative mode, may also be intended
to characterize joint statistical distributions of the visible data and their associated classes
18
DEEP LEARNING
when available and being treated as part of the visible data. In the latter case, the use of
Bayes rule can turn this type of generative networks into a discriminative one for learning,
examples RBMs, DBNs, DBMs,RNNs, DBM. [2]
which are intended to directly provide discriminative power for pattern classification
purposes, often by characterizing the posterior distributions of classes conditioned on the
visible data. Target label data are always available in direct or indirect forms for such
supervised learning. They are also called discriminative deep networks, examples CNN,
DNN. [2]
where the goal is discrimination, which is assisted, often in a significant way, with the
outcomes of generative or unsupervised deep networks. This can be accomplished by
better optimization or/and regularization of the deep networks in category (B). The goal
can also be accomplished when discriminative criteria for supervised learning are used to
estimate the parameters in any of the deep generative or unsupervised deep networks in
category (A) above. [2]
Definition1:
The convolutional neural networks or CNNs is a special kind of assembled and connected
multilayer perceptrons used for patterns classification, also, we can say it’s a special class
of neural networks used in order to extraction (learn) higher level of features through the
linear operation “ convolution “.
19
DEEP LEARNING
Definition2:
Convolutional networks are simply neural networks that use convolution in place of
general matrix multiplication in at least one of their layers. [12]
Objective of CNNs
The main objective of a CNNs is to highe level of extracted features from data via
convolutions, and it’s very well used for object recognition, identify faces, individuals,
street signs, and many other aspects of visual data.
The efficacy of CNNs in image recognition is one of the main reasons why the world
recognizes the power of deep learning. As Figure 1.6 illustrates, CNNs are good at building
position and (somewhat) rotation invariant features from raw image data. CNNs are
powering major advances in machine vision, which has obvious applications for self-driving
cars, robotics, drones, and treatments for the visually impaired. [6]
20
DEEP LEARNING
Add to two-dimensional image, the CNNs are applied also to three-dimensional datasets.
here are some examples :.
• 3D Shapes dataset,
• Graph data,
.
CNN Architecture:
As figure 1.7 illustrates, the CNN architecture usually has three layers: a convolutional
layer, pooling layer, and fully connected layer. Convolution Layer:
The convolution layer is the core building block of the CNN. It carries the main portion
of the network’s computational load.
This layer performs a dot product between two matrices, where one matrix is the set of
learnable parameters otherwise known as a kernel, and the other matrix is the restricted
portion of the receptive field. The kernel is spatially smaller than an image, but is more
21
DEEP LEARNING
in-depth. This means that, if the image is composed of three (RGB) channels, the kernel
height and width will be spatially small, but the depth extends up to all three channels.
During the forward pass, the kernel slides across the height and width of the image pro-
ducing the image representation of that receptive region. This produces a two-dimensional
representation of the image known as an activation map that gives the response of the
kernel at each spatial position of the image. The sliding size of the kernel is called a stride.
If we have an input of size W x W x D and D out number of kernels with a spatial size of F
with stride S and amount of padding P, then the size of output volume can be determined
by the following formula:
22
DEEP LEARNING
Pooling Layer:
The pooling layer replaces the output of the network at certain locations by deriving
a summary statistic of the nearby outputs. This helps in reducing the spatial size of the
representation, which decreases the required amount of computation and weights. The
pooling operation is processed on every slice of the representation individually.
There are several pooling functions such as the average of the rectangular neighborhood,
L2 norm of the rectangular neighborhood, and a weighted average based on the distance
from the central pixel. However, the most popular process is max pooling, which reports
the maximum output from the neighborhood
In all cases, pooling provides some translation invariance which means that an object
would be recognizable regardless of where it appears on the frame.[7].
23
DEEP LEARNING
Neurons in this layer have full connectivity with all neurons in the preceding and suc-
ceeding layer as seen in regular FCNN. This is why it can be computed as usual by a
matrix multiplication followed by a bias effect.
The FC layer helps map the representation between the input and the output.[7].
• LeNet: one of the earliest successful architectures of CNNs developed by Yann Lecun,
used to extract digits, read zip codes from images,. . . etc.
• AlexNet: ILSVRC 2012 winner The first work that popularized Convolutional net-
works in computer, developed by Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton.
• ZFNet: ILSVRC 2013 winner, Introduced the visualization concept of the Deconvo-
lutional Network, Developed by Matthew Zeiler and Rob Fergus.
• ResNet: residual network was the winner of ILSVRC 2015. It features special skip
connections and a heavy use of batch normalization. [7] . [13]
24
DEEP LEARNING
In this section, we will describe powerful and advanced models in deep learning that
have become widely used in many subfields of machine learning, “Generative models” or
“Deep generative models “ are an active research area advancing rapidly to find more
connections between the diverse classes of deep learning models.
In recent years, and with the remarkable progress in methodology and technology of
deep learning, the scientists and specialists of this field has started to ask the question if
we are now able to build machines that is in itself creative?.
The generative modelling aims to progress in this field to get answers for the posed ques-
tion.
Based on unsupervised learning the generative models are able to learn structure and
features from data then generate data that is similar to data the model has been trained
with, using neural network combined with progress in stochastic optimization methods,
this have enabled scalable modeling of complex, high-dimensional data including images,
text, and speech.
”Generative modeling is the art and science of engineering a family of probability dis-
tributions that is simultaneously rich, parsimonious, and tractable”. [14]
25
DEEP LEARNING
• Generative models of time-series data can be used for simulation and planning (re-
inforcement learning applications).
In generative models there are two of the most commonly used and efficient approaches
are Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN). VAE
aims at maximizing the lower bound of the data log-likelihood and GAN aims at achieving
an equilibrium between Generator and Discriminator. In the next section we will try to
explain the pricipe of VAE, after we will mention some other generative models,and the
next chapter will be fully dedicated to explaining and detailing The GANs.
26
DEEP LEARNING
Firstly we will describe the autoencoder, then we will explain the introduced differences
with Variational Autoencoders.
Autoencoders
An encoder: network that compresses high dimensional input data into a lower dimen-
sional representation vector.
A decoder: network that decompresses a given representation vector back to the original
domain.This process is shown in Figure 1.10.
The network is trained to find weights for the encoder and decoder that minimize the
loss between the original input and the reconstruction of the input after it has passed
through the encoder and decoder.
The representation vector is a compression of the original image into a lower dimensional,
latent space. The idea is that by choosing any point in the latent space, we should be able
to generate novel images by passing this point through the decoder, since the decoder has
learned how to convert points in the latent space into viable images. [4]
27
DEEP LEARNING
Comparing the Variational Autoencoders with Autoencoders, there are two parts that
should be changed - the encoder and the loss function.
28
DEEP LEARNING
The Encoder: In an autoencoder, each image is mapped directly to one point in the
latent space. In a variational autoencoder, each image is instead mapped to a multivariate
normal distribution around a point in the latent space, as shown in Figure 1.11
.
Figure 1.11: Encoders - the difference between an autoencoder and a variational autoen-
coder
To summarize, the encoder will take each input image and encode it to two vectors,
that together define a multivariate normal distribution on the latent space:
to encode an image into a specific point z in the latent space, we can sample from this
distribution, using the following equation:
z =µ+σ∗ (1.8)
Where epsilon is a point sampled from the standard normal distribution. [4]
29
DEEP LEARNING
Previously, we saw how there was no requirement for the latent space to be continuous.
Now, since we are sampling at random from an area around mu, the decoder must ensure
that all points in the same neighborhood produce very similar images when decoded, so
that the reconstruction loss remains small. This is a very nice property that ensures that
even when we choose a point in the latent space that has never been seen by the decoder,
it is likely to decode to an image that is well-formed. [4], [17]
Previously, our loss function only consisted of the RMSE loss between images and their
reconstruction after being passed through the encoder and decoder. This reconstruction
loss also appears in a variational autoencoder, but we also require one extra component
- the KL divergence.KL divergence (Kullback–Leibler divergence) is a way of measuring
how much one probability distribution is different from another. In a VAE, we want to
measure how different our normal distribution with parameters mu and log-var is from the
standard normal distribution.In this special case, the KL divergence has the closed form
given below:
X
kl − loss = −0.5 ∗ (1 + log −var − µ2 − exp(log − var)) (1.9)
The sum is taken over all the dimensions in the latent space. kl-loss is minimised to 0
when mu = 0 and log-var = 0 for all dimensions. As these two terms start to differ from 0,
kl-loss increases.In summary, the KL divergence term penalises the network for encoding
observations to mu and log-var variables that differ significantly from the parameters of a
standard normal distribution, namely: mu = 0 and log-var = 0.
Why does this addition to the loss function help?:
30
DEEP LEARNING
Firstly, we now have a well-defined distribution that we can use for choosing points in
the latent space - the standard normal distribution. If we sample from this distribution,
we know that we’re very likely to get a point that lies within the limits of what the
VAE is used to seeing. Secondly, since this term tries to force all encoded distributions
towards the standard normal, there is less chance that large gaps will form between point
clusters. Instead, the encoder will try to use the space around the origin symmetrically
and efficiently.” Figure 1.12 shows the classification of generative models based on the
type of density. [4], [17]
• Boltzmann Machines.
31
DEEP LEARNING
1.10 Conclusion
In this chapter, we introduced the concepts of deep learning, why and when to use deep
learning, as we had explained the difference between deep learning and machine learning,
where the machine learning is a method of statistical learning where each instance in a
dataset is described by a set of features or attributes. but deep learning is a method of
statistical learning that extracts automatically the features or attributes from raw data.
also, we explained the different classes of deep learning which led us to the most important
concept currently ”generative models” where we presented one of the most important types
of this model ”Variational Autoencoders”, in the next chapter, we will introduce and
explain the most commonly used and efficient model ”generative adversarial Networks” ,
this will give us the necessary foundations to perform the implementation effectively at
the last port of this study.
32
Chapter 2
2.1 Introduction:
John Romero says, “You might not think that programmers are artists, but program-
ming is an extremely creative profession. It’s logic-based creativity”. Recently the deep
learning has given a new boost to this creativity through generative models, especially
through Generative Adversarial Networks, or (GANs), this deep learning powerful tool
allowed Christie’s (auctions house) to sells its first AI portrait for $ 432,500, beating es-
timates of $ 10,000 ( a portrait was generated by a GANs shows the figure of what looks
like an 18th century gentleman).
Also, taking into account that, The 2018 Turing award has been given to a trio of
researchers who laid the foundations for the current boom in artificial intelligence: Yoshua
Bengio, Geoffrey Hinton, and Yann LeCun, ( this prize known as the “Nobel Prize of
computer sciences,”), This is a significant reward for the deep learning and all fields of
artificial intelligence because the artificial intelligence had almost no consideration like
this from the IT community before.
33
GENERATIVE ADVERSARIAL NETWORK (GANS)
Yann LeCun describe GANs as: “the most interesting idea in the last 10 years in
Machine Learning”.
All of the above-mentioned means that the impact and potential of deep generative
models has become very huge for innovations in many domains.
In this chapter, we will present the theoretical understanding of GANs, in order to ex-
plore the possibility of generate Arabic letters using DCGANs (Deep Convolutional GANs)
which is the main objective of this project.
A Generative Adversarial Nets (GANs) are kind of neural network, where the architec-
ture based on generative modeling . (described in the 2014 paper by Ian Goodfellow).
More generally, Generative modeling involves using a model to generate new examples
that plausibly come from an existing distribution of samples, such as generating a new
similar images but specifically different from a dataset of existing images.
The GAN model architecture involves two sub-models: a generator model for generating
new examples and a discriminator model for classifying whether generated examples are
real, from the domain, or fake, generated by the generator model.
• Generator. Model that is used to generate new plausible examples from the problem
domain.
34
GENERATIVE ADVERSARIAL NETWORK (GANS)
• Discriminator. Model that is used to classify examples as real (from the domain) or
fake (generated).
Generative adversarial networks are based on a game theoretic scenario in which the
generator network must compete against an adversary. The generator network directly
produces samples. Its adversary, the discriminator network, attempts to distinguish be-
tween samples drawn from the training data and samples drawn from the generator.[12]
[17] [4]
The basic architecture of GAN described by Ian Goodfellow 2014. A GAN consists of
two neural networks playing a game with each other. The discriminator tries to determine
whether information is real or fake. The other neural network, called a generator, tries to
create data that the discriminator thinks is real.
35
GENERATIVE ADVERSARIAL NETWORK (GANS)
This concept is depicted in Figure 2.1 The neural network at the top is the discrimina-
tor, and its task is to distinguish the training set’s real information from the generator’s
creations. In the simplest GAN structure, the generator starts with random data and
learns to transform this noise into information that matches the distribution of the real
data.
The generator never sees the genuine data; it must learn to create realistic information
by receiving feedback from the discriminator. This is called adversarial loss, and when
implemented correctly it works surprisingly well. In fact, regularization techniques such
as dropout layers are often used in GANs because the generator can overfit the training
set through this entirely indirect learning process.
The longer these two neural networks play this game, the more they sharpen each other’s
skills. The discriminator becomes very good at detecting fake data while the generator
learns to produce information that is indistinguishable from what is observed in the real
world.
When we end up with two GAN neural networks that are very good at what they do,
how could we use them? A trained discriminator can be used for detecting abnormalities,
outliers, and anything out of the ordinary. This could be very valuable in fields such as
cybersecurity, radiology, astronomy, and manufacturing.
A skilled generator is used for making creations. Once the generator learns the distri-
bution of the training data, we can sample the generator an unlimited number of times for
realistic outputs such as images, language, pharmaceuticals, numerical simulations, and
just about anything else one can imagine. [18], [4], [17]
36
GENERATIVE ADVERSARIAL NETWORK (GANS)
To understand GANs, we should know how generative algorithms work, and for that,
contrasting them with discriminative algorithms is instructive. Discriminative algorithms
try to classify input data; that is, given the features of an instance of data, they predict
a label or category to which that data belongs.
For example, given all the words in an email (the data instance), a discriminative algo-
rithm could predict whether the message is spam or not-spam. spam is one of the labels,
and the bag of words gathered from the email are the features that constitute the input
data. When this problem is expressed mathematically, the label is called y and the fea-
tures are called x. The formulation p(y—x) is used to mean “the probability of y given
x”, which in this case would translate to “the probability that an email is spam given the
words it contains.”
So discriminative algorithms map features to labels. They are concerned solely with that
correlation. One way to think about generative algorithms is that they do the opposite.
Instead of predicting a label given certain features, they attempt to predict features given
a certain label.
The question a generative algorithm tries to answer is: Assuming this email is spam, how
likely are these features? While discriminative models care about the relation between y
and x, generative models care about “how you get x.” They allow you to capture p(x—y),
the probability of x given y, or the probability of features given a label or category. (That
said, generative algorithms can also be used as classifiers. It just so happens that they
can do more than categorize input data.) [19]
Another way to think about it is to distinguish discriminative from generative like this:
37
GENERATIVE ADVERSARIAL NETWORK (GANS)
Z
p(x)
DKL (pkq) = p(x) log dx (2.1)
x q(x)
.
DKL achieves the minimum zero when p(x) == q(x) everywhere.
1 p+q 1 p+q
DJS (pkq) = DKL (pk ) + DKL (qk ) (2.2)
2 2 2 2
Some believe that one reason behind GANs’ big success is switching the loss function
from asymmetric KL divergence in traditional maximum-likelihood approach to symmet-
ric JS divergence.
38
GENERATIVE ADVERSARIAL NETWORK (GANS)
• A discriminator D estimates the probability of a given sample coming from the real
dataset. It works as a critic and is optimized to tell the fake samples from the real
ones.
These two models compete against each other during the training process: the generator
G is trying hard to trick the discriminator, while the critic model D is trying hard not
to be cheated. This interesting zero-sum game between two models motivates both to
39
GENERATIVE ADVERSARIAL NETWORK (GANS)
Given,
On one hand, we want to make sure the discriminator D’s decisions over real data are
accurate by maximizing Ex∼pr (x) [log D(x)] Meanwhile, given a fake sample. G(z), z ∼
pz (z), the discriminator is expected to output a probability, D(G(z)), close to zero by
maximizing Ez∼pz (z) [log(1 − D(G(z)))].
On the other hand, the generator is trained to increase the chances of D producing a
high probability for a fake example, thus to minimize
Ez∼pz (z) [log(1 − D(G(z)))]
When combining both aspects together, D and G are playing a minimax game in which
we should optimize the following loss function:
min max L(D, G) = Ex∼pr (x) [log D(x)] + Ez∼pz (z) [log(1 − D(G(z)))]
G D
(2.3)
= Ex∼pr (x) [log D(x)] + Ex∼pg (x) [log(1 − D(x)]
(Ex∼pr (x) [log D(x)] has no impact on G during gradient descent updates.)
40
GENERATIVE ADVERSARIAL NETWORK (GANS)
Now we have a well-defined loss function. Let’s first examine what is the best value for
D.
Z
L(G, D) = pr (x) log(D(x)) + pg (x) log(1 − D(x)) dx (2.4)
x
Since we are interested in what is the best value of D(x) to maximize L(G,D), let us
label: x
e = D(x), A = pr (x), B = pg (x).
And then what is inside the integral (we can safely ignore the integral because x is
sampled over all the possible values) is:
f (e x + Blog(1 − x
x) = Aloge e)
df (e
x) 1 1 1 1
=A −B
de
x ln10 x
e ln10 1 − xe
1 A B (2.5)
= ( − )
ln10 xe 1−x e
1 A − (A + B)e x
=
ln10 x e(1 − x
e)
pr (x)
D∗ (x) = x
e∗ = A
A+B
= pr (x)+pg (x)
∈ [0, 1].
41
GENERATIVE ADVERSARIAL NETWORK (GANS)
Z
∗ ∗ ∗
L(G, D ) = pr (x) log(D (x)) + pg (x) log(1 − D (x)) dx
x
Z Z
1 1 (2.6)
= log pr (x)dx + log pg (x)dx
2 x 2 x
= −2 log 2
1 pr + pg 1 pr + pg
DJS (pr kpg ) = DKL (pr || ) + DKL (pg || )
2 Z 2 2 2
1 pr (x)
= log 2 + pr (x) log dx +
2 x pr + pg (x)
Z (2.7)
1 pg (x)
log 2 + pg (x) log dx
2 x pr + pg (x)
1 ∗
= log 4 + L(G, D )
2
Thus,
Essentially the loss function of GAN quantifies the similarity between the generative
data distribution pg and the real sample distribution pr by JS divergence when the dis-
42
GENERATIVE ADVERSARIAL NETWORK (GANS)
criminator is optimal. The best G that replicates the real data distribution leads to the
minimum L(G∗ , D∗ ) = −2 log 2 which is aligned with equations above. [20]
GANs are now a very active research area actually, and there are many different types
of GAN implementation, below we mention some important ones and widely used in
researches:
• Vanilla GAN: the simplest type of GAN. where the Generator and the Discriminator
are simple multi-layer perceptrons.(SGD used for optimization).
• Conditional GAN (CGAN): CGAN some conditional parameters added to help Gen-
erator for generating data, and help Discriminator distinguish the real data from the
fake ones.
• Deep Convolutional GAN: DCGAN is the most popular and successful implementa-
tion of GAN( we will explain it in detail in the next section).
• Context-Conditional GAN,
• Wasserstein GAN,
Current deep-learning research still focused on extracting of features from huge datasets
that can be reused again, for that reason in computer vision, using unlabeled images to
extract good features will be very helpful in many kinds of supervised learning like image
classification, for this objective, there is a solution based on the training of GANs, the
generator, and discriminator networks will be used as feature extractors for supervised
43
GENERATIVE ADVERSARIAL NETWORK (GANS)
tasks, but the training of GANs is unstable, and the generator sometime give a nonsensical
results.
So, in order to make the GANs more stable at training, a new extension of the GAN has
been introduced called Deep Convolutional Generative Adversarial Networks (DCGANs)
by Alec Radford his team.
DCGANs it’s mainly composed of convolution layers without max pooling or fully con-
nected layers. It uses convolutional stride and transposed convolution for the downsam-
pling and the upsampling, for that the DCGANs became the most popular and successful
in GANs family, that way today most GANs are at least loosely based on the DCGAN
architecture
Below the architecture guidelines that make the Deep Convolutional GANs stable:
• Replace any pooling layers with strided convolutions (discriminator) and fractional-
strided convolutions (generator).
• Use ReLU activation in generator for all layers except for the output, which uses
Tanh.
The figure 2.3 below show the network design for Generator. [21]
44
GENERATIVE ADVERSARIAL NETWORK (GANS)
• All weights should be initialized from a zero-centered Normal distribution with stan-
dard deviation 0.02.
• For LeakyReLU, the slope of the leak was set to 0.2 in all models.
• Use the Adam optimizer with tuned hyperparameters. with learning rate is 0.0002,
and the hyperparam beta1 at 0.5.
The figure 2.4 below show the network design for Discriminator. [21]
45
GENERATIVE ADVERSARIAL NETWORK (GANS)
2.4 Conclusion
In this chapter, we have explained GANs DCGANs, with their architectures and pa-
rameters, where we have concluded that the increasing of complexity of the generator does
not necessarily improve the image quality, Also the simplicity of DCGAN makes it stable
and able to be good start point for other project, in the next chapter “implementation”
we will try to build DcGANs that will generate Arabic letterers, this allows us to test
what we have already studied in this chapter.
46
Chapter 3
Implementation
3.1 Introduction
Arabic calligraphy (Islamic calligraphy) is the artistic practice of handwriting and callig-
raphy in various languages and lands that use Arabic letters and share a common Islamic
cultural heritage, It includes Arabic Calligraphy, Ottoman, and Persian calligraphy, and,it
is characterized by written connected, that makes it possible to acquire different geometric
shapes through the tide, reverb, rotation, elevation, and overlap.
Also It is known in Arabic as khatt Islami ( ), meaning Islamic line, design, or construc-
tion, It is also used in the desalination of manuscripts and books, especially copies of the
Holy Quran.
Popular Arabic calligraphy: 1- Naskh 2 - Diwani 3- Thuluth 4- Reqa 5- Koufi.
47
IMPLEMENTATION
The development environment that we have worked on consists of the following tools:
• Python Language
• Anaconda Distribution
• PyTorch library
• Google Colaboratory
48
IMPLEMENTATION
• open source: you are allowed to view and modify the source
Anaconda is the most trusted distribution for data science, also it’s a free and open-
source distribution of the Python and R programming languages for scientific computing
(data science, machine learning applications, large-scale data processing, predictive ana-
lytics, etc.), that aims to simplify package management and deployment. Package versions
are managed by the package management system conda.
• Anaconda Version: 3.
Deep learning developers often clash with the limitations of their devices, especially
students, where their programs need high-performance computers with GPU or TPU to
49
IMPLEMENTATION
be implemented, and this is not available in most cases, cause of the high prices of such
types of equipment, but all these it does not matter very much since the launch of google-
colab service.
• We can connect colab to our local machine through the local Jupyter Notebook and
work directly on google colab GPUs TPUs.
• Comes with pre-installed frameworks and libraries for deep learning, with a possi-
bility to customize the environment by adding a new libraries.
• From colab we can map google drive to save , import , and share our projects and
data.
• Provide us TPUs.
3.2.4 Pytorch
PyTorch is a python based library built to provide flexibility as a deep learning develop-
ment platform. The workflow of PyTorch is as close as you can get to python’s scientific
computing library – numpy. it is free and open-source, PyTorch provides many high-level
features:
50
IMPLEMENTATION
• Tensor computing (like NumPy) with strong acceleration via graphics processing
units (GPU).
• Python support – As mentioned above, PyTorch smoothly integrates with the python
data science stack. It is so similar to numpy that you might not even notice the
difference.
The practical part of our project is a program written in Python, where we will train
our DCGAN to generate new Arabic letters after training it on many handwritten letters
pictures (dataset), all this by using Pytorch library was previously explained. the expla-
nation of the implementation will start with the description of the used dataset, then we
will define the needed input and parameters for the running of the program, in the last
part, we will see what the program produces as results.
3.3.1 Dataset
The dataset is composed of pictures for Arabic alphabet (from ’alef’ to ’yeh’ 480 images
per character) handwritten with images size 32x32 (for dcgan the standard image size
is supposed to be 64x64), because it’s the only dataset available for handwritten Arabic
51
IMPLEMENTATION
3.3.2 Input
• GPUs-number: on colab the value will be 1 for GPU or 0 to use CPU, to run the
program on local computer without GPU like Nvidia graphic card we should use 0
as value.
• img-size: The size of training images, for our program we use size 32.
• nc: if it’s a color training images we will use 3, else , 1 for black and wight images.
• Beta1: hyperparam for Adam optimizers we will use 0.5 (standard of DCGANs)
52
IMPLEMENTATION
3.3.3 Output
First Output:
we will use the subdirectories images in dataset folder, to create the data loader in order
to run it on, then visualize some training images.
Second output:
we will get the printed model of generator and discriminator to see how are structured,
then we will get the training Loop : depending on the number of the chosen number of
epochs this step might take a while.
Third output:
• show the changed loss during training, by plot of D G’s losses versus training
iterations.
• we will show a batch of real data next to a batch of fake data from G.
53
IMPLEMENTATION
3.4 Results:
54
IMPLEMENTATION
Figure 3.3: Real vs Fake losses Optim: Adam, size: 64, epochs:5
55
IMPLEMENTATION
Figure 3.5: Real vs Fake losses Optim: SGD, size: 64, epochs:5
56
IMPLEMENTATION
Figure 3.7: Real vs Fake losses Optim: Adam, size: 32, epochs:180
57
IMPLEMENTATION
Figure 3.9: Real vs Fake losses Optim: Adam, size: 64, epochs:150
The implementation section has allowed us to expand and discover many techniques
and platforms that make the implementation of deep learning projects more smooth, easy,
and allow us to get results in a resonable time.
Google Colabotory make GPUs TPUs allowed for anyone interested to implement deep
learning projects, also the choice of right frameworks and libraries (like PyTorch) make
the programming of GANs in particular and neural networks in general very easy after
the real understanding of the topic. The achieved results in terms of success to generate
Arabic letters can be customized as follows:
• If we compare figures (3.2 3.3) with (3.4 3.5) as it can be seen the Adam optimizer
achieves substantially better results than the SGD one. Therefore, the use of SGD
Optimizer has been abandoned in the standard of this model (DCGAN) unlike the
Adam optimizer.
• Now if we compare the figures (3.2 3.3) with (3.6 3.7) and (3.8 3.9) where we
used the same optimizer (Adam) but with different training epochs: it shows The
58
IMPLEMENTATION
evolution of achieved results when we apply more training epochs, the difference
seems very exciting.
• The last results comparison will be between (3.6 3.7) and (3.8 3.9) where it seems
there are no significant differences between the two results, this is what makes us
assume there may be limits to learning from small representations size (this is just
an assumption that may be it will be verifiable in further studies)
59
General Conclusion
In this work, we tried to build a perception by the projection of GANs success in com-
puter vision to generating Arabic letters using a specific GAN extension called DcGAN,
(which has a structural difference). The results were impressive although the problem of
limited size of dataset, Where we are forced to modify the standard model of DCGAN to
adapt it with the size of training images.
From the architectures tested during this work, conclusions have been compiled and de-
scribed after the implementation some are not conclusive conclusions, but it based on the
result of the experimentation.
After we completed this work we can say that: the Generative Adversarial Networks
adjusted with gradient based optimization methods is powerful tools technique has proved
its ability to absorb many representations (in our case: arabic handwritten letters), this
makes the generating of a new Arabic-calligraphy using DcGAN seems like a very possible
horizon in the near future.
60
Bibliography
[1] https://dictionary.cambridge.org/dictionary/english/deep-learning.
[2] Li Deng, Dong Yu, et al. Deep learning: methods and applications. Foundations and
Trends R in Signal Processing, 7(3–4):197–387, 2014.
[3] https://github.com/lisa-lab/deeplearningtutorials.
[5] https://medium.com/the-deep-learning-methods.
[6] Josh Patterson and Adam Gibson. Deep learning: A practitioner’s approach. ”
O’Reilly Media, Inc.”, 2017.
[7] https://www.datascience.com/blog/convolutional-neural-network.
[8] https://www.analyticsvidhya.com/blog/2017/10/fundamentals-deep-learning.
[9] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
[10] Simon S Haykin, Simon S Haykin, Simon S Haykin, Kanada Elektroingenieur, and Si-
mon S Haykin. Neural networks and learning machines, volume 3. Pearson education
Upper Saddle River, 2009.
61
BIBLIOGRAPHY
[12] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,
Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In
Advances in neural information processing systems, pages 2672–2680, 2014.
[13] http://cs231n.github.io/convolutional-networks/.
[17] Ian Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint
arXiv:1701.00160, 2016.
[18] https://towardsdatascience.com/adversarial-training-creating-realistic-fakes-with-
machine-learning-c570881d0e81.
[19] https://skymind.ai/wiki/generative-adversarial-network-gan.
[21] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation
learning with deep convolutional generative adversarial networks. arXiv preprint
arXiv:1511.06434, 2015.
[22] https://www.python.org.
[23] https://docs.anaconda.com.
[24] https://pytorch.org/docs/stable/index.html.
62