0% found this document useful (0 votes)
166 views12 pages

De-GAN A Conditional Generative Adversarial Network For Document Enhancement

DE-GAN is a conditional generative adversarial network proposed to enhance degraded document images. It uses cGANs to generate an enhanced version of documents with various forms of degradation like dirt, blurring, or watermarks. DE-GAN consistently outperforms state-of-the-art methods on benchmark datasets. It can restore documents to their ideal condition by removing degradation while retaining text quality. The flexible model can address different document enhancement tasks like cleanup, binarization, and watermark removal.

Uploaded by

ajgallego
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
166 views12 pages

De-GAN A Conditional Generative Adversarial Network For Document Enhancement

DE-GAN is a conditional generative adversarial network proposed to enhance degraded document images. It uses cGANs to generate an enhanced version of documents with various forms of degradation like dirt, blurring, or watermarks. DE-GAN consistently outperforms state-of-the-art methods on benchmark datasets. It can restore documents to their ideal condition by removing degradation while retaining text quality. The flexible model can address different document enhancement tasks like cleanup, binarization, and watermark removal.

Uploaded by

ajgallego
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1180 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO.

3, MARCH 2022

DE-GAN: A Conditional Generative Adversarial


Network for Document Enhancement
Mohamed Ali Souibgui and Yousri Kessentini

Abstract—Documents often exhibit various forms of degradation, which make it hard to be read and substantially deteriorate the
performance of an OCR system. In this paper, we propose an effective end-to-end framework named document enhancement
generative adversarial networks (DE-GAN) that uses the conditional GANs (cGANs) to restore severely degraded document images.
To the best of our knowledge, this practice has not been studied within the context of generative adversarial deep networks. We
demonstrate that, in different tasks (document clean up, binarization, deblurring and watermark removal), DE-GAN can produce an
enhanced version of the degraded document with a high quality. In addition, our approach provides consistent improvements compared
to state-of-the-art methods over the widely used DIBCO 2013, DIBCO 2017, and H-DIBCO 2018 datasets, proving its ability to restore
a degraded document image to its ideal condition. The obtained results on a wide variety of degradation reveal the flexibility of the
proposed model to be exploited in other document enhancement problems.

Index Terms—Document analysis, document enhancement, degraded document binarization, watermark removal, deep learning,
generative adversarial networks

1 INTRODUCTION faced obstacles are as follows: Overlaps of noise or water-


marks with the text, dense watermarks, intense dirt or degra-
document processing consists in the transfor-
A UTOMATIC
mation into a form that is comprehensible by a computer
vision system or by a human. Thanks to the development of
dation can cover the entire text and reading it becomes very
hard, there is no prior knowledge about the degradation or
the watermark that should be removed, etc. An ideal system
several public databases, document processing has made a
should be good in performing two tasks simultaneously,
great progress in recent years [1], [2]. However, this process-
removing the noise and the watermarks as well as retaining
ing is not always effective when documents are degraded.
the text quality in the document images.
Lot of damages could be done to a document paper. For
Recently, a great success is made by deep neural networks
example: Wrinkles, dust, coffee or food stains, faded sun
in natural images generation and restoration, especially deep
spots and lot of real-life scenarios [3]. Degradation could be
convolutional neural networks (auto-encoders and varia-
presented also in the scanned documents because of the bad
tional auto-encoders (VAE)) [6], [7], [8] and generative
conditions of digitization like using the smart-phones cam-
adversarial networks (GANs) [9], [10]. GANs, which were
eras (shadow [4], blur [5], variation of light conditions, dis-
introduced in [11], are now considered as the ideal solution
tortion, etc.). Moreover, some documents could contain
for image generation problems [10] with a high quality, sta-
watermarks, stamps or annotations. The recovery is even
bility and variation compared to the auto-encoders. Genera-
harder when certain types of these later take the text place
tive models gained more attention because of the ability of
for instance in cases where the stains color is the same or
capturing high dimensional probability distributions, impu-
darker than the document font color (Fig. 1 shows some
tation of missing data and dealing with multimodal outputs.
examples). Hence, an approach to recover a clean version of
Despite that, document analysis research community is not
the degraded document is needed.
benefiting enough from those approaches, yet. Using them is
In this study, we are focusing on two document enhance-
very limited, for instance, in font translation [12], handwrit-
ment problems. Degraded documents recovery, i.e., to pro-
ten profiling [13] and staff-line removal from a music score
duce a clean (grayscale or binary) version of the document
images [14], where a promising results were found.
given any type of degradation, and watermark removal. The
In [9], Isola et al. show that conditional generative adver-
sarial networks (cGANs), a variation of GANs, performs
 Mohamed Ali Souibgui is with the Computer Vision Center, and Com- good in image-to-image translation (labels to facade, day to
puter Science Department, Universitat Aut onoma de Barcelona, 08193 night, edges to photo, BW to color, etc.). While GANs learn a
Bellaterra, Spain. E-mail: [email protected].
generative model of data, conditional GANs (cGANs) learn
 Yousri Kessentini is with the Digital Research Center of Sfax, Sfax 3021,
Tunisia , and also with the MIRACL Laboratory, University of Sfax, Sfax a conditional generative model, where it conditions on an
3029, Tunisia. E-mail: [email protected]. input image and generate a corresponding output image.
Manuscript received 8 Jan. 2019; revised 15 May 2020; accepted 3 Sept. 2020. Since document enhancement follows the same process,
Date of publication 7 Sept. 2020; date of current version 3 Feb. 2022. means, we want to preserve the text and remove the damage
(Corresponding author: Yousri Kessentini.) in a conditioned image, cGANs shall be the suitable solution,
Recommended for acceptance by C. V. Jawahar.
Digital Object Identifier no. 10.1109/TPAMI.2020.3022406 and this is what motivated us for this study.

0162-8828 ß 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
SOUIBGUI AND KESSENTINI: DE-GAN: A CONDITIONAL GENERATIVE ADVERSARIAL NETWORK FOR DOCUMENT ENHANCEMENT 1181

Fig. 1. Examples of the documents used in this study: (a): Degraded documents, (b): A document with dense watermark.

The main contributions of this paper are: As primary, to categories: degradation or text. Afterward, assigning zeros
the best of our knowledge, this is the first occurrence of to the text pixels and ones for the degradation will generate
GANs, conditional GANs specifically, in a framework that a binary clean image. While generating a gray-scale or col-
addresses different document enhancement problems (clean ored image is done by preserving the same value for the
up, binarization and watermark removal). Second, we used a text pixels.
simple but flexible architecture that could be exploited to Classic document binarization methods [18], [19], [20],
tackle any document degradation problem. Third, we intro- [21], [22], [23] were based on thresholding, many algorithms
duce a new document enhancement problem consists in were developed for the goal of finding the suitable global or
dense watermarks and stamps removal. Finally, we experi- local threshold(s) to apply as a filter. According to the thresh-
mentally prove that our approach achieves a higher perfor- old(s), pixels are classified to be belonging on the text (zero)
mance compared to the state-of-the-art methods in degraded or the degradation (one). Lelore et al. [24] presented an algo-
document binarization. rithm called FAIR, based on edge detection to localize the
The rest of the paper is organized as follows. In Section 2, a text in a degraded document image. A global threshold selec-
summary of previous works on document enhancement, tion method was proposed in [25], basing on fuzzy expert
especially for document clean up and binarization and water- systems (FESs), the image contrast is enhanced. Then, the
mark removal in documents as well as in natural images. We range of the threshold value is adjusted by another FES and a
review also some related works using the GANs in image-to- pixel-counting algorithm. Finally, the threshold value is
image translation. Then, we provide our proposed approach obtained as the middle value of that range. A machine learn-
in Section 3. Some experimental results and comparison with ing based approach was proposed in [26], the goal was the
traditional and recent methods are described in Section 4. determination of the binarization threshold in each image
Finally, a conclusion with some future research directions are region given a three-dimensional feature vector formed by
presented in Section 5. the distribution of the gray level pixel values. The support
vector machine (SVM) was used to classify each region into
2 RELATED WORKS one of four different threshold values. An other and similar
SVM based approach was introduced in [27]. The main
In this work, we focus on the enhancement of degraded docu- drawbacks of these classic methods is that the results are
ment images by addressing different kinds of degradation highly sensitive to document condition. With a complex
which are document clean up, binarization, and watermark image background or a non uniform intensity, problems
removal. From a document analyst viewpoint, this recovery of a occurred.
clean version from the degraded document falls in the research Later, evolved techniques were proposed. Moghaddam
field called document enhancement. Where we can find, in et al. [28] proposed a variational model to remove the bleed-
addition to those three, many other ways to enhance a docu- through from the degraded double-sided document images.
ment. For instance: unshadowing [4], [15], super-resolution [16], For the cases where the verso side of the document is not
deblurring [5], dewarping [17], etc. In what follows, we cover available, a modified variational model is also introduced.
some related works to our addressed problems. By transferring the model to the wavelet domain and using
the hard wavelet shrinkage, the interference patterns was
2.1 Degraded Document Recovering and removed. Other energy based methods were also intro-
Binarization duced. In [29], authors considered the ink as a target and
Dirty document cleaning is related to document image tried to detect it by maximizing an energy function. This
binarization, especially the degraded ones. Where the goal technique was applied also for scene text binarization [30],
is to produce a binary but clean document. That is why, which is a similar task. Similarly, Xiong et al. [31] estimated
dealing with these two problems was almost the same. The the background and subtracted it from the image by a math-
idea is to classify the pixels of the document as one of two ematical morphology. Then the Laplacian energy based

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
1182 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 3, MARCH 2022

segmentation is performed on the enhanced document 2.3 Generative Adversarial Networks for
image to classify the pixels. Although these sophisticated Image-to-Image Transform
image processing techniques, document binarization results As mentioned above, GANs are now achieving impressing
are still unsatisfactory. For this reason, some deep learning results in image generation and translation. In this para-
frameworks were recently used to tackle this problem. The graph, we investigate the use of this mechanism in related
goal here is not to train a model for predicting a threshold, it problems to document processing and enhancement. This
is to directly separate the foreground text from the back- shall gives intuitions to document analysis community about
ground noise given, of course, a considerable amount of exploiting GANs for these tasks. In [43], it was demonstrated
paired data (degraded images and their binarized versions). that GANs lead to improvements in semantic segmentation.
These deep learning based models lead to better results com- Ledig et al. presented SRGAN [44], a Generative Adversarial
pared to the other hand-crafted methods. Several End-to- Network for image Super-Resolution. Through it, they
end frameworks, based on fully convolutional neural-net- achieved a photo-realistic reconstructions for large upscaling
work (encoder-decoder way), was used to binarize and factors (4). In [44], conditional GANs were used for several
enhance the document image [32], [33], [34]. Afzal et al. [35] image-to-image translation tasks (these tasks are related to
formulated the binarization of document images as a document enhancement), given a paired data. This work
sequence learning problem. Hence, the 2D Long Short-Term was extended to [45], where CycleGAN, a GAN that uses
Memory (LSTM) was employed to process a 2D sequence of impaired data, was proposed as a solution. An other model
pixels for the classification of each pixel as text or back- called ”pix2pix-HD” and deals with high-resolution (e.g.,
ground. In [36], an other Fully convolutional network was 2048x1024) photorealistic image-to-image translation tasks
trained with a combined Pseudo F-measure and F-measure was appeared in [46]. Furthermore, an unsupervised method
loss function for the task of document image binarization. A for image-to-image translation was proposed in [47], where
method that inspires from the two previous approaches, i.e., authors train two GANs, or ”DualGAN” as they denoted. In
a recurrent neural network based algorithm using grid their architecture, the primal GAN learns to translate images
LSTM cells for image binarization, as well as a pseudo F- from a domain U to a domain V , while the dual GAN learns
Measure based weighted loss function could be found in to invert the task. The closed loop architecture allows images
[37]. Vo et al. [38] proposed a hierarchical deep supervised from each domain to be translated and then reconstructed.
network (DSN) architecture to predict the text pixels. They Hence, a loss function that accounts for the reconstruction
claimed that their architecture incorporates side layers to error of images can be used to train the translators.
decrease the learning time, while taking a lot of training data.
It is to note that in this paper our object is not to binarize
the document images but to clean the degraded ones and 3 PROPOSED APPROACH
preserve them in their basic grey or colored level. But, we We consider the problems of document enhancement as an
will test our approach in this problem for the purpose of image to image translation task where the goal is to
comparison with the state-of-the art approaches. generate clean document images given the degraded ones.
Since GANs have outperformed auto-encoders in generat-
2.2 Watermark Removal ing high fidelity samples and while we are using paired
Watermark removal is also related to classical document data, we propose to use a cGAN. We called our model
binarization or image matting, where the goal is to decom- DE-GAN (for Document Enhancement conditional Genera-
pose a single image into background and foreground know- tive Adversarial Network). GANs were initially proposed
ing that this time the text is in the background while the in [47] and consist in two neural networks, a generator G
watermark is in the foreground. But, this problem was not and a discriminator D characterized by the parameters ’G
proposed in document processing. In fact, the appeared and ’D , respectively. The generator have the goal of learn-
works that deal with watermark removal was in natural ing a mapping from a random noise vector z to an image y,
images processing. In [39], authors used an image inpaint- G’G : z ! y. While the discriminator has the function of dis-
ing algorithms to remove the watermark. Before that, a sta- tinguishing between the image generated by G and the
tistical method was used to detect the watermark region. ground truth one. Hence, given y, D should be able to tell if
Dekel et al. [40] proposed to estimate the outline sketch and it is fake or real by outputting a probability value, D’D : y !
alpha matte of the same watermark from a batch of different P ðrealÞ. Those two networks compete against each other in
images. Two watermarks were used in this study, the goal a min-max game, in other words if one wins the other loses.
was testing the effectiveness of a single visible watermark The generator aims to cheat the discriminator by producing
to protect a large set of images. Wu et al. [41] used the gener- a close image to the ground truth, however, the discrimina-
ative adversarial networks [11] to remove watermark from tor will improve his prediction of the image being fake, and
faces images used in a biometric system. Cheng et al. [42] this is what is called the adversarial learning. cGANs follow
proposed a method based on convolutional neural networks the same process, except that, they introduced an additional
(CNN). First, object detection algorithms were used to parameter x. Which is the conditioned image. Here, the gen-
detect the watermark region in natural images and then erator is learning the mapping from an observed image x
pass it to an other model to remove the watermark. In our and a random noise vector z, to y, G’G : fx; zg ! y and the
study, we investigate for the first time the problem of water- discriminator is looking, also, to the conditioned image
mark removal in document images, this leads us to compare which makes his process as: D’D : fx; yg ! P ðrealÞ.
our approach with some results obtained on natural images In our situation, the generator will generate a clean image
for the same purpose. denoted by I C given the degraded (or watermarked) one

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
SOUIBGUI AND KESSENTINI: DE-GAN: A CONDITIONAL GENERATIVE ADVERSARIAL NETWORK FOR DOCUMENT ENHANCEMENT 1183

Fig. 2. The generator follows the U-net architecture [48]. Each box corresponds to a feature map. The number of channels is denoted on bottom of
the box. The arrows denote the different operations.

which we will denote I W . The generator aims, of course, to to a sequence of up-sampling and convolutional layers called
produce an image that is very close to the ground truth decoder. There are two disadvantages of using an encoder-
image denoted by I GT . The training of cGANs for this task decoder model for the proposed problem: First, due to down-
is done by the following adversarial loss function: sampling (pooling), lot of information is lost and the model
will have difficulties to recover them later when predicting an
LGAN ð’G ; ’D Þ ¼ EI W ;I GT log ½D’D ðI W ; I GT Þ image with the same size as the input. Second, image informa-
tion flow pass through all the layers, including the bottleneck.
þ EI W log ½1  D’D ðI W ; G’G ðI W ÞÞ:
Thus, sometimes, a huge amount of unwanted redundant fea-
(1) tures (inputs and outputs are sharing a lot of same pixels) are
exchanged. Which leads to energy and time loss. For this rea-
Using this function, the generator should produce, after
son, we employ skip connections following the structure of a
several epochs, a similar image to the ground truth, i.e., the
the model called U-net [48]. Skip connections are added every
watermark and the degradation will be removed and this
two layers to recover images with less deterioration, it is to
may fool the discriminator. But, it is not guaranteed that the
note also that skip connections are used when training a very
text will be preserved in a good condition. To overcome
deep model to prevent the gradient vanishing and exploding
this, we employ an additional log loss function between the
problems. Some batch normalization layers are also added to
generated image and the ground truth, for the purpose of
accelerate the training. The architecture of the generator used
forcing the model to generate images that have the same
in this study is illustrated in Fig. 2, it is similar to [48] where
text as the ground truth. It is to note also that this additional
it was introduced for the purpose of biomedical image
loss boosts the training speed, the added function is
segmentation.
Llog ð’G Þ ¼ EI GT ;I W ½ðI GT log ðG’G ðI W ÞÞ
(2) 3.2 Discriminator
þ ðð1  I GT Þlog ð1  G’G ðI W ÞÞÞ: The defined discriminator is a simple Fully Convolutional
Network (FCN), composed of 6 convolutional layers, that
Thus, the proposed loss of our network, denoted by Lnet output a 2D matrix containing probabilities of the generated
becomes: image being real. This model is presented in Fig. 3. As
shown, the discriminator receives two input images which
Lnet ð’G ; ’D Þ ¼ min’G max’D LGAN ð’G ; ’D Þ þ Llog ð’G Þ: are the degraded image and its clean version (ground
(3) truth or cleaned by the generator). Those images were
Where,  is a hyper-parameter that was set to 100 for text concatenated together in a 256  256  2 shape tensor. Then,
cleaning and 500 in watermark removal and document the obtained volume propagated in the model to end up in a
binarization. The architecture of generator and discrimina- 16  16  1 matrix in the last layer. This matrix contains
tor networks are described in the next sections. probabilities that should be, to the discriminator, close to 1 if
the clean image represents the ground truth. If it is generated
3.1 Generator by the generator the probabilities should be close to 0. There-
The generator is performing an image-to-image translation fore, the last layer takes a sigmoid as an activation function.
task. Usually, auto-encoder models are used for this problem After completing training, this discriminator is no longer
[49], [50], [51]. These models consist, mostly, in a sequence of used. Given a degraded image, we only use the generative
convolutional layers called encoder which perform down- network to enhance it. But, this discriminator shall force the
sampling until a particular layer. Then, the process is reversed generator during training to produce better results.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
1184 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 3, MARCH 2022

Fig. 5. Cleaning degraded documents by DE-GAN.

Fig. 3. The discriminator architecture.


TABLE 2
Results of Image Binarization on DIBCO 2013 Database

Model PSNR F-measure Fps DRD


Otsu [18] 16.6 83.9 86.5 11.0
Niblack [20] 13.6 72.8 72.2 13.6
Sauvola et al.[19] 16.9 85.0 89.8 7.6
Gatos et al. [59] 17.1 83.4 87.0 9.5
Su et al. [60] 19.6 87.7 88.3 4.2
Tensmeyer et al. [36] 20.7 93.1 96.8 2.2
Xiong et al. [31] 21.3 93.5 94.4 2.7
Vo et al. [38] 21.4 94.4 96.0 1.8
Howe [61] 21.3 91.3 91.7 3.2
Fig. 4. The proposed DE-GAN.
DE-GAN 24.9 99.5 99.7 1.1

TABLE 1
The Obtained Results of Document Cleaning
images for training and 32 for testing. From the 112 training
Using Noisy Office Database [3] images, a set of overlapped patches of size 256  256 pixels
was extracted. This has generated 1,356 pairs of patches that
Model SSIM PSNR were fed to our model. This first test intend to demonstrate
FCN (U-net) 0.9970 36.02 the adversarial training effect. Thus, we train another model
DE-GAN 0.9986 38.12 which is a simple FCN which is the U-net presented in Fig. 2.
A validation set of 15 percent from the training images was
used in this model. The results obtained by both models are
presented in Table 1. As could be interpreted, the result of the
3.3 Training Process
encoder-decoder network (U-net) are acceptable for denois-
Training our DE-GAN was as follows, we took patches from ing and cleaning tasks. But, our DE-GAN is further improving
the degraded images of size 256  256 and fed it as an input the results. Which expose the reason of using an adversarial
to the generator. The produced images are fed to the dis- training for these types of problems. For more comparison,
criminator with the ground truth patches and the degraded we have participated to the kaggle competition on denoising
ones. Then, as presented in Equation (3), the discriminator dirty documents,1 we obtain a root mean squared error score
starts forcing the generator to produce outputs that cannot of 0.01952. This makes our method as one of the best
be distinguished from “real” images, while doing his best at approaches in the leaderboard.
detecting the generator’s “fakes”. This training is illustrated In order to give an idea of the cleaning made by our
in Fig. 4 and it is done using Adam with a learning rate of model, some examples are given in Fig. 5 which demon-
1e4 as an optimizer. strate the ability of recovering a very close document to the
ground truth.
4 EXPERIMENTS AND RESULTS Nevertheless, we will compare our approach with state-
of-the-art results in the document binarization problem. We
4.1 Document Cleaning and Binarization take the DIBCO 2013 Dataset [52] for testing. While training
We begin our experiments with document cleaning. For this
task, the Noisy Office Database which contains different types
of degradation, and presented in [3], is used. We defined 112 1. https://www.kaggle.com/c/denoising-dirty-documents/

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
SOUIBGUI AND KESSENTINI: DE-GAN: A CONDITIONAL GENERATIVE ADVERSARIAL NETWORK FOR DOCUMENT ENHANCEMENT 1185

Fig. 6. Binarization of degraded documents by DE-GAN, the result is sat-


isfactory, except in some parts that were highly dense (the red boxes in
the predicted images row).

Fig. 8. Qualitative binarization results produced by different methods of of


a part from the sample (HW5), which is included in DIBCO 2013 dataset.

TABLE 3
Results of Image Binarization on DIBCO 2017 Database, a
Comparison With DIBCO 2017 Competitors Approaches

Rank in the
Model PSNR F-measure Fps DRD
competition
10 [58] 18.28 91.04 92.86 3.40 1
17a [58] 17.58 89.67 91.03 4.35 2
12 [58] 17.61 89.42 91.52 3.56 3
1b [58] 17.53 86.05 90.25 4.52 4
1a [58] 17.07 83.76 90.35 4.33 5
DE-GAN 18.74 97.91 98.23 3.01 -

Fig. 7. Qualitative binarization results produced by different methods of a


part from the sample (PR5), which is included in DIBCO 2013 dataset.

our model was done with different versions of DIBCO Data-


bases [53], [54], [55], [56], [57], [58]. Same as the previous
test, a set of 6,824 training pairs (patches of size 256  256)
was taken from its 80 total images. The obtained results are
compared with several approaches in Table 2.
Out of the results, we can say that DE-GAN is superior
Fig. 9. Qualitative binarization results produced by different of the sample
than the current state-of-the-art methods according to the 16 in DIBCO 2017 dataset, here we compare DE-GAN with the winner’s
following metrics [52]: Peak signal-to-noise ratio (PSNR), F- approach.
measure, pseudo-F-measure (Fps ) and Distance reciprocal
distortion metric (DRD). Some examples of DIBCO 2013 document when it get very dense, because they are basing on
images binarization by DE-GAN are presented in Fig. 6. thresholds that make the degraded pixels classified as a text,
To reflect the results of the previous Table, an illustrative or classifying the text pixels as a damage to be removed. For
comparisons between those different methods could be found the recent approaches [38], [61], they yield a better result than
in Figs. 7 and 8. It is easy to visualize the superiority of our the classic ones and separate the text from the background
method over the classic methods, like those of [18], [19], [20], successfully. However, our method gives a higher perfor-
which fail to remove the background degradation from the mance in terms of closeness to the ground truth image.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
1186 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 3, MARCH 2022

TABLE 4
Results of Image Binarization on DIBCO 2017 and DIBCO 2018 Databases, a Comparison With DIBCO 2018 Competitors Approaches

Model DIBCO 2018 DIBCO 2017 Rank in the competition


PSNR F-measure Fps DRD PSNR F-measure Fps DRD
1 [62] 19.11 88.34 90.24 4.92 17.99 89.37 90.17 5.51 1
7 [62] 14.62 73.45 75.94 26.24 15.72 84.36 87.34 7.56 2
2 [62] 13.58 70.04 74.68 17.45 14.04 79.41 82.62 10.70 3
3b [62] 13.57 64.52 68.29 16.67 15.28 82.43 86.74 6.97 4
6 [62] 11.79 46.35 51.39 24.56 15.38 80.75 87.24 6.22 5
DE-GAN 16.16 77.59 85.74 7.93 18.74 97.91 98.23 3.01 -

Moreover, we tested DE-GAN on a recent DIBCO data- The comparison is done with the top 5 ranked approaches
set, which is DIBCO 2017 [58]. We train our model on 6,098 in ICDAR 2017 competition on document image Binariza-
patches from similar datasets [52], [53], [54], [55], [56], [57]. tion [58]. 18 research groups have participated in the com-
petition with 26 distinct algorithms. The results are
presented in Table 3, where you can notice the superiority
of our DE-GAN over the different methods. It is to note that
most of these approaches are based on encoder-decoder
models and the winner team was using a U-net with several
data augmentation techniques. However, GANs were not
exploited in this competition. An example to compare our
output with the winner algorithm is given in Fig. 9.
In addition, we compared our model with the most recent
approaches, presented in the H-DIBCO 2018 competition [62]
that was held in ICFHR 2018 conference. The results are pre-
sented in Table 4. As shown, our approach has the best perfor-
mance on DIBCO 2017 test set and gives the second best DRD,
PSNR, F-measure and pseudo F-Measure on H-DIBCO 2018
test set. We note that the winner system in the competition
integrates a lot of pre-processing and a post-processing steps
in their algorithm, that make it more efficient for this particu-
lar H-DIBCO 2018 dataset. On the contrary, we are presenting
a simple end-to-end model that shows a good ability in a sev-
eral datasets and enhancements tasks without any additional
processing step. Finally, for a more practical usage of the
model, we tried to binarize some real (naturally degraded)
documents as well, the degradation consists in stains and
show-through. The obtained results are given in Fig. 10, the
model is producing a better versions of the real images, which
will certainly improves their recognition rate.

4.2 Watermark Removal


After testing our model in document cleaning and binariza-
tion, we will evaluate it on the problem of watermark
removal. Dense watermarks (or stamps) can cause a huge
deterioration in the foreground of the document, which make
it hard to be read. However, this problem was not investi-
gated by document analysis community. We decided to be
the first that address it using DE-GAN. Hence, it was not pos-
sible to find a public dataset for testing. We created our own
database which contains 1000 pairs (image of a document
with a dense watermark and stamps and its clean version).
The used watermarks have random texts, sizes, colors,
fonts, opacities and locations (see Fig. 11). As shown, these
watermarks are sometimes covering the entire text making
it unseen by the unaided eye. The code used to produce this
data is available at GitHub,2 for the same dataset used in
Fig. 10. Binarization of three historical degraded documents by DE-
GAN, the binarized version is presented under each original image. 2. https://github.com/dali92002/watermarking-documents/blob/
Some parts are not well recovered as shown in the red boxes. master/Watermarking.ipynb

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
SOUIBGUI AND KESSENTINI: DE-GAN: A CONDITIONAL GENERATIVE ADVERSARIAL NETWORK FOR DOCUMENT ENHANCEMENT 1187

Fig. 11. 4 Samples from our developed dataset. Fig. 12. Qualitative results for dense watermark removal. Above, a sec-
tion from watermarked invoice. Below, it’s enhanced version. Some parts
of the text in the invoice was blurred due to privacy constraints. Because
our study the reader can contact the first author to obtain it. of different domains, synthetic versus real, we can see that some tiny
Training our DE-GAN was done, same as document clean- parts of the watermark were not completely removed (red boxes).
ing, by using overlapped patches (7658 pairs of patches
from 800 watermarked document images). While taking 200
documents for testing. Since, for the best of our knowledge, 4.3 Comparison With Other GAN Models
there is no approach in the literature that addresses this As it is a fact that our model is inspired from the pix2pix
problem in documents. Comparing our obtained results model [9] (we are using a deeper generator and a different
was done with the approaches used in natural images additional loss), it would be useful if we tried some other
watermark removal. The comparison results are presented similar models that are based on GANs and dedicated to the
in Table 5. same image-to-image translation problem. For this aim,
Despite that the watermarks used in our study were very cycleGAN [45] and pix2pix-HD [46] models are considered
dense and we believe that removing them is harder than the for the comparison. We evaluate these models on H-DIBCO
related approaches presented in Table 5. Our approach sur- 2018 dataset [62] with the same conditions and data used to
passes, by far, those in natural images. Fig. 13 shows some train the DE-GAN. The quantitative and qualitative obtained
examples of watermark removal by DE-GAN, the produced results are presented in Table 6 and Fig. 14, respectively.
images are preserving the text quality while removing the Experimental results shows the superiority of DE-GAN com-
foreground watermarks. In addition, since the presented pared to cycleGAN and pix2pix-HD in achieving higher
watermarked documents were synthetically made, it was PSNR, F-measure and Fps and a lower DRD. We note that
interesting to apply DE-GAN to remove watermarks from a the unsupervised training capabilities of CycleGAN are
naturally degraded document. Fig. 12 shows that DE-GAN quite useful since paired data is harder to find in document
successfully removes a dense watermark from a document enhancement applications. For pix2pix-HD, the results are
paper. As you can see, the watermark is completely promising, since the training samples that we used for train-
removed, and the reader or the OCR system can easily read ing were few (the number of DIBCO samples is small if we
the enhanced document compared to the degraded one. split them to patches with size 512  1024, that’s why we
used some flips of images to augment the data). With more
TABLE 5 data, we believe that pix2pix-HD could perform much better.
Results of Watermark Removal
4.4 Document Deblurring
Model PSNR SSIM
The DE-GAN model presented in this paper is able to out-
Dekel et al. [40] 36.02 0.924 perform many state-of-the-art approaches in different prob-
Wu et al. [41] 23.37 0.884 lems like binarization, denoising and watermark removal.
Cheng et al. [42] 30.86 0.914
DE-GAN 40.98 0.998 To experimentally prove the efficiency and the flexibility of
the proposed method, we evaluate it on a more challenging

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
1188 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 3, MARCH 2022

Fig. 13. Watermark removal by DE-GAN.

scenario, which is document deblurring. We use 4000 patches Thus, we will compare the results with this CNN and pix2-
from the dataset developed in [63] to train our model, and 932 pix-HD models trained on this selected data. The obtained
patches for testing. Noting that, in [63] a convolutional neural results are presented in Table 7. We can see that GAN’s mod-
network architecture is proposed to address the problem. els surpasses the CNN. This is much clear in the qualitative
results of some patches presented in Fig. 15. We can also see
that DE-GAN gives similar results to pix2pix-HD, however, it
TABLE 6 is more accurate for predicting some characters. For example,
Results of Image Binarization for DIBCO 2018 Database
in the second patch row, third line, the word ”kind” is cor-
Model PSNR F-measure Fps DRD rectly predicted by DE-GAN but it is predicted as ”bind” by
pix2pix-HD. We note that the used dataset is composed of
cycleGAN 11.00 56.33 58.07 30.07
pix2pix-HD 14.42 72.79 76.28 15.13 300x300px image patches, which can explain why pix2pix-
DE-GAN 16.16 77.59 85.74 7.93 HD does not give a better performance (it works generally
with larger input patch with a size of 512x1024, or 1024x2048).

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
SOUIBGUI AND KESSENTINI: DE-GAN: A CONDITIONAL GENERATIVE ADVERSARIAL NETWORK FOR DOCUMENT ENHANCEMENT 1189

Fig. 15. Qualitative deblurring results of some patches produced by


different methods.

Fig. 16. Qualitative results for Tesseract recognition of some text lines.
Fig. 14. Qualitative binarization results produced by different models of
the sample (9) from H-DIBCO 2018 dataset.
proposed enhancement method boosts the baseline OCR
TABLE 7 performance by a large margin, the character error rate is
The Obtained Results of Document Deblurring decreased from 0.37 for the degraded documents to 0.01 for
the enhanced ones. Fig. 16 shows a tiny example of this pro-
Method PSNR
cess. In each row, you can find a line of a degraded docu-
CNN [63] 19.36 ment image and the text produced by the OCR system, then
pix2pix-HD [46] 19.89 its enhanced version followed the OCR text.
DE-GAN 20.37

5 CONCLUSION
4.5 OCR Evaluation In this paper we proposed a Document Enhancement Gen-
After the quantitatively and qualitatively evaluation of the erative Adversarial Network named DE-GAN to restore
resulted enhanced images presented previously, we com- severely degraded document images. DE-GAN is a modi-
pare in what follows the performance of OCR on degraded fied version of the pix2pix with a deeper generator and a
and enhanced documents. For this aim, we took a set of 4 different additional loss (adversarial + log) to generate an
images (2 degraded ones from DIBCO datasets, and 2 enhanced document given the degraded version. To the
images with a dense watermark from our dataset).Then, we best of our knowledge, this is the first application of GANs
used Tesseract OCR [64] to recognize those images and their for studying document enhancement problem. Moreover,
enhanced versions with DE-GAN. We found that the we present a new problem in document enhancement that

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
1190 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 3, MARCH 2022

is dense watermark (or stamps) removal, hoping that it [13] A. Ghosh, B. Bhattacharya, and S. B. R. Chowdhury, “Handwriting
profiling using generative adversarial networks,” in Proc. 31st AAAI
takes the attention of document analysis community. Exten- Conf. Artif. Intell., 2017, pp. 4927–4928.
sive experiments show that DE-GAN achieved interesting [14] A. Konwer et al., “Staff line removal using generative adver-
results in different document enhancement tasks that out- sarial networks,” in Proc. 24th Int. Conf. Pattern Recognit., 2018,
perform the fully convolutional networks, cycleGAN and pp. 1103–1108.
[15] N. Kligler, S. Katz, and A. Tal, “Document enhancement using
pix2pix-HD models. Furthermore, we achieve improved visibility detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
results compared to many recent state-of-the-art methods nit., 2018, pp. 2374–2382.
on benchmarking datasets like DIBCO 2013, DIBCO 2017 [16] R. K. Pandey and A. G. Ramakrishnan, “Language independent
single document image super-resolution using CNN for improved
and H-DIBCO 2018. recognition,” CoRR, vol. abs/1701.08835, 2017. [Online]. Avail-
We showed that the proposed enhancement method able: http://arxiv.org/abs/1701.08835
boosts the baseline OCR performance by a large margin. [17] C. L. Tan, L. Zhang, Z. Zhang, and T. Xia, “Restoring warped doc-
Hence, as an immediate future work, we plan to add the ument images through 3D shape modeling,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 28, no. 2, pp. 195–208, Feb. 2006.
OCR evaluation in the discriminator part. Thus, we can give [18] N. Otsu, “A threshold selection method from gray-level histograms,”
the discriminator the ability of reading the text to decide if IEEE Trans. Syst., Man, Cybern., vol. 9, no. 1, pp. 62–66, Jan. 1979.
it is real or fake, which will force it to generate more read- [19] J. Sauvola and M. Pietik, “Adaptive document image bina-
able images. We intend, also, to test the performance of the rization,” Pattern Recognit., vol. 33, pp. 225–236, 2000.
[20] W. Niblack, An Introduction to Digital Image Processing. Birkeroed,
DE-GAN on mobile captured documents which present Denmark: Strandberg Publishing Company, 1985.
many problems like shadow, real blur, low resolution, dis- [21] N. Phansalkart, S. More, A. Sabale, and M. Joshi, “Adaptive local
tortion, etc. thresholding for detection of nuclei in diversity stained cytology
images,” in Proc. Int. Conf. Commun. Signal Process., 2011, pp. 218–220.
[22] G. Chutani, T. Patnaik, and V. Dwivedi, “An improved approach
ACKNOWLEDGMENT for automatic denoising and binarization of degraded document
images based on region localization,” in Proc. Int. Conf. Advances
This work was supported in part by the Swedish Research Comput. Commun. Inform., 2015, pp. 2272–2278.
Council (grant 2018-06074, DECRYPT), the Spanish Project [23] M. Cheriet, J. N. Said, and C. Y. Suen, “A recursive thresholding
RTI2018-095645-B-C21 and the CERCA Program/Generalitat technique for image segmentation,” IEEE Trans. Image Process.,
vol. 7, no. 6, pp. 918–921, Jun. 1998.
de Catalunya. [24] T. Lelore and F. Bouchara, “FAIR: A fast algorithm for document
image restoration,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35,
REFERENCES no. 8, pp. 2039–2048, Aug. 2013.
[25] M. Annabestani and M. Saadatmand-Tarzjan, “A new threshold
[1] K. Chellapilla, S. Puri, and P. Simard, “High performance convolu- selection method based on fuzzy expert systems for separating
tional neural networks for document processing,” in Proc. Int. Work- text from the background of document images,” Iranian J. Sci.
shop Front. Handwriting Recognit., 2006. [Online]. Available: https:// Technol. Trans. Elect. Eng., vol. 43, pp. 219–231, 2019.
scholar.googleusercontent.com/scholar.bib?q=info:W8yn_wnlEl0J: [26] C.-H. Chou, W.-H. Lin, and F. Chang, “A binarization method
scholar.google.com/&output=citation&scisdr=CgVFlI0mEP7Szqj with learning-built rules for document images produced by cam-
SoHc:AAGBfm0AAAAAX1zXuHck0ZQCcjUmDlB9jOwrINeGsgg eras,” Pattern Recognit., vol. 43, pp. 1518–1530, 2010.
P&scisig=AAGBfm0AAAAAX1zXuDmqymV0GuOewudd3gh3UL [27] W. Xiong, J. Xu, Z. Xiong, J. Wang, and M. Liu, “Degraded histori-
zjjwVr&scisf=4&ct=citation&cd=-1&hl=en cal document image binarization using local features and support
[2] S. Schreiber, S. Agne, I. Wolf, A. Dengel, and S. Ahmed, vector machine (SVM),” Optik, vol. 164, pp. 218–223, 2018.
“DeepDeSRT: Deep learning for detection and structure recogni- [28] R. F. Moghaddam and M. Cheriet, “A variational approach to
tion of tables in document images,” in Proc. 14th IAPR Int. Conf. degraded document enhancement,” IEEE Trans. Pattern Anal.
Document Anal. Recognit., 2017, pp. 1162–1167. Mach. Intell., vol. 32, no. 8, pp. 1347–1361, Aug. 2010.
[3] F. Zamora-Martinez, S. Espa~ na-Boquera, and M. J. Castro-Bleda, [29] R. Hedjam, M. Cheriet, and M. Kalacska, “Constrained energy
“Behaviour-based clustering of neural networks applied to docu- maximization and self-referencing method for invisible ink detec-
ment enhancement,” in Proc. Int. Work-Conf. Artif. Neural Netw., tion from multispectral historical document images,” in Proc. 22nd
2007, pp. 144–151. Int. Conf. Pattern Recognit., 2014, pp. 3026–3031.
[4] S. Bako, S. Darabi, E. Shechtman, J. Wang, K. Sunkavalli, and P. Sen, [30] S. Milyaev, O. Barinova, T. Novikova, P. Kohli, and V. Lempitsky,
“Removing shadows from images,” in Proc. Asian Conf. Comput. “Fast and accurate scene text understanding with image binariza-
Vis., 2016, pp. 173–183. tion and off-the-shelf OCR,” Int. J. Document Anal. Recognit.,
[5] X. Chen, X. He, J. Yang, and Q. Wu, “An effective document vol. 18, pp. 169–182, 2015.
image deblurring algorithm,” in Proc. IEEE Conf. Comput. Vis. Pat- [31] W. Xiong, X. Jia, J. Xu, Z. Xiong, M. Liu, and J. Wang, “Historical
tern Recognit., 2011, pp. 369–376. document image binarization using background estimation and
[6] X.-J. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very energy minimization,” in Proc. 24th Int. Conf. Pattern Recognit.,
deep convolutional encoder-decoder networks with symmetric 2018, pp. 3716–3721.
skip connections,” in Proc. 30th Int. Conf. Neural Inf. Process. Syst., [32] G. Meng, K. Yuan, Y. Wu, S. Xiang, and C. Pan, “Deep networks
2016, pp. 2810–2818. for degraded document image binarization through pyramid
[7] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution reconstruction,” in Proc. 14th IAPR Int. Conf. Document Anal. Rec-
using deep convolutional networks,” IEEE Trans. Pattern Anal. ognit., 2017, pp. 727–732.
Mach. Intell., vol. 38, no. 2, pp. 295–307, Feb. 2016. [33] J. Calvo-Zaragoza and A.-J. Gallego, “A selectional auto-encoder
[8] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” approach for document image binarization,” Pattern Recognit.,
in Proc. 2nd Int. Conf. Learn. Representations, 2014. vol. 86, pp. 37–47, 2019.
[9] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image [34] K. G. Lore, A. Akintayo, and S. Sarkar, “LLNet: A deep autoen-
translation with conditional adversarial networks,” in Proc. IEEE coder approach to natural low-light image enhancement,” Pattern
Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5967–5976. Recognit., vol. 61, pp. 650–662, 2017.
[10] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing [35] M. Z. Afzal, J. Pastor-Pellicer, F. Shafait, T. M. Breuel, A. Dengel,
of GANs for improved quality, stability, and variation,” in Proc. and M. Liwicki, “Document image binarization using LSTM: A
6th Int. Conf. Learn. Representations, 2018. sequence learning approach,” in Proc. 3rd Int. Workshop Historical
[11] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Int. Document Imag. Process., 2015, pp. 79–84.
Conf. Neural Inf. Process. Syst., 2014, pp. 2672–2680. [36] C. Tensmeyer and T. Martinez, “Document image binarization
[12] A. K. Bhunia et al., “Word level font-to-font image translation using with fully convolutional neural networks,” in Proc. 14th IAPR Int.
convolutional recurrent generative adversarial networks,” in Proc. Conf. Document Anal. Recognit., 2017, pp. 99–104.
24th Int. Conf. Pattern Recognit., 2018, pp. 3645–3650.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.
SOUIBGUI AND KESSENTINI: DE-GAN: A CONDITIONAL GENERATIVE ADVERSARIAL NETWORK FOR DOCUMENT ENHANCEMENT 1191

[37] F. Westphal, N. Lavesson, and H. Grahn, “Document image binar- [57] I. Pratikakis, K. Zagoris, G. Barlas, and B. Gatos, “ICFHR 2016
ization using recurrent neural networks,” in Proc. 13th IAPR Int. competition on handwritten document image binarization (H-
Workshop Document Anal. Syst., 2018, pp. 263–268. DIBCO 2016),” in Proc. 15th Int. Conf. Front. Handwriting Recognit.,
[38] Q. N. Vo, S. H. Kim, H. J. Yang, and G. Lee, “Binarization of 2016, pp. 619–623.
degraded document images based on hierarchical deep supervised [58] I. Pratikakis, K. Zagoris, G. Barlas, and B. Gato, “ICDAR2017 com-
network,” Pattern Recognit., vol. 74, pp. 568–586, 2018. petition on document image binarization (DIBCO 2017),” in Proc.
[39] C. Xu, Y. Lu, and Y. Zhou, “An automatic visible watermark 14th IAPR Int. Conf. Document Anal. Recognit., 2017, pp. 1395–1403.
removal technique using image inpainting algorithms,” in Proc. [59] B. Gatos, I. Pratikakis, and S. Perantonis, “An adaptive binariza-
4th Int. Conf. Syst. Inform., 2017, pp. 1152–1157. tion technique for low quality historical documents,” in Proc. Int.
[40] T. Dekel, M. Rubinstein, C. Liu, and W. T. Freeman, “On the effec- Workshop Document Anal. Syst., 2014, pp. 102–113.
tiveness of visible watermarks,” in Proc. IEEE Conf. Comput. Vis. [60] B. Su, S. Lu, and C. L. Tan, “Robust document image binarization
Pattern Recognit., 2017, pp. 6864–6872. technique for degraded document images,” IEEE Trans. Image Pro-
[41] J. Wu, H. Shi, S. Zhang, Z. Lei, Y. Yang, and S. Z. Li, “De-Mark cess., vol. 22, no. 4, pp. 1408–1417, Apr. 2013.
GAN: Removing dense watermark with generative adversarial [61] N. Howe, “Document binarization with automatic parameter
network,” in Proc. Int. Conf. Biometrics, 2018, pp. 69–74. tuning,” Int. J. Document Anal. Recognit., vol. 16, pp. 247–258, 2013.
[42] D. Cheng et al., “Large-scale visible watermark detection and [62] I. Pratikakis, K. Zagoris, P. Kaddas, and B. Gatos, “ICFHR2018
removal with deep convolutional networks,” in Proc. Chin. Conf. competition on handwritten document image binarization (H-
Pattern Recognit. Comput. Vis., 2018, pp. 27–40. DIBCO 2018),” in Proc. 16th Int. Conf. Front. Handwriting Recognit.,
[43] P. Luc, C. Couprie, S. Chintala, and J. Verbeek, “Semantic segmen- 2018, pp. 489–493.
tation using adversarial networks,” NIPS Workshop Adversarial 
[63] M. Hradis, J. Kotera, P. Zemcık, and F. Sroubek, “Convolutional
Training, Dec. 2016. neural networks for direct text deblurring,” in Proc. Brit. Mach.
[44] C. Ledig et al., “Photo-realistic single image super-resolution Vis. Conf., 2015, pp. 1–13.
using a generative adversarial network,” in Proc. IEEE Conf. Com- [64] R. Smith, “An overview of the Tesseract OCR engine,” in Proc. 9th
put. Vis. Pattern Recognit., 2017, pp. 105–114. Int. Conf. Document Anal. Recognit., 2007, pp. 629–633.
[45] J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-
image translation using cycle-consistent adversarial networks,” in
Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2242–2251. Mohamed Ali Souibgui received the bachelor’s
[46] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, and master’s degrees in computer science from
“High-resolution image synthesis and semantic manipulation with the University of Monastir, Monastir, Tunisia, in
conditional GANs,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
2015 and 2018, respectively. He is currently work-
nit., 2018, pp. 8798–8807.
ing toward the PhD degree from the Computer
[47] Z. Yi, H. Zhang, P. Tan, and M. Gong, “DualGAN: Unsupervised Vision Center, Autonomous University of Barce-
dual learning for image-to-image translation,” in Proc. IEEE Int. lona, Barcelona, Spain. His research interests
Conf. Comput. Vis., 2017, pp. 2868–2876. include machine learning, computer vision, and
[48] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional document analysis.
networks for biomedical image segmentation,” in Proc. Int. Conf.
Med. Image Comput. Comput.-Assisted Interv., 2015, pp. 234–241.
[49] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional net-
works for semantic segmentation,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 39, no. 4, pp. 640–651, Apr. 2017.
[50] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with Yousri Kessentini received the graduate degree
deep neural networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., in computer science engineering from the National
2012, pp. 350–358. Engineering School of Sfax (ENIS), Sfax, Tunisia,
[51] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep con- in 2003, and the PhD degree in the field of pattern
volutional encoder-decoder architecture for scene segmentation,” recognition from the University of Rouen, Rouen,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, France, in 2009. He was postdoctoral researcher
Dec. 2017. with ITESOFT Company and LITIS Laboratory
[52] I. Pratikakis, B. Gatos, and K. Ntirogiannis, “ICDAR 2013 docu- from 2011 to 2013. Currently, he is an assistant pro-
ment image binarization contest (DIBCO 2013),” in Proc. 12th Int. fessor at CRNS and the head of the DeepVision
Conf. Document Anal. Recognit., 2013, pp. 1471–1476. Research Team. His main research interests
[53] B. Gatos, K. Ntirogiannis, and I. Pratikakis, “ICDAR 2009 docu- include deep learning, document processing and
ment image binarization contest (DIBCO 2009),” in Proc. 10th Int. recognition, data fusion, and computer vision. He is certified as an official
Conf. Document Anal. Recognit., 2009, pp. 1375–1382. instructor and ambassador from the NVIDIA Deep Learning Institute. He
[54] I. Pratikakis, B. Gatos, and K. Ntirogiannis, “ICDAR 2011 docu- has coordinate several research projects in partnership with industry and
ment image binarization contest (DIBCO 2011),” in Proc. Int. Conf. the author and co-author of several papers.
Document Anal. Recognit., 2011, pp. 1506–1510.
[55] I. Pratikakis, B. Gatos, and K. Ntirogiannis, “ICFHR 2012 competition
on handwritten document image binarization (H-DIBCO 2012),” in " For more information on this or any other computing topic,
Proc. Int. Conf. Front. Handwriting Recognit., 2012, pp. 817–822. please visit our Digital Library at www.computer.org/csdl.
[56] K. Ntirogiannis, B. Gatos, and I. Pratikakis, “ICFHR2014 competi-
tion on handwritten document image binarization (H-DIBCO
2014),” in Proc. 14th Int. Conf. Front. Handwriting Recognit., 2014,
pp. 809–813.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on September 20,2022 at 07:19:53 UTC from IEEE Xplore. Restrictions apply.

You might also like