0% found this document useful (0 votes)
4 views6 pages

Convolutional Neural Networks For Page Segmentation of Historical Document Images

This paper presents a page segmentation method for handwritten historical document images using a Convolutional Neural Network (CNN) that treats segmentation as a pixel labeling problem. The proposed method learns features directly from raw image pixels and demonstrates competitive results against traditional methods and deeper architectures on various public datasets. The study emphasizes the effectiveness of a simple CNN architecture with one convolution layer, achieving superior performance in segmenting complex layouts of historical documents.

Uploaded by

bob wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views6 pages

Convolutional Neural Networks For Page Segmentation of Historical Document Images

This paper presents a page segmentation method for handwritten historical document images using a Convolutional Neural Network (CNN) that treats segmentation as a pixel labeling problem. The proposed method learns features directly from raw image pixels and demonstrates competitive results against traditional methods and deeper architectures on various public datasets. The study emphasizes the effectiveness of a simple CNN architecture with one convolution layer, achieving superior performance in segmenting complex layouts of historical documents.

Uploaded by

bob wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2017 14th IAPR International Conference on Document Analysis and Recognition

Convolutional Neural Networks for Page Segmentation of Historical Document


Images

Kai Chen∗ , Mathias Seuret∗ , Jean Hennebert∗† , and Rolf Ingold∗


∗ DIVA, University of Fribourg, Switzerland, Email: {firstname.lastname}@unifr.ch
† University of Applied Sciences, HES-SO//FR, Fribourg, Switzerland, Email: [email protected]

Abstract—This paper presents a page segmentation method pattern between neurons of adjacent layers, CNN can dis-
for handwritten historical document images based on a Con- cover spatial correlations at different granularity of local
volutional Neural Network (CNN). We consider page segment- context [10]. With multiple convolutional layers and pooling
ation as a pixel labeling problem, i.e., each pixel is classified as
one of the predefined classes. Traditional methods in this area layers, CNN has achieved many successes in various fields,
rely on hand-crafted features carefully tuned considering prior e.g., handwriting recognition [11], image classification [12],
knowledge. In contrast, we propose to learn features from raw and text recognition in natural images [13].
image pixels using a CNN. While many researchers focus on In [14], the authors show that an autoencoder can be
developing deep CNN architectures to solve different problems, used to learn features automatically on the training images.
we train a simple CNN with only one convolution layer. We
show that the simple architecture achieves competitive results An autoencoder is a feed forward neural network trained to
against other deep architectures on different public datasets. reconstruct its input. Hidden layers outputs are then used as
Experiments also demonstrate the effectiveness and superiority features to feed an off-the-shelf classifier. In [15], the authors
of the proposed method compared to previous methods. show that by using superpixels as units of labeling, the speed
Keywords-convolutional neural network; page segmentation; of the method is increased. In [16], a Conditional Random
layout analysis; historical document images; deep learning; Field (CRF) [17] is applied in order to model the local
and contextual information jointly to refine the segmentation
I. I NTRODUCTION results which have been achieved in [15]. Following the
Page segmentation is an important prerequisite step of same idea of [16], we consider the segmentation problem
document image analysis and understanding. The goal is to as an image patch labeling problem. The image patches
split a document image into regions of interest. Compared are generated by using superpixels algorithm. In contrast
to segmentation of machine printed document images, page to [14], [15], [16], in this work, we focus on developing
segmentation of historical document images is more chal- an end-to-end method. We combine feature learning and
lenging due to many variations such as layout structure, classifier training into one step. Image patches are used as
decoration, writing style, and degradation. Our goal is to input to train a CNN for the labeling task. During training,
develop a generic segmentation method for handwritten the features used to predict labels of the image patches are
historical documents. In this method, we consider the seg- learned on the convolution layers of the CNN.
mentation problem as a pixel-labeling problem, i.e., for a While many researchers focus on developing very deep
given document image, each pixel is labeled as one of the CNN to solving various problems [12], [18], [19], we train
predefined classes. a simple CNN of one convolution layer. Experiments on
Some page segmentation methods have been developed public historical document image datasets show that despite
recently. These methods rely on hand-crafted features [1], the simple structure and little tuning of hyperparameters, the
[2], [3], [4] or prior knowledge [5], [6], [7], or models that proposed method achieves comparable results compared to
combine hand-crafted features with domain knowledge [8], other CNN architectures.
[9]. In contrast, in this paper, our goal is to develop a more
general method which automatically learns features from II. M ETHODOLOGY
the pixels of document images. Elements such as strokes In order to create general page segmentation method
of words, words in sentences, sentences in paragraphs have without using any prior knowledge of the layout structure of
a hierarchical structure from low to high levels. As these the documents, we consider the page segmentation problem
patterns are repeated in different parts of the documents. as a pixel labeling problem. We propose to use a CNN
Based on these properties, feature learning algorithms can be for the pixel labeling task. The main idea is to learn a
applied to learn layout information of the document images. set of feature detectors and train a nonlinear classifier on
Convolutional Neural Network (CNN) is a feed-forward the features extracted by the feature detectors. With the set
artificial neural network which shares weights among neur- of feature detectors and the classifier, pixels on the unseen
ons in the same layer. By enforcing local connectivity document images can be classified into different classes.

2379-2140/17 $31.00 © 2017 IEEE 965


DOI 10.1109/ICDAR.2017.161
A. Preprocessing
In order to speed up the pixel labeling process, for a
given document image, we first apply a superpixel algorithm.
A superpixel is an image patch which contains pixels be-
long to the same object. Then instead of labeling all the
pixels, we only label the center pixel of each superpixel
and the remaining pixels in that superpixel are assigned to
the same label. The superiority of the superpixel labeling
approach over the pixel labeling approach for the page
segmentation task has been demonstrated in [15]. The simple
linear iterative clustering (SLIC) algorithm [20] is applied
as a preprocessing step to generate superpixels for given
document images.
Figure 1: The architecture of the proposed CNN
B. CNN Architecture
The architecture of our CNN is given in Figure 1. The 1
n
structure can be summarized as 28 × 28 × 1 − 26 × 26 × 4 − L(X, Y ) = − (ln a(x(i) ) + (1 − y (i) ) ln(1 − a(x(i) ))),
n i=1
100 − M , where M is the number of classes. The input is a
(3)
grayscale image patch. The size of the image patch is 28×28
where X = {x(1) , · · · , x(n) } is the set of training image
pixels. Our CNN architecture contains only one convolution
patches and Y = {y (1) , · · · , y (n) } is the corresponding set
layer which consists of 4 kernels. The size of each kernel
of labels. The number of training image patches is n. For
is 3 × 3 pixels. Unlike other traditional CNN architecture,
each x(i) , a(x(i) ) is the output of the CNN as defined in
the pooling layer is not used in our architecture. Then one
Eq. 1. The CNN is trained with Stochastic Gradient Descent
fully connected layer of 100 neurons follows the convolution
with the dropout [23] technique. The goal of dropout is to
layer. The last layer consists of a logistic regression with
avoid overfitting by introducing random noise to training
softmax which outputs an estimation of the probability of
samples. Such that during the training, the outputs of the
each class, such that
neurons are masked out with the probability of 0.5.

eWi x+bi III. E XPERIMENT


P (y = i|x, W1 , · · · , WM , b1 , · · · , bM ) = M ,
j=1 e
Wj x+bj Experiments are conducted on six public handwritten
(1) historical document image datasets.
where x is the output of the fully connected layer, Wi and
bi are the weights and biases of the ith neuron in this layer, A. Datasets
and M is the number of the classes. The predicted class ŷ The datasets are of very different nature. The G. Washing-
is the class which has the max probability, such that ton dataset consists of the pages written in English with ink
on paper and the images are in gray levels. The other two
datasets, i.e., Parzival and St. Gall datasets consist of images
ŷ = arg max P (y = i|x, W1 , · · · , WM , b1 , · · · , bM ). (2) of manuscripts written with ink on parchment and the images
i
In the convolution and fully connected layers of the CNN, are in color. The Parzival dataset consits of the pages written
Rectified Linear Units (ReLUs) [21] are used as neurons. An by three writers in the 13th century. The St. Gall dataset
ReLU is given as: f (x) = max(0, x), where x is the input contains the manuscripts from a medieval manuscript written
of the neuron. in Latin. The details of the ground truth for both datasets
are presented in [24].
C. Training Three new datasets with more complex layouts have
been recently created [25]. The CB55 dataset consists of
To train the CNN, for each superpixel, we generate a patch manuscripts from the 14th century which are written in
which is centred on that superpixel. The patch is considered Italian and Latin languages by one writer. The CSG18
as the input of the network. The size of each patch is 28×28 and CSG863 datasets consist of manuscripts from the 11th
pixels. The label of each patch is its center pixel’s label. The century which are written in Latin language. The number of
patches of the training images are used to train the network. writers of both datasets is not specified. The details of the
In the CNN, the stride length is 1 and the weights three datasets are presented in [25].
are initialized by using Xavier initialization [22]. The cost In the experiments, all images are scaled down with a
function is defined as the cross-entropy loss, such that scaling factor 2−3 . Table I gives the details of training, test,
and validation sets of the six datasets.

966
Table I: Details of training, test, and validation sets. T R, T E, and C. Evaluation
V A denote the training, test, and validation sets respectively.
We compare the proposed method to the previous meth-
image size (pixels) |T R| |T E| |V A| ods [15], [16]. Similar to the proposed method, superpixels
G. Washington 2200 × 3400 10 5 4
St. Gall 1664 × 2496 20 30 10 are considered as the basic units of labeling. In [15], the
Parzival 2000 × 3008 20 13 2 features are learned on randomly selected grayscale image
CB55 4872 × 6496 20 10 10 patches with a stacked convolutional autoencoder in an
CSG18 3328 × 4992 20 10 10
CSG863 3328 × 4992 20 10 10 unsupervised manner. Then the features and the labels of
the superpixels are used to train a classifier. With the trained
B. Metrics classifier, superpixels are classified into different classes.
In [16], a Conditional Random Field (CRF) is applied in
The most used metrics for page segmentation of histor- order to model the local and contextual information jointly
ical document images are precision, recall, and pixel level for the superpixel labeling task. The trained classifier in [15]
accuracy. Besides of these standard metrics, we also adapt is considered as the local classifier in [16]. Then the local
the metrics which are well defined and has been widely classifier is used to train a contextual classifier which takes
used for common semantic segmentation and scene parsing the output of the local classifier as input and output the
evaluations to evaluate different page segmentation methods. scores of given labels. With the local and contextual clas-
These metrics have been proposed in [26]. They are based sifiers, a CRF is trained to label the superpixels of a given
on pixel accuracy and region intersection over union (IU). image. In the experiments, we use a multilayer perceptron
Consequently, the metrics used in the experiments are: pixel (MLP) as the local classifier in [15], [16] and an MLP
accuracy, mean pixel accuracy, mean IU, and frequency as the contextual classifier in [16]. Simple Linear Iterative
weighted IU (f.w. IU). Clustering algorithm (SLIC) [20] is applied to generate the
In order to obtained the metrics, we define the variables: superpixels. The superiority of SLIC over other superpixel
• nc : the number of classes. algorithms is demonstrated in [15]. In the experiments, for
• nij : the number of pixels of class i predicted to belong each image, 3000 superpixels are generated.
to class j. For class i: Table II reports the pixel accuracy, mean pixel accuracy,
mean IU, and f.w. IU of the three methods. It is shown
– nii : the number of correctly classified pixels (true
that the proposed CNN outperforms the previous method.
positives).
Figure 2 gives the segmentation results of the three methods.
– nij : the number of wrongly classified pixels (false
We can see that visually the CNN achieves more accurate
positives).
segmentation results compared to other methods.
– nji : the number of wrongly not classified pixels
(false negatives). D. Max Pooling
• ti : the total number of pixels in class i, such that Pooling is a widely used technology in CNN. Max pooling
 is the most common type of pooling which is applied in
ti = nji . (4) order to reduce spatial size of the representation to reduce
j the number of parameters of the network. In order to show
the impact of max pooling for the segmentation task. We add
With the defined variables, we can compute:
a max pooling layer after the convolution layer. The pooling
• pixel accuracy: size is 2 × 2 pixels. Table II reports the performance of the

nii CNN with a max pooling layer. We can see that only on the
acc = i . (5)
i ti
CB55 dataset, with max pooling the mean pixel accuracy
and mean IU are slightly improved. In general, adding a
• mean accuracy:
max pooling layer does not improve the performance of the
1  nii segmentation task. Figure 3 reports the f.w. IU of the CNN
accmean = × . (6) with different max pooling sizes. We define the max pooling
nc ti
i
size as m × m, such that m = {2 × n | n ∈ N, 0 ≤ n ≤ 13}.
• mean IU: We can see that increasing the pooling size decreases the
 performance. The reason is that for some computer vis-
1 n
iumean = ×  ii . (7) ion problems, e.g., object recognition and text extraction
nc i
t i + j nji − nii in natural images, the exact location of a feature is less
important than its rough location relative to other features.
• f.w. IU: However, for a given document image, to label a pixel in
1  ti × nii the center of a patch, it is not sufficient to know if there
iuweighted =  ×  . (8)
k tk ti + j nji − nii
is text somewhere in that patch, the location of the text is
i

967
Table II: Performance (in percentage) of superpixel labeling with only local MLP, CRF, and the proposed CNN.

G. Washington Parzival St.Gall


pixel mean mean f.w. pixel mean mean f.w. pixel mean mean f.w.
acc. acc. IU IU acc. acc. IU IU acc. acc. IU IU
Local MLP [15] 87 89 75 83 91 64 58 86 95 89 84 92
CRF [16] 91 90 76 85 93 70 63 88 97 88 84 94
CNN 91 91 77 86 94 75 68 89 98 90 87 96
CNN (max pooling) 91 90 77 86 94 75 68 89 98 90 87 96
CB55 CSG18 CSG863
pixel mean mean f.w. pixel mean mean f.w. pixel mean mean f.w.
acc. acc. IU IU acc. acc. IU IU acc. acc. IU IU
Local MLP [15] 83 53 42 72 83 49 39 73 84 54 42 74
CRF [16] 84 53 42 75 86 47 37 77 86 51 42 78
CNN 86 59 47 77 87 53 41 79 87 58 45 79
CNN (max pooling) 86 60 48 77 87 53 42 80 87 57 45 79

Figure 2: Segmentation results on the Parzival, CB55, and CSG863 datasets from top to bottom respectively. The colors: black, white,
blue, red, and pink are used to represent: periphery, page, text, decoration, and comment respectively. The columns from left to right are:
input, ground truth, and segmentation results of the local MLP, CRF, and CNN respectively.

needed. Therefore, the exact location of a feature is helpful kernels. We can see that except on the CS18 dataset, when
for the page segmentation task. K ≥ 4 the performance is not improved.

E. Number of Kernels F. Number of Layers


In order to show the impact of the number of kernels of In order to show the impact of the number of convo-
the convolution layer on the segmentation task. We define lutional layers on the page segmentation task. We incre-
the number of kernels as K. In the experiments, we set mentally add convolutional layers, such that there is two
K ∈ {1, 2, 4, 6, 8, 10, 12, 14}. Figure 4 reports the f.w. IU more kernels on the current layer than the previous layer.
of the one convolution layer CNN with different number of Figure 5 reports the f.w. IU of the CNN with different

968
Figure 6: f.w. IU of the CNN on different numbers of training
Figure 3: f.w. IU of the CNN on different max ppooling
g sizes. images.

dataset the pages are more varied and the ground truth is
less consistent.
H. Run Time
The proposed CNN is implemented with the python
library Theano [27]. The experiments are performed on a PC
with an Intel Core i7-3770 3.4 GHz processor and 16 GB
RAM. On average, for each image, the CNN takes about 1
second processing time. The superpixel labeling method [15]
Figure 4: f.w. IU of the CNN on different numbers of kernels. and CRF model [16] take about 2 and 5 seconds respectively.
IV. C ONCLUSION
In this paper, we have proposed a convolutional neural
network (CNN) for page segmentation of handwritten his-
torical document images. In contrast to traditional page
segmentation methods which rely on off-the-shelf classifiers
trained with hand-crafted features, the proposed method
learns features directly from image patches. Furthermore,
feature learning and classifier training are combined into one
step. Experiments on public datasets show the superiority
Figure 5: f.w. IU of the CNN on different numbers of conv layers. of the proposed method over the previous methods. While
many researchers focus on applying very deep CNN archi-
number of convolution layers. It is show that the number of tectures for different tasks, we show that with the simple
layers does not affect the performance of the segmentation one convolution layer CNN, we have achieved comparable
task. However, on the G. Washington dataset, with more performance compared to other network architectures.
layers, the performance is degraded slightly. The reason is
ACKNOWLEDGMENT
that compared to other datasets, the G. Washington dataset
has fewer training images. Furthermore, the layouts of the This work is supported by the Swiss National Science
pages in the G, Washington dataset are more varied. Foundation project HisDoc 2.0 with the grant number:
205120 150173 and National Natural Science Foundation of
G. Number of Training Images China with the grant numbers: 61202257 and 61650110512.

In order to show the performance under different amount R EFERENCES


of training images. For each dataset, we choose N images [1] C. Grana, D. Borghesani, and R. Cucchiara, “Automatic seg-
in the training set to train the CNN. For each experi- mentation of digitalized historical manuscripts,” Multimedia
ment, the number of batches is set to 5000. Figure 6 Tools and Applications, vol. 55, no. 3, pp. 483–506, 2011.
reports the f.w. IU under different values of N , such that [2] S. S. Bukhari, T. M. Breuel, A. Asi, and J. El-Sana, “Layout
N ∈ {1, 2, 4, 8, 10, 12, 14, 16, 18, 20}1 . We can see that in analysis for arabic historical document images using machine
general, when N > 2, the performance is not improved. learning,” in Frontiers in Handwriting Recognition (ICFHR),
However, on the G. Washington dataset, with more training 2012 International Conference on. IEEE, 2012, pp. 639–644.
images, the performance is degraded slightly. The reason is [3] K. Chen, H. Wei, M. Liwicki, J. Hennebert, and R. Ingold,
that compared to the other datasets, on the G. Washington “Robust text line segmentation for historical manuscript im-
ages using color and texture,” in 2014 22nd International
1 In the G. Washington dataset, there is 10 training images. Therefore, Conference on Pattern Recognition (ICPR). IEEE, 2014,
N ∈ {1, 2, 4, 8, 10}. pp. 2978–2983.

969
[4] K. Chen, H. Wei, J. Hennebert, R. Ingold, and M. Li- [16] K. Chen, M. Seuret, M. Liwicki, J. Hennebert, C.-L. Liu,
wicki, “Page segmentation for historical handwritten docu- and R. Ingold, “Page segmentation for historical handwrit-
ment images using color and texture features,” in Frontiers in ten document images using conditional random fields,” in
Handwriting Recognition (ICFHR), 2014 14th International Frontiers in Handwriting Recognition (ICFHR), 2016 15th
Conference on. IEEE, 2014, pp. 488–493. International Conference on. IEEE, 2016, pp. 90–95.

[5] M. Bulacu, R. van Koert, L. Schomaker, and T. van der [17] J. Lafferty, A. McCallum, and F. Pereira, “Conditional ran-
Zant, “Layout analysis of handwritten historical documents dom fields: Probabilistic models for segmenting and labeling
for searching the archive of the cabinet of the dutch queen,” sequence data,” in Proceedings of the eighteenth international
in Ninth International Conference on Document Analysis and conference on machine learning, ICML, vol. 1, 2001, pp. 282–
Recognition (ICDAR 2007), vol. 1. IEEE, 2007, pp. 357–361. 289.
[6] C. Panichkriangkrai, L. Li, and K. Hachimura, “Character [18] M. D. Zeiler and R. Fergus, “Visualizing and understanding
segmentation and retrieval for learning support system of convolutional networks,” in European conference on com-
japanese historical books,” in Proceedings of the 2nd In- puter vision. Springer, 2014, pp. 818–833.
ternational Workshop on Historical Document Imaging and
Processing. ACM, 2013, pp. 118–122. [19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
for image recognition,” in Proceedings of the IEEE Confer-
[7] B. Gatos, G. Louloudis, and N. Stamatopoulos, “Segmenta- ence on Computer Vision and Pattern Recognition, 2016, pp.
tion of historical handwritten documents into text zones and 770–778.
text lines,” in Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE, 2014, pp. [20] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and
464–469. S. Süsstrunk, “Slic superpixels compared to state-of-the-art
superpixel methods,” IEEE transactions on pattern analysis
[8] R. Cohen, A. Asi, K. Kedem, J. El-Sana, and I. Dinstein,
and machine intelligence, vol. 34, no. 11, pp. 2274–2282,
“Robust text and drawing segmentation algorithm for his-
2012.
torical documents,” in Proceedings of the 2nd International
Workshop on Historical Document Imaging and Processing.
[21] V. Nair and G. E. Hinton, “Rectified linear units improve
ACM, 2013, pp. 110–117.
restricted boltzmann machines,” in Proceedings of the 27th
[9] A. Asi, R. Cohen, K. Kedem, J. El-Sana, and I. Dinstein, “A international conference on machine learning (ICML-10),
coarse-to-fine approach for layout analysis of ancient manu- 2010, pp. 807–814.
scripts,” in Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE, 2014, pp. [22] X. Glorot and Y. Bengio, “Understanding the difficulty of
140–145. training deep feedforward neural networks,” in Proceedings
of the Thirteenth International Conference on Artificial Intel-
[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient- ligence and Statistics, 2010, pp. 249–256.
based learning applied to document recognition,” Proceedings
of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [23] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and
R. Salakhutdinov, “Dropout: a simple way to prevent neural
[11] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. networks from overfitting.” Journal of Machine Learning
Howard, W. Hubbard, and L. D. Jackel, “Backpropagation Research, vol. 15, no. 1, pp. 1929–1958, 2014.
applied to handwritten zip code recognition,” Neural compu-
tation, vol. 1, no. 4, pp. 541–551, 1989. [24] K. Chen, M. Seuret, H. Wei, M. Liwicki, J. Hennebert, and
R. Ingold, “Ground truth model, tool, and dataset for layout
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet analysis of historical documents,” in IS&T/SPIE Electronic
classification with deep convolutional neural networks,” in Imaging. International Society for Optics and Photonics,
Advances in neural information processing systems, 2012, pp. 2015, pp. 940 204–940 204.
1097–1105.
[25] F. Simistira, M. Seuret, N. Eichenberger, A. Garz, M. Liwicki,
[13] T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, “End-to-end text and R. Ingold, “Diva-hisdb: A precisely annotated large
recognition with convolutional neural networks,” in Pattern dataset of challenging medieval manuscripts,” in Frontiers in
Recognition (ICPR), 2012 21st International Conference on. Handwriting Recognition (ICFHR), 2016 15th International
IEEE, 2012, pp. 3304–3308. Conference on. IEEE, 2016, pp. 471–476.
[14] K. Chen, M. Seuret, M. Liwicki, J. Hennebert, and R. In- [26] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional
gold, “Page segmentation of historical document images with networks for semantic segmentation,” in Proceedings of the
convolutional autoencoders,” in Document Analysis and Re- IEEE Conference on Computer Vision and Pattern Recogni-
cognition (ICDAR), 2015 13th International Conference on. tion, 2015, pp. 3431–3440.
IEEE, 2015, pp. 1011–1015.
[27] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu,
[15] K. Chen, C.-L. Liu, M. Seuret, M. Liwicki, J. Hennebert, and G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio,
R. Ingold, “Page segmentation for historical document images “Theano: a cpu and gpu math expression compiler,” in Pro-
based on superpixel classification with unsupervised feature ceedings of the Python for Scientific Computing Conference
learning,” in Document Analysis System (DAS), 2016 12th (SciPy), 2010.
IAPR International Workshop on. IEEE, 2016, pp. 299–304.

970

You might also like