Thesis 1
Thesis 1
To automatically perform the dataset creation without sacri- possible to retrieve and process the contents of an arbitrary
ficing data quality, one needs to overcome several challenges. static website, such as the comic strip websites [15].
Comic panel extraction is a difficult task, as the layouts
Panel extraction has been studied widely, mostly in the context
and drawing styles of comics vary significantly. Evaluation
of applying it to mobile comic viewing apps [16]. Some
of the existing segmentation methods, for example using the
solutions use convolutional neural networks (CNN) to train
intersection-over-union score, is needed to find one that per-
an object detection model that can find panel positions in a
forms consistently on comic illustrations. Transcribing comics
comic [17]. However, this approach would most likely require
using OCR is also challenging, as the current OCR models
additional training to perform well on unseen comic series,
struggle with varying hand-writing styles and backgrounds
therefore it is not easily applicable to this papers’ problem.
[11]. This research attempts to improve the OCR accuracy
Other methods utilize image processing techniques, such as
by pre-processing the input using image binarization with
mathematical morphology and region growing for finding the
adaptive thresholding [12] and up-scaling. Another difficulty
background and extracting the panels [18, 19, 20]. Those
is the correct ordering of the words output by the OCR.
solutions are better suited for our use case and they form a
By default, the output is ordered in a line-by-line manner,
basis for the developed panel extraction algorithm.
but for comics also the panel-by-panel, and bubble-by-bubble
orderings have to be considered. Text extraction is performed Current text recognition solutions have a very high accuracy at
at the panel level and optionally concatenated to obtain full identifying and extracting text but still have some limitations,
comic transcription. Hierarchical clustering [13] is applied to including dealing with colored backgrounds, small fonts, and
group all bounding boxes belonging to the same speech bubble handwritten text [11]. Unfortunately, those are all present in
and determine the right ordering. comics — comic strips are characterized by high variability of
fonts and styles, often along with complex backgrounds, noise,
Section II covers the related work and presents the cur-
and low resolution. To overcome those difficulties, researchers
rent state-of-the-art in the domain. Section III describes the
propose domain-specific training of OCR models [21] and out-
methodology behind this paper’s contributions: it introduces
put post-processing, such as text validation and correction [22].
the design of the pipeline, describes in detail the created panel
The reading order of a comic is not straightforward either:
extraction algorithm, and presents the solutions developed to
comics are read panel after panel, bubble after bubble, unlike
improve the performance of the OCR. Section IV gives insight
most documents, where top-to-bottom, left-to-right ordering
into conducted experiments and their results, aiming to give a
is sufficient. Overall, comics form a challenge for OCR, and
detailed evaluation of each of the steps of the DCP. Section
current engines cannot deal with it out of the box.
V discusses the results and gives ideas for future work on the
topic, and Section VI reflects on the ethical implications and
reproducibility of the findings. III. DCP: DATASET C ONSTRUCTION P IPELINE
extraction stage, where each full comic is divided into panels, In the first two steps, the image is converted to grayscale and
and the panels are saved to disk. Finally, to perform automated binarized using adaptive Gaussian thresholding, as visualized
text transcription, the illustrations are fed into the OCR stage. Fig. 3a-b. Adaptive thresholding establishes the threshold
First, pre-processing is applied to the images, to make the value separately for each pixel based on its neighborhood,
text easier to detect, then the illustrations are sent to the resulting in less noise than global thresholding and preserving
OCR engine, and lastly, the OCR output is post-processed the contours and features in the image. Usually, this results in
to reduce the error. The output of the system consists of the an image that is easier to analyze [23], see Fig. 4.
individual panels, along with their transcriptions, or optionally,
the original, full comics with their transcriptions obtained via
concatenating the individual panels’ transcriptions.
1) For each comic website, list the URLs of all the comic
images. Fig. 4: Original img. vs. global threshold vs. adaptive threshold [23]
2) For each URL, send an HTTP GET request.
3) For each response, extract the image and save it to disk.
Performing these operations on a single thread does not scale After binarization, contours are identified in the image. Only
well, as, for each URL, the process would have to wait for the outermost, top-level contours are interesting for this task,
the server’s response, before it can send the next request. as those form the candidates for the panel bounding boxes (see
Therefore, multi-threading is utilized to improve efficiency — Fig. 3c-d).
several threads send requests and deal with the responses in After identifying the outermost contours, they are filled in with
parallel achieving significant speedup. white color, but the resulting image still has some noise, see for
There exists extensive library support for web scraping for example the ”WWW.PHDCOMICS.COM” text on the bottom
most popular programming languages. For instance, the pro- of Fig. 3e. A morphological opening operation is applied to
cess can be implemented easily using Python’s requests and remove the noise. It is a combination of erosion and dilation
BeautifulSoup libraries. The solution for this task can be re- operations, allowing for the removal of the small elements of
used for an arbitrary comic website — the only additional the image while preserving the shape and size of the larger
work when introducing a new comic source is getting the list ones.
of all the image URLs. The resulting image, as presented in Fig. 3f has no more
noise. Contours present in that image, determine the position
C. Panel extraction of the final bounding box, see Fig. 3g. Once the algorithm
identifies the positions of all the panel frames, the original
The panel extraction process leverages the presence of image is divided into separate illustrations, by extracting the
frames around comic panels to perform the segmentation. An areas surrounded by the bounding boxes and saving them as
overview of the process is presented in Fig. 3. new images.
(a) All b-boxes (b) Identified clusters (a) Initial illustration (b) Before denoising (c) Final image
Fig. 5: Example of bounding-box clustering. Illustration Fig. 6: Example of text removal from an illustration. Illustration taken from
taken from https://dilbert.com. https://dilbert.com.
D. Text extraction The source of this issue is the lack of information about the
comic bubbles composition, which is crucial for determining
After the comic is segmented into individual illustrations, the correct order of words in comic dialogues. Comics are
the text extraction has to be performed on each illustration. read bubble by bubble, rather than simply line by line. To
That can be achieved using optical character recognition — a correct the output, the bounding boxes need to be grouped into
conversion of image representation containing text into plain clusters corresponding to bubbles, as presented in Fig. 5b. We
text strings. calculate the bubbles’ centers and use them to sort bubbles
1) Engines: There is a wide selection of OCR software on by their centers’ x and y positions. We can then read the text
the market, therefore it is not feasible to test all of it, but two individually for each bubble, and concatenate the results to
most popular engines are selected: obtain the corrected transcription:
• Tesseract [24] - the leading open-source OCR engine. Bubble 1: ”climate change is caused by gravity”
• Google’s Vision API OCR [25] - the state-of-the-art Bubble 2: ”that’s right!”
commercial OCR API. Concatenated: ”climate change is caused by grav-
ity. that’s right!”
2) Pre-processing: Pre-processing techniques, such as up-
scaling and binarization, can be applied to images to improve The bubble grouping can be performed using agglomerative
the performance of OCR [26]. hierarchical clustering [13, 27]. Initially, each bounding box
starts alone, in a singleton cluster. Then clusters are merged
a) Upscaling: Character height is considered a key factor for until all the pairwise distances between clusters are higher
OCR output quality, and for optimal performance, it should than a certain threshold. To determine the distance between
be between 20-40 pixels. Unfortunately, the character heights two clusters, a single linkage approach is used - the distance
in the comic datasets are often much smaller — an up-scaling between clusters is the minimal distance between a pair
step is needed. To know the re-scaling factor, one has to find consisting of elements of those two clusters (one from each).
the character height in the original image. For this purpose, The distance between two bounding boxes is defined as the
an initial OCR pass is performed on the unprocessed image, sum of minimum spacings between their edges in x and y
and the character height is defined as the median of the directions, see Fig 7. Additionally, if there is overlap in an
heights of the bounding boxes returned by the engine. Then, axis, the spacing in that axis is set to 0. The distance threshold
images are re-scaled using cubic interpolation by a factor of for clusters can be determined based on the calculated letter
desired letter height
initial letter height , where a common value for the desired height — spacing between two lines of text within one bubble
letter height is 30 pixels. would rarely be larger than the height of one letter.
b) Binarization: : The images are then binarized using adap-
tive thresholding [12], and fed into the final OCR pass, poten-
tially resulting in better accuracy than before pre-processing.
3) Post-processing:
a) Clustering: The OCR output consists of detected words
along with their bounding boxes, see Fig. 5a. The boxes are
initially ordered top-to-bottom, left-to-right, as in a standard Fig. 7: x and y spacings between two bounding boxes.
printed text page. However, this does not work for comics,
as it does for example, for Fig. 5a, this would result in the b) Autocorrect: Single character mistakes are very common
following output: in the OCR output — often the majority of the characters
are detected correctly, but some letters are classified as a
”climate change is caused by that’s gravity. right!” different character than they actually are. In that cases, it can
TABLE I: Web scraping and panel extraction evaluation.
(a) Web scraping - average time to scrape one panel (in seconds), and (b) Comparison of panel extraction performance between our method (DCP) and
speed-up factor achieved by the use of parallelization. Time estimates Kumiko [20]: single panel and full strip success rates, intersection-over-union scores,
based on scraping 1000 Dilbert, 1000 PHD Comics and 1000 Garfield and time efficiency. Tested on 300 comic strips with total of 1118 panels.
images from the web.
Dilbert PHD Comics Garfield
Metric or dataset
Result or Dataset Dilbert PHD Comics Garfield Average Kumiko DCP Kumiko DCP Kumiko DCP
1 thread 1.46 2.72 0.43 1.54 Panel succ. rate 97% 100% 92% 99% 91% 91%
10 threads 0.22 0.18 0.046 0.149 Strip succ. rate 95% 100% 78% 96% 73% 72%
speed-up 6.6 15.1 9.3 10.33 Average IoU 0.99 0.99 0.96 0.98 0.97 0.95
Time per comic 680ms 2.3ms 400ms 1.1ms 890ms 1.2ms
Web scraper is evaluated by downloading comic strips from Overall, the proposed panel extraction algorithm achieves
dilbert.com and phdcomics.com and pt.jikos.cz/garfield. Over almost perfect results on Dilbert and PHD comics comics —
14000 Garfield, 12000 Dilbert and 2100 PHD Comics strips leveraging the presence of the frames enables outperforming
are downloaded. Multi-threaded scraping is significantly faster Kumiko. The performance on Garfield is noticeably worse, as
than single-threaded, with 10 speed-up factor for Garfield, 6.5 no frames are present for some of the panels, making it harder
for PHD Comics, and 15.1 for Dilbert, see Table Ia. To give to find the panel boundaries — see Fig. 9b for an example.
a better idea of the scale, scraping all 12000 Dilbert comics 1 Panel extraction for Garfield is performed using global threshold binariza-
would take approximately 9 hours with a single thread, but tion rather than adaptive binarization, as it performed better when no frames
only 35 minutes with 10 threads. are present.
Fig. 9: Examples of panel extraction results — the detected panels are represented by the green areas. (Comic strips from Garfield [1])
(a) Correct segmentation example: all three panels are detected correctly.
(b) Incorrect segmentation example: the middle panel is not detected correctly, as there is no clear border around it.
When it comes to efficiency, our algorithm is significantly To evaluate the impact of this paper’s contribution to comic
faster than Kumiko, making it more suited for processing large dialogue transcription, a baseline scenario is established: feed-
datasets. It also has a significant advantage over deep-learning- ing the entire comic strip into the OCR engines, without pre-
based panel detection techniques - no dataset-specific training processing and dividing it into panels, denoted as Exp. #1 in
is needed, the method can be directly applied to any other Table IIa. As presented in Table IIb, results achieved using
comic. the baseline approach are extremely inaccurate, making them
completely unusable. Two primary reasons for the failure are:
D. Text extraction
1) The OCR picks up a lot of text from outside the actual
Evaluation of text extraction is performed by comparing illustrations. That text is not part of the dialogues, it
ground-truth, manual transcriptions of comic strips with the mostly contains other information such as publication
output of automated transcription using OCR. The evaluation dates, comic artist’s name, or website URLs.
is conducted on Garfield, Dilbert and PHD Comics datasets, 2) As there is no information about panel division in this
containing 500, 500, and 100 annotated comic strips respec- experiment, the OCR engines struggle with determining
tively. The Garfield and Dilbert transcriptions are available the correct order of the output — e.g. some text from
online in Alfred Arnold’s transcription archive [31], and the the second panel can appear before some parts of text
PHD Comics annotations are obtained via manual transcrip- from the first panel.
tions.
The first major improvement to this scenario is experiment #2
The Levenshtein distance, also known as the edit distance, is
from Table IIa, where instead of scanning the whole image
used as a primary metric for text extraction evaluation. Given
at once, the OCR is performed separately on each panel, and
two strings, the Levenshtein distance is defined as the minimal
the results are then concatenated. We can observe a significant
number of single-character edits (insertions, deletions, substi-
decrease in error rates. All the later experiments are conducted
tutions) needed to change one of the strings into the other:
on separate panels, rather than on the full comic strip.
Ldist (st , sd ) = Cins (st , sd ) + Cdel (st , sd ) + Csub (st , sd ) (2)
In the next two experiments — #3 and #4 from Table IIa — the
Where st is the ground-truth string, and sd is the detected impact of pre-processing techniques is evaluated. As presented
string. The distance can be normalized by dividing by the in Table IIb - #3, adding an up-scaling step has a minor, but
length of the ground truth string: positive impact on the performance — especially in the case
Ldist of PHD Comics, where the initial image resolution is low for
Ldist norm (st , sd ) = min(1, ) (3) some of the older strips. Experiment #4 results indicate that
| st |
adding a binarization step has a slightly negative impact on the
Given the comics text is almost always capitalized, the eval-
outcome, contrary to general OCR pre-processing recommen-
uation of the transcriptions is performed in a case-insensitive
dations from the literature. Based on these evaluation results,
manner — the strings are converted to lowercase before
in the later experiments the binarization step is skipped, and
comparison. Therefore, no distinctions are made between
only the up-scaling step is applied.
lowercase and uppercase letters; for example, ”CAT” and
”cat” are treated as the same string with distance 0. Experiment #5 aims to evaluate the impact of adding a
TABLE II: Text extraction evaluation - experiment on 500 Dilbert, 500 Garfield and 100 PHD Comics with ground-truth strings obtained
via manual transcription.
(a) Experiment setup: six experiments are conducted to evaluate text extraction. (b) Experiment results: normalized Levehnstein distance between
Experiments test OCR on full and segmented strips, using the proposed pre- detected and ground-truth transcriptions. Comparison of Vision API
processing and post-processing techniques. and Tesseract OCR on Dilbert, Garfeld, and PHD Comics datasets.
Pre-processing Post-processing Dilbert PHD Comics Garfield
Exp. no. Segmentation Exp. no.
re-sizing binarization clustering autocorrect Tess. V. API Tess. V. API Tess. V. API
Exp. #1 7 7 7 7 7 Exp. #1 0.68 0.61 0.731 0.538 0.650 0.381
Exp. #2 4 7 7 7 7 Exp. #2 0.233 0.044 0.786 0.109 0.532 0.163
Exp. #3 4 4 7 7 7 Exp. #3 0.222 0.044 0.698 0.104 0.501 0.159
Exp. #4 4 4 4 7 7 Exp. #4 0.242 0.048 0.727 0.112 0.534 0.150
Exp. #5 4 4 7 4 7 Exp. #5 0.188 0.032 0.699 0.075 0.468 0.120
Exp. #6 4 4 7 4 4 Exp. #6 0.276 0.097 0.694 0.0781 0.485 0.121
bounding box clustering step on the OCR performance. Table The proposed panel extraction method achieved success rates
IIb shows a significant positive impact on the accuracy — for Dilbert and PHD Comics, achieving 100% and 97% suc-
clustering reduces the error rates by up to 30%. This shows, cess rates respectively, outperforming the baseline algorithm in
that a significant fraction of the errors is caused by the wrong terms of both accuracy and efficiency. Unfortunately, it failed
ordering of the output words due to a lack of information about to detect a significant fraction of Garfield panels, achieving
the comic speech bubbles. Clustering fixes that issue for most only a 71% success rate. The source of the errors was the lack
data points. of clear panel boundaries in some of the strips. The proposed
algorithm used contour detection and thresholding to detect
Finally, experiment #6, from Table IIa evaluates the impact of
the panel boundary, therefore it dealt flawlessly with panels
auto-correcting the OCR output on the extraction error. Intu-
that had frames, but struggled when no frame was present.
itively, one could expect some improvement from dictionary-
based correction, but the results in Table IIb show an opposite The automatic text transcription was the most challenging
effect. One explanation for this could be that comics contain stage of the process, as the existing OCR solutions performed
a lot of names, onomatopoeias, and exclamations — such as poorly in their out-of-the-box state. Moreover, the proposed
”Dilbert”, ”Woo” or ”Pow” — which are not present in the OCR pre-processing methods, such as binarization and up-
dictionary and get mistakenly corrected into other words. scaling, brought no significant improvement to the perfor-
mance. However, performing OCR on individual illustrations
Overall, the final performance of the OCR is satisfactory,
and correcting the order of the output by grouping the text
but not perfect. Vision API performs better than Tesseract
bounding boxes into speech bubbles reduced the error rates
in all cases. Tesseract completely fails with PHD Comics
by a factor of 7. It was thought that performing dictionary au-
and Garfield, in a big part of the comics it does not detect
tocorrect on the OCR output would bring further improvement,
text at all. Panel separation and clustering have a significant,
but the effect was the opposite, possibly due to the presence
positive impact on the performance, but the other elements of
of non-dictionary words, like onomatopeias and exclamations
the proposed method do not bring improvement. Best error
in the comic dialogues. Overall, the final OCR output is fairly
rates of 0.03 on Dilbert, 0.07 on PHD Comics, and 0.12 on
close to the true transcriptions, with normalized Levenshtein
Garfield give a solid base for automatic transcription, but in
distances of 0.03, 0.07, and 0.12 for Dilbert, PHD Comics,
the current state most likely the transcription would still have
and Garfield respectively.
to be corrected by a human.
In general, the proposed pipeline can successfully construct a
dataset of comic illustrations and transcriptions for most data
V. D ISCUSSION , CONCLUSIONS AND FUTURE WORK points, but the output still contains an observable amount of
errors. Part of the errors can be attributed to the mistakes in
The purpose of this paper was to design, implement, and eval- segmenting the comic strips where no clear panel frames are
uate an automated dataset construction pipeline for building present. It could be beneficial to conduct further research to
an illustration-transcription comics dataset. To do so, web- develop a solution that can deal with such cases. The majority
scraping was applied to automatically download the comic of the errors occur at the text extraction stage - there might be
strips, image processing techniques were utilized to divide a need to construct a human-in-the-loop software, including a
the comics into individual illustrations, and OCR was used tool for manual correction of the output transcriptions. Another
to automatically generate transcriptions. possibility for improvement is to experiment with dataset-
specific training of the OCR models on a small, manually
The web scraping technique proved successful in the experi- annotated dataset. Finally, some text-region detection algo-
ments. We were able to download thousands of Garfield, Dil- rithms, such as the EAST text detector [32], could be used
bert and PHD Comics strips, and thanks to the multithreaded to detect candidate text areas and feed those into the OCR
implementation, we achieved an average rate of between 5 and pipeline instead of the whole comic strips, potentially resulting
20 comics per second.
in better OCR accuracy. R EFERENCES
[1] Garfield. URL: https://garfield.com/.
VI. R ESPONSIBLE RESEARCH
[2] PHD Comics. URL: http://phdcomics.com/.
The following three paragraphs reflect on the ethical and legal [3] Dilbert. URL: https://dilbert.com/.
implications of this research project, and the reproducibility [4] Faizan Ahmad, Aaima Najam, and Zeeshan Ahmed.
of results achieved in the experiments. “Image-based face detection and recognition:” state of
the art””. In: arXiv preprint arXiv:1302.6379 (2013).
[5] Ian J Goodfellow et al. “Generative adversarial net-
A. Copyright issues
works”. In: arXiv preprint arXiv:1406.2661 (2014).
The data used for experiments was obtained via web scraping, [6] Han Zhang et al. “Stackgan: Text to photo-realistic
which is a topic of debates concerning legal issues such as image synthesis with stacked generative adversarial
copyright and privacy violations [33]. There is no risk of networks”. In: Proceedings of the IEEE international
privacy violations in the case of this research project — all of conference on computer vision. 2017, pp. 5907–5915.
the data points are publicly available artworks, created to be [7] Scott Reed et al. “Generative Adversarial Text to Image
shared with a wide audience. However, the copyright violation Synthesis”. In: Proceedings of The 33rd International
is a real threat: the comic strips from PHD Comics and Dilbert Conference on Machine Learning. Ed. by Maria Florina
are the intellectual property of their creators, and cannot be Balcan and Kilian Q. Weinberger. Vol. 48. Proceedings
freely distributed by a third party. Therefore, to avoid any of Machine Learning Research. New York, New York,
concerns regarding copyright violation, the datasets used for USA: PMLR, 2016, pp. 1060–1069. URL: http : / /
experiments will not be published. Some of the PHD Comics proceedings.mlr.press/v48/reed16.html.
strips are still used in the paper to illustrate the methods [8] Rada Mihalcea, Carlo Strapparava, and Stephen Pul-
and experiments, but this kind of usage is explicitly listed man. “Computational models for incongruity detection
as permitted on PHD Comics website [34]. in humour”. In: International Conference on Intelli-
gent Text Processing and Computational Linguistics.
B. Potential software misuse Springer. 2010, pp. 364–374.
[9] Yuji Roh, Geon Heo, and Steven Euijong Whang. “A
Another possible ramification of this project is the potential Survey on Data Collection for Machine Learning: A Big
misuse of the published software. For instance, the proposed Data - AI Integration Perspective”. In: IEEE Transac-
scraping mechanism could be used to mass-download copy- tions on Knowledge and Data Engineering 33.4 (2021),
righted comics, which could then be republished illegally pp. 1328–1347. DOI: 10.1109/TKDE.2019.2946162.
on a third-party website. Moreover, the text removal method [10] Hillary Sanders and Joshua Saxe. “Garbage in, garbage
implemented in the project could be used to clear the existing out: How purportedly great ML models can be screwed
dialogues from a comic and add new ones, creating an alter- up by bad data”. In: Proceedings of Blackhat 2017
native story. This usually goes against the comic publisher’s (2017).
regulations, as it involves producing derivative work from [11] Cem Dilmegani. Current State of OCR: Is it a solved
copyrighted content. In general, software misuse cannot be problem in 2021? 2021. URL: https : / / research .
fully prevented, but explicit warning regarding this topic is aimultiple.com/ocr-technology/#: ∼ :text=of%20OCR%
included in the software documentation. 20tools?- ,OCR%20is%20not%20a%20stand- alone%
20solution % 20in % 20human - machine , structured %
C. Reproducibility 20data%20from%20their%20documents..
[12] Jamileh Yousefi. “Image binarization using Otsu thresh-
Reproducibility of results is a crucial aspect of research, but, olding algorithm”. In: Ontario, Canada: University of
unfortunately, it is often overlooked by scientists [35], also in Guelph (2011).
the computer science field [36]. To ensure the reproducibility [13] William HE Day and Herbert Edelsbrunner. “Effi-
of the results achieved in this research, the source code will cient algorithms for agglomerative hierarchical cluster-
be published as open-source software on github.com 2 , along ing methods”. In: Journal of classification 1.1 (1984),
with a usage guide. This way, the experiments mentioned in pp. 7–24.
the paper can be easily repeated by interested parties and [14] H. Ng and S. Winkler. “A data-driven approach to
compared with new research. Moreover, the code can also cleaning large face datasets”. In: 2014 IEEE Interna-
be forked and modified to be used in a different context or tional Conference on Image Processing (ICIP). 2014,
improved by new ideas. pp. 343–347. DOI: 10.1109/ICIP.2014.7025068.
[15] Ryan Mitchell. Web scraping with Python: Collecting
2 Project repository: https://github.com/mstyczen/comic-dcp more data from the modern web. ” O’Reilly Media,
Inc.”, 2018.
[16] Van Nguyen Nhu, Christophe Rigaud, and Jean- [32] Xinyu Zhou et al. “East: an efficient and accurate scene
Christophe Burie. “What do We Expect from Comic text detector”. In: Proceedings of the IEEE conference
Panel Extraction?” In: 2019 International Conference on Computer Vision and Pattern Recognition. 2017,
on Document Analysis and Recognition Workshops (IC- pp. 5551–5560.
DARW). Vol. 1. 2019, pp. 44–49. DOI: 10 . 1109 / [33] Vlad Krotov and Leiser Silva. “Legality and ethics of
ICDARW.2019.00013. web scraping”. In: (2018).
[17] Toru Ogawa et al. “Object detection for comics [34] PHD Comics: About. URL: https : / / phdcomics . com /
using manga109 annotations”. In: arXiv preprint about.php.
arXiv:1803.08670 (2018). [35] Monya Baker. “Reproducibility crisis”. In: Nature
[18] Anh Khoi Ngo Ho, Jean-Christophe Burie, and Jean- 533.26 (2016), pp. 353–66.
Marc Ogier. “Panel and speech balloon extraction [36] Matthew Hutson. Artificial intelligence faces repro-
from comic books”. In: 2012 10th IAPR International ducibility crisis. 2018.
Workshop on Document Analysis Systems. IEEE. 2012,
pp. 424–428.
[19] Xufang Pang et al. “A Robust Panel Extraction Method
for Manga”. In: Proceedings of the 22nd ACM Inter-
national Conference on Multimedia. MM ’14. Orlando,
Florida, USA: Association for Computing Machinery,
2014, pp. 1125–1128. ISBN: 9781450330633. DOI: 10.
1145/2647868.2654990. URL: https://doi.org/10.1145/
2647868.2654990.
[20] Kumiko. URL: https://github.com/njean42/kumiko/.
[21] Christophe Rigaud et al. “Toward speech text recog-
nition for comic books”. In: Proceedings of the 1st
International Workshop on coMics ANalysis, Processing
and Understanding. 2016, pp. 1–6.
[22] Christophe Ponsard, Ravi Ramdoyal, and Daniel
Dziamski. “An OCR-Enabled Digital Comic Books
Viewer”. In: Computers Helping People with Special
Needs. Ed. by Klaus Miesenberger et al. Berlin, Hei-
delberg: Springer Berlin Heidelberg, 2012, pp. 471–478.
ISBN : 978-3-642-31522-0.
[23] Image Thresholding. URL: https : / / docs . opencv. org /
master/d7/d4d/tutorial py thresholding.html.
[24] Tesseract. URL: https : / / github . com / tesseract - ocr /
tesseract.
[25] Google Vision API. URL: https : / / cloud . google . com /
vision/docs/ocr.
[26] Wojciech Bieniecki, Szymon Grabowski, and Wojciech
Rozenberg. “Image Preprocessing for Improving OCR
Accuracy”. In: 2007 International Conference on Per-
spective Technologies and Methods in MEMS Design.
2007, pp. 75–80. DOI: 10 . 1109 / MEMSTECH . 2007 .
4283429.
[27] Agglomerative clustering - scikit-learn documentation.
URL: https://scikit-learn.org/stable/modules/generated/
sklearn.cluster.AgglomerativeClustering.html.
[28] TextBlob: Simplified Text Processing. URL: https : / /
textblob.readthedocs.io/en/dev/index.html.
[29] Peter Norvig. How to Write a Spelling Corrector? URL:
http://norvig.com/spell-correct.html.
[30] Jaccard index. May 2021. URL: https://en.wikipedia.
org/wiki/Jaccard index.
[31] Alfred Arnold’s FTP server with comic transcriptions.
URL: http://john.ccac.rwth-aachen.de:8000/ftp/dilbert/.