2017 IEVC Yanagisawa
2017 IEVC Yanagisawa
© Atsushi Sasaki
5. EXPERIMENT
© Hishika Minamisawa
1
0.98
0.96
A
0.94
P 0.92
test
0.9
train
0.88
iteration number
1
Panel 1 has 2 characters and 3 balloons 0.98
Panel 2 has 1 character and 2 balloons A 0.96
P 0.94
Fig.3 Example of panel structure recognition test
0.92
train
In this experiment, we define a true positive as the 0.9
detected area overlapping the correct area more than 50%.
© Atsushi Sasaki
(a) Examples of panel detection by [5] (b) Examples of panel detection by Faster R-CNN
Fig.5 Examples of panel detection for flat panels and connected panels
Table 1 Results of comic component extraction for 5 layout as shown in Fig.7. In Fig.6 and Fig.7, red rectangle
comic sources by Faster R-CNN shows the detected area as comic component.
R P
Total TP FN FP
(%) (%) 6. CONCLUSION & FUTURE WORK
Panel 859 770 90 40 89.5 95.1
Balloon 1190 1161 29 42 97.6 96.5 In this paper, we evaluated panel structure recognition
Character 937 803 134 50 85.7 94.1 using Faster R-CNN. Experimental results show our
proposed method success to recognizing 67.5% of panel
structures on average.
Table 2 Results of comic component extraction for 5 comic
For future works, there are some possible
sources by [5] improvements in detection for panels and character faces
R P those are hard to detected in this method. As a specific
Total TP FN FP technique, it is considerable to combine image processing
(%) (%)
such as highlighting division lines of panels with Faster
Panel 859 481 378 183 56.0 72.4
R-CNN detection. In addition, for obtaining metadata to
Balloon 1190 790 400 650 66.4 54.9
be used for automatic generation of comic summaries, we
need to consider a technique for classifying main
characters from detected character faces.
Table 3 Results of panel structure recognition for 5 comic
sources
7. REFERENCES
B (%) C (%) B + C (%)
Comic A 83.0 74.5 68.1 [1] Internet Media Research Institute: “eComic Marketing
Comic B 91.4 89.8 84.9 Report 2012”, Impress R&D, pp.14 (2012).
Comic C 81.7 72.8 66.3
Comic D 94.6 69.0 65.2 [2] D. Ishii, K. Kawamura, H. Watanabe: “A Study on Frame
Comic E 62.3 62.9 52.8 Decomposition of Comic Images", IEICE Transactions, Vol.
J90-D, No.7, pp. 1667—1670 (2007).
is defined as follows: “B” means the panels which speech [3] S. Nonaka, T. Sawano, N. Haneda: “Development of “GT-
balloon numbers correctly extracted, “C” means the Scan”, the Technology for Automatic Detection of Frames in
panels which character face numbers correctly extracted Scanned Comic”, FUJIFILM RESEARCH &
and “B + C” means the panels which both numbers of DEVELOPMENT, No.57, pp.46—49 (2012).
speech balloon and character face correctly extracted. An
experimental result is shown in Table 3. From this result, [4] T. Tanaka, F. Toyama, J. Miyamichi, K. Shoji: “Detection
and Classification of Speech Balloons in Comic Images”,
the highest value of B + C is 84.9% in comic B and the
Journal of the Institute of Image Information and Television
lowest value is 52.8% in comic E.
Engineers, Vol.64, No.12, pp.1933—1939 (2010).
An example case of failure to panel structure
recognition is the detection failure caused by deformed
faces as shown in Fig.6. In addition, the reason of low
recognition rate in Comic E is that it contains fuzzy panel
Proceedings of the Fifth IIEEJ International Workshop
on Image Electronics and Visual Computing 2017
Da Nang, Vietnam, February 28- March 3, 2017