Automatic Processing of Historical Japanese Mathematics (Wasan) Documents
Abstract
:Featured Application
Abstract
1. Introduction
- The basic rank deals with basic calculations using the four arithmetic operations. Mathematics in this level correspond to those necessary for daily life.
- The intermediate rank considers mathematics as a hobby. Students of this rank learn from an established textbook (one of the aforementioned “Wasan books”) and can receive diplomas from a Wasan master to show their progress.
- The highest rank corresponds to the level of Wasan masters who were responsible for the writing of Wasan books. These masters studied academic content at a level comparable to Western mathematics of the same period.
- The description of the problem (problem statement) written in Edo period Kanji. Extracting this description would amount to (1) determining the regions of Wasan documents containing kanji, (2) recognizing each individual Kanji. Completing both steps would allow us to present documents along with their textual description and use automatic translation to make the database accessible to non-Japanese users.
- Determining the location of the particular kanji called “ima” 今. The description of all problems starts with the formulaic sentence “Now, as shown,…”’今有如. The “ima” (今, now) kanji marks the start of the problem description and is typically placed beside or underneath the problem diagram. Consequently, finding the “ima” kanji without human intervention is a convenient way to start the automatic processing of these documents.
- We have used a larger database of 100 Wasan pages to evaluate the algorithms developed. This includes manual annotations on a large number of documents and expands significantly on our previous work [6,7]. See Section 2.1 for details.
- We provide detailed evaluation of the performance of several blob detector algorithms with two goals: (1) Extracting full kanji texts from Wasan documents, and (2) locating the 今 kanji. Goal 1 is new to the current work and the study of goal 2 has been greatly expanded.
- We present a new, extensive evaluation of the DL part of our study, focusing on what networks works better for our goals. As a major novel contribution of this study we have compared in detail the performances of five well known DL networks.
- Additionally, we also investigate the effect that the kanji databases that we use have in the final results. We explore two strategies: (1) Using only modern kanji databases, which are easier to find and contain more characters, but have the disadvantage of not fully matching the type of characters present in historical Wasan documents, and (2) using classical kanji databases as these characters are closer to those appearing in Wasan documents, but also presenting increased variability and fewer examples. The modern kanji database used has been expanded in the current work and the ancient kanji database is used for the first time.
- Finally, the combined performances of the kanji detection and classification steps are studied in depth for the problem of determining the position of the “ima” kanji. For this application, the influence of the kanji database used to train the DL network has also been studied for the first time.
2. Materials and Methods
2.1. Data
2.1.1. Wasan Images
2.1.2. Kanji Databases
2.2. Algorithm Overview
- Preprocessing steps to improve image quality:
- -
- Hough line detector [31] to find (almost) vertical lines in the documents. These lines are then compared to the vertical direction. By rotating the pages so that the lines coincide with the vertical direction in the image we can correct orientation misalignements introduced during the document scanning process. See Section 2.3 for details.
- -
- Noise removal [32]. This step is performed to eliminate small regions produced by dirt in the original documents or minor scanning problems. See Section 2.4 for details.
- Blob detectors to determine regions candidate to containing kanji. See Section 2.5 for details.
- -
- Preprocessed images are used as the input for blob detector algorithms. These algorithms output parts of each page that are likely to correspond to kanji characters and store them as separate images. Three blob detector algorithms were studied (see Section 3.1 for the comparison).
- DL-Based Classification. See Section 2.6 for details.
- -
- In the last step of our pipeline, DL Networks are used to classify each possible kanji image into one of the kanji classes considered. Five different DL networks were studied for this purpose. See Section 3.2 for detailed results.
- The output of our algorithm is composed of the coordinates of each detected kanji along with its kanji category. We pay special attention to those regions classified as to belong to the “ima” kanji. See Section 3.3 for the details focusing on the results related to the “ima” kanji as well as the combined performance of the blob detector and kanji classification algorithms for this particular case.
2.3. Orientation Correction
2.4. Noise Correction
2.5. Individual Kanji Segmentation: Blob Detectors
2.5.1. Laplacian of Gausian LoG
2.5.2. Difference of Gaussians DoG
2.5.3. Determinant of the Hessian DoH
2.6. Classification of Kanji Images Using DL Networks
2.6.1. Kanji Datasets Considered
- The ETL dataset [19] is made up of 3036 kanji handwritten in modern style. The classes in this dataset are totally balanced and each kanji class is represented by 200 examples. As many of the kanji in the dataset represent relatively infrequent kanji, and in order to reduce the time needed to run the experiments, we chose a subset of the ETL database made up of 2136 classes of “regular use kanji” as listed by the Japanese Ministry of Education. For each of these kanji, we considered the 200 examples in the ETL database.
- The Kuzushiji-Kanji dataset is made up of 3832 kanji characters extracted from historical documents. Some of the characters have more that 1500 examples, while half the classes have fewer than 10. The dataset is, thus, very heavily imbalanced. Furthermore, many of the classes are also very infrequent. In order to achieve a slightly less imbalanced dataset made up of more frequent characters, we discarded the classes with less than 10 examples (obtaining a dataset with 1636 kanji classes) and downsampled the classes with more than 200 examples to contain exactly 200 randomly chose examples.
2.6.2. DL Networks Tested
- Alexnet [46] is one of the first widely used convolutional neural networks, composed of eight layers (five convolutional layers sometimes followed by max-pooling layers and three fully connected layers). This network was the one that started the current DL trend after outperforming the current state-of-the-art method on the ImageNet data set by a large margin.
- Squeezenet [47] uses so-called squeeze filters, including a point-wise filter to reduce the number of necessary parameters. A similar accuracy to Alexnet was claimed with fewer parameters.
- VGG [21] represents an evolution of the Alexnet network that allowed for an increased number of layers (16 in the version considered in our work) by using smaller convolutional filters.
- Resnet [48] is one of the first DL architectures to allow a higher number of layers (and, thus, “deeper” networks) by including blocks composed of convolution, batch normalization and ReLU. In the current work, a version with 50 layers was used.
- Densenet [49] is another evolution of the resnet network that uses a larger number of connections between layers to claim increased parameter efficiency and better feature propagation that allows them to work with even more layers (121 in this work).
3. Results
3.1. Experiment 1: Blob Detectors for Kanji Segmentation
3.2. Experiment 2: Classification of the Modern and Ancient Kanji Using Deep Learning Networks
3.3. Experiment 3: Classification of the “ima” kanji
4. Discussion
4.1. Kanji Detection
4.2. Kanji Classification
4.3. “ima” Kanji Detection
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Matsuoka, M. Wasan, and Its Cultural Background. In Katachi and Symmetry; Ogawa, T., Miura, K., Masunari, T., Nagy, D., Eds.; Springer: Tokyo, Japan, 1996; pp. 341–345. [Google Scholar]
- Martzloff, J.C. A survey of Japanese publications on the history of Japanese traditional mathematics (Wasan) from the last 30 years. Hist. Math. 1990, 17, 366–373. [Google Scholar] [CrossRef] [Green Version]
- Smith, D.E.; Mikami, Y. A History of Japanese Mathematics; Felix Meiner: Leipzig, Germany, 1914; p. 288. [Google Scholar]
- Mitsuyoshi, Y. Jinkouki; Wasan Institute: Tokyo, Japan, 2000; p. 215. [Google Scholar]
- Fukagawa, H.; Rothman, T. Sacred Mathematics: Japanese Temple Geometry; Princeton Publishing: Princeton, NJ, USA, 2008; p. 348. [Google Scholar]
- Diez, Y.; Suzuki, T.; Vila, M.; Waki, K. Computer vision and deep learning tools for the automatic processing of WASAN documents. In Proceedings of the ICPRAM 2019—Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, Prague, Czech Republic, 19–21 February 2019; pp. 757–765. [Google Scholar]
- Suzuki, T.; Diez, Y.; Vila, M.; Waki, K. Computer Vision and Deep learning algorithms for the automatic processing of Wasan documents. In Proceedings of the 34th Annual Conference of JSAI, Online. 9–12 June 2020; The Japanese Society for Artificial Intelligence: Kumamoto, Japan, 2020; pp. 4Rin1–10. [Google Scholar]
- Pomplun, M. Hands-On Computer Vision; World Scientific Publishing: Singapore, 2022. [Google Scholar] [CrossRef]
- Liu, C.; Dengel, A.; Lins, R.D. Editorial for special issue on “Advanced Topics in Document Analysis and Recognition”. Int. J. Doc. Anal. Recognit. 2019, 22, 189–191. [Google Scholar] [CrossRef] [Green Version]
- Otter, D.W.; Medina, J.R.; Kalita, J.K. A Survey of the Usages of Deep Learning for Natural Language Processing. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 604–624. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dahl, C.M.; Johansen, T.S.D.; Sørensen, E.N.; Westermann, C.E.; Wittrock, S.F. Applications of Machine Learning in Document Digitisation. CoRR 2021. Available online: http://xxx.lanl.gov/abs/2102.03239 (accessed on 1 January 2021).
- Philips, J.; Tabrizi, N. Historical Document Processing: A Survey of Techniques, Tools, and Trends. In Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2020, Volume 1: KDIR, Budapest, Hungary, 2–4 November 2020; Fred, A.L.N., Filipe, J., Eds.; SCITEPRESS: Setúbal, Portugal, 2020; pp. 341–349. [Google Scholar] [CrossRef]
- Cao, C.; Wang, B.; Zhang, W.; Zeng, X.; Yan, X.; Feng, Z.; Liu, Y.; Wu, Z. An Improved Faster R-CNN for Small Object Detection. IEEE Access 2019, 7, 106838–106846. [Google Scholar] [CrossRef]
- Guo, L.; Wang, D.; Li, L.; Feng, J. Accurate and fast single shot multibox detector. IET Comput. Vis. 2020, 14, 391–398. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; IEEE Computer Society: Washington, DC, USA, 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. CoRR 2018. Available online: http://xxx.lanl.gov/abs/1804.02767 (accessed on 1 January 2021).
- Tomás Pérez, J.V. Recognition of Japanese Handwritten Characters with Machine Learning Techniques. Bachelor’s Thesis, University of Alicante, Alicante, Spain, 2020. [Google Scholar]
- Wang, Q.; Yin, F.; Liu, C. Handwritten Chinese Text Recognition by Integrating Multiple Contexts. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1469–1481. [Google Scholar] [CrossRef] [PubMed]
- ETL. ETL Character Database. 2018. Available online: http://etlcdb.db.aist.go.jp/ (accessed on 20 November 2018).
- Tsai, C. Recognizing Handwritten Japanese Characters Using Deep Convolutional Neural Networks; Technical Report; Standford University: Standford, CA, USA, 2016. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations ICLR 2015, San Diego, CA, USA, 7–9 May 2015. Conference Track Proceedings. [Google Scholar]
- Grębowiec, M.; Protasiewicz, J. A Neural Framework for Online Recognition of Handwritten Kanji Characters. In Proceedings of the 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), Poznań, Poland, 9–12 September 2018; pp. 479–483. [Google Scholar]
- Clanuwat, T.; Bober-Irizar, M.; Kitamoto, A.; Lamb, A.; Yamamoto, K.; Ha, D. Deep Learning for Classical Japanese Literature. CoRR 2018. Available online: http://xxx.lanl.gov/abs/cs.CV/1812.01718 (accessed on 1 January 2021).
- Ueki, K.; Kojima, T. Survey on Deep Learning-Based Kuzushiji Recognition. In Pattern Recognition. ICPR International Workshops and Challenges; Del Bimbo, A., Cucchiara, R., Sclaroff, S., Farinella, G.M., Mei, T., Bertini, M., Escalante, H.J., Vezzani, R., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 97–111. [Google Scholar]
- Saini, S.; Verma, V. International Journal of Recent Technology and Engineering IJ. CoRR 2019, 8, 3510–3515. [Google Scholar]
- Ahmed Ali, A.A.; Suresha, M.; Mohsin Ahmed, H.A. Different Handwritten Character Recognition Methods: A Review. In Proceedings of the 2019 Global Conference for Advancement in Technology (GCAT), Bangalore, India, 18–20 October 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Tang, Y.; Hatano, K.; Takimoto, E. Recognition of Japanese Historical Hand-Written Characters Based on Object Detection Methods. In Proceedings of the 5th International Workshop on Historical Document Imaging and Processing, HIP@ICDAR 2019, Sydney, NSW, Australia, 20–21 September 2019; pp. 72–77. [Google Scholar] [CrossRef]
- Ueki, K.; Kojima, T. Japanese Cursive Character Recognition for Efficient Transcription. In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods—Volume 1: ICPRAM, INSTICC, Valletta, Malta, 22–24 February 2020; pp. 402–406. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Yamagata University. Yamagata University Wasan Sakuma Collection (Japanese). 2018. Available online: https://www.ocrconvert.com/japanese-ocr (accessed on 20 November 2018).
- Fernandes, L.A.; Oliveira, M.M. Real-time line detection through an improved Hough transform voting scheme. Pattern Recognit. 2008, 41, 299–314. [Google Scholar] [CrossRef]
- Agrawal, M.; Doermann, D.S. Clutter noise removal in binary document images. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; Volume 16, pp. 351–369. [Google Scholar] [CrossRef] [Green Version]
- Illingworth, J.; Kittler, J. A survey of the hough transform. Comput. Vis. Graph. Image Process. 1988, 44, 87–116. [Google Scholar] [CrossRef]
- Matas, J.; Galambos, C.; Kittler, J. Progressive Probabilistic Hough Transform. In Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), Fort Collins, CO, USA, 23–25 June 1999; Volume 1, pp. 554–560. [Google Scholar] [CrossRef] [Green Version]
- Arnia, F.; Muchallil, S.; Munadi, K. Noise characterization in ancient document images based on DCT coefficient distribution. In Proceedings of the 13th International Conference on Document Analysis and Recognition, ICDAR 2015, Nancy, France, 23–26 August 2015; pp. 971–975. [Google Scholar] [CrossRef]
- Barna, N.H.; Erana, T.I.; Ahmed, S.; Heickal, H. Segmentation of Heterogeneous Documents into Homogeneous Components using Morphological Operations. In Proceedings of the 17th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2018, Singapore, 6–8 June 2018; pp. 513–518. [Google Scholar] [CrossRef]
- Goyal, B.; Dogra, A.; Agrawal, S.; Sohi, B.S. Two-dimensional gray scale image denoising via morphological operations in NSST domain & bitonic filtering. Future Gener. Comp. Syst. 2018, 82, 158–175. [Google Scholar] [CrossRef]
- Tekleyohannes, M.K.; Weis, C.; Wehn, N.; Klein, M.; Siegrist, M. A Reconfigurable Accelerator for Morphological Operations. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2018, Vancouver, BC, Canada, 21–25 May 2018; pp. 186–193. [Google Scholar] [CrossRef]
- Van der Walt, S.; Schönberger, J.L.; Nunez-Iglesias, J.; Boulogne, F.; Warner, J.D.; Yager, N.; Gouillart, E.; Yu, T.; The Scikit-Image Contributors. scikit-image: Image processing in Python. PeerJ 2014, 2, e453. [Google Scholar] [CrossRef] [PubMed]
- Lindeberg, T. Image Matching Using Generalized Scale-Space Interest Points. J. Math. Imaging Vis. 2015, 52, 3–36. [Google Scholar] [CrossRef] [Green Version]
- Marr, D.; Hildreth, E. Theory of Edge Detection. Proc. R. Soc. Lond. Ser. B 1980, 207, 187–217. [Google Scholar] [CrossRef]
- Diez, Y.; Kentsch, S.; Fukuda, M.; Caceres, M.L.L.; Moritake, K.; Cabezas, M. Deep Learning in Forestry Using UAV-Acquired RGB Data: A Practical Review. Remote Sens. 2021, 13, 2837. [Google Scholar] [CrossRef]
- Wen, J.; Thibeau-Sutre, E.; Diaz-Melo, M.; Samper-González, J.; Routier, A.; Bottani, S.; Dormont, D.; Durrleman, S.; Burgos, N.; Colliot, O. Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation. Med. Image Anal. 2020, 63, 101694. [Google Scholar] [CrossRef] [PubMed]
- Cabezas, M.; Kentsch, S.; Tomhave, L.; Gross, J.; Caceres, M.L.L.; Diez, Y. Detection of Invasive Species in Wetlands: Practical DL with Heavily Imbalanced Data. Remote Sens. 2020, 12, 3431. [Google Scholar] [CrossRef]
- Howard, J.; Thomas, R.; Gugger, S. Fastai. Available online: https://github.com/fastai/fastai (accessed on 1 January 2021).
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1, Red Hook, NY, USA, 3–6 December 2012; Curran Associates Inc.: Red Hook, NY, USA, 2012; pp. 1097–1105. [Google Scholar]
- Iandola, F.N.; Moskewicz, M.W.; Ashraf, K.; Han, S.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <1 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2016; pp. 2261–2269. [Google Scholar]
- Kentsch, S.; Caceres, M.L.L.; Serrano, D.; Roure, F.; Diez, Y. Computer Vision and Deep Learning Techniques for the Analysis of Drone-Acquired Forest Images, a Transfer Learning Study. Remote Sens. 2020, 12, 1287. [Google Scholar] [CrossRef] [Green Version]
- Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 2000, 120, 122–125. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in PyTorch; NIPS Autodiff Workshop: Long Beach, CA, USA, 2017. [Google Scholar]
今 Detection % | 今 Classification | ||
---|---|---|---|
Classif. DataSet | (LoG Blob Detector) | TPR | FPR |
Classic (Kuzushiji) | 100 | 0.95 | 0.03 |
Modern (ETL) | 100 | 0.10 | 0.04 |
Mixed | 100 | 0.85 | 0.21 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Diez, Y.; Suzuki, T.; Vila, M.; Waki, K. Automatic Processing of Historical Japanese Mathematics (Wasan) Documents. Appl. Sci. 2021, 11, 8050. https://doi.org/10.3390/app11178050
Diez Y, Suzuki T, Vila M, Waki K. Automatic Processing of Historical Japanese Mathematics (Wasan) Documents. Applied Sciences. 2021; 11(17):8050. https://doi.org/10.3390/app11178050
Chicago/Turabian StyleDiez, Yago, Toya Suzuki, Marius Vila, and Katsushi Waki. 2021. "Automatic Processing of Historical Japanese Mathematics (Wasan) Documents" Applied Sciences 11, no. 17: 8050. https://doi.org/10.3390/app11178050