0% found this document useful (0 votes)

58 views14 pages

YOLOv5 Pytorch Implementation

Uploaded by

Asnaku Arja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views14 pages

YOLOv5 Pytorch Implementation

Uploaded by

Asnaku Arja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/374383796

UNVEILING DOCUMENT STRUCTURES WITH YOLOV5 LAYOUT DETECTION A

PREPRINT

Preprint · September 2023

CITATIONS READS

0 286

3 authors, including:

Herman Sugiharto Yani Siti Nurpazrin

Siliwangi University Siliwangi University
7 PUBLICATIONS 4 CITATIONS 1 PUBLICATION 0 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Herman Sugiharto on 02 October 2023.

The user has requested enhancement of the downloaded file.

U NVEILING D OCUMENT S TRUCTURES WITH YOLOV 5 L AYOUT
D ETECTION

A P REPRINT

Herman Sugiharto Yorisa Silviana

Department of Informatics Department of Informatics
Siliwangi University Siliwangi University
Tasikmalaya, Indonesia Tasikmalaya, Indonesia
[email protected] [email protected]

Yani Siti Nurpazrin

Department of Informatics
Siliwangi University
Tasikmalaya, Indonesia
[email protected]

September 29, 2023

A BSTRACT
The current digital environment is characterized by the widespread presence of data, particularly
unstructured data, which poses many issues in sectors including finance, healthcare, and education.
Conventional techniques for data extraction encounter difficulties in dealing with the inherent variety
and complexity of unstructured data, hence requiring the adoption of more efficient methodologies.
This research investigates the utilization of YOLOv5, a cutting-edge computer vision model, for the
purpose of rapidly identifying document layouts and extracting unstructured data.
The present study establishes a conceptual framework for delineating the notion of "objects" as they
pertain to documents, incorporating various elements such as paragraphs, tables, photos, and other
constituent parts. The main objective is to create an autonomous system that can effectively recognize
document layouts and extract unstructured data, hence improving the effectiveness of data extraction.
In the conducted examination, the YOLOv5 model exhibits notable effectiveness in the task of
document layout identification, attaining a high accuracy rate along with a precision value of 0.91, a
recall value of 0.971, an F1-score of 0.939, and an area under the receiver operating characteristic
curve (AUC-ROC) of 0.975. The remarkable performance of this system optimizes the process of
extracting textual and tabular data from document images. Its prospective applications are not limited
to document analysis but can encompass unstructured data from diverse sources, such as audio data.
This study lays the foundation for future investigations into the wider applicability of YOLOv5 in
managing various types of unstructured data, offering potential for novel applications across multiple
domains.

Keywords layout detection · unstructured data · YOLOv5

1 Introduction
In the contemporary and dynamic digital age, there has been a substantial rise in the generation and utilization of data.
Unstructured data, which refers to data that does not possess a predetermined format, holds significant importance
inside diverse domains including banking, healthcare, and education.Adnan and Akbar [2019a]. A significant portion
of the data contained in documents is found in unstructured formats and exhibits variability in terms of its style and
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

presentation, hence posing difficulties in the extraction of crucial information.Adnan and Akbar [2019b]. When faced
with these variances and complexities, conventional methods of data extraction frequently demonstrate ineffectiveness
and inefficiency Zaman et al. [2020]. In order to tackle this matter, the utilization of technologies such as artificial
intelligence and computer vision has facilitated the process of data extraction and processing. Nevertheless, there exists
potential for enhancement in terms of velocity, precision, and effectiveness. Diwan et al. [2022].
Detecting objects is a fundamental task in computer vision with numerous applications, including layout detection.
Throughout the years, the YOLO (You Only Look Once) line of models has emerged as a prominent solution for
real-time object identification, renowned for their exceptional speed and accuracy Jiménez-Bravo et al. [2022]. YOLOv5,
the most recent edition of the YOLO family, demonstrates notable advancements in accuracy and precision when
compared to its previous versions. While YOLOv4 shown remarkable performance, YOLOv5 has been rigorously
crafted to augment accuracy while maintaining efficient inference speed Kaur and Singh [2022]Arifando et al. [2023].
Through a combination of architectural refinements, novel data augmentation techniques, and a carefully curated
training process, YOLOv5 accomplishes superior object detection capabilities Hussain [2023].
This study’s primary objective is to investigate and enhance the application of techniques for identifying document
layouts and extracting unstructured data using the YOLOv5 framework. This study defines "objects" as the many
components found within documents, including but not limited to paragraphs, tables, photographs, and other similar
items. The primary aim of this study is to develop and deploy a system capable of autonomously identifying document
layouts and efficiently and precisely extracting unstructured data from these documents. This study is expected to
provide a valuable contribution towards enhancing the efficacy of unstructured data extraction.

2 Related Work

Numerous studies on layout detection and the application of the YOLOv5 architecture have been utilized in the past. In
a meticulously executed research project conducted by Pfitzmann et al. [2022], the academic community was introduced
to the revolutionary DocLayNet dataset. The dataset presented below signifies a significant transformation in the domain
of document layout research, providing an extensive collection of meticulously annotated document layouts. It consists
of an astounding total of 1,107,470 meticulously annotated objects, encompassing a wide range of diverse object classes,
including but not limited to text, images, mathematical formulas, code snippets, page headers and footers, and intricate
tabular structures. In contrast, the research undertaken by Pillai and Mangsuli [2021] followed a different research path,
focusing on data derived from the complex field of the oil and gas business. The study utilized advanced transformer
topologies to address the challenging problem of detecting and extracting layout components that are embedded within
intricate papers from this particular domain.
The YOLOv5 framework has been employed in a multitude of computer vision research endeavors, encompassing
several domains such as object recognition Diwan et al. [2022], Yue et al. [2022], Kitakaze et al. [2020], object tracking
Alvar and Bajic [2018], Younis A. Al-Arbo [2021], Kumari et al. [2021], and video analysis Wang et al. [2022], Gu
et al. [2022]. In the aforementioned experiments, YOLOv5 has exhibited a notable level of precision in conjunction
with its user-friendly nature.
In this exhaustive study, the research team has developed a sophisticated system that goes beyond layout detection;
it incorporates the intricate task of layout extraction guided by meticulously predefined classes. At the core of this
robust system lies YOLOv5, an advanced deep learning framework that serves as the layout detector. Its presence and
performance in the system contribute significantly to the overarching framework’s exceptional precision and efficacy.
The primary objective of this research is to revolutionize the processing of unstructured data, with a particular
concentration on PDF documents generated from scanned sources. The documents in question provide a significant
obstacle for traditional methods of extracting text from PDF files, since they are typically hindered by the complexities
of scanned images. The unique approach employed by the study team holds the potential to surpass the existing
constraints, providing a powerful solution to the challenging endeavor of efficiently extracting information from these
texts. As we progress further into the era of digital transformation, the advances made by this research hold the promise
of substantial advances in document processing, bridging the divide between unstructured data and actionable insights.

3 Methodology

The research is a quantitative study with an experimental approach. The experimental approach is chosen because the
aim of this research is to determine the cause-and-effect relationships among existing variables such as datasets, model
architectures, and model parameters (Williams, 2007).

2
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

The novelty targeted by this proposed research lies in the utilization of YOLOv5 for detecting layouts within a
document.

Literature Review The literature survey was undertaken in order to gain a comprehensive understanding of the
concepts and theories that are relevant to the research. This includes exploring the theoretical foundations of the YOLO
architecture, examining the process of data labeling, and investigating the techniques used for layout detection. The data
was obtained from secondary sources, including online platforms, academic publications, electronic books, scholarly
papers, and other relevant materials. Furthermore, in the literature review phase, a comprehensive examination of prior
scholarly articles was conducted to assess the research that pertains to the present research subject.

Problem Definition Through an examination of prior research, several gaps or weaknesses within these studies were
uncovered, hence highlighting opportunities for prospective enhancements. After identifying gaps or weaknesses, the
researchers generated research questions to establish the aims of the next study.

Data Collection During this phase, the data underwent preparation in order to train the forthcoming layout detection
model. The dataset included of photos depicting the layout of documents sourced from a variety of academic journals.
The data was subsequently annotated using Label Studio, employing pre-established categories.

Model Training During this stage, the existing YOLOv5 architecture was trained using optimal parameters to produce
an appropriate model. The model was trained using the provided hardware and labeled data.

Model Evaluation During this phase, the trained model was subjected to several tests utilizing the pre-existing
provided data. The evaluation process additionally incorporated manual human assessment in order to augment the
validity of the evaluation data. The evaluation process involved the utilization of metrics such as accuracy, precision,
and F1 score for the purpose of calculations.

Conclusion Drawing conclusions provided an overview of the data analysis and model evaluation, encompassing the
entirety of the research.

4 Results and Discussion

4.1 Base Model

YOLO was initially proposed by Redmon et al. [2016] in 2016. This method gained recognition for its real-time
processing speed of 45 frames per second. Simultaneously, the method maintained competitive performance and even
achieved state-of-the-art results on popular datasets.
YOLOv5 is designed for fast and accurate real-time object detection. This algorithm offers several performance
enhancements compared to its previous versions Redmon and Farhadi [2016], Redmon et al. [2016], Redmon and
Farhadi [2018], including improved speed and detection capabilities. One of the key advantages of YOLOv5 is its
ability to conduct object detection swiftly on resource-constrained devices such as CPUs or mobile devices. This
enables researchers or academics to perform real-time object detection rapidly without sacrificing accuracy Jocher et al.
[2022].

Figure 1: YOLOv5 architecture Jocher et al. [2022].

3
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

The architectural design of YOLOv5, as illustrated in Figure 1, showcases its segmentation into three main
components: Backbone, PANet, and Output. The Backbone, alternatively referred to as the feature extractor, is a
crucial component within a network that is tasked with extracting fundamental elements from the input image. The
YOLOv5 model incorporates the CSPDarknet53 architecture as its underlying framework. The Path Aggregation
Network (PANet) is a key element of the YOLOv5 framework, designed to effectively aggregate information from
many scales. The PANet architecture facilitates the integration of contextual information from many scales, hence
enhancing the ability to recognize objects of varying sizes. The YOLOv5 model produces a result of several bounding
boxes and corresponding class labels, representing the detected objects in the given image. According to Jin (2022),
bounding boxes are utilized to establish the precise coordinates and dimensions of objects within an image, while class
labels serve to identify the specific category to which the identified object belongs.

4.2 Layout Detection

The technique of Layout Detection is utilized to ascertain the configuration of elements within a document Vitagliano
et al. [2022]. In this study, the term "layout" refers to the various components that comprise the structure of a layout,
including titles, text, photos, captions, and tables, as seen in Figure 2. The data extraction process for detected
documents is determined based on the specific type of data contained inside them. The process of extracting data is
depicted in Figure 3.

Figure 2: Document Layout.

The extraction components used in this research are as follows:

Optical Character Recognition (OCR) This method is employed to transform text data present in scanned documents
into editable and searchable text Billah et al. [2015]. The OCR framework used in this research is Tesseract. Tesseract
is a framework developed by Google for optical character recognition needs, offering ease of use Smith [2007].

4
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

Table extraction encompasses two components, table structure recognition and OCR. Table structure recognition is
used to detect the structure of tables, including rows, columns, and cells. The PubTables-1M model Smock et al. [2021]
is utilized for this purpose. This model accurately analyzes tables originating from images.
The extracted data will be combined into a JSON format and sorted based on the coordinate positions of the data
components. Consequently, the obtained data will include component coordinates (x1, y1, x2, y2), component classes
(such as text, tables, etc.), and data, as depicted in Figure 3.

Figure 3: Layout Detection Flow.

4.3 Dataset

The dataset included in this study comprises 153 PDF pages that have been transformed from diverse sources, such as
books and sample journals. The data was subsequently tagged utilizing Label Studio Tkachenko et al. [2020-2022] with
the subsequent classes:

Table 1: Data Classes.

Class Description
Title Attribute referring to the book title
Text Attribute referring to the text within the book
Image Attribute indicating images on the book page
Caption Attribute for captions of images or tables
Image_caption Group box for images and captions
Table Attribute for tables in the book
Table_caption Group box for tables and captions

Each page within the used dataset has a varying number of classes due to the distinct structures of each page. The
classes for the training data are indicated as shown in Figure 4.

Figure 4: Data train class.

5
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

The training data consists of 143 layout image data, while the test data comprises 10 layout image data, with data
classes visible in Figure 8.

Figure 5: Data test class.

4.4 Training Model

When conducting training, the parameters employed are outlined in Table 2.

Table 2: Data Classes

Parameter Value
Model variant YOLOv5 S
Epoch 500
Image Size 640
Patience 100
Cache RAM
Device GPU
Batch size 32

The environment utilized to execute the training is Google Colab Pro, with specifications as provided in Table 3.

Table 3: Hardware specifications

Hardware Specification
CPU 2 x Intel Xeon CPU @ 2.20GHz
GPU Tesla P100 16 GB
RAM 27 GB
Storage 129 GB available

4.5 Evaluation Metric

Evaluation metrics are tools used to measure the quality and performance of machine learning models Thambawita et al.
[2020]. Some of the metrics used include mAP50, mAP50-95, Precision, Recall, Box Loss, Class Loss, and Object
Loss.

Precision is the ratio of true positive predictions (TP) to the total number of positive predictions (T P + F P ).
Precision is used to measure the quality of positive predictions by the model Heyburn et al. [2018]. Precision is defined
as shown in Equation (1):

TP
P = (1)
TP + FP

6
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

Recall is the ratio of true positive predictions (TP) to the total number of actual positives (T P + F N ). Recall is used
to measure the model’s ability to find all positive samples Wang et al. [2022]. Recall is defined as shown in Equation
(2):

TP
R= (2)
TP + FN

mAP50 The average of the Average Precision (AP) is calculated by considering all classes. A detection is deemed
correct if the Intersection over Union (IoU) between the predicted bounding box and the ground truth is 0.5 or higher.
The aforementioned metric offers an assessment of the model’s effectiveness in object detection, allowing for a certain
degree of flexibility in terms of mistakes related to object placement and bounding box dimensions Heyburn et al.
[2018].

mAP50-95 The assessment metric employed in object detection tasks is frequently utilized inside competitive settings,
such as the COCO (Common Objects in Context) challenge. The metric being referred to is the mean Average Precision
(mAP) calculated across different Intersection over Union (IoU) criteria. These thresholds range from 0.5 to 0.95, with
an increment of 0.05 Thambawita et al. [2020].

Box Loss The metric referred to as box loss, or alternatively localization loss, evaluates the accuracy of a model’s
predictions regarding object bounding boxes. The calculation often involves determining the disparity between the
predicted bounding box coordinates generated by the model and the corresponding actual (ground truth) bounding box
coordinates. Two often employed metrics in this context are Mean Squared Error (MSE) and Intersection over Union
(IoU). Wang et al. [2022].

Class Loss The metric of class loss evaluates the model’s ability to accurately forecast object classes. The calculation
typically involves determining the discrepancy between the anticipated probability of class membership as estimated by
the model and the true classes as determined by the ground truth. Cross-Entropy Loss is a frequently employed metric
in this context Wang et al. [2022].

Object Loss The metric of object loss evaluates the model’s ability to accurately forecast the existence of objects. In
models like as YOLO, the prediction of the presence or absence of an object at the center of each cell in the visual
grid is made. The calculation of object loss involves determining the discrepancy between the anticipated probability
of object presence as determined by the model and the actual presence of the object, as indicated by the ground truth
Heyburn et al. [2018].

4.6 Training Results

The training results yield metric values as shown in Table 4, indicating mAP50, mAP50-95, Precision, and Recall
scores. Figure 6 illustrates the metric graph for iterations 238 to 381.

7
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

Figure 6: Training Model Metric Graph

Table 4: Training Model Metric

Metric Value
mAP50 0.97
mAP50-95 0.801
Precision 0.911
Recall 0.971

These results show that the model training has achieved a sufficiently high accuracy for predicting the provided
document layouts. The results also indicate that the training data stopped at epoch 381 due to achieving satisfying
accuracy and no further improvement, leading to early stopping of the model.
Box Loss as depicted in Figure 7 has values of 0.308 during the training process and 0.636 during validation.
These results indicate that the model can predict object bounding boxes well with low data loss.

8
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

Figure 7: Box Loss Metric Results

The model training yields small class loss values of 0.245 during training and 0.383 during validation, as shown in
Figure 8. This demonstrates the model’s ability to predict classes from the given layouts.

Figure 8: Class Loss Metric Results

The Object Loss metric refers to the model’s ability to detect objects before predicting their classes and bounding
boxes. The training value is 0.863, and the validation value is 0.85, as shown in Figure 9.

9
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

Figure 9: Object Loss Metric Results

The results of the extraction process are exemplified in Figure 10, demonstrating accurate predictions with high
speed.

Figure 10: Object Detection Results

Extraction results using regulation page data are shown in Figure 11, aligning with the original data. The average
extraction speed is 0.512 per page.

10
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

Figure 11: Text Extraction Results

The outcomes of the detection and extraction process provide evidence that the model successfully meets the
criteria for functioning as an unstructured document detector and extractor.

5 Conclusions
The utilization of YOLOv5 in document layout identification tasks has demonstrated significant efficacy, resulting
in a notable accuracy rate accompanied with precision values of 0.91 and recall values of 0.971. The exceptional
performance of this model has facilitated its ability to identify and retrieve textual and tabular data from document
images, hence accelerating the typically arduous task of extracting data from scanned documents. The capabilities
of YOLOv5 can be further expanded beyond the analysis of document layout, presenting opportunities for exciting
future study. This entails exploring the possibilities of utilizing many forms of unstructured data, encompassing not just
documents and photographs but also audio data analysis. This avenue has significant opportunities for a broad spectrum
of applications.

References
Kiran Adnan and Rehan Akbar. Limitations of information extraction methods and techniques for heterogeneous
unstructured big data. International Journal of Engineering Business Management, 11:184797901989077, January
2019a. doi:10.1177/1847979019890771. URL https://doi.org/10.1177/1847979019890771.
Kiran Adnan and Rehan Akbar. An analytical study of information extraction from unstructured and multidimensional
big data. Journal of Big Data, 6(1), October 2019b. doi:10.1186/s40537-019-0254-8. URL https://doi.org/10.
1186/s40537-019-0254-8.
Gohar Zaman, Hairulnizam Mahdin, Khalid Hussain, and Atta Rahman. Information extraction from semi
and unstructured data sources: A systematic literature review. ICIC Express Letters, 14:593–603, 06 2020.
doi:10.24507/icicel.14.06.593.
Tausif Diwan, G. Anirudh, and Jitendra V. Tembhurne. Object detection using YOLO: challenges, architectural
successors, datasets and applications. Multimedia Tools and Applications, 82(6):9243–9275, August 2022.
doi:10.1007/s11042-022-13644-y. URL https://doi.org/10.1007/s11042-022-13644-y.
D. M. Jiménez-Bravo, L. Lozano Murciego, A. Sales Mendes, H. Sánchez San Blás, and J. Bajo. Multi-object tracking
in traffic environments: A systematic literature review. Neurocomputing, 494:43–55, 7 2022.
J. Kaur and W. Singh. Tools, techniques, datasets and application areas for object detection in an image: a review.
Multimedia Tools and Applications, 81(27):38297–38351, apr 23 2022.
R. Arifando, S. Eto, and C. Wada. Improved YOLOv5-Based Lightweight Object Detection Algorithm for People with
Visual Impairment to Detect Buses. Applied Sciences, 13(9):5802, may 8 2023.
M. Hussain. Yolo-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing
and Industrial Defect Detection. Machines, 11(7):677, jun 23 2023.

11
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

Birgit Pfitzmann, Christoph Auer, Michele Dolfi, Ahmed S. Nassar, and Peter Staar. DocLayNet: A large human-
annotated dataset for document-layout segmentation. In Proceedings of the 28th ACM SIGKDD Conference
on Knowledge Discovery and Data Mining. ACM, August 2022. doi:10.1145/3534678.3539043. URL https:
//doi.org/10.1145/3534678.3539043.
Prashanth Pillai and Purnaprajna Mangsuli. Document layout analysis using detection transformers. In Day 3
Wed, November 17, 2021. SPE, December 2021. doi:10.2118/207266-ms. URL https://doi.org/10.2118/
207266-ms.
Xuebin Yue, Hengyi Li, Masao Shimizu, Sadao Kawamura, and Lin Meng. YOLO-GD: A deep learning-based object de-
tection algorithm for empty-dish recycling robots. Machines, 10(5):294, April 2022. doi:10.3390/machines10050294.
URL https://doi.org/10.3390/machines10050294.
Yu Kyō Kitakaze, Renjin Yoshihara, Souta Okabe, and Ryō Matsumura. Development of harmful bird recogni-
tion system using object detection YOLO. Journal of Industrial Application Engineering, 8(1):10–16, 2020.
doi:10.12792/jjiiae.8.1.10. URL https://doi.org/10.12792/jjiiae.8.1.10.
Saeed Ranjbar Alvar and Ivan V. Bajic. MV-YOLO: Motion vector-aided tracking by semantic object detection.
In 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP). IEEE, August 2018.
doi:10.1109/mmsp.2018.8547125. URL https://doi.org/10.1109/mmsp.2018.8547125.
Prof.Dr. Khalil I. Alsaif Younis A. Al-Arbo. Online multi-object tracking in videos based on features detected by
YOLO. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(6):2922–2931, April 2021.
doi:10.17762/turcomat.v12i6.5801. URL https://doi.org/10.17762/turcomat.v12i6.5801.
Niharika Kumari, Verena Ruf, Sergey Mukhametov, Albrecht Schmidt, Jochen Kuhn, and Stefan Küchemann. Mo-
bile eye-tracking data analysis using object detection via YOLO v4. Sensors, 21(22):7668, November 2021.
doi:10.3390/s21227668. URL https://doi.org/10.3390/s21227668.
Chao Wang, Yunchu Zhang, Yanfei Zhou, Shaohan Sun, Hanyuan Zhang, and Yepeng Wang. Automatic detection
of indoor occupancy based on improved YOLOv5 model. Neural Computing and Applications, 35(3):2575–2599,
September 2022. doi:10.1007/s00521-022-07730-3. URL https://doi.org/10.1007/s00521-022-07730-3.
Yue Gu, Shucai Wang, Yu Yan, Shijie Tang, and Shida Zhao. Identification and analysis of emergency behavior of
cage-reared laying ducks based on YoloV5. Agriculture, 12(4):485, March 2022. doi:10.3390/agriculture12040485.
URL https://doi.org/10.3390/agriculture12040485.
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object
detection, 2016.
Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger, 2016.
Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement, 2018.
Glenn Jocher, Ayush Chaurasia, Alex Stoken, Jirka Borovec, NanoCode012, Yonghye Kwon, Kalen Michael, TaoXie,
Jiacong Fang, Imyhxy, , Lorna, Zeng Yifu, Colin Wong, Abhiram V, Diego Montes, Zhiqiang Wang, Cristi Fati,
Jebastin Nadar, Laughing, UnglvKitDe, Victor Sonck, Tkianai, YxNONG, Piotr Skalski, Adam Hogan, Dhruv Nair,
Max Strobel, and Mrinal Jain. ultralytics/yolov5: v7.0 - yolov5 sota realtime instance segmentation, 2022. URL
https://zenodo.org/record/7347926.
Gerardo Vitagliano, Lucas Reisener, Lan Jiang, Mazhar Hameed, and Felix Naumann. Mondrian: Spreadsheet
layout detection. In Proceedings of the 2022 International Conference on Management of Data. ACM, June 2022.
doi:10.1145/3514221.3520152. URL https://doi.org/10.1145/3514221.3520152.
Mustain Billah, Sajjad Waheed, and Abu Hanifa. An optical character recognition system from printed text and text
image using adaptive neuro fuzzy inference system. International Journal of Computer Applications, 130(16):1–5,
November 2015. doi:10.5120/ijca2015907196. URL https://doi.org/10.5120/ijca2015907196.
R. Smith. An overview of the tesseract OCR engine. In Ninth International Conference on Document Analysis
and Recognition (ICDAR 2007) Vol 2. IEEE, September 2007. doi:10.1109/icdar.2007.4376991. URL https:
//doi.org/10.1109/icdar.2007.4376991.
Brandon Smock, Rohith Pesala, and Robin Abraham. Pubtables-1m: Towards comprehensive table extraction from
unstructured documents, 2021.
Maxim Tkachenko, Mikhail Malyuk, Andrey Holmanyuk, and Nikolai Liubimov. Label Studio: Data labeling software,
2020-2022. URL https://github.com/heartexlabs/label-studio. Open source software available from
https://github.com/heartexlabs/label-studio.

12
Unveiling Document Structures with YOLOv5 Layout Detection A P REPRINT

Vajira Thambawita, Debesh Jha, Hugo Lewi Hammer, Håvard D. Johansen, Dag Johansen, Pål Halvorsen, and
Michael A. Riegler. An extensive study on cross-dataset bias and evaluation metrics interpretation for machine
learning applied to gastrointestinal tract abnormality classification. ACM Transactions on Computing for Healthcare,
1(3):1–29, June 2020. doi:10.1145/3386295. URL https://doi.org/10.1145/3386295.
Rachel Heyburn, Raymond R. Bond, Michaela Black, Maurice Mulvenna, Jonathan Wallace, Deborah Rankin, and
Brian Cleland. Machine learning using synthetic and real data: Similarity of evaluation metrics for different
healthcare datasets and for different algorithms. In Data Science and Knowledge Engineering for Sensing Decision
Support. WORLD SCIENTIFIC, July 2018. doi:10.1142/9789813273238_0160. URL https://doi.org/10.
1142/9789813273238_0160.

View publication stats

Splunk for Data Insights: Definitive Reference for Developers and Engineers
From Everand
Splunk for Data Insights: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Fault Codes: STO U Andriiv
No ratings yet
Fault Codes: STO U Andriiv
4 pages
SDLC Final For Report
67% (3)
SDLC Final For Report
48 pages
Framecad Structure Procedures Overview
100% (1)
Framecad Structure Procedures Overview
95 pages
U D S Yolo 5 L D: Nveiling Ocument Tructures With V Ayout Etection
No ratings yet
U D S Yolo 5 L D: Nveiling Ocument Tructures With V Ayout Etection
13 pages
YOLO Object Detection Explained: Definitive Reference for Developers and Engineers
From Everand
YOLO Object Detection Explained: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Internet Usage in Sierra Leone
From Everand
Internet Usage in Sierra Leone
Dr. Kamara
No ratings yet
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
ICT Project Management: Framework for ICT-based Pedagogy System: Development, Operation, and Management
From Everand
ICT Project Management: Framework for ICT-based Pedagogy System: Development, Operation, and Management
Suman Ahmmed
No ratings yet
Docbank: A Benchmark Dataset For Document Layout Analysis
No ratings yet
Docbank: A Benchmark Dataset For Document Layout Analysis
12 pages
IoT in Everyday Life
From Everand
IoT in Everyday Life
Anasuya Menon
No ratings yet
HoloViews in Scientific Data Visualization: Definitive Reference for Developers and Engineers
From Everand
HoloViews in Scientific Data Visualization: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mobile Agents in Networking and Distributed Computing
From Everand
Mobile Agents in Networking and Distributed Computing
Jiannong Cao
No ratings yet
Transportation Management Land & Sea, Aviation and Infrastructure Concepts: Analyzing the influence of Covid on company processes
From Everand
Transportation Management Land & Sea, Aviation and Infrastructure Concepts: Analyzing the influence of Covid on company processes
BoD - Books on Demand
No ratings yet
Layout Similarity
No ratings yet
Layout Similarity
18 pages
Teaching and Learning in Technology Empowered Classrooms—Issues, Contexts and Practices
From Everand
Teaching and Learning in Technology Empowered Classrooms—Issues, Contexts and Practices
Ajitha Nayar K
No ratings yet
The Information Process: A Model and Hierarchy
From Everand
The Information Process: A Model and Hierarchy
Victor Yang
No ratings yet
Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies
From Everand
Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies
Ruchi Doshi
No ratings yet
Self-Supervised Learning: Teaching AI with Unlabeled Data
From Everand
Self-Supervised Learning: Teaching AI with Unlabeled Data
Robert Johnson
No ratings yet
Incentivizing Collaborative BIM-Enabled Projects: A Synthesis of Agency and Behavioral Approaches
From Everand
Incentivizing Collaborative BIM-Enabled Projects: A Synthesis of Agency and Behavioral Approaches
Chen-Yu Chang
No ratings yet
Comprehensive Guide to Zipkin: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Zipkin: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Microstructural Characterization of Materials
From Everand
Microstructural Characterization of Materials
David Brandon
No ratings yet
Principles of Observability for Modern Systems: Definitive Reference for Developers and Engineers
From Everand
Principles of Observability for Modern Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Remote Learning Made Easy During the Pandemic
From Everand
Remote Learning Made Easy During the Pandemic
Rupinder Ganaka
No ratings yet
A Table Detection Method For Multipage PDF Documents Via Visual Seperators and Tabular Structures
No ratings yet
A Table Detection Method For Multipage PDF Documents Via Visual Seperators and Tabular Structures
5 pages
Hybrid Leadership in the Burgeoning Digital Economy: 1, #1
From Everand
Hybrid Leadership in the Burgeoning Digital Economy: 1, #1
Uday Kumar Ghosh
No ratings yet
Reimagine Tech-Inclusive Education: Evidence, Practices, and Road Map
From Everand
Reimagine Tech-Inclusive Education: Evidence, Practices, and Road Map
Asian Development Bank
No ratings yet
Graph Data Science with Python and Neo4j
From Everand
Graph Data Science with Python and Neo4j
Timothy Eastridge
No ratings yet
Graph Data Science with Python and Neo4j: Hands-on Projects on Python and Neo4j Integration for Data Visualization and Analysis Using Graph Data Science for Building Enterprise Strategies (English Edition)
From Everand
Graph Data Science with Python and Neo4j: Hands-on Projects on Python and Neo4j Integration for Data Visualization and Analysis Using Graph Data Science for Building Enterprise Strategies (English Edition)
Timothy Eastridge
No ratings yet
Secure Edge Computing for IoT: Master Security Protocols, Device Management, Data Encryption, and Privacy Strategies to Innovate Solutions for Edge Computing in IoT
From Everand
Secure Edge Computing for IoT: Master Security Protocols, Device Management, Data Encryption, and Privacy Strategies to Innovate Solutions for Edge Computing in IoT
Oluyemi James
No ratings yet
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
From Everand
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
Bolakale Aremu
5/5 (1)
Secure Edge Computing for IoT: Master Security Protocols, Device Management, Data Encryption, and Privacy Strategies to Innovate Solutions for Edge Computing in IoT (English Edition)
From Everand
Secure Edge Computing for IoT: Master Security Protocols, Device Management, Data Encryption, and Privacy Strategies to Innovate Solutions for Edge Computing in IoT (English Edition)
Oluyemi James Odeyinka
No ratings yet
Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)
From Everand
Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)
Pooja Kelgaonkar
No ratings yet
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
From Everand
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
Isrin Ismail
No ratings yet
Protocols for Tracking Information Content in the Existing BIM
From Everand
Protocols for Tracking Information Content in the Existing BIM
Andrea di Filippo
No ratings yet
Internet of Things Theory and Practice: Build Smarter Projects to Explore the IoT Architecture and Applications (English Edition)
From Everand
Internet of Things Theory and Practice: Build Smarter Projects to Explore the IoT Architecture and Applications (English Edition)
Amit Kumar Tyagi
No ratings yet
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Data Science: Concepts, Strategies, and Applications
From Everand
Data Science: Concepts, Strategies, and Applications
Zemelak Goraga
No ratings yet
Big Data Ethics in Research
From Everand
Big Data Ethics in Research
Nicolae Sfetcu
No ratings yet
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition)
From Everand
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition)
Dr. Gypsy Nandi
No ratings yet
Different Approaches to Learning Science, Technology, Engineering, and Mathematics: Case Studies from Thailand, the Republic of Korea, Singapore, and Finland
From Everand
Different Approaches to Learning Science, Technology, Engineering, and Mathematics: Case Studies from Thailand, the Republic of Korea, Singapore, and Finland
Asian Development Bank
No ratings yet
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Thinger.io Architecture and Application Development: Definitive Reference for Developers and Engineers
From Everand
Thinger.io Architecture and Application Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
From Everand
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
Dr. Anita Gehlot
No ratings yet
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
From Everand
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Cybersecurity in Cloud Computing
From Everand
Cybersecurity in Cloud Computing
Akula Achari
No ratings yet
A Critical Analysis of National Apprenticeship Training Scheme (NATS) and It's Employability on Technical Graduates: A Case Study of the Eastern Region.
From Everand
A Critical Analysis of National Apprenticeship Training Scheme (NATS) and It's Employability on Technical Graduates: A Case Study of the Eastern Region.
DR. S.M. EJAZ AHMAD
No ratings yet
Visuals Matter!
From Everand
Visuals Matter!
Mario Arlt
No ratings yet
Agile Approaches on Large Projects in Large Organizations
From Everand
Agile Approaches on Large Projects in Large Organizations
Brian Hobbs
No ratings yet
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Deep Reinforcement Learning: An Essential Guide
From Everand
Deep Reinforcement Learning: An Essential Guide
Robert Johnson
No ratings yet
Exploring Higher Vocational Software Technology Education
From Everand
Exploring Higher Vocational Software Technology Education
Chen Ping
No ratings yet
Jaeger Distributed Tracing in Practice: Definitive Reference for Developers and Engineers
From Everand
Jaeger Distributed Tracing in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Strategic Policy Insights in Data Science
From Everand
Strategic Policy Insights in Data Science
Zemelak Goraga
No ratings yet
Pp-Structurev2: A Stronger Document Analysis System
No ratings yet
Pp-Structurev2: A Stronger Document Analysis System
8 pages
Deep Learning-Based Detection of One and Two-Column Textual Blocks in Camera-Captured Pashto Documents Images
No ratings yet
Deep Learning-Based Detection of One and Two-Column Textual Blocks in Camera-Captured Pashto Documents Images
10 pages
A Table Detection Method For Multipage PDF Documen
No ratings yet
A Table Detection Method For Multipage PDF Documen
6 pages
DIP Project Report
No ratings yet
DIP Project Report
16 pages
Implementing Single Shot Detector (SSD) in Keras
No ratings yet
Implementing Single Shot Detector (SSD) in Keras
49 pages
SSD Single Shot MultiBox Detector
No ratings yet
SSD Single Shot MultiBox Detector
17 pages
Improved SSD Network For Fast Concealed Object Detection and Recognition
No ratings yet
Improved SSD Network For Fast Concealed Object Detection and Recognition
16 pages
The Basics of Object Detection YOLO SSD R-CNN
No ratings yet
The Basics of Object Detection YOLO SSD R-CNN
4 pages
Thesis Kisner
No ratings yet
Thesis Kisner
86 pages
Image Recognition Using Artificial Intelligence Kumar2021
No ratings yet
Image Recognition Using Artificial Intelligence Kumar2021
4 pages
Guidelines in The Professional Electrical Engineer Licensure Examinations
100% (1)
Guidelines in The Professional Electrical Engineer Licensure Examinations
52 pages
Durga Prasad Resume
No ratings yet
Durga Prasad Resume
1 page
NCMB210 Midterm (1-5)
No ratings yet
NCMB210 Midterm (1-5)
18 pages
Cisco Meraki
No ratings yet
Cisco Meraki
15 pages
Diasonic DDR-5100 Digital Voice Recorder Instructions
No ratings yet
Diasonic DDR-5100 Digital Voice Recorder Instructions
2 pages
Datasheet
No ratings yet
Datasheet
9 pages
CEDRON - OLA-LP Model Formulation
No ratings yet
CEDRON - OLA-LP Model Formulation
7 pages
E sanadAU
No ratings yet
E sanadAU
2 pages
Mabin - Fundood Data - S To Z
No ratings yet
Mabin - Fundood Data - S To Z
36 pages
Security Issues in Cloud Computing
No ratings yet
Security Issues in Cloud Computing
19 pages
Ipcr Lecture
No ratings yet
Ipcr Lecture
1 page
MICROSOFT
No ratings yet
MICROSOFT
7 pages
Fixed Asset Form - 1
No ratings yet
Fixed Asset Form - 1
4 pages
Vsphere Esxi Vcenter Server 702 Storage Guide
No ratings yet
Vsphere Esxi Vcenter Server 702 Storage Guide
408 pages
Advanced Payroll Software & HRIS To Enhance Your HR Business Process
No ratings yet
Advanced Payroll Software & HRIS To Enhance Your HR Business Process
15 pages
Sbi Yono FINAL PROJECT
No ratings yet
Sbi Yono FINAL PROJECT
46 pages
KIOXIA Dell EMC Data Sheet Global 2023-04
No ratings yet
KIOXIA Dell EMC Data Sheet Global 2023-04
2 pages
ADI - Reference Manual
No ratings yet
ADI - Reference Manual
690 pages
Black White Minimalist CV Resume
No ratings yet
Black White Minimalist CV Resume
1 page
Data Visualization On Melbourne Housing Dataset
No ratings yet
Data Visualization On Melbourne Housing Dataset
11 pages
Google - Testinises.professional Cloud Architect - Dumps.2024 Jul 16.by - Burgess.119q.vce
No ratings yet
Google - Testinises.professional Cloud Architect - Dumps.2024 Jul 16.by - Burgess.119q.vce
13 pages
Farming Chia On An Old Computer - TurboFuture
100% (1)
Farming Chia On An Old Computer - TurboFuture
1 page
81742301
No ratings yet
81742301
2 pages
Lesson 3 IPV4 IPV6 in IP Addressing Mechanism.
No ratings yet
Lesson 3 IPV4 IPV6 in IP Addressing Mechanism.
9 pages
5-Instructions of Machine Maintenance A3 A3 Uv l1800
No ratings yet
5-Instructions of Machine Maintenance A3 A3 Uv l1800
3 pages
2023+CISSP+Domain+2+Study+Guide+by+ThorTeaches Com+v4 0
No ratings yet
2023+CISSP+Domain+2+Study+Guide+by+ThorTeaches Com+v4 0
9 pages
DELTA IA-Robot ALL C EN Ver2023 20231026
No ratings yet
DELTA IA-Robot ALL C EN Ver2023 20231026
16 pages

YOLOv5 Pytorch Implementation

Uploaded by

YOLOv5 Pytorch Implementation

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

UNVEILING DOCUMENT STRUCTURES WITH YOLOV5 LAYOUT DETECTION A

Preprint · September 2023

Herman Sugiharto Yani Siti Nurpazrin

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Herman Sugiharto Yorisa Silviana

Yani Siti Nurpazrin

September 29, 2023

Keywords layout detection · unstructured data · YOLOv5

4 Results and Discussion

Figure 1: YOLOv5 architecture Jocher et al. [2022].

4.2 Layout Detection

Figure 2: Document Layout.

The extraction components used in this research are as follows:

Figure 3: Layout Detection Flow.

Table 1: Data Classes.

Figure 4: Data train class.

Figure 5: Data test class.

4.4 Training Model

When conducting training, the parameters employed are outlined in Table 2.

Table 2: Data Classes

Table 3: Hardware specifications

4.5 Evaluation Metric

4.6 Training Results

Figure 6: Training Model Metric Graph

Table 4: Training Model Metric

Figure 7: Box Loss Metric Results

Figure 8: Class Loss Metric Results

Figure 9: Object Loss Metric Results

Figure 10: Object Detection Results

Figure 11: Text Extraction Results

View publication stats

You might also like