Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (134)

Search Parameters:
Keywords = Optical Character Recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 5724 KiB  
Article
Comparative Approach to De-Noising TEMPEST Video Frames
by Alexandru Mădălin Vizitiu, Marius Alexandru Sandu, Lidia Dobrescu, Adrian Focșa and Cristian Constantin Molder
Sensors 2024, 24(19), 6292; https://doi.org/10.3390/s24196292 - 28 Sep 2024
Viewed by 292
Abstract
Analysis of unintended compromising emissions from Video Display Units (VDUs) is an important topic in research communities. This paper examines the feasibility of recovering the information displayed on the monitor from reconstructed video frames. The study holds particular significance for our understanding of [...] Read more.
Analysis of unintended compromising emissions from Video Display Units (VDUs) is an important topic in research communities. This paper examines the feasibility of recovering the information displayed on the monitor from reconstructed video frames. The study holds particular significance for our understanding of security vulnerabilities associated with the electromagnetic radiation of digital displays. Considering the amount of noise that reconstructed TEMPEST video frames have, the work in this paper focuses on two different approaches to de-noising images for efficient optical character recognition. First, an Adaptive Wiener Filter (AWF) with adaptive window size implemented in the spatial domain was tested, and then a Convolutional Neural Network (CNN) with an encoder–decoder structure that follows both classical auto-encoder model architecture and U-Net architecture (auto-encoder with skip connections). These two techniques resulted in an improvement of more than two times on the Structural Similarity Index Metric (SSIM) for AWF and up to four times for the SSIM for the Deep Learning (DL) approach. In addition, to validate the results, the possibility of text recovery from processed noisy frames was studied using a state-of-the-art Tesseract Optical Character Recognition (OCR) engine. The present work aims to bring to attention the security importance of this topic and the non-negligible character of VDU information leakages. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

19 pages, 644 KiB  
Article
SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning
by Anjali Shinde, Essa Q. Shahra, Shadi Basurra, Faisal Saeed, Abdulrahman A. AlSewari and Waheb A. Jabbar
Sensors 2024, 24(18), 6084; https://doi.org/10.3390/s24186084 - 20 Sep 2024
Viewed by 661
Abstract
The growing problem of unsolicited text messages (smishing) and data irregularities necessitates stronger spam detection solutions. This paper explores the development of a sophisticated model designed to identify smishing messages by understanding the complex relationships among words, images, and context-specific factors, areas that [...] Read more.
The growing problem of unsolicited text messages (smishing) and data irregularities necessitates stronger spam detection solutions. This paper explores the development of a sophisticated model designed to identify smishing messages by understanding the complex relationships among words, images, and context-specific factors, areas that remain underexplored in existing research. To address this, we merge a UCI spam dataset of regular text messages with real-world spam data, leveraging OCR technology for comprehensive analysis. The study employs a combination of traditional machine learning models, including K-means, Non-Negative Matrix Factorization, and Gaussian Mixture Models, along with feature extraction techniques such as TF-IDF and PCA. Additionally, deep learning models like RNN-Flatten, LSTM, and Bi-LSTM are utilized. The selection of these models is driven by their complementary strengths in capturing both the linear and non-linear relationships inherent in smishing messages. Machine learning models are chosen for their efficiency in handling structured text data, while deep learning models are selected for their superior ability to capture sequential dependencies and contextual nuances. The performance of these models is rigorously evaluated using metrics like accuracy, precision, recall, and F1 score, enabling a comparative analysis between the machine learning and deep learning approaches. Notably, the K-means feature extraction with vectorizer achieved 91.01% accuracy, and the KNN-Flatten model reached 94.13% accuracy, emerging as the top performer. The rationale behind highlighting these models is their potential to significantly improve smishing detection rates. For instance, the high accuracy of the KNN-Flatten model suggests its applicability in real-time spam detection systems, but its computational complexity might limit scalability in large-scale deployments. Similarly, while K-means with vectorizer excels in accuracy, it may struggle with the dynamic and evolving nature of smishing attacks, necessitating continual retraining. Full article
Show Figures

Figure 1

26 pages, 5241 KiB  
Article
Automated Identification of Cylindrical Cells for Enhanced State of Health Assessment in Lithium-Ion Battery Reuse
by Alejandro H. de la Iglesia, Fernando Lobato Alejano, Daniel H. de la Iglesia, Carlos Chinchilla Corbacho and Alfonso J. López Rivero
Batteries 2024, 10(9), 299; https://doi.org/10.3390/batteries10090299 - 24 Aug 2024
Viewed by 700
Abstract
Lithium-ion batteries are pervasive in contemporary life, providing power for a vast array of devices, including smartphones and electric vehicles. With the projected sale of millions of electric vehicles globally by 2022 and over a million electric vehicles in Europe alone in the [...] Read more.
Lithium-ion batteries are pervasive in contemporary life, providing power for a vast array of devices, including smartphones and electric vehicles. With the projected sale of millions of electric vehicles globally by 2022 and over a million electric vehicles in Europe alone in the first quarter of 2023, the necessity of securing a sustainable supply of lithium-ion batteries has reached a critical point. As the demand for electric vehicles and renewable energy storage (ESS) systems increases, so too does the necessity to address the shortage of lithium batteries and implement effective recycling and recovery practices. A considerable number of electric vehicle batteries will reach the end of their useful life in the near future, resulting in a significant increase in the number of used batteries. It is of paramount importance to accurately identify the manufacturer and model of cylindrical batteries to ascertain their State of Health (SoH) and guarantee their efficient reuse. This study focuses on the automation of the identification of cylindrical cells through optical character recognition (OCR) and the analysis of the external color of the cell and the anode morphology based on computer vision techniques. This is a novel work in the current limited literature, which aims to bridge the gap between industrialized lithium-ion cell recovery processes and an automated SoH calculation. Accurate battery identification optimizes battery reuse, reduces manufacturing costs and mitigates environmental impact. The results of the work are promising, achieving 90% accuracy in the identification of 18,650 cylindrical cells. Full article
Show Figures

Figure 1

22 pages, 1588 KiB  
Article
Investigating the Challenges and Opportunities in Persian Language Information Retrieval through Standardized Data Collections and Deep Learning
by Sara Moniri, Tobias Schlosser and Danny Kowerko
Computers 2024, 13(8), 212; https://doi.org/10.3390/computers13080212 - 21 Aug 2024
Viewed by 768
Abstract
The Persian language, also known as Farsi, is distinguished by its intricate morphological richness, yet it contends with a paucity of linguistic resources. With an estimated 110 million speakers, it finds prevalence across Iran, Tajikistan, Uzbekistan, Iraq, Russia, Azerbaijan, and Afghanistan. However, despite [...] Read more.
The Persian language, also known as Farsi, is distinguished by its intricate morphological richness, yet it contends with a paucity of linguistic resources. With an estimated 110 million speakers, it finds prevalence across Iran, Tajikistan, Uzbekistan, Iraq, Russia, Azerbaijan, and Afghanistan. However, despite its widespread usage, scholarly investigations into Persian document retrieval remain notably scarce. This circumstance is primarily attributed to the absence of standardized test collections, which impedes the advancement of comprehensive research endeavors within this realm. As data corpora are the foundation of natural language processing applications, this work aims at Persian language datasets to address their availability and structure. Subsequently, we motivate a learning-based framework for the processing of Persian texts and their recognition, for which current state-of-the-art approaches from deep learning, such as deep neural networks, are further discussed. Our investigations highlight the challenges of realizing such a system while emphasizing its possible benefits for an otherwise rarely covered language. Full article
Show Figures

Figure 1

18 pages, 7344 KiB  
Article
A User Location Reset Method through Object Recognition in Indoor Navigation System Using Unity and a Smartphone (INSUS)
by Evianita Dewi Fajrianti, Yohanes Yohanie Fridelin Panduman, Nobuo Funabiki, Amma Liesvarastranta Haz, Komang Candra Brata and Sritrusta Sukaridhoto
Network 2024, 4(3), 295-312; https://doi.org/10.3390/network4030014 - 22 Jul 2024
Viewed by 602
Abstract
To enhance user experiences of reaching destinations in large, complex buildings, we have developed a indoor navigation system using Unity and a smartphone called INSUS. It can reset the user location using a quick response (QR) code to reduce the loss of [...] Read more.
To enhance user experiences of reaching destinations in large, complex buildings, we have developed a indoor navigation system using Unity and a smartphone called INSUS. It can reset the user location using a quick response (QR) code to reduce the loss of direction of the user during navigation. However, this approach needs a number of QR code sheets to be prepared in the field, causing extra loads at implementation. In this paper, we propose another reset method to reduce loads by recognizing information of naturally installed signs in the field using object detection and Optical Character Recognition (OCR) technologies. A lot of signs exist in a building, containing texts such as room numbers, room names, and floor numbers. In the proposal, the Sign Image is taken with a smartphone, the sign is detected by YOLOv8, the text inside the sign is recognized by PaddleOCR, and it is compared with each record in the Room Database using Levenshtein distance. For evaluations, we applied the proposal in two buildings in Okayama University, Japan. The results show that YOLOv8 achieved [email protected] 0.995 and [email protected]:0.95 0.978, and PaddleOCR could extract text in the sign image accurately with an averaged CER% lower than 10%. The combination of both YOLOv8 and PaddleOCR decreases the execution time by 6.71s compared to the previous method. The results confirmed the effectiveness of the proposal. Full article
Show Figures

Figure 1

28 pages, 35864 KiB  
Article
Custom Anchorless Object Detection Model for 3D Synthetic Traffic Sign Board Dataset with Depth Estimation and Text Character Extraction
by Rahul Soans and Yohei Fukumizu
Appl. Sci. 2024, 14(14), 6352; https://doi.org/10.3390/app14146352 - 21 Jul 2024
Viewed by 860
Abstract
This paper introduces an anchorless deep learning model designed for efficient analysis and processing of large-scale 3D synthetic traffic sign board datasets. With an ever-increasing emphasis on autonomous driving systems and their reliance on precise environmental perception, the ability to accurately interpret traffic [...] Read more.
This paper introduces an anchorless deep learning model designed for efficient analysis and processing of large-scale 3D synthetic traffic sign board datasets. With an ever-increasing emphasis on autonomous driving systems and their reliance on precise environmental perception, the ability to accurately interpret traffic sign information is crucial. Our model seamlessly integrates object detection, depth estimation, deformable parts, and text character extraction functionalities, facilitating a comprehensive understanding of road signs in simulated environments that mimic the real world. The dataset used has a large number of artificially generated traffic signs for 183 different classes. The signs include place names in Japanese and English, expressway names in Japanese and English, distances and motorway numbers, and direction arrow marks with different lighting, occlusion, viewing angles, camera distortion, day and night cycles, and bad weather like rain, snow, and fog. This was done so that the model could be tested thoroughly in a wide range of difficult conditions. We developed a convolutional neural network with a modified lightweight hourglass backbone using depthwise spatial and pointwise convolutions, along with spatial and channel attention modules that produce resilient feature maps. We conducted experiments to benchmark our model against the baseline model, showing improved accuracy and efficiency in both depth estimation and text extraction tasks, crucial for real-time applications in autonomous navigation systems. With its model efficiency and partwise decoded predictions, along with Optical Character Recognition (OCR), our approach suggests its potential as a valuable tool for developers of Advanced Driver-Assistance Systems (ADAS), Autonomous Vehicle (AV) technologies, and transportation safety applications, ensuring reliable navigation solutions. Full article
Show Figures

Figure 1

12 pages, 868 KiB  
Article
Trademark Text Recognition Combining SwinTransformer and Feature-Query Mechanisms
by Boxiu Zhou, Xiuhui Wang, Wenchao Zhou and Longwen Li
Electronics 2024, 13(14), 2814; https://doi.org/10.3390/electronics13142814 - 17 Jul 2024
Viewed by 537
Abstract
The task of trademark text recognition is a fundamental component of scene text recognition (STR), which currently faces a number of challenges, including the presence of unordered, irregular or curved text, as well as text that is distorted or rotated. In applications such [...] Read more.
The task of trademark text recognition is a fundamental component of scene text recognition (STR), which currently faces a number of challenges, including the presence of unordered, irregular or curved text, as well as text that is distorted or rotated. In applications such as trademark infringement detection and analysis of brand effects, the diversification of artistic fonts in trademarks and the complexity of the product surfaces where the trademarks are located pose major challenges for relevant research. To tackle these issues, this paper proposes a novel recognition framework named SwinCornerTR, which aims to enhance the accuracy and robustness of trademark text recognition. Firstly, a novel feature-extraction network based on SwinTransformer with EFPN (enhanced feature pyramid network) is proposed. By incorporating SwinTransformer as the backbone, efficient capture of global information in trademark images is achieved through the self-attention mechanism and enhanced feature pyramid module, providing more accurate and expressive feature representations for subsequent text extraction. Then, during the encoding stage, a novel feature point-retrieval algorithm based on corner detection is designed. The OTSU-based fast corner detector is presented to generate a corner map, achieving efficient and accurate corner detection. Furthermore, in the encoding phase, a feature point-retrieval mechanism based on corner detection is introduced to achieve priority selection of key-point regions, eliminating character-to-character lines and suppressing background interference. Finally, we conducted extensive experiments on two open-access benchmark datasets, SVT and CUTE80, as well as a self-constructed trademark dataset, to assess the effectiveness of the proposed method. Our results showed that the proposed method achieved accuracies of 92.9%, 92.3% and 84.8%, respectively, on these datasets. These results demonstrate the effectiveness and robustness of the proposed method in the analysis of trademark data. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

32 pages, 7283 KiB  
Technical Note
Research on the Training and Application Methods of a Lightweight Agricultural Domain-Specific Large Language Model Supporting Mandarin Chinese and Uyghur
by Kun Pan, Xiaogang Zhang and Liping Chen
Appl. Sci. 2024, 14(13), 5764; https://doi.org/10.3390/app14135764 - 1 Jul 2024
Viewed by 830
Abstract
In the field of Natural Language Processing (NLP), the lack of support for minority languages, especially Uyghur, the scarcity of Uyghur language corpora in the agricultural domain, and the lightweight nature of large language models remain prominent issues. This study proposes a method [...] Read more.
In the field of Natural Language Processing (NLP), the lack of support for minority languages, especially Uyghur, the scarcity of Uyghur language corpora in the agricultural domain, and the lightweight nature of large language models remain prominent issues. This study proposes a method for constructing a bilingual (Uyghur and Chinese) lightweight specialized large language model for the agricultural domain. By utilizing a mixed training approach of Uyghur and Chinese, we extracted Chinese corpus text from agricultural-themed books in PDF format using OCR (Optical Character Recognition) technology, converted the Chinese text corpus into a Uyghur corpus using a rapid translation API, and constructed a bilingual mixed vocabulary. We applied the parameterized Transformer model algorithm to train the model for the agricultural domain in both Chinese and Uyghur. Furthermore, we introduced a context detection and fail-safe mechanism for the generated text. The constructed model possesses the ability to support bilingual reasoning in Uyghur and Chinese in the agricultural domain, with higher accuracy and a smaller size that requires less hardware. It (our work) addresses issues such as the scarcity of Uyghur corpora in the agricultural domain, mixed word segmentation and word vector modeling in Uyghur for widespread agricultural languages, model lightweighting and deployment, and the fragmentation of non-relevant texts during knowledge extraction from small-scale corpora. The lightweight design of the model reduces hardware requirements, facilitating deployment in resource-constrained environments. This advancement promotes agricultural intelligence, aids in the development of specific applications and minority languages (such as agriculture and Uyghur), and contributes to rural revitalization. Full article
Show Figures

Figure 1

15 pages, 1403 KiB  
Article
BERTopic for Enhanced Idea Management and Topic Generation in Brainstorming Sessions
by Asma Cheddak, Tarek Ait Baha, Youssef Es-Saady, Mohamed El Hajji and Mohamed Baslam
Information 2024, 15(6), 365; https://doi.org/10.3390/info15060365 - 20 Jun 2024
Viewed by 1193
Abstract
Brainstorming is an important part of the design thinking process since it encourages creativity and innovation through bringing together diverse viewpoints. However, traditional brainstorming practices face challenges such as the management of large volumes of ideas. To address this issue, this paper introduces [...] Read more.
Brainstorming is an important part of the design thinking process since it encourages creativity and innovation through bringing together diverse viewpoints. However, traditional brainstorming practices face challenges such as the management of large volumes of ideas. To address this issue, this paper introduces a decision support system that employs the BERTopic model to automate the brainstorming process, which enhances the categorization of ideas and the generation of coherent topics from textual data. The dataset for our study was assembled from a brainstorming session on “scholar dropouts”, where ideas were captured on Post-it notes, digitized through an optical character recognition (OCR) model, and enhanced using data augmentation with a language model, GPT-3.5, to ensure robustness. To assess the performance of our system, we employed both quantitative and qualitative analyses. Quantitative evaluations were conducted independently across various parameters, while qualitative assessments focused on the relevance and alignment of keywords with human-classified topics during brainstorming sessions. Our findings demonstrate that BERTopic outperforms traditional LDA models in generating semantically coherent topics. These results demonstrate the usefulness of our system in managing the complex nature of Arabic language data and improving the efficiency of brainstorming sessions. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) for Economics and Business Management)
Show Figures

Graphical abstract

17 pages, 2445 KiB  
Article
Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports
by Ivan Malashin, Igor Masich, Vadim Tynchenko, Andrei Gantimurov, Vladimir Nelyub and Aleksei Borodulin
Mach. Learn. Knowl. Extr. 2024, 6(2), 1361-1377; https://doi.org/10.3390/make6020064 - 18 Jun 2024
Viewed by 1701
Abstract
This study presents an integrated approach for automatically extracting and structuring information from medical reports, captured as scanned documents or photographs, through a combination of image recognition and natural language processing (NLP) techniques like named entity recognition (NER). The primary aim was to [...] Read more.
This study presents an integrated approach for automatically extracting and structuring information from medical reports, captured as scanned documents or photographs, through a combination of image recognition and natural language processing (NLP) techniques like named entity recognition (NER). The primary aim was to develop an adaptive model for efficient text extraction from medical report images. This involved utilizing a genetic algorithm (GA) to fine-tune optical character recognition (OCR) hyperparameters, ensuring maximal text extraction length, followed by NER processing to categorize the extracted information into required entities, adjusting parameters if entities were not correctly extracted based on manual annotations. Despite the diverse formats of medical report images in the dataset, all in Russian, this serves as a conceptual example of information extraction (IE) that can be easily extended to other languages. Full article
(This article belongs to the Section Data)
Show Figures

Figure 1

20 pages, 6519 KiB  
Article
A Bio-Inspired Retinal Model as a Prefiltering Step Applied to Letter and Number Recognition on Chilean Vehicle License Plates
by John Kern, Claudio Urrea, Francisco Cubillos and Ricardo Navarrete
Appl. Sci. 2024, 14(12), 5011; https://doi.org/10.3390/app14125011 - 8 Jun 2024
Viewed by 818
Abstract
This paper presents a novel use of a bio-inspired retina model as a scene preprocessing stage for the recognition of letters and numbers on Chilean vehicle license plates. The goal is to improve the effectiveness and ease of pattern recognition. Inspired by the [...] Read more.
This paper presents a novel use of a bio-inspired retina model as a scene preprocessing stage for the recognition of letters and numbers on Chilean vehicle license plates. The goal is to improve the effectiveness and ease of pattern recognition. Inspired by the responses of mammalian retinas, this retinal model reproduces both the natural adjustment of contrast and the enhancement of object contours by parvocellular cells. Among other contributions, this paper provides an in-depth exploration of the architecture, advantages, and limitations of the model; investigates the tuning parameters of the model; and evaluates its performance when integrating a convolutional neural network and a spiking neural network into an optical character recognition (OCR) algorithm, using 40 different genuine license plate images as a case study and for testing. The results obtained demonstrate the reduction of error rates in character recognition based on convolutional neural networks (CNNs), spiking neural networks (SNNs), and OCR. It is concluded that this bio-inspired retina model offers a wide spectrum of potential applications to further explore, including motion detection, pattern recognition, and improvement of dynamic range in images, among others. Full article
Show Figures

Figure 1

25 pages, 730 KiB  
Review
Handwritten Recognition Techniques: A Comprehensive Review
by Husam Ahmad Alhamad, Mohammad Shehab, Mohd Khaled Y. Shambour, Muhannad A. Abu-Hashem, Ala Abuthawabeh, Hussain Al-Aqrabi, Mohammad Sh. Daoud and Fatima B. Shannaq
Symmetry 2024, 16(6), 681; https://doi.org/10.3390/sym16060681 - 2 Jun 2024
Cited by 1 | Viewed by 1724
Abstract
Given the prevalence of handwritten documents in human interactions, optical character recognition (OCR) for documents holds immense practical value. OCR is a field that empowers the translation of various document types and images into data that can be analyzed, edited, and searched. In [...] Read more.
Given the prevalence of handwritten documents in human interactions, optical character recognition (OCR) for documents holds immense practical value. OCR is a field that empowers the translation of various document types and images into data that can be analyzed, edited, and searched. In handwritten recognition techniques, symmetry can be crucial to improving accuracy. It can be used as a preprocessing step to normalize the input data, making it easier for the recognition algorithm to identify and classify characters accurately. This review paper aims to summarize the research conducted on character recognition for handwritten documents and offer insights into future research directions. Within this review, the research articles focused on handwritten OCR were gathered, synthesized, and examined, along with closely related topics, published between 2019 and the first quarter of 2024. Well-established electronic databases and a predefined review protocol were utilized for article selection. The articles were identified through keyword, forward, and backward reference searches to comprehensively cover all relevant literature. Following a rigorous selection process, 116 articles were included in this systematic literature review. This review article presents cutting-edge achievements and techniques in OCR and underscores areas where further research is needed. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

18 pages, 4288 KiB  
Article
Construction Method and Practical Application of Oil and Gas Field Surface Engineering Case Database Based on Knowledge Graph
by Taiwu Xia, Zhixiang Dai, Yihua Zhang, Feng Wang, Wei Zhang, Li Xu, Dan Zhou and Jun Zhou
Processes 2024, 12(6), 1088; https://doi.org/10.3390/pr12061088 - 25 May 2024
Viewed by 698
Abstract
To address the challenge of quickly and efficiently accessing relevant management experience for a wide range of ground engineering construction projects, supporting project management with information technology is crucial. This includes the establishment of a case database and an application platform for intelligent [...] Read more.
To address the challenge of quickly and efficiently accessing relevant management experience for a wide range of ground engineering construction projects, supporting project management with information technology is crucial. This includes the establishment of a case database and an application platform for intelligent search and recommendations. The article leverages Optical Character Recognition (OCR) technology, knowledge graph technology, and Natural Language Processing (NLP) technology. It explores the mechanisms for classifying construction cases, methods for constructing a case database, structuring case data, intelligently retrieving and matching cases, and intelligent recommendation methods. This research forms a complete, feasible, and scalable method for deconstructing, storing, intelligently retrieving, and recommending construction cases, providing a theoretical basis for the establishment of a construction case database. It aims to meet the needs of digital project management and intelligent decision-making support in the oil and gas sector, thereby enhancing the efficiency and accuracy of project construction. This work offers a theoretical foundation for the development of an intelligent management platform for ground engineering projects in the oil and gas industry, supporting the sector’s digital transformation and intelligent development. Full article
Show Figures

Figure 1

25 pages, 13403 KiB  
Review
A Review of Document Binarization: Main Techniques, New Challenges, and Trends
by Zhengxian Yang, Shikai Zuo, Yanxi Zhou, Jinlong He and Jianwen Shi
Electronics 2024, 13(7), 1394; https://doi.org/10.3390/electronics13071394 - 7 Apr 2024
Viewed by 1725
Abstract
Document image binarization is a challenging task, especially when it comes to text segmentation in degraded document images. The binarization, as a pre-processing step of Optical Character Recognition (OCR), is one of the most fundamental and commonly used segmentation methods. It separates the [...] Read more.
Document image binarization is a challenging task, especially when it comes to text segmentation in degraded document images. The binarization, as a pre-processing step of Optical Character Recognition (OCR), is one of the most fundamental and commonly used segmentation methods. It separates the foreground text from the background of the document image to facilitate subsequent image processing. In view of the different degradation degrees of document images, researchers have proposed a variety of solutions. In this paper, we have summarized some challenges and difficulties in the field of document image binarization. Approximately 60 methods documenting image binarization techniques are mentioned, including traditional algorithms and deep learning-based algorithms. Here, we evaluated the performance of 25 image binarization techniques on the H-DIBCO2016 dataset to provide some help for future research. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

28 pages, 3632 KiB  
Article
Discovering Trends in the Digitalization of Shipping: An Exploratory Study into Trends Using Natural Language Processing
by Geoffrey Aerts and Guy Mathys
J. Mar. Sci. Eng. 2024, 12(4), 618; https://doi.org/10.3390/jmse12040618 - 4 Apr 2024
Cited by 1 | Viewed by 1429
Abstract
This study investigates digitalization in the shipping industry by analyzing over 500 industry presentations from an eight-year span to discern key trends and nascent signals. Employing optical character recognition, advanced natural language processing techniques, and similarity metrics, the research enhances topic interpretability. Through [...] Read more.
This study investigates digitalization in the shipping industry by analyzing over 500 industry presentations from an eight-year span to discern key trends and nascent signals. Employing optical character recognition, advanced natural language processing techniques, and similarity metrics, the research enhances topic interpretability. Through Theil–Sen regressions and diffusion metrics, it identifies trends and emerging signals, noting a rise in interest in smart ports and supply chain management, signaling a shift toward more intelligent technology integration. However, attention to supply chain management shows a decline. The research tracks a shift from broad technology themes to specific areas like cybersecurity and blockchain, reflecting a narrative pivot to tackle particular digital challenges and opportunities. The study detects weak signals, including terms like “subsea” and “drone”, suggesting forthcoming industry innovations and shifts, notably toward ESG considerations. An additional machine learning analysis corroborates findings on key topics like energy efficiency and crew welfare, also spotlighting virtual disaster recovery and ERP projects as emerging areas of interest. This work aids in comprehending the fluid digitalization landscape in shipping, highlighting the sector’s ongoing evolution, and underscoring the need for further inquiry into autonomous shipping and related domains. Full article
Show Figures

Figure 1

Back to TopTop