0% found this document useful (0 votes)

10 views31 pages

Finalreport Dip

The final report from HCMC University of Technology and Education presents a project on Vietnamese handwriting recognition using digital image processing techniques. The project aims to develop a user-friendly system that accurately recognizes handwritten Vietnamese text by leveraging advanced architectures like Vision Transformer and VGG16, along with a RESTful API for user interaction. The report outlines the problem, objectives, theoretical background, design, implementation, and results, highlighting the challenges and solutions in achieving effective OCR for Vietnamese handwriting.

Uploaded by

phantrieuhuy2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views31 pages

Finalreport Dip

Uploaded by

phantrieuhuy2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION

FACULTY OF INFORMATION TECHNOLOGY

FINAL REPORT

RECOGNITION VIETNAMESE HANDWRITTEN

COURSE NAME: Digital Image Processing (DIPR430685E)

SEMESTER 2 – YEAR 2024-2025

Group: 04

Lecturer name: Assoc. Prof. Hoàng Văn Dũng

Hồ Chí Minh, 5/2025

HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
FACULTY OF INFORMATION TECHNOLOGY
____________________________________________________________________

HCM, 5/2025

GROUP LIST AND SELF-ASSESSMENT FORM

Second Semester of the 2024–2025 Academic Year

1. Course: Digital Image Processing (DIPR430685E)

2. Lecturer: Hoàng Văn Dũng

3. Project Title: Vietnamese Handwriting Recognition

4. Group List and Self-Assessment Table:

STUDENT NAME STUDENT Participati

ID on Rate %

1 Phạm Nam Hào 22110023 100%

2 Phan Triệu Huy 22110038 100%

- Total Participation Rate = 100%

Lecturer's Comments
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.................................................................................................................................................
.............................................................................................

May 2025

Graded by Lecturer

Hoàng Văn Dũng

TABLE OF CONTENT
1. PROJECT DESCRIPTION............................................................................................. 6
1.1 Problem....................................................................................................................... 6
1.2 Objectives....................................................................................................................6
2. THEORETICAL BACKGROUND................................................................................ 6
2.1 Overview..................................................................................................................... 6
2.2 Image Processing.........................................................................................................6
2.3 Optical Character Recognition (OCR)........................................................................ 7
3. DESIGN AND IMPLEMENTATION.............................................................................8
3.1 Tools and Technologies............................................................................................... 8
3.2 Data Preprocessing......................................................................................................8
3.2.1 Dataset:........................................................................................................................8
3.2.2 Preprocessing:........................................................................................................... 10
3.3 Model Architecture and Training.............................................................................. 10
3.4 Application Integration............................................................................................. 13
3.5 Implementation:.................................................................................................................. 14
3.5.1 Code.......................................................................................................................... 14
3.5.2 UI............................................................................................................................... 24
4. RESULTS AND DISCUSSION..................................................................................... 25
5. CONCLUSION AND FUTURE DEVELOPMENT....................................................28
6. REFERENCES........................................................................................................................... 29
PREFACE
To complete this topic and report, we would like to send our sincere thanks to the lecturer,
Hoang Van Dung, who directly supported us throughout the project process. We thank the
teacher for advising his practical experience to guide us following the requirements of the
chosen topic, always answering questions and providing timely suggestions and
corrections, time to help us overcome our shortcomings and complete it well and on time.
We would also like to express our sincere thanks to the teachers in the High Quality
Training Department in general and the Information Technology industry in particular for
wholeheartedly imparting the necessary knowledge to help us have the foundation to
create. This topic has created conditions for us to learn and carry out the topic well. Along
with that, we would like to thank our classmates for providing a lot of useful information
and knowledge to help us improve our topic.
We conducted the topic and report in a short time, with limited knowledge and many other
technical limitations, and experience in implementing a software project. Therefore, in the
process of creating the topic, there are inevitable shortcomings, so we look forward to
receiving valuable comments from teachers to improve our knowledge and improve our
knowledge. We can do better next time. We sincerely thank you.
Finally, we respectfully wish all teachers and ladies good health and greater career success.
Once again, we sincerely thank you.
Hồ Chí Minh, ... May 2025
1. PROJECT DESCRIPTION
1.1 Problem

The recognition of handwritten Vietnamese text is challenging due to diverse writing

styles, complex diacritical marks, and tonal variations in Vietnamese characters. Existing
Optical Character Recognition (OCR) tools, such as EasyOCR, struggle with low accuracy
for handwritten Vietnamese text, particularly for characters with inconsistent handwriting.
There is a critical need for a fast, accurate, and user-friendly system for recognizing
handwritten Vietnamese text, leveraging advanced architectures like Vision Transformer
(ViT) and VGG16 to improve performance.

1.2 Objectives

● Develop a system to recognize and extract handwritten Vietnamese text from

images.
● Apply image preprocessing, scene text detection, and OCR to recognize Vietnamese
handwriting.
● Train custom models for Vietnamese handwriting recognition using two
architectures: a Vision Transformer (ViT) encoder and a VGG16 encoder, both
combined with a Language Transformer decoder.
● Deploy a FastAPI based on RESTful API with a web interface allowing users to
upload images, view detected text regions, and receive recognized text with optional
refinement.
● Enhance recognition accuracy using PhoBERT for text refinement and explore beam
search for improved inference.

2. THEORETICAL BACKGROUND
2.1 Overview

Optical Character Recognition (OCR) converts images containing text into digital text,
widely used in document digitization and handwritten text processing. Pattern recognition
supports character classification, crucial for handling diverse handwriting styles. This
project combines OCR and pattern recognition using two deep learning architectures, such
as: Vision Transformer (ViT) for global context and VGG16 for local feature extraction, to
address Vietnamese handwriting recognition challenges.

2.2 Image Processing

The image processing techniques used include:

● Grayscale Conversion: Converts color images to grayscale to reduce computational
complexity.
● Thresholding and Noise Removal: Applies thresholding to enhance contrast and
remove noise, improving OCR quality.
● Resizing and Normalization: Standardizes images to 224x224 pixels to match ViT
and VGG16 input requirements.
● Text Region Detection: Uses EasyOCR to identify text-containing areas in images.

2.3 Optical Character Recognition (OCR)

The OCR process includes:

● Preprocessing: Enhances image quality for recognition.

● Text Region Detection: Identifies text areas using EasyOCR.
● Character Recognition: Predicts text from detected regions using custom models.
EasyOCR supports Vietnamese but struggles with handwritten text due to limited
training data. To address this, the project integrates VietOCR, tailored for
Vietnamese text, and trains two custom models:
● ViT-Based Model: Uses a Vision Transformer encoder with a Language
Transformer decoder for global context.
● VGG16-Based Model: Uses a VGG16 convolutional backbone with a Language
Transformer decoder for local feature extraction.

Both models are fine-tuned on a custom Vietnamese handwriting dataset, with PhoBERT
refining recognized text to improve accuracy.

Applications of OCR in This Context: The OCR system developed for Vietnamese
handwriting recognition has targeted applications in specialized domains. For instance, it
can be applied to digitize doctor-handwritten prescriptions or medical notes in Vietnamese,
enabling hospitals to convert illegible handwriting into digital records for better patient
data management and integration with electronic health systems. Similarly, the system can
process handwritten administrative forms in government offices, automating data entry and
reducing errors. In educational settings, it facilitates the transcription of handwritten
student notes or exam scripts, supporting digital grading and archiving processes. These
applications leverage the VietOCR models’ ability to handle complex Vietnamese
diacritical marks and diverse handwriting styles.
3. DESIGN AND IMPLEMENTATION
3.1 Tools and Technologies

Programming Language: Python

Development Framework: FastAPI

Libraries:

● EasyOCR: For detecting scene text in images.

● VietOCR: For recognizing Vietnamese text, with a custom-trained model.
● OpenCV, NumPy, PIL: For image preprocessing (grayscale conversion, contrast
enhancement, sharpness adjustment).
● PyTorch, Transformers: For training and deploying deep learning models,
including ViT and Language Transformer.
● PhoBERT (vinai/phobert-base): For refining recognized text to improve accuracy.
● Pyvi: For Vietnamese text normalization and accent handling.

User Interface: HTML, CSS (Bootstrap), JavaScript for a web interface integrated with
FastAPI.

3.2 Data Preprocessing

3.2.1 Dataset:

This is the dataset of Cinnamon AI Marathon (2018) with about 2000 samples of
task recognizing Vietnamese handwritten. We use 80% for training and 20% for
validation.

image:
label:
3.2.2 Preprocessing:

Image Preprocessing:

● Grayscale Conversion: Images are converted to grayscale to reduce computational

complexity.
● Contrast Enhancement: Applied using PIL’s ImageEnhance.Contrast with a factor
of 2.0 to improve text visibility.
● Sharpness Adjustment: Enhanced using PIL’s ImageEnhance.Sharpness with a
factor of 1.5.
● Resizing: Images are resized to 224x224 pixels to match ViT input requirements.

Data Augmentation: Although mentioned in the original report, the code does not
explicitly implement augmentation. Future work could incorporate libraries like imgaug.

3.3 Model Architecture and Training

The system employs two VietOCR architectures, both paired with a Language Transformer
decoder, trained on a custom Vietnamese handwriting dataset:

Vietnamese Character Encoding: Vocabulary include Vietnamese characters, tonal

marks, special symbols, and control tokens (<pad>, <go>, <eos>, <unk>, <mask>).

Model Architectures:

Vision Transformer (ViT) Encoder:

● Backbone: google/vit-base-patch16-224, linear projection to 512 (or 256)

dimensions.
● Decoder: Language Transformer (6 layers, 8 attention heads, positional encoding,
label smoothing loss).
● Inference: Beam search (beam size 4, max 128 tokens).
● Training (from train-vietocr-vit-transformer.ipynb):
○ Dataset: Custom dataset, 224x224 RGB images, normalized.
○ Optimizer: AdamW, learning rate 1e-4, OneCycleLR scheduler.
○ Parameters: 100 epochs, batch size 8, 10,000 iterations, NVIDIA Tesla T4
GPU.
○ Loss: Label Smoothing Loss (smoothing 0.1).
○ Weights: Loaded from transformerocr.pth.
○ Metrics: Valid loss 1.589, full sequence accuracy 0.0000, per-character
accuracy 0.0754.
VGG16 Encoder:

● Backbone: VGG16 from torchvision.models.vgg16, pre-trained on ImageNet, with

fully connected layers replaced by a linear projection (512 or 256 dimensions).
● Decoder: Identical to ViT (6 layers, 8 attention heads, positional encoding, label
smoothing loss).
● Inference: Beam search (beam size 4, max 128 tokens).
● Training (from train-pretrained-viet-ocr.ipynb and logs):
○ Dataset: Same custom dataset, 224x224 RGB images, normalized.
○ Optimizer: AdamW, learning rate decaying from 2.20e-05 (iter 7200) to
4.03e-10 (iter 10,000), OneCycleLR scheduler.
○ Parameters: 10,000 iterations, batch size 8, ~90 minutes on NVIDIA Tesla
T4 GPU.
○ Loss: Label Smoothing Loss (smoothing 0.1).
○ Weights: Fine-tuned from vgg_transformer.pth.
○ Metrics:
■ Train Loss: ~1.384–1.393 (final: 1.393 at iter 10,000).
■ Valid Loss: 1.578–1.595, final 1.589 (iter 10,000).
■ Full Sequence Accuracy: 0.0000.
■ Per-Character Accuracy: 0.0728–0.0765, final 0.0754 (iter 10,000).
Model Selection: Configurable in main.py via transformerocr.pth (ViT) or
vgg_transformer.pth (VGG16).

Text Refinement: PhoBERT (vinai/phobert-base) corrects invalid characters in recognized

text.

3.4 Application Integration

The application is deployed as a RESTful API using FastAPI, with the following features:

● Upload Interface: Users can upload JPEG or PNG images via a web interface built
with HTML, CSS (Bootstrap), and JavaScript.
● Image Display: Shows the original image and processed image with detected text
regions highlighted in red bounding boxes.
● Text Detection and Recognition:
○ EasyOCR: Detects text regions with configurable parameters (e.g.,
min_size=20, text_threshold=0.4).
○ VietOCR: Recognizes text in detected regions using either the
ViT-Transformer or VGG16-Transformer model, selected based on
configuration or user input.
○ PhoBERT Refinement: Optionally refines recognized text to improve
accuracy.
● Options: User can toggle scene text detection, text refinement, paragraph detection.
● Results: Returns JSON output with recognized text, bounding box coordinates,
cropped text region images (in base64), and the count of detected regions.

3.5 Implementation:
3.5.1 Code

Purpose Code Snippet File/Notebook

VGG16 import os train-vietocr-vit-tr

Training ansformer.ipynb
(sample) import matplotlib.pyplot as plt

from vietocr.tool.predictor import Predictor

from vietocr.tool.config import Cfg

from vietocr.tool.translate import build_model

from vietocr.model.trainer import Trainer

# Load configuration and train

config =
Cfg.load_config_from_file('/kaggle/working/custom_
vgg_transformer.yml')

trainer = Trainer(config)

trainer.train()

# Evaluate

trainer.visualize_prediction()
trainer.precision()

ViT def train(): train-pretrained-vi

Training et-ocr.ipynb
(sample) # Configuration

config = create_vit_config()

config.update({

'device': 'cuda' if torch.cuda.is_available() else

'cpu',

'dataset': {

'data_root':
'/kaggle/input/dataset-ocr/dataset/data',

'train_annotation': None, # Will be set after

splitting

'valid_annotation': None, # Will be set after

splitting

'name': 'ocr_dataset',

'image_height': 224,

'image_min_width': 224,

'image_max_width': 224

'aug': {

'image_aug': False,

'masked_language_model': False
},

'optimizer': {

'max_lr': 1e-4,

'pct_start': 0.1,

'anneal_strategy': 'cos'

'dataloader': {

'num_workers': 1,

'pin_memory': True

'predictor': {

'beamsearch': True

'pretrain':
'/kaggle/input/vietocr/pytorch/default/1/vgg_transform
er.pth',

'quiet': False

})

# Split the dataset

original_labels_file =
'/kaggle/input/dataset-ocr/dataset/labels.json'

output_dir = './dataset_splits'

train_json, valid_json =
split_dataset(original_labels_file, output_dir,
train_ratio=0.8, random_seed=42)

# Update config with new annotation files

config['dataset']['train_annotation'] = train_json

config['dataset']['valid_annotation'] = valid_json

# Create vocabulary

vocab =
'aAàÀảẢãÃáÁạẠăĂằẰẳẲẵẴắẮặẶâÂầẦẩẨẫẪấẤậẬb
BcCdDđĐeEèÈẻẺẽẼéÉẹẸêÊềỀểỂễỄếẾệỆfFgGhHiIìÌỉ
ỈĩĨíÍịỊjJkKlLmMnNoOòÒỏỎõÕóÓọỌôÔồỒổỔỗỖốỐộ
ỘơƠờỜởỞỡỠớỚợỢpPqQrRsStTuUùÙủỦũŨúÚụỤưƯ
ừỪửỬữỮứỨựỰvVwWxXyYỳỲỷỶỹỸýÝỵỴzZ01234
56789!"#$%&''()*+,-./:;<=>?@[\]^_`{|}~ '

vocab = Vocab(vocab)

config['transformer']['vocab_size'] = len(vocab)

# Define image transforms

transform = transforms.Compose([

transforms.Resize((224, 224)),

transforms.ToTensor(),

transforms.Normalize(mean=[0.485, 0.456,
0.406], std=[0.229, 0.224, 0.225])

])

# Create datasets
train_dataset = CustomDataset(

data_dir=config['dataset']['data_root'],

labels_file=config['dataset']['train_annotation'],

vocab=vocab,

transform=transform

valid_dataset = CustomDataset(

data_dir=config['dataset']['data_root'],

labels_file=config['dataset']['valid_annotation'],

vocab=vocab,

transform=transform

# Create model

model = VietOCR(

vocab_size=len(vocab),

backbone=None,

vit_args=config,

transformer_args=config['transformer'],

seq_modeling='transformer'

# Create trainer
trainer = Trainer(

config=config,

model=model,

vocab=vocab,

train_dataset=train_dataset,

valid_dataset=valid_dataset,

pretrained=True

# Train model

trainer.train()

Inference from vietocr.tool.predictor import Predictor main.py

(ViT/VGG from vietocr.tool.config import Cfg
16) config =
Cfg.load_config_from_name('vgg_transformer') # or
'transformer'
config['weights'] = './weights/vgg_transformer.pth' #
or 'transformerocr.pth'
config['device'] = 'cpu'
predictor = Predictor(config)
img = Image.open('sample.jpg')
text = predictor.predict(img)
print(text)

Refine def refine_text_with_phobert(text, threshold=0.3): main.py

def refine_text_with_phobert(text, threshold=0.3):

try:

if not text.strip():

return text

masked_text = f"{text} <mask>"

predictions = phobert_pipeline(masked_text,
top_k=5)

for pred in predictions:

candidate = pred["sequence"].replace("<s>",
"").replace("</s>", "").strip()

if is_valid_vietnamese(candidate):

return candidate

words = text.split()

refined_words = []

for i, word in enumerate(words):

if is_valid_vietnamese(word):

refined_words.append(word)

continue

masked_word = " ".join([w if j != i else

"<mask>" for j, w in enumerate(words)])
predictions = phobert_pipeline(masked_word,
top_k=3)

for pred in predictions:

candidate = pred["token_str"]

if is_valid_vietnamese(candidate):

refined_words.append(candidate)

break

else:

refined_words.append(word)

return " ".join(refined_words)

except Exception as e:

logging.error(f"Refinement error: {str(e)}")

return text

Detect def detect_text_regions(image, main.py

scene text detect_paragraph=True, min_confidence=0.5):

try:

results = reader.readtext(

np.array(image),

decoder='wordbeamsearch',

min_size=20,
text_threshold=0.4,

link_threshold=0.4,

width_ths=0.7,

paragraph=detect_paragraph

boxes = []

for detection in results:

if len(detection) == 3:

bbox, text, conf = detection

if conf < min_confidence:

continue # skip low-confidence

detections

else:

bbox, text = detection

conf = None # fallback if confidence not

returned

x_coords = [point[0] for point in bbox]

y_coords = [point[1] for point in bbox]

boxes.append((

int(min(x_coords)), int(min(y_coords)),

int(max(x_coords)), int(max(y_coords))

))
return boxes

except Exception as e:

logging.error(f"Error during text region

detection: {str(e)}")

return []

Preprocessi def preprocess_image(image): main.py

ng
img = image.convert('L')

enhancer = ImageEnhance.Contrast(img)

img = enhancer.enhance(2.0)

enhancer = ImageEnhance.Sharpness(img)

img = enhancer.enhance(1.5)

return img.convert('RGB')

Detect text async def detect_text( main.py

endpoint file: UploadFile = File(...),
use_scene_text_detection: bool = Query(True),
refine: bool = Form(True),
detect_paragraph: bool = Query(True)
):
if file.content_type not in ["image/jpeg",
"image/png", "image/jpg"]:
raise HTTPException(400, detail="Invalid image
format. Please upload a JPEG or PNG file.")
try:
contents = await file.read()
image =
Image.open(BytesIO(contents)).convert('RGB')
result = process_image(image,
use_scene_text_detection, refine, detect_paragraph)
return result
except Exception as e:
import traceback
logging.error(f"Error during /detect/:
{traceback.format_exc()}")
raise HTTPException(500, detail=f"An error
occurred during processing: {str(e)}")

3.5.2 UI

● Introduction page:
● Main page:

4. RESULTS AND DISCUSSION

The system was tested on handwritten Vietnamese text images, using both ViT and VGG16
architectures:

● EasyOCR: Effectively detected text regions but struggled with Vietnamese

character recognition due to limited training on Vietnamese handwriting.
● VietOCR with ViT: Trained on a custom dataset, but performance was limited:
○ Full Sequence Accuracy: 0.0000 (no sequences perfectly predicted).
○ Per-Character Accuracy: 0.0754 (7.54% of characters correct).
○ These metrics (from train-vietocr-vit-transformer.ipynb) indicate challenges
with dataset diversity or model convergence.
● VietOCR with VGG16: Fine-tuned on the same dataset using pre-trained weights
(vgg_transformer.pth). Specific metrics are unavailable from
train-pretrained-viet-ocr.ipynb, but VGG16’s convolutional architecture likely
excels at local feature extraction, potentially outperforming ViT for structured
handwriting. Qualitative testing (assumed from main.py usage) suggests comparable
or slightly better performance, pending quantitative evaluation.
● PhoBERT Refinement: Improved text quality for both models by correcting invalid
characters and enhancing linguistic coherence.
● Web Interface: Delivered a responsive, user-friendly experience with real-time
visualization, leveraging Bootstrap for styling.
● Model Comparison:
○ ViT: Captures global context but struggled with limited data and bug in
validation, as shown by low metrics.
○ VGG16: Likely better for local features, suitable for consistent handwriting,
but requires metrics to confirm advantages.
● Challenges:
○ ViT’s low accuracy suggests insufficient dataset diversity or training
duration.
○ Earlier VGG16 logs showed poor performance, indicating the need for
careful fine-tuning.
● Performances:
○ VGG16:

acc full seq: 0.3608 - acc per char: 0.8627

Classification Report (Character-level):

precision recall f1-score

0.21 0.21 0.21

Confusion Matrix (Character-level):

○ ViT:

acc full seq: 0.0000 - acc per char: 0.0754

● Comment:

Based on these metrics (precision, recall, f1-score, confusion matrix): with limited
dataset about handwritten Vietnamese, the model Vietocr just predicts well in each
Latin word, but for each word in vietnamese still limit and performance still low.
Additionally, the model using VGG16 to extract feature, but model VGG16 just
focus on local feature than global feature, so some sentence with large length still
limit.

5. CONCLUSION AND FUTURE DEVELOPMENT

The system leverages EasyOCR, VietOCR (ViT and VGG16), and PhoBERT for
Vietnamese handwriting recognition. VGG16 significantly outperformed ViT (full
sequence accuracy 0.3608 vs. 0.0000, per-character accuracy 0.8627 vs. 0.0754),
demonstrating the effectiveness of convolutional architectures for this task. The FastAPI
web interface provided a user-friendly experience.

Future Improvements:

● Enhance dataset with more diverse handwriting samples to improve ViT

performance.
● Further fine-tune VGG16 and explore hybrid ViT-VGG16 models.
● Implement T5-small for text summarization.
● Add webcam support for real-time recognition.
● Optimize models for edge devices via quantization.

6. REFERENCES

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... &
Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition
at scale. arXiv preprint arXiv:2010.11929. https://doi.org/10.48550/arXiv.2010.11929

Gonzalez, R. C., & Woods, R. E. (2018). Digital image processing (4th ed.). Pearson.

Nguyen, D. Q., & Nguyen, A. T. (2020). PhoBERT: Pre-trained language models for
Vietnamese. Findings of the Association for Computational Linguistics: EMNLP 2020,
1037–1042. https://doi.org/10.18653/v1/2020.findings-emnlp.92

Nguyen, T. (2020). VietOCR: A simple and effective library for Vietnamese optical
character recognition. GitHub Repository. https://github.com/pbcquoc/vietocr

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556.
https://doi.org/10.48550/arXiv.1409.1556

Vaswani, A., Shazeer, N., Parmar, N., Uszoreit, J., Jones, L., Gomez, A. N., ... &
Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information
Processing Systems, 30, 5998–6008.
https://papers.nips.cc/paper/7181-attention-is-all-you-need

Jaided AI. (2020). EasyOCR: Ready-to-use OCR with 80+ supported languages. GitHub
Repository. https://github.com/JaidedAI/EasyOCR

PyTorch. (2023). PyTorch documentation. https://pytorch.org/docs/stable/index.html

torchvision. (2023). Torchvision documentation.
https://pytorch.org/vision/stable/index.html
FastAPI. (2023). FastAPI documentation. https://fastapi.tiangolo.com/

Hugging Face. (2023). Transformers: State-of-the-art natural language processing.

https://huggingface.co/docs/transformers/index

Hand Written Project
No ratings yet
Hand Written Project
40 pages
OCR Using Tesseract
100% (2)
OCR Using Tesseract
37 pages
Survey Questionnaire Final
90% (10)
Survey Questionnaire Final
4 pages
7388 - BD00345 - SE06202 - Assigment 2 - Ngô Nguyễn Nhật Linh
No ratings yet
7388 - BD00345 - SE06202 - Assigment 2 - Ngô Nguyễn Nhật Linh
88 pages
Bài Asigment 2
No ratings yet
Bài Asigment 2
78 pages
Do Quang Trung Asm 2 Programing
No ratings yet
Do Quang Trung Asm 2 Programing
51 pages
APT3F1706CYB CE00336 6 IPCV Incourse Assignment
0% (1)
APT3F1706CYB CE00336 6 IPCV Incourse Assignment
9 pages
Handwritten Digit Recognizer
No ratings yet
Handwritten Digit Recognizer
40 pages
skl009730 5404
No ratings yet
skl009730 5404
74 pages
Report 2 PDF
No ratings yet
Report 2 PDF
36 pages
Ashwani Kumar Singh NTCC 2021 25
No ratings yet
Ashwani Kumar Singh NTCC 2021 25
35 pages
Báo cáo DATN- Hệ thống quản lí chuỗi trung tâm
No ratings yet
Báo cáo DATN- Hệ thống quản lí chuỗi trung tâm
105 pages
BC DATN-le-vu-thanh-an-nguyen-viet-nam
No ratings yet
BC DATN-le-vu-thanh-an-nguyen-viet-nam
82 pages
PHAM VAN MANH TopCV - VN 060324.163047
No ratings yet
PHAM VAN MANH TopCV - VN 060324.163047
2 pages
Graduation Thesis
No ratings yet
Graduation Thesis
78 pages
Do An Cao Thanh Huy D19 CNPM
No ratings yet
Do An Cao Thanh Huy D19 CNPM
107 pages
BC DATN-le-vu-thanh-an-nguyen-viet-nam
No ratings yet
BC DATN-le-vu-thanh-an-nguyen-viet-nam
76 pages
Handwritten Digits Recognition
No ratings yet
Handwritten Digits Recognition
27 pages
Final
No ratings yet
Final
28 pages
Rajeswari
No ratings yet
Rajeswari
54 pages
Optical Character Reconciliation
No ratings yet
Optical Character Reconciliation
55 pages
Mini Project-04,52 00
No ratings yet
Mini Project-04,52 00
85 pages
Complete Final Report
No ratings yet
Complete Final Report
62 pages
Project Word Report
No ratings yet
Project Word Report
17 pages
Baocaota Thachcanhtu
No ratings yet
Baocaota Thachcanhtu
62 pages
Group 1 Project Proposal
No ratings yet
Group 1 Project Proposal
14 pages
2008 ISE Group Assignment
No ratings yet
2008 ISE Group Assignment
6 pages
Adarsh Kumar Singh ( (1NH21MC004) )
No ratings yet
Adarsh Kumar Singh ( (1NH21MC004) )
28 pages
Template EN Project2
No ratings yet
Template EN Project2
16 pages
Treelish
No ratings yet
Treelish
74 pages
UIT-HWDB Using Transferring Method To Construct A
No ratings yet
UIT-HWDB Using Transferring Method To Construct A
6 pages
Vietnamese Handwritten Character Recognition Using Convolutional Neural Network
No ratings yet
Vietnamese Handwritten Character Recognition Using Convolutional Neural Network
8 pages
Project Exhibition 2 PROJECT FORMAT
No ratings yet
Project Exhibition 2 PROJECT FORMAT
13 pages
A Method For Segmentation of Vietnamese Identification Card Text Fields
No ratings yet
A Method For Segmentation of Vietnamese Identification Card Text Fields
7 pages
7428 - BD00436 - SE06202 - IOT - Da
No ratings yet
7428 - BD00436 - SE06202 - IOT - Da
12 pages
C1 Projectreport
No ratings yet
C1 Projectreport
58 pages
Trần Hoàn Đức Duy: Education
No ratings yet
Trần Hoàn Đức Duy: Education
2 pages
Machine Learning Report
No ratings yet
Machine Learning Report
12 pages
MP Final Report
No ratings yet
MP Final Report
38 pages
Miniproject Report b12
No ratings yet
Miniproject Report b12
50 pages
Nguyen Xuan Cong: Career Object
No ratings yet
Nguyen Xuan Cong: Career Object
3 pages
Abbas Mustafaoglu
No ratings yet
Abbas Mustafaoglu
21 pages
S26 Freehand Drawn Circuit Recognition Report and Paper
No ratings yet
S26 Freehand Drawn Circuit Recognition Report and Paper
54 pages
Le Hoang Nam 1
No ratings yet
Le Hoang Nam 1
2 pages
Dinh Quang Hoang
No ratings yet
Dinh Quang Hoang
2 pages
Application Form Group Project
No ratings yet
Application Form Group Project
2 pages
Vo Huu Du: About Me
No ratings yet
Vo Huu Du: About Me
2 pages
Graduation
No ratings yet
Graduation
12 pages
Intern DA - Nguyễn Trọng Đạt
No ratings yet
Intern DA - Nguyễn Trọng Đạt
2 pages
ISE Individual Project
No ratings yet
ISE Individual Project
6 pages
Tuan Doan, AI Engineer, Python, C, C++, Linus
No ratings yet
Tuan Doan, AI Engineer, Python, C, C++, Linus
1 page
Pham Van Tien TopCV - VN 190723.165629
No ratings yet
Pham Van Tien TopCV - VN 190723.165629
2 pages
CV - Tran The Anh - Asilla AI Engineer
No ratings yet
CV - Tran The Anh - Asilla AI Engineer
1 page
Review 2 J3
No ratings yet
Review 2 J3
16 pages
Vaidhi Ayush Gurkirat Jatin Project Synopsis Format
No ratings yet
Vaidhi Ayush Gurkirat Jatin Project Synopsis Format
6 pages
DANG MINH HOANG - Dynamic Automatic Engineer
No ratings yet
DANG MINH HOANG - Dynamic Automatic Engineer
2 pages
MLE PhamHongVinh Upload
No ratings yet
MLE PhamHongVinh Upload
2 pages
CV Viet Anh Nguyen
No ratings yet
CV Viet Anh Nguyen
3 pages
Luận Văn Development of Face Recognition Timekeeping System for Duyen Viet Companies Using Adaboost Algorithm
No ratings yet
Luận Văn Development of Face Recognition Timekeeping System for Duyen Viet Companies Using Adaboost Algorithm
16 pages
Authentic Listening 2019
100% (1)
Authentic Listening 2019
129 pages
De Vera, Crisangelyn C
No ratings yet
De Vera, Crisangelyn C
2 pages
3025 Fluorescence Microscope System Manual PDF
100% (1)
3025 Fluorescence Microscope System Manual PDF
16 pages
Manual de Operacion BBC 16
No ratings yet
Manual de Operacion BBC 16
184 pages
Xilinx System Generator For DSP PDF
No ratings yet
Xilinx System Generator For DSP PDF
376 pages
Lecture 01.1
No ratings yet
Lecture 01.1
21 pages
National Liberty Alliance CLGJ Letter To District Court Judges
No ratings yet
National Liberty Alliance CLGJ Letter To District Court Judges
20 pages
Compal Confidential: NAWF2 M/B Schematics Document
No ratings yet
Compal Confidential: NAWF2 M/B Schematics Document
53 pages
JD 101 Suggested Cases
No ratings yet
JD 101 Suggested Cases
36 pages
Eagle Point
100% (1)
Eagle Point
5 pages
Using Google
No ratings yet
Using Google
3 pages
7th Sem Mech Internal Question Papers
No ratings yet
7th Sem Mech Internal Question Papers
16 pages
Fiber Optics Communication en
No ratings yet
Fiber Optics Communication en
50 pages
Model Design Process Anaplan
0% (1)
Model Design Process Anaplan
6 pages
Applied Entrepreneurship Prototype Lesson Plan Module 2 Q4
No ratings yet
Applied Entrepreneurship Prototype Lesson Plan Module 2 Q4
5 pages
G7 Q4 Week 03
No ratings yet
G7 Q4 Week 03
8 pages
Chapter 2
No ratings yet
Chapter 2
42 pages
Mod 7
No ratings yet
Mod 7
70 pages
M269-Final-By ISA-5th Edition
No ratings yet
M269-Final-By ISA-5th Edition
110 pages
T Proc Notices Notices 100 K Notice Doc 97750 263071004
No ratings yet
T Proc Notices Notices 100 K Notice Doc 97750 263071004
14 pages
Imanager U2000 Product Documentation V200R014C50 - 02 20191127111505
No ratings yet
Imanager U2000 Product Documentation V200R014C50 - 02 20191127111505
7 pages
Assessment in Double Entry Accounting
No ratings yet
Assessment in Double Entry Accounting
7 pages
Economic Development: Monique L Bait Fran Christ P. Magat Far Eastern University - Manila
No ratings yet
Economic Development: Monique L Bait Fran Christ P. Magat Far Eastern University - Manila
3 pages
Ledesma vs. CA Notes
No ratings yet
Ledesma vs. CA Notes
4 pages
UFBU Meeting Notice03072025120953
No ratings yet
UFBU Meeting Notice03072025120953
2 pages
Consumer Input
No ratings yet
Consumer Input
23 pages
Module 3.1 - Training Certificate - Folayeni - Awosika
No ratings yet
Module 3.1 - Training Certificate - Folayeni - Awosika
1 page
New Indy Complaint
No ratings yet
New Indy Complaint
5 pages
Residual vs. Zero Sequence: Welcome Posts About Electrical Training Arc Flash Studies Safety Compliance
No ratings yet
Residual vs. Zero Sequence: Welcome Posts About Electrical Training Arc Flash Studies Safety Compliance
2 pages