OCR Assignment

Uploaded by

dhanjaljobanjit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views3 pages

OCR Assignment

Uploaded by

dhanjaljobanjit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment: Fine-Tuning an OCR Model for Handwriting Recognition

Role

AI Engineer

Objective

As an AI Engineer, your task is to fine-tune a state-of-the-art Optical Character Recognition

(OCR) model to achieve high accuracy in recognizing handwritten text. The focus is on
leveraging the latest datasets and models to improve performance on diverse handwriting
styles, including noisy or irregular samples. This assignment simulates a real-world scenario
where your solution will be part of a document digitization pipeline.

Task Description

You are required to:

1. Select and fine-tune a modern OCR model for handwriting recognition.
2. Use the latest publicly available datasets to train and evaluate your model.
3. Optimize the model for accuracy and efficiency, considering real-world challenges like
varied handwriting styles, noise, and irregular layouts.
4. Provide a report detailing your methodology, results, and a brief justification of your
choices.

Specific Requirements

● Model: Choose a transformer-based OCR model as your starting point. Recommended

options include:
○ TrOCR (Transformer-based OCR): Available via Hugging Face
(microsoft/trocr-large-handwritten), known for its strong performance
on handwritten text recognition. It combines a Vision Transformer (ViT) encoder
with a text Transformer decoder.
○ DocTR (Document Text Recognition): An open-source option from Mindee,
combining text detection (DBNet++) and recognition (CRNN with Transformer
enhancements), optimized for both printed and handwritten text.
● Dataset: Use the following latest and diverse datasets for fine-tuning and evaluation:
○ IAM Handwriting Database (Updated 2023 Version): Contains 13,353
handwritten English text lines from 657 writers. Access via the official IAM
website or Hugging Face datasets hub. Focus on the line-level annotations for
this task.
○ Imgur5K (2021): A diverse dataset with ~135K handwritten English words across
5K images, offering variability in styles and real-world scenarios. Available via
Papers with Code.
○ Synthetic Data (Optional): Generate additional synthetic handwritten data using
tools like TextRecognitionDataGenerator (GitHub:
Belval/TextRecognitionDataGenerator) to augment your training set with custom
styles or edge cases.
● Evaluation Metrics:
○ Primary: Character Error Rate (CER) – measures the edit distance between
predicted and ground truth text at the character level.
○ Secondary: Word Error Rate (WER) – assesses accuracy at the word level.
○ Target: Achieve a CER ≤ 7% and WER ≤ 15% on a held-out test set from the IAM
dataset.
● Tools and Frameworks:
○ Use PyTorch or TensorFlow for model implementation.
○ Leverage Hugging Face Transformers for TrOCR or Mindee’s DocTR library
for ease of fine-tuning.
○ Preprocessing: Apply OpenCV or PIL for image normalization (e.g., resizing to
384x384 for TrOCR, grayscale conversion, noise reduction).
● Fine-Tuning Process:
○ Preprocess the dataset (e.g., normalize images, tokenize text using the model’s
tokenizer).
○ Fine-tune the pre-trained model on the combined IAM and Imgur5K datasets for
at least 10 epochs, adjusting hyperparameters like learning rate (suggested:
5e-5) and batch size (suggested: 8, GPU-dependent).
○ Use a validation split (10% of data) to monitor overfitting and early stopping if
needed.
● Hardware: You will use free GPU resources available on:
● Kaggle: Provides a Tesla P100 (16GB VRAM) or dual NVIDIA T4s (2x 16GB
VRAM) with ~30 hours of GPU time per week and a 12-hour session limit.
○ Optimization for Fine-Tuning: Use a batch size of 4 with the P100 or 8
with dual T4s (enable multi-GPU via PyTorch’s DataParallel). Enable
mixed precision training (torch.cuda.amp) to reduce memory usage and
speed up training.
○ Data Handling: Import IAM and Imgur5K directly via Kaggle’s dataset
hub or upload synthetic data as a custom dataset.
● Google Colab: Offers a Tesla T4 (16GB VRAM) or occasionally a K80 (12GB
VRAM) with a ~12-hour session limit (subject to availability).
○ Optimization for Fine-Tuning: Set batch size to 4 for the T4 (or 2 for
K80), and use mixed precision training. If memory issues persist, apply
gradient accumulation (e.g., accumulate gradients over 2 steps to
simulate batch size 8).
○ Data Handling: Mount Google Drive to load datasets or upload them
manually to Colab’s runtime.
● Both platforms suffice for this task with proper optimization. The 16GB VRAM
limit (common across free tiers) is slightly below a premium 24GB GPU, but
adjustments like smaller batch sizes or mixed precision ensure feasibility.
● Deliverables:
○ Code: A Jupyter notebook or Python script with clear comments, implementing
the full pipeline (data loading, preprocessing, fine-tuning, evaluation).
○ Model: Save the fine-tuned model weights in a standard format (e.g., .pth for
PyTorch or Hugging Face model hub upload).
○ Report: A 1-2 page PDF summarizing:
■ Dataset and model choices with justification.
■ Preprocessing steps and fine-tuning strategy.
■ Final CER and WER scores on the test set.
■ Challenges faced and potential improvements.

Time Expectation

● Estimated completion time: 8-10 hours.

● This includes data preparation (2 hours), model setup and fine-tuning (4-5 hours),
evaluation (1 hour), and report writing (1-2 hours).

Evaluation Criteria

● Technical Accuracy: Correct implementation of fine-tuning pipeline and achievement of

target metrics (CER ≤ 7%, WER ≤ 15%).
● Code Quality: Clarity, modularity, and documentation of the codebase.
● Innovation: Creative approaches to preprocessing, data augmentation, or
hyperparameter tuning.
● Report: Conciseness, clarity, and depth of reasoning behind choices.

Context

This assignment mimics a real-world task where an AI Engineer must adapt a pre-trained model
to a specific use case (handwriting recognition for digitizing historical documents). The chosen
datasets and models reflect the latest advancements as of March 2025, ensuring the task is
both challenging and relevant.

Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
DEEP LEARNING HW 7 Transfer Learning REV
No ratings yet
DEEP LEARNING HW 7 Transfer Learning REV
2 pages
Real-Time objec-WPS Office
No ratings yet
Real-Time objec-WPS Office
5 pages
Hand Written Letter Recognition
No ratings yet
Hand Written Letter Recognition
14 pages
Complex Engineering Activity
No ratings yet
Complex Engineering Activity
2 pages
Spring 2025 - CS619 - 10928
No ratings yet
Spring 2025 - CS619 - 10928
2 pages
Optical Character Recognition Using Neural Networks: Title of The Project
No ratings yet
Optical Character Recognition Using Neural Networks: Title of The Project
5 pages
Phase S
No ratings yet
Phase S
5 pages
NM Narash
No ratings yet
NM Narash
6 pages
Handwritten Text Recognition Using Tensorflow 2.0: Computer Vision
No ratings yet
Handwritten Text Recognition Using Tensorflow 2.0: Computer Vision
37 pages
Learning PyTorch 2.0, Second Edition
From Everand
Learning PyTorch 2.0, Second Edition
Matthew Rosch
No ratings yet
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
From Everand
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
Matthew Rosch
No ratings yet
Assignment1 Report ds8113
No ratings yet
Assignment1 Report ds8113
4 pages
Assignment I-4
No ratings yet
Assignment I-4
3 pages
ITNPAI1 Assignment S22
No ratings yet
ITNPAI1 Assignment S22
3 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Handwriting Detection Presentation
No ratings yet
Handwriting Detection Presentation
10 pages
Personal Coding Assistant
No ratings yet
Personal Coding Assistant
32 pages
Task 1 ML
No ratings yet
Task 1 ML
7 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
AD3511 DL
No ratings yet
AD3511 DL
2 pages
Assignment For AI Interns
No ratings yet
Assignment For AI Interns
3 pages
ArIES Open Projects ML
No ratings yet
ArIES Open Projects ML
6 pages
Keras
No ratings yet
Keras
4 pages
Questions and Instructions For Test IIT - BHU
No ratings yet
Questions and Instructions For Test IIT - BHU
1 page
Finetuning
No ratings yet
Finetuning
3 pages
Advance Questions Answers
No ratings yet
Advance Questions Answers
4 pages
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
hw1 2487155975100812
No ratings yet
hw1 2487155975100812
6 pages
Pgi20s02j - Lab Record
No ratings yet
Pgi20s02j - Lab Record
24 pages
IGNOU BCA Project Synopsis Handwritten Digit Recognition
No ratings yet
IGNOU BCA Project Synopsis Handwritten Digit Recognition
2 pages
Jayant Midterm
No ratings yet
Jayant Midterm
11 pages
Batch A6 - Project Requirements
No ratings yet
Batch A6 - Project Requirements
5 pages
CV NguyenVanTuan
No ratings yet
CV NguyenVanTuan
3 pages
Deep Learning Lab (Ai&ds)
No ratings yet
Deep Learning Lab (Ai&ds)
39 pages
Bilingual OCR Report
No ratings yet
Bilingual OCR Report
10 pages
NN & DL Lab Manual 1-1
No ratings yet
NN & DL Lab Manual 1-1
23 pages
Detection Project Meeting Notes
No ratings yet
Detection Project Meeting Notes
2 pages
Tools
No ratings yet
Tools
3 pages
DL Programs
No ratings yet
DL Programs
12 pages
Comprehensive PyTorch Coding Challenges Across Mac
No ratings yet
Comprehensive PyTorch Coding Challenges Across Mac
5 pages
21BCP167 Ai 9
No ratings yet
21BCP167 Ai 9
10 pages
Taask
No ratings yet
Taask
18 pages
Gen AI Notes Paer 2
No ratings yet
Gen AI Notes Paer 2
14 pages
Col780 A3-1
No ratings yet
Col780 A3-1
5 pages
IDE Project
No ratings yet
IDE Project
5 pages
Ad3511 Set2
No ratings yet
Ad3511 Set2
2 pages
NNDL Lab Exp
No ratings yet
NNDL Lab Exp
50 pages
ML Project
No ratings yet
ML Project
1 page
F) Maybe Is Full Script Complet
No ratings yet
F) Maybe Is Full Script Complet
35 pages
C) Le Script But Not Complet Partie 1
No ratings yet
C) Le Script But Not Complet Partie 1
13 pages
Computer Vision Engineer Interview Preparation Guide
No ratings yet
Computer Vision Engineer Interview Preparation Guide
20 pages
PRML Lab01
No ratings yet
PRML Lab01
2 pages
G54 Midterm
No ratings yet
G54 Midterm
15 pages
DeepLearningExp4.Ipynb - Colab
No ratings yet
DeepLearningExp4.Ipynb - Colab
5 pages
Cs336 Spring2024 Assignment2 Systems
No ratings yet
Cs336 Spring2024 Assignment2 Systems
30 pages
Ad3511 Practical Questions
No ratings yet
Ad3511 Practical Questions
3 pages
CS419 Assignment
No ratings yet
CS419 Assignment
3 pages
DeepSeek vs. ChatGPT – Why DeepSeek is the Superior AI.
From Everand
DeepSeek vs. ChatGPT – Why DeepSeek is the Superior AI.
Gary Thatcher
No ratings yet
LLB 1 Sem Legal Language and Legal Writing Including General English Winter 2015
No ratings yet
LLB 1 Sem Legal Language and Legal Writing Including General English Winter 2015
5 pages
Viewing Pipeline
100% (1)
Viewing Pipeline
34 pages
ER To Relational Mapping: Dr. Ejaz Ahmed 1
No ratings yet
ER To Relational Mapping: Dr. Ejaz Ahmed 1
46 pages
Direct Operated, High Flow Balanced Poppet 1/4 To 1/2
No ratings yet
Direct Operated, High Flow Balanced Poppet 1/4 To 1/2
6 pages
Devanshu Ralhan CV
No ratings yet
Devanshu Ralhan CV
1 page
PM500
No ratings yet
PM500
14 pages
Dinsmoor 1995a
No ratings yet
Dinsmoor 1995a
18 pages
Dolphin CPAP Brochure
No ratings yet
Dolphin CPAP Brochure
2 pages
Syllabus Logistics Management
No ratings yet
Syllabus Logistics Management
4 pages
Some Key Results (To Learn) : You Need To Remember These Trigonometric Formulae-They Are Needed in Some Integration Questions
No ratings yet
Some Key Results (To Learn) : You Need To Remember These Trigonometric Formulae-They Are Needed in Some Integration Questions
1 page
1.0um 6+1) x1 Pump and Signal Combiner
No ratings yet
1.0um 6+1) x1 Pump and Signal Combiner
3 pages
300 Series: Uninterruptible Power Supplies
No ratings yet
300 Series: Uninterruptible Power Supplies
2 pages
STD 12 Pre First Exam English
No ratings yet
STD 12 Pre First Exam English
5 pages
1-History of Nursing
No ratings yet
1-History of Nursing
37 pages
Virtue Ethics Final
No ratings yet
Virtue Ethics Final
19 pages
G12 - TB English - Part (1) (20.2.2024)
No ratings yet
G12 - TB English - Part (1) (20.2.2024)
39 pages
20 Abbreviations Related To Computer (5 Files Merged)
No ratings yet
20 Abbreviations Related To Computer (5 Files Merged)
11 pages
HSP225 Sleep Fact Sheet DP3
No ratings yet
HSP225 Sleep Fact Sheet DP3
2 pages
Nikon Total Digital Imaging System-1
No ratings yet
Nikon Total Digital Imaging System-1
24 pages
Ce Unit Iii PDF
No ratings yet
Ce Unit Iii PDF
116 pages
CO IPL Schedule
No ratings yet
CO IPL Schedule
8 pages
Mini Plant Training Material: Air Slides
100% (2)
Mini Plant Training Material: Air Slides
28 pages
Behavioural Sciences Syllabus
No ratings yet
Behavioural Sciences Syllabus
8 pages
Diezmann Watters 2000 Identifying and Supporting Spatial Intelligence in Young Children
No ratings yet
Diezmann Watters 2000 Identifying and Supporting Spatial Intelligence in Young Children
15 pages
Experiential Marketing - A Case Study of Starbucks
No ratings yet
Experiential Marketing - A Case Study of Starbucks
41 pages
Layout Mata101n
No ratings yet
Layout Mata101n
5 pages
Content Weightage For Ogdcl Test
No ratings yet
Content Weightage For Ogdcl Test
3 pages
GCSE Pyschology
No ratings yet
GCSE Pyschology
24 pages
SP 05 Hazard Identification Risk Assessment and Control HIRAC PDF
No ratings yet
SP 05 Hazard Identification Risk Assessment and Control HIRAC PDF
8 pages
Past Year Question of DSMM
No ratings yet
Past Year Question of DSMM
1 page