Preprocessing Task

The document provides a Python script that utilizes the Tesseract OCR library to preprocess an image for text extraction. It includes steps for loading an image, converting it to grayscale, applying binarization, noise removal, morphological operations, and deskewing before extracting text. The output consists of the processed text extracted from the image after these preprocessing tasks.

Uploaded by

ravula.shivakumar11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views7 pages

Preprocessing Task

Uploaded by

ravula.shivakumar11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

NAME: RAVULA SHIVA KUMAR GMAIL: ravula.shivakumar11@gmail.

com

Preprocessing Task
Source code :
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\

tesseract.exe"
import cv2
import numpy as np
from PIL import Image

# Load the image

image_path = "C:/Users/ravul/OneDrive/Desktop/p24.jpg"
image = cv2.imread(image_path)
# Convert OpenCV image to PIL format
pil_image = Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# Perform OCR
text = pytesseract.image_to_string(pil_image)
print(text)
#grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow("Grayscale", gray)
cv2.waitKey(0)
#binarizarion
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
[1]
cv2.imshow("Thresholded", thresh)
cv2.waitKey(0)
#noice Removal
denoised = cv2.fastNlMeansDenoising(thresh, None, 30, 7, 21)
cv2.imshow("Denoised", denoised)
cv2.waitKey(0)
#Morphological Operations
kernel = np.ones((1, 1), np.uint8)
morph = cv2.morphologyEx(denoised, cv2.MORPH_CLOSE, kernel, iterations=1)
cv2.imshow("Morphological", morph)
cv2.waitKey(0)
#Deskewing (Correcting Skewed Text)
# Deskewing (Correcting Skewed Text)
coords = np.column_stack(np.where(thresh > 0))
rect = cv2.minAreaRect(coords)
angle = rect[-1]
if angle < -45:
angle += 90
elif angle > 45:
angle -= 90
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, 1.0)
deskewed = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC,
borderMode=cv2.BORDER_REPLICATE)
cv2.imshow("Deskewed Image", deskewed)
cv2.waitKey(0)
cv2.destroyAllWindows()
#Extract Text After Preprocessing
processed_text = pytesseract.image_to_string(deskewed)
print(processed_text)

Output:
Input image:
Grayscale image
Binarization
noise Removal:
Morphological Operations

Extracted Text After Preprocessing:

opencv cheatsheet
No ratings yet
opencv cheatsheet
60 pages
Module # 10C - Text Recognition with Tesseract OCR
No ratings yet
Module # 10C - Text Recognition with Tesseract OCR
8 pages
Python Project
No ratings yet
Python Project
2 pages
Code Snippets
No ratings yet
Code Snippets
2 pages
OpenCV OCR and Text Recognition With Tesseract - PyImageSearch
No ratings yet
OpenCV OCR and Text Recognition With Tesseract - PyImageSearch
65 pages
Optical Character Recognition Research: Index
No ratings yet
Optical Character Recognition Research: Index
6 pages
ML Report
No ratings yet
ML Report
5 pages
Optical Character Recognition by Open Source OCR Tool Tesseract A Case Study
No ratings yet
Optical Character Recognition by Open Source OCR Tool Tesseract A Case Study
7 pages
We Used Tesseract OCR For Train The Data and Recognize The Character From Digital Image Under The Apache 2
No ratings yet
We Used Tesseract OCR For Train The Data and Recognize The Character From Digital Image Under The Apache 2
1 page
madmaze_pytesseract_ A Python wrapper for Google Tesseract
No ratings yet
madmaze_pytesseract_ A Python wrapper for Google Tesseract
5 pages
Extracting Text From Scanned PDF Using Pytesseract & Open CV
No ratings yet
Extracting Text From Scanned PDF Using Pytesseract & Open CV
9 pages
CV - Expt2
No ratings yet
CV - Expt2
21 pages
Optical Character Recognition (OCR) in Python
No ratings yet
Optical Character Recognition (OCR) in Python
110 pages
18DIP Lab 2
No ratings yet
18DIP Lab 2
11 pages
Lab 04 Digital Image Processing Practice
No ratings yet
Lab 04 Digital Image Processing Practice
9 pages
Bit 22034
No ratings yet
Bit 22034
18 pages
Build Your Own Optical Character Recognition (Ocr) System Using Google'S Tesseract and Opencv
No ratings yet
Build Your Own Optical Character Recognition (Ocr) System Using Google'S Tesseract and Opencv
10 pages
Optical Character Recognition
No ratings yet
Optical Character Recognition
27 pages
TP02 - Image Processing Using Python-OpenCV
No ratings yet
TP02 - Image Processing Using Python-OpenCV
3 pages
Exp.3
No ratings yet
Exp.3
21 pages
Ocr Nanonets Tesseract
No ratings yet
Ocr Nanonets Tesseract
39 pages
98DSP-PPT
No ratings yet
98DSP-PPT
8 pages
CV Practical Record Editted - PDF
No ratings yet
CV Practical Record Editted - PDF
36 pages
Python Tesseract
No ratings yet
Python Tesseract
2 pages
Lab 1 Dip
No ratings yet
Lab 1 Dip
8 pages
Image To Text1
No ratings yet
Image To Text1
2 pages
Akash Ha
No ratings yet
Akash Ha
10 pages
An Overview of Tesseract OCR Engine
No ratings yet
An Overview of Tesseract OCR Engine
15 pages
opencv cheatsheet
No ratings yet
opencv cheatsheet
65 pages
Improving The Efficiency of Tesseract Ocr Engine
No ratings yet
Improving The Efficiency of Tesseract Ocr Engine
51 pages
Word Extraction-1
No ratings yet
Word Extraction-1
2 pages
A Comparison of Some Morphological Filters For Improving OCR Performance
No ratings yet
A Comparison of Some Morphological Filters For Improving OCR Performance
13 pages
CV Lab Manual
No ratings yet
CV Lab Manual
45 pages
Optical Character Recognition
No ratings yet
Optical Character Recognition
27 pages
Shubham Image File
No ratings yet
Shubham Image File
14 pages
REF2 - Basic Image Processing
No ratings yet
REF2 - Basic Image Processing
18 pages
Iqjaqokskss
No ratings yet
Iqjaqokskss
3 pages
Capstonepres
No ratings yet
Capstonepres
12 pages
Ahsbsdns
No ratings yet
Ahsbsdns
1 page
Setting Up A Simple OCR Server: by Real Python 37 Comments
No ratings yet
Setting Up A Simple OCR Server: by Real Python 37 Comments
8 pages
CV Lab File
No ratings yet
CV Lab File
39 pages
DRASHTI_CVML
No ratings yet
DRASHTI_CVML
83 pages
s42001-021-00149-1
No ratings yet
s42001-021-00149-1
22 pages
Remove Text from Images using CV2 and Keras-OCR _ by Carlo Borella _ Towards Data Science
No ratings yet
Remove Text from Images using CV2 and Keras-OCR _ by Carlo Borella _ Towards Data Science
18 pages
Project On Opencv New
No ratings yet
Project On Opencv New
11 pages
CV Lab 1
No ratings yet
CV Lab 1
7 pages
P6 - Computer Vision
No ratings yet
P6 - Computer Vision
27 pages
Computer Vision
No ratings yet
Computer Vision
20 pages
Integration of Telugu Dictionary Into Tesseract OCR
No ratings yet
Integration of Telugu Dictionary Into Tesseract OCR
25 pages
DIP Lab Manual No 04
No ratings yet
DIP Lab Manual No 04
12 pages
OCR For Printed Telugu Documents
No ratings yet
OCR For Printed Telugu Documents
32 pages
Image Processing Lab Work
No ratings yet
Image Processing Lab Work
24 pages
ALCANTARAuLaboratory-6-Image-Processing-Student_031006
No ratings yet
ALCANTARAuLaboratory-6-Image-Processing-Student_031006
9 pages
Lab Report 2
No ratings yet
Lab Report 2
9 pages
cv lab manual pdf (1)
No ratings yet
cv lab manual pdf (1)
56 pages
IP_LAB[1]
No ratings yet
IP_LAB[1]
8 pages
Ocr Gtts
No ratings yet
Ocr Gtts
49 pages
Drawing Functions
No ratings yet
Drawing Functions
23 pages
LAB1
No ratings yet
LAB1
7 pages
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
From Everand
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
Abdelfattah Ragab
No ratings yet
Final Chapter 5_merged (3)
No ratings yet
Final Chapter 5_merged (3)
88 pages
Thesis Complete
No ratings yet
Thesis Complete
124 pages
cert cert final
No ratings yet
cert cert final
17 pages
Od 328482183058791100
No ratings yet
Od 328482183058791100
1 page
CSMA05 nithin ppt
No ratings yet
CSMA05 nithin ppt
10 pages
Champions Challenge Dataset (1)
No ratings yet
Champions Challenge Dataset (1)
4,274 pages
sujan ppt
No ratings yet
sujan ppt
9 pages
UNIT 4 CNS
No ratings yet
UNIT 4 CNS
25 pages
cns syllabus
No ratings yet
cns syllabus
2 pages
chapter 1
No ratings yet
chapter 1
22 pages
bhanu doc
No ratings yet
bhanu doc
73 pages
UNIT_I
No ratings yet
UNIT_I
25 pages
BFSI- OCR (2)
No ratings yet
BFSI- OCR (2)
12 pages
plag-check-report-2024-12-07T16_16_33
No ratings yet
plag-check-report-2024-12-07T16_16_33
31 pages
references for major project
No ratings yet
references for major project
1 page
Betplus App (1)
No ratings yet
Betplus App (1)
19 pages

Preprocessing Task

Uploaded by

Preprocessing Task

Uploaded by

NAME: RAVULA SHIVA KUMAR GMAIL: ravula.shivakumar11@gmail.

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\

# Load the image

Extracted Text After Preprocessing:

You might also like