PDL-III Report FINAL
PDL-III Report FINAL
VISION
PROJECT DEVELOPMENT LAB REPORT
Submitted by
M.BHUVANAESHWARI
(201061004)
&
J.SANGEERTHANA
(201061021)
of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
VILLUPURAM 605108
NOVEMBER 2024
IFET COLLEGE OF ENGINEERING
(An Autonomous Institution)
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Dr. R. THENDRAL Mr. M. ARUNKUMAR M.E.,
HEAD OF THE DEPARTMENT SUPERVISOR
Associate Professor, Assistant Professor,
Department of IT, Department of IT,
IFET College of Engineering, IFET College of Engineering,
Villupuram - 605108 Villupuram - 605108
i
CERTIFICATE OF EVALUATION
College name : IFET College of Engineering, Villupuram.
Branch : B.Tech - IT
Name of the
Register Title of the
Name of the Student Supervisor with
Number Project
Designation
TEXT
EXTRACTION
M.BHUVANAESHWARI 201061004
FROM IMAGES Mr. M. ARUNKUMAR
& &
USING Assistant Professor
J.SANGEERTHANA 201061021
COMPUTER
VISION
The report for the Mini project-II submitted for the fulfillment of the award of
the degree of Bachelor of Technology in Information Technology of IFET College of
Engineering (Autonomous), permanently affiliated to Anna University was evaluated
and confirmed to be the work done by the above student.
bring forth the success of the project. I would like to express my sincere gratitude to
our Chairman Mr.K.V.Raja, our Secretary Mr.K.Shivram Alva and our Treasurer
to carry out this project and we extend our gratitude to our principal
I also take this opportunity to express my sincere thanks to our Vice Principal
and Dean Academics Dr.P.Kanimozhi who has provided all the needful help in
I wish to express our thanks to our Head of the Department, Dr. R. Thendral,
for her persistent encouragement and support to complete this project. I express my
of Information Technology for his / her priceless guidance and motivation which
I also thank our lab technicians and all the staff members of our department for
Last but not the least, I whole heartedly thanks my family and friends for their
moral support in tough times and their constructive criticism which made me to
Computer Vision for the past decade. OCR means converting handwritten, Typed, or
Printed text into Machine-readable text. Described how OCR systems are being used
currently with their benefits and limitations. The various applications of OCR in data
simple text recognizing model using Pytesseract and OpenCV that can perform several
language text. The features of this model are also described in this project. OpenCV
(Open Source Computer Vision Library) is an open source computer vision and
infrastructure for computer vision applications and to accelerate the use of machine
perception in the commercial products. OpenCV makes it easy for businesses to utilize
iv
TABLE OF CONTENTS
CHAPTER TITLE PAGE
NO NO
ABSTRACT iv
LIST OF FIGURES vi
LIST OF ABBREVIATION vii
1 INTRODUCTION 1
1.1. Introduction 1
1.2. Domain Overview 1
2 LITERATURE REVIEW 3
2.1 Text extraction from images 3
2.2 Text Conversion Topic 4
2.3 Text Detection from Image 4
2.4 Live Text Extraction 4
2.5 Text Localization and Recognition 5
2.6 Existing System 5
2.6.1 Disadvantages of Existing System 5
3 PROPOSED SYSTEM 6
3.1. Proposed System 6
3.2 System Architecture 7
3.3 Advantages of Proposed System 8
3.4 Modules 8
3.4.1 Detection 8
3.4.2 Enhancement 9
3.4.3 Extraction 10
4 SYSTEM REQUIREMENTS 11
4.1 Software Requirements 11
4.2 Hardware Requirements 11
5 CONCLUSION AND FUTURE WORK 12
v
APPENDIX-I 13
APPENDIX-II 20
REFERENCES 24
vi
LIST OF FIGURES
FIGURE NO TITLE PAGE NO
1 System Architecture 7
2 Example for text extraction 20
3 Reading image from URL 20
4 Extracting text and removing irrelevant symbols 21
5 Greyscale image 21
6 Rectangle around the text 22
7 Pattern on specific or word 22
8 Final output 23
vii
LIST OF ABBREVIATIONS
viii
1.INTRODUCTION
1.1 INTRODUCTION
Text data present in images contain useful information for automatic annotation,
indexing, and structuring of images. Extraction of this information involves detection,
localization, tracking, extraction, enhancement, and recognition of the text from a giv
image. However, variations of text due to differences in size, style, orientation, and
alignment, as well as low image contrast and complex background make the problem
of automatic text extraction extremely challenging. While comprehensive surveys of
related problems such as face detection, document analysis, and image indexing can
be found, the problem of text information extraction is not well surveyed. A large
number of techniques have been proposed to address this problem, and the purpose is
to classify and review these algorithms, discuss benchmark data and performance
evaluation, and to point out promising directions for future research.
Computer vision tasks include methods for acquiring, processing, analyzing and
understanding digital images, and extraction of high-dimensional data from the real
world in order to produce numerical or symbolic information, e.g. in the forms of
decisions. Understanding in this context means the transformation of visual images
into a editable and reusable text.
The image data can take many forms, such as video sequences, views from
multiple cameras, multi-dimensional data from a 3D scanner, or medical scanning
devices. It is very useful for users to save time and effort of typing from an images.
2
2.LITERATURE REVIEW
YEAR: 2020
This section provides a brief overview of the existing work carried out in the
field of text recognition. Text recognition has been in existence since a very long time.
images with colorful background is considered and a preprocessing method is
described which improves upon the performance of the Tesseract Optical Character
Recognition (OCR) engine. Here first text segmentation is done to separate the text
from the colorful background by dividing the original image into images. Then a
classifier recognizes the image containing text. There was an improvement of about
20% compared to the Tesseract OCR performance by employing preprocessing.
3
2.2 Text Conversion Topic
YEAR: 2020
YEAR:2020
YEAR: 2021
Live Text’. This feature is similar to how Google Lens works on Android
phones, and on the Google Search and Photos app on iOS. With Live Text, iOS will
now recognise any text in a photo, screenshot, or camera preview thanks to the optical
character recognition (OCR) allowing Apple to extract text from any image.
4
2.5 Text Localization and Recognition
YEAR: 2020
In this current world there is growing demand for the users to convert printed
documents, images ,videos into a text .the basic text extraction system was invented to
convert the data that are available on text. The existing system of text extraction on a
computer vision is just a text extraction system without any opencv functionality .
That is the existing system deals with the homogeneous character recognition or
character recognition of a single languages.It can only have the capability to recognize
, extract and convert only the images of English or only a single language . That is the
older text extraction system is only unlingual.
Text works efficiently with the printed text only and not with handwritten
text. Handwriting must be learnt by the pc.
There is the need of lot of space required by the image produced.
The quality of the image can be lose during this process.
Not 100% accurate, there are likely to be some mistakes made during the
method. Not worth doing for little amounts of text.
Extract a text from image of English or only a single language not a
multiple languages
5
3.PROPOSED SYSTEM
3.1 PROPOSED SYSTEM
The text extractor will use the Tesseract OCR Engine to extract the Image text.
The Web application will contain a section to submit/upload an image. which will then
go through our text extraction program, after the program outputs the result, Flask will
pull an API request to get the output text and display it on the Web page. Processing of
OCR information is fast because large amounts of text are often entered quickly. Often
times, a paper form is turned into an electronic form that is easy to distribute. The
advanced version can even recreate tables, columns, and flowcharts. It is cheaper than
paying someone to manually enter a large amount of text data and it can be useful for
users to editable and reusable. A great deal of study has previously been done on
several essential elements of handwritten character recognition systems. The text
conversion approach incorporates picture improvement by realignment, rescaling, and
resizing of the provided image, as well as noise removal from the image using
sophisticated algorithms. Furthermore, Tesseract's training with a self-adapted dataset
for handwritten characters has a substantial impact on final output. Using opencv
functions gives completely accurate results. They only understand information that is
organized. And this is exactly where Optical Character Recognition comes in the
picture. Tesseract requires a clean image to detect the text, this is where OpenCV
plays an important role as it performs the operations on an image like converting a
colored image to binary image, adjusting the contrast of an image, edge detection, and
many more.
6
3.2 SYSTEM ARCHITECTURE
Input : Images/video
Text Detection
Recognition(OCR)
Output : Text
Fig.1.System Architecture
The three basic steps involved in this process are detection, enhancement and
extraction. This diagram defines the structure of the system.
7
3.3 ADVANTAGES OF PROPOSED SYSTEM:
Processing of Opencv information is fast. Large quantities of text are often
input quickly.
This process is much faster as compared to the manual typing the
information into the system Advanced version can even Recreate tables,
columns and even produce sites.
Information of OCR can be readable with high degree of accuracy. Flatbed
scanners are very accurate and may produce reasonably top quality image.
It is cheaper than paying someone amount to manually enter great deal of
text data. Moreover it takes less time to convert within the electronic form.
3.4 MODULES
• Detection
• Enhancement
• Extraction
3.4.1 DETECTION
The process of detecting and extracting text from an image is known as Optical
Character Recognition (OCR). The Analyze Image step makes use of OCR to allow
users to extract and post-process textual data detected within images. The paper
document is generally scanned by the optical scanner and is converted in to the form
of a picture. At this stage we have the data in the form of image and this image can be
further analysed so that’s the important information can be retrieved. The image
resulting from the scanning process may contain a certain amount of noise. Depending
on the resolution on the scanner and the success of the applied technique for
thresholding, the characters may be smeared or broken. Some of these defects, which
may later cause poor recognition rates, can be eliminated by using a pre-processor to
smooth the digitized characters. The smoothing implies both filling and thinning.
Filling eliminates small breaks, gaps and holes in the digitized characters, while
8
thinning reduces the width of the line. The most common techniques for smoothing,
moves a window across the binary image of the character, applying certain rules to the
contents of the window. So, to improve quality of the input image, few operations are
performed for enhancement of image such as noise removal, normalization,
binarization etc.
3.4.2 ENHANCEMENT
This is used for preprinted colored forms where a different color is used for the
fixed texts. This allows just the respondent data to be recognized without the form
instructions, item names, boxes and other shapes.
Can select a predefined color (red, green or blue), or a colored area in the
image. Use the Select area tool to draw a rectangle including the page background
color and the color to be dropped. The selected color will become invisible in the
image.
3.4.3 EXTRACTION
4.SYSTEM REQUIREMENTS
4.1 SOFTWARE REQUIREMENTS
Jupiter Notebook(Anaconda3)
Tessaract
OpenCV
4.2HARDWARE REQUIREMENTS
• RAM : 512 MB and above
• Hard disk : 80GB and above
• Monitor : CRT or LCD monitor
• Keyboard : Normal or Multimedia
• Mouse : Compatible mouse
11
5.CONCLUSION
5.1 CONCLUSION
The proposed Opencv function used for extraction of text can be extended
further for recognition of other Indian scripts. This can be modified further to improve
accuracy of segmentation. New features can be added to improve the accuracy of
recognition. These algorithms can be tried on large database of handwritten text. There
is a need to develop the standard database for recognition of text. The proposed work
can be extended to work on degraded text or broken characters. Recognition of digits
in the text, half characters and compound characters can be done to improve the word
recognition rate. This extracted text can be further converted to audio so asto make
12
physically challenged that is blind people easily understand which text has been
converted from the image.
APPENDIX-I
(SOURCE CODE)
# In[1]:
import requests
# In[2]:
r = requests.get("https://raw.githubusercontent.com/tesseract-ocr/tessdata/4.00/
ind.traineddata", stream = True)
if block:
file.write(block)
# In[3]:
13
# Installing libraries required for optical character recognition
clear_output()
# In[4]:
clear_output()
# In[5]:
# Import libraries
import pytesseract
import cv2
import numpy as np
import re
# In[6]:
image = Image.open(requests.get('https://i.stack.imgur.com/pbIdS.png',
stream=True).raw)
14
image = image.resize((300,150))
image.save('sample.png')
image
# In[7]:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program
Files/Tesseract-OCR/tesseract.exe'
text = pytesseract.image_to_string(image,config=custom_config)
print(text)
# In[8]:
# Extracting text from image and removing irrelevant symbols from characters
try:
text=pytesseract.image_to_string(image,lang="eng")
characters_to_remove = "!()@—*“>+-/,'|£#%$&^_~"
new_string = text
print(new_string)
except IOError as e:
print("Error (%s)." % e)
15
# In[9]:
# Now we will perform opencv operations to get text from complex images
image = cv2.imread('sample.png')
# In[10]:
def get_grayscale(image):
gray = get_grayscale(image)
Image.fromarray(gray)
# In[11]:
# noise removal
def remove_noise(image):
return cv2.medianBlur(image,5)
noise = remove_noise(gray)
Image.fromarray(gray)
# In[12]:
#thresholding
def thresholding(image):
16
return cv2.threshold(image, 0, 255, cv2.THRESH_BINARY +
cv2.THRESH_OTSU)[1]
thresh = thresholding(gray)
Image.fromarray(thresh)
# In[13]:
#erosion
def erode(image):
kernel = np.ones((5,5),np.uint8)
erode = erode(gray)
Image.fromarray(erode)
# In[14]:
#Morphology
def opening(image):
kernel = np.ones((5,5),np.uint8)
opening = opening(gray)
Image.fromarray(opening)
# In[15]:
def canny(image):
17
return cv2.Canny(image, 100, 200)
canny = canny(gray)
Image.fromarray(canny)
# In[16]:
#skew correction
def deskew(image):
angle = cv2.minAreaRect(coords)[-1]
else:
angle = -angle
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
return rotated
rotated = deskew(gray)
Image.fromarray(rotated)
# In[17]:
#template matching
18
def match_template(image, template):
match
# In[18]:
img = cv2.imread('sample.png')
h, w, c = img.shape
boxes = pytesseract.image_to_boxes(img)
for b in boxes.splitlines():
b = b.split(' ')
Image.fromarray(img)
# In[19]:
img = cv2.imread('sample.png')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
keys = list(d.keys())
date_pattern = 'artificially'
n_boxes = len(d['text'])
19
for i in range(n_boxes):
if re.match(date_pattern, d['text'][i]):
Image.fromarray(img)
APPENDEX – II
(SNAP SHOTS)
20
Fig.3: Reading image from URL
21
Fig.5: Greyscale image
22
Fig.7: Pattern on specific or word
23
Fig.8: Final Output
24
REFERENCES
[1] “Text extraction from product images using +AI” submmit by Rajesh shreedhar
Bhat .2020
[2] T.Som, Sumit Saha,"Handwritten Character Recognition Using Fuzzy
Membership Function", International Journal of Emerging Technologies in
Sciences and Engineering, Volume 5, December 2011.
[3] L. A. Zadeh. Fuzzy sets, Information Control 8 (1965) 338–353. Gur, Eran,
and ZeevZelavsky, “Retrieval of Rashi Semi-Cursive Handwriting via Fuzzy
Logic,” IEEE International Conference on Frontiers in Handwriting
Recognition (ICFHR), 2012.
[4] Thomas Natsvhlager, “Optical Character Recognition”, A Tutorial for the
Course Computational Intelligence.
[5] Andrei Polzounov,ArtsiomAblavatski , Sergio Escalera, Shijian Lu, JianfeiCai
“Wordfence: Text Detection In Natural Images With Border Awareness”.
[6] D.Trier , A.K.Jain ,T.Taxt , “Feature Extraction Method for Character
Recognition-A Survey” ,Pattern Recognition.
[7] C.-A. Boiangiu, R. loanitescu and R.-C. Dragomir, “VOTING-BASED OCR
SYSTEM,” Journal of Information Systems & Operations Management, 2016.
[8] Teena Varma Stephen S Madari, Lenita L Montheiro, Rachna S Pooojary 2021
“extraction from image and detection”.
[9] “Text extraction from images”, sahana k Adanthaya published on 2020.
[10] S. Shams, “Machine Learning Medium,” 15 January 2021. An investigation on
feature and text extraction from images using image recognition in Android.
[11] H. Li, P. Wang and C. Shen, “Towards End-to-End Car License Plates
Detection and Recognition with Deep Neural Networks,” arXiv:1709.08828
[cs.CV], p. 9, 2017.
[12] F. Yin, Y.-C. Wu, X.-Y. Zhang and C.-L. Liu, “Scene Text Recognition with
Sliding Convolutional Character Models,” arXiv:1709.01727v1, p. 10, 2017.
[13] K. Chaudhary and R. Bali, “EASTER: Efficient and Scalable Text Recognizer,”
arXiv:2008.07839v2 [cs.CV], p. 9, 2020.
25