Optical Character Recognition Based Speech Synthesis: Project Report
Optical Character Recognition Based Speech Synthesis: Project Report
RECOGNITION BASED
SPEECH SYNTHESIS
PROJECT REPORT
COURSE
VIRTUAL INSTRUMENTATION
(EEE4035)
GUIDIED BY:
PROF. ABHISHEK G
APRIL 2019
1
CONTENTS
2
OBJECTIVE
Speech signal is more effective means of communication than text because blind and
visually impaired persons can also respond to sounds. Knowledge extraction by just
listening to sounds is a distinctive property. The OCR based speech synthesis system
will significantly improve the degree to which the visually impaired can interact with
their environment as that of a sighted person. This project aims to develop a cost
effective, and user friendly optical character recognition (OCR) based speech
synthesis system. The OCR based speech synthesis system has been developed using
Laboratory virtual instruments engineering workbench (LabVIEW).
3
INTRODUCTION
4
VI COMPONENTS
1. Read soundinput
2. Spectral Measurements
3. Write soundoutput
4. Waveformgraph
5. Array to Clusterblock
6. Path
5
METHODOLOGY
7
METHODOLOGY
Image Acquisition: The image has been captured using a digital HP scanner. The flap of the scanner
had been kept open during the acquisition process in order toobtain a uniform black background.
The image had been acquired using the program developed in LabVIEW. The configuration of the
Image has been done with the help of Image create subvi function of LabVIEW. The configuration
of the image means selecting the image type and border size of the image as per the requirement. In
this work 8 bit image with border size of 3 has been used.
Image Pre-processing (Binarization): Binarization is the process of converting a gray scale image (0 to 255
pixel values) into binary image (0 to1 pixel values) by using a threshold value. The pixels lighter than the
threshold are turned to white and the remainder to black pixels. In this work a global thresholding with a
threshold value of 175 has been used to binarize the image i.e. the values of pixel which are from 175 to 255
has been converted to 1 while the of pixel which have gray scale value less than 175 have been converted to
0.
Image Segmentation: The segmentation process consists of line segmentation, word segmentation
9
METHODOLOGY
and finally character segmentation.
1. Line segmentation is the first step of the segmentation process. It takes the array of the image as an input
and scans the image horizontally to find first ON pixel and remember that coordinate as y1.
2. In the word segmentation process the line segmented images have been vertically scanned to find first ON
pixel. When this happen the system remember the coordinate of this point as x1. This is the starting
coordinate for the word.
3. Character segmentation has been performed by scanning the word segmented image vertically. This
process is different from the word segmentation in following two ways: i) Number of horizontal OFF pixels
between the different characters are less in comparison to number of OFF pixels between the words ii) Total
number of characters and their order in the word has been determined so as to reproduce the word correctly
during speech synthesis.
1.
2.
1
0
METHODOLOGY
Matching and Recognition: In this process, correlation between stored templates and segmented character
has been obtained by using correlation VI. The correlation VI determines the correlation between segmented
character and stored templates of each character. The value of the highest correlation recognizes a particular
character. In this way in order to recognize the character every segmented character has been compared with
the predefined data stored in the system. Since same font size has been used for recognition, a unique match
for the each character has been obtained. Figure 6 shows the LabVIEW program of correlation between two
images.
In text to speech module text recognised by OCR system will be the inputs of speech synthesis
system which is to be converted into speech in .wav file format and creates a wave file named
output wav, which can be listen by using wave file player. Two steps are involved in text to speech
synthesis:
1
1
METHODOLOGY
i) Text to speech conversion:
1
2
LabVIEW PROGRAM
1. BLOCKDIAGRAM
2. FRONTPANEL
8
Experiments have been performed to test the proposed system developed using
LabVIEW 7.1 version. The developed OCR based speech synthesis system
has two steps:
a. Optical Character Recognition
b. Speech Synthesis
Step 1. The scanner scans the printed text and the system reads the image using
IMAQ ReadFile and display the image by using IMAQ WindDraw function
of the LabVIEW
Step 2. In this step binarization of the image has been done with a threshold of
175 and the resulting image.
Step-3.-In this step line segmentation of thresholded image has been done.
Step 4. In this step words have been segmented from the line.
Step 5. In this step character segmentation has been performed and all the chacter
in word image window have been segmentated. The segmenatation of first
three characters of word ”Optical”
Step 6. Finally the output of OCR system is in text format which has been stored
in a computer system. The result of recognized text can also be shown on
Front pane
A wave file output.wav is created containing text converted into speech which
10
can listen using wave file player. The waveform will vary according to the
different text from OCR output in the text box and can be listened on the
speaker.
11
APPLICATION
The uses of OCR vary across different fields. One widely known OCR
application is in banking, where OCR is used to process checks without
human involvement. A check can be inserted into a machine, the writing
on it is scanned instantly, and the correct amount of money is transferred.
This technology has nearly been perfected for printed checks, and is fairly
accurate for handwritten checks as well, though it occasionally requires
manual confirmation. Overall, this reduces wait times in many banks.
The blind source separation technique can be used to separate the source
from all other types of noise. This technique can also be used to extract
audio when we have no idea about the sources. For example, if three
people are talking at the same time, the audio signal of each person can be
extracted as separate audio signals.
11
CONCLUSION
In this report, an OCR based speech synthesis system (which can be used as a good
mode of communication between people) has been discussed. The system has been
implemented on LabVIEW 7.1 platform. The developed system consists of OCR and
speech synthesis. In OCR printed or written character documents have been scanned
and image has been acquired by using IMAQ Vision for LabVIEW. The different
characters have been recognized using segmentation and correlation based methods
developed in LabVIEW. In second section recognized text has been converted into
speech using Microsoft Speech Object Library (Version 5.1). The developed OCR
based speech synthesis system is user friendly, cost effective and gives the result in
the real time. Moreover, the program has the required flexibility to be modified easily
if required.
12
REFERENCES
https://www.slideshare.net/BharatThakur1/ocr-speech-using-labview
https://www.nr.no/~eikvil/OCR.pdf
https://en.wikipedia.org/wiki/Optical_character_recognition
https://supplychainminded.com/3-practical-applications-ocr-technology-business-easier/
13