Project of Matlab
Project of Matlab
CHARACTERS IN TAMIL
A PROJECT REPORT
SUBMITTED BY
JAIGANESH S (112718104010)
YUVARAJ B (112718104030)
BACHELOR OF ENGINEERING
in
APRIL 2022
1
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
06.…
08…
.20…
21… at St. Peter’s College of
Submitted for viva-voce held on……
Engineering and Technology.
INTERNALEXAMINER EXTERNALEXAMINER
2
ACKNOWLEDGEMENT
3
ABSTRACT
similar and different in some characteristics like color intensity and texture image
segmentation used in almost in all modern computers and mobiles for image
reorganization. Here Tamil handwritten documents are converted into grayscale image
and then segmented into characters segmentation is done in both vertical and horizontal
direction in this algorithm. First, the algorithm checks for touching in character in
horizontal zone and then in vertical zone. If the touching character is in vertical zone,
then the cut should be made in horizontal way. or if the touching character is in
horizontal zone, then the cut should be made in vertical way. This method proposes
4
ABSTRACT (TAMIL)
படப் பிரிவு என்பது ஒரு படத்தத வெெ் வெறு பகுதிகளாகப்
மற் றும் அதமப் புப் படப் பிரிவு வபான்ற சில பண்புகளில் ஒவர
முன்வமாழிகிறது.
5
TABLE OF CONTENTS
LIST OF FIGURES 5
1 INTRODUCTION 8
1.1 Problem Definition 9
9
1.2 Segmentation Methods
1.3 Identification of Touching Characters 11
2 LITERATURE SURVEY 14
3 SYSTEM SPECIFICATION 17
4 SYSTEM ANALYSIS 19
6
5 SYSTEM DESIGN 28
6 TESTING 32
SAMPLE CODE 32
7 CONCLUSION 41
8 REFERENCES 42
7
INTRODUCTION
The Tamil language contains one of the world's oldest scripts, which
goes back to the 6th century BC. And it's thought that such figures were born
in Keezhadi, Tamil Nadu, India. Tamil characters have evolved from Greek
the vowel has 12 characters. Characters have had certain strokes as their
changed the forms using different strokes and shapes to identify the short
(Kuril) and long (Nedil) vowels during the 17th century AD. The famous Tamil
of the characters in the 19th century AD that are still in use today. Because
paper was readily available in their region prior to the development of paper,
8
1.1 PROBLEM DEFINITION:
The difference in the forms of the scripts is caused by the way the pen
is held. This has a significant impact on how the scripts are depicted as
critical step in providing the best answer in the recognition phase. Single and
multiple touching characters are the two forms of touching characters. Single
touching characters are divided into two categories: horizontal touching and
with the same line characters, while vertical touching occurs when the
process. An Adaptive Partial Projection (APP) method can be used to identify the
line numbers and space between the text lines by a piece wise projection method.
9
The second method is known as A* Path Planning (A*PP) is used to identify
the touching and partially overlapping characters in text lines by heuristic way. The
latter uses by means of various cost functions in Thai and Khmer language
manuscripts.
In this Dynamic Labelling Algorithm (DLA), text lines can be segmented with
a recognition accuracy of 96 percent (RA). Even when the characters are strongly
contacting and overlapping with the preceding or subsequent line characters, this
segmentation is the second key phase in OCR of Tamil manuscripts. Image filtering,
remove undesired particles other than the character present in the lines. They also
make the character stroke visible. The flow diagram depicts the suggested character
10
1.3 Identification of Touching Characters
value has been set to assess the weight; when the lower value falls below the
threshold, the characters are treated as single characters and are immediately split
without any complications. The bigger the value, the more touching characters are
The touching character has a greater aspect ratio(ar) than single characters that
are automatically divided. The factor (far) indicated above is used to detect touching
characters which need greater attempt to make single characters. The touching
characters are determined by the formula far =ea/(1+ea), where a=w/h and w and h
are the character's width and height parameters. After the touching character has
touching characters is achievable. The cutting edge is defined as the point at which
11
demonstrate the many ways two characters can contact horizontally. For the above-
weight in columns. They are designated as the lightest weight that can anticipate as
the character's contact point, as well as a cutting edge to segment the character.
Because the weight is derived by the character's column, the cutting edge for this
category of horizontal touching characters. The height of the character, can be used
to identify the vertical touching character. In vertical touching characters, the types
7 to 10 indicate the various ways of touching point. Rows determine the cutting edge
between the characters with the least weight. Vertical touching characters have a
horizontal cutting edge, while segmented characters have a vertical cutting edge.
line characters and vertically touching the subsequent line characters. The horizontal
and vertical cutting-edge method creates touching characters that can be used to
12
LITERATURE SURVEY
In this paper, Anupama. B and Seenivasa Reddy propose a method for extracting
The text image is horizontally projected, and the peaks in the horizontal projection are
used to identify line segments. To divide the text image into segments, a threshold is
used. Another threshold is used to eliminate false lines. For the line segments, vertical
histogram projections are used, which are then decomposed into words and then further
decomposed into characters using threshold. Several document images are used to test
the algorithm. According to the results of the experiment, the proposed method is fast
and reliable for handwritten documents with non-overlapping lines. This method has the
drawback of causing segmentation errors for touching characters. For good quality
Shalini M. and Indira Reddy B. are the authors of this study. One of the most
important aspects of record picture inquiry is content line division. To recognise all
content areas in the record picture, content line separation is critical. We offer a
computation that takes into account multiple histogram projections and uses
content picture is level projected, and then line segments are recognized by the tops in
the flat projection. The limit is used to divide the content image into pieces. False lines
13
are eliminated by using a different edge. For the line segments, vertical histogram
projections are used, which are then fragmented into words using edge and then further
for touching characters. For good quality archives, line division precision with DR is 99
strategy to character segmentation in this study. The fact that Arabic is primarily written
used to establish where the character begins and ends. This is a crucial stage in character
recognition. To separate lines, words, and characters, the suggested method employs
of separating the related characters. The horizontal axis profile and vertical axis profile
were used to discover character separations using the profile's amplitude filter and a
simple edge tool. When tested on several printed papers using various Arabic typefaces,
this method shows promise. The word segmentation algorithm has a 99 percent accuracy
rating. The character segmentation algorithm also reached a satisfactory accuracy ratio
of 98 percent.
14
Partha Bhowmick and Gaurav Harit present a novel character segmentation
approach for Bangla handwritten text documents in this study. This approach is
dependent on the vertices of the outer isothermic coverings being characterised. Words
in a text that correspond to each other. Each cover is divided into several sub-polygons
that represent the characters that make up a word using the vertex characterisation. The
using any heuristics. Accuracy: Using this character segmentation, we acquire a 96.04
percent accuracy.
In this publication, Vijay and Madan Kharat present their findings. In handwritten
Hindi text documents, global threshold performs better for character segmentation, but
Otsu's threshold technique segments words and lines. Character accuracy is determined
not just by the lines and words, but also by the precision of character segmentation. This
method must be used to evaluate the job in order to correct slope and congested lines
documents where accuracy has been degraded. The word segmentation method in
accuracy rate.
15
The precise character level segmentation of printed or handwritten text is a
critical pre-processing step for optical character recognition, according to Soumen Bag
and Ankit Krishna in this paper (OCR). It has been observed that languages with cursive
writing make the segmentation problem significantly more difficult. The fundamental
method for handwritten Hindi words in this research. Segmentation is carried out based
on some structural patterns seen in this language's writing style. The proposed approach
can handle a wide range of writing styles as well as skewed header lines as input. For
both printed and handwritten words, the approach has been tested on our own database.
The average success rate is 96.93 percent when it comes to accuracy. When compared
to other methods, the method produces reasonably good results for this database.
in this method. The image is initially converted to a grayscale format. The image is then
processed using Otsu's thresholding method to boost the text's intensity and make it
stand out from the backdrop. The data is then analysed to detect and rectify skew. The
image is then converted to a histogram, with each low point representing the line space
between lines and being segmented line by line. Vertical histogram is used for character
16
Kathirvalavakumar and Karthigaiselvi's major goal in this study is to use vertical
horizontally overlapping lines and touching characters found in all zones of machine
printed Tamil script. Documents of various categories are gathered and tested. The
suggested method successfully segments all of the photos used in the experiment. Even
when the lines and characters are touched, all of the lines and words are appropriately
segmented, and characters are segmented more precisely. The suggested algorithms
have the benefit of being able to segment more than two touching letters in a word.
Because it is based on projection values, the procedures required in line, word, and
(HCL), especially in case of operating systems. An HCL lists tested, compatibility and
17
sometimes incompatible hardware devices for a particular operating system or
requirements.
software installation package and need to be installed separately before the software is
installed.
18
OPERATING SYSTEM Windows 7 or Higher, Mac, Linux.
IDE Matlab
There are many image segmentation techniques proposed to segment the images
to retrieve essential knowledge and information out of it. All of the techniques vary in
their method used for segmenting the images. Some of the popular techniques used for
➢ Thresholding
➢ Clustering techniques
➢ Watershed segmentation
19
4.1.2 PROPOSED SYSTEM:
similar and different in some characteristics like color intensity and texture image
segmentation used in almost in all modern computers and mobiles for image
reorganization. Here Tamil handwritten documents are converted into grayscale image
and then segmented into characters segmentation is done in both vertical and horizontal
direction in this algorithm. First, the algorithm checks for touching in character in
horizontal zone and then in vertical zone. If the touching character is in vertical zone,
then the cut should be made in horizontal way. or if the touching character is in
horizontal zone, then the cut should be made in vertical way. This method proposes
20
4.2 SYSTEM ARCHITECTURE
21
4.3 SOFTWARE DESCRIPTION
4.3.1 What is Matlab
scientists to analyze and design systems and products that transform our world. The
• Analyse data
• Develop algorithms
MATLAB lets you take your ideas from research to production by deploying to
enterprise applications and embedded devices, as well as integrating with Simulink ® and
Model-Based Design.
22
4.4 MODULE DESCRIPTION
4.4.1 What are Modules?
systems, and interoperability, which allows them to function with the components of
1.uigetfile
file = uigetfile opens a modal dialog box that lists files in the current folder. It
enables a user to select or enter the name of a file. If the file exists and is valid, uigetfile
returns the file name when the user clicks Open. If the user clicks Cancel or the window
2. imread
• A = imread(filename) reads the image from the file specified by filename, inferring
the format of the file from its contents. If filename is a multi-image file, then imread
23
• A = imread(filename,fmt) additionally specifies the format of the file with the
standard file extension indicated by fmt. If imread cannot find a file with the name
This syntax applies only to GIF, PGM, PBM, PPM, CUR, ICO, TIF, SVS, and HDF4
files. You must specify a filename input, and you can optionally specify fmt.
name-value pair arguments, in addition to any of the input arguments in the previous
syntaxes.
• [A,map] = imread(___) reads the indexed image in filename into A and reads its
associated colormap into map. Colormap values in the image file are automatically
This syntax applies only to PNG, CUR, and ICO files. For PNG files, transparency
is the alpha channel, if one is present. For CUR and ICO files, it is the AND (opacity)
mask.
3. imshow
• imshow(I) displays the grayscale image I in a figure. imshow uses the default display
range for the image data type and optimizes figure, axes, and image object properties
24
• imshow(I,[low high]) displays the grayscale image I, specifying the display range as
a two-element vector, [low high]. For more information, see the DisplayRange
argument.
• imshow(I,[]) displays the grayscale image I, scaling the display based on the range
of pixel values in I. imshow uses [min(I(:)) max(I(:))] as the display range. imshow
displays the minimum value in I as black and the maximum value as white. For more
• imshow(BW) displays the binary image BW in a figure. For binary images, imshow
filename.
• imshow(I,RI) displays the image I with associated 2-D spatial referencing object RI.
25
4. rgb2gray
• I = rgb2gray(RGB) converts the truecolor image RGB to the grayscale image I. The
rgb2gray function converts RGB images to grayscale by eliminating the hue and
5. graythresh
method [1]. Otsu's method chooses a threshold that minimizes the intraclass variance
of the thresholded black and white pixels. The global threshold T can be used with
6. im2bw
replacing all pixels in the input image with luminance greater than level with the
value 1 (white) and replacing all other pixels with the value 0 (black).
• This range is relative to the signal levels possible for the image's class. Therefore, a
level value of 0.5 corresponds to an intensity value halfway between the minimum
26
• BW = im2bw(X,cmap,level) converts the indexed image X with colormap cmap to
a binary image.
7. bwareaopen
fewer than P pixels from the binary image BW, producing another binary image,
8. regionprops
• stats = regionprops(BW,properties) returns measurements for the set of properties
for each 8-connected component (object) in the binary image, BW. You can use
specified by properties for each labeled region in the image I. The first input to
27
SYSTYEM DESIGN
28
5.2 CLASS DIAGRAM
29
5.2 ACTIVITY DIAGRAM
30
TESTING
31
6.1 SAMPLE CODE:
[filename,pathname]=uigetfile('*','Load an Image');
%% Read Image
imagen=imread(fullfile(pathname,filename));
%% Show image
figure(1)
imshow(imagen);
%% Complement
%Include this line of code when segmenting an Image with White Text on Black
Background.
%imagen=imcomplement(imagen);
imagen=rgb2gray(imagen);
32
end
threshold = graythresh(imagen);
imagen =~im2bw(imagen,threshold);
imagen = bwareaopen(imagen,30);
pause(1)
figure(2)
imshow(~imagen);
[L Ne]=bwlabel(imagen);
propied=regionprops(L,'BoundingBox');
hold on
33
for n=1:size(propied,1)
rectangle('Position',propied(n).BoundingBox,'EdgeColor','g','LineWidth',2)
end
hold off
pause (1)
%% Objects extraction
figure(3)
for n=1:Ne
[r,c] = find(L==n);
n1=imagen(min(r):max(r),min(c):max(c));
imshow(~n1);
pause(0.5)
end
34
35
36
37
38
39
40
CONCLUSION
characters using the categories of 'horizontal touching' and ‘vertical touching.' In this
methods on line segmented pictures. The Dynamic Labelling is combined with two
more current line segmentation approaches, the APP and A*PP algorithms. APP and
A*PP algorithm provides 87% of RA and 89% accuracy. Dynamic labelling algorithm
41
REFERENCES
[1] Anupama. B & Seenivasa Reddy "Character Segmentation for Telugu handwritten
documents"
[2] Shalini. M & Indira Reddy. B "Character Segmentation for Telugu Image
Document"
[4] Partha Bhowmick & Gaurav Harit "Character Segmentation of Bengali Handwritten
[5] Vijay and Madan kharat "Segmentation of Devanagari handwritten text using
Thresholding approach"
[6] Soumen Bag and Ankit Krishna "Character Segmentation of Hindi Unconstrained
Handwritten Words"
[7] Kiruba and Nivethitha "Segmentation of handwritten tamil character from palm
42