0% found this document useful (0 votes)
14 views

Project of Matlab

College project computer science research

Uploaded by

maketrickss129
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Project of Matlab

College project computer science research

Uploaded by

maketrickss129
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

SEGMENTATION OF TOUCHING

CHARACTERS IN TAMIL

A PROJECT REPORT

SUBMITTED BY

JAIGANESH S (112718104010)

MARIYA ANTONY SARATH S (112718104018)

MOHAMMAD ALI S (112718104020)

YUVARAJ B (112718104030)

IN PARTIAL FULFILLMENT FOR THE AWARD OF THE


DEGREE OF

BACHELOR OF ENGINEERING

in

COMPUTER SCIENCE AND ENGINEERING

St. PETER’S COLLEGE OF ENGINEERING


AND TECHNOLOGY

ANNA UNIVERSITY: CHENNAI 600 025

APRIL 2022
1
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “SEGMENTATION OF HANDWRITTEN


CHARACTER IN TAMIL” is the bonafide work of” Jaiganesh. R (112718104010),
Mariya Antony Sarath. s (112718104018), Mohammad ali. S (112718104020) and
Yuvaraj. B (112718104030)” who carried out the project work under my supervision

SIGNATURE SIGNATURE

Ms. P. PREETHY REBECCA Ms. J. WISELY JOE


HEAD OF THE DEPARTMENT SUPERVISOR
Department of Computer Science Department of Computer Science
&Engineering St. Peter’s College of &Engineering St. Peter’s College of
Engineering and Technology, Avadi, Engineering and Technology, Avadi,
Chennai-600054. Chennai-600054.

06.…
08…
.20…
21… at St. Peter’s College of
Submitted for viva-voce held on……
Engineering and Technology.

INTERNALEXAMINER EXTERNALEXAMINER
2
ACKNOWLEDGEMENT

We thank God Almighty for giving us blessings and opportunity to


update our career through St. Peter's College of Engineering and Technology.
We thank the Chairperson Dr. (Mrs.) T. Banumathi and the Trustees
Dr. T. Lasya, M.B.B.S., M.D., FRCR(London), and Dr. T. Namratha,
M.B.B.S., M.B.A., for having provided us the necessary infrastructure that
has helped us to learn and further progress forward in our academic career.
We would like to express our sincere thanks to Dr. C.V. Jayakumar,
M.E., Ph. D. Principal who has been the exceptional source of inspiration
throughout our campaign in this institution.
We would like to express our gratefulness to, Head of the Computer
Science and Engineering Department who has been a source of inspiration
throughout our stint in this institution.
We would like to thank our Project Supervisor Ms. Mary Selvan M.E.,
Associate Professor, for supervising us in the project development and for
rendering her valuable guidance, encouragement and support throughout the
project work.
We extend our sincere thanks to the Project Coordinator Ms. Mary
Selvan, M.E., Associate Professor, for the congruous support and bountiful
encouragement throughout the project work.
We extend our sincere thanks to all Teaching and Non-teaching
faculty of Department of Computer Science and Engineering for their
help and support further. We thank our Parents and Friends who have
constantly encouraged us throughout this course by extending their moral
support to achieve great heights.

3
ABSTRACT

Image Segmentation refers to partition of an image into different region that a

similar and different in some characteristics like color intensity and texture image

segmentation used in almost in all modern computers and mobiles for image

reorganization. Here Tamil handwritten documents are converted into grayscale image

and then segmented into characters segmentation is done in both vertical and horizontal

direction in this algorithm. First, the algorithm checks for touching in character in

horizontal zone and then in vertical zone. If the touching character is in vertical zone,

then the cut should be made in horizontal way. or if the touching character is in

horizontal zone, then the cut should be made in vertical way. This method proposes

more accuracy for touching lines in Tamil language.

4
ABSTRACT (TAMIL)
படப் பிரிவு என்பது ஒரு படத்தத வெெ் வெறு பகுதிகளாகப்

பிரிப் பததக் குறிக்கிறது, இது கிட்டத்தட்ட அதனத்து நவீன

கணினிகள் மற் றும் வமாதபல் களில் படத்தத

மறுசீரதமப் பதற் காக பயன்படுத்தப் படும் ெண்ண தீவிரம்

மற் றும் அதமப் புப் படப் பிரிவு வபான்ற சில பண்புகளில் ஒவர

மாதிரியான மற் றும் வெறுபட்டது. இங் வக தமிழ் தகயால்

எழுதப் பட்ட ஆெணங் கள் கிவரஸ்வகல் படமாக மாற் றப் பட்டு

பின்னர் எழுத்துகளாகப் பிரிக்கப் பட்டு வெங் குத்து மற் றும்

கிதடமட்ட திதெயில் இந்த ெழிமுதறயில் வெய் யப் படுகிறது.

முதலில் , அல் காரிதம் கிதடமட்ட மண்டலத்திலும் பின்னர்

வெங் குத்து மண்டலத்திலும் உள் ள எழுத்துக்கதளத் வதாடுெததெ்

ெரிபார்க்கிறது. வதாடும் எழுத்து வெங் குத்து மண்டலத்தில்

இருந்தால் , வெட்டு கிதடமட்டமாக வெய் யப் பட வெண்டும் ,

அல் லது வதாடும் பாத்திரம் கிதடமட்ட மண்டலத்தில் இருந்தால் ,

வெட்டு வெங் குத்தாக வெய் யப் பட வெண்டும் . இந்த முதற தமிழ்

வமாழியில் ெரிகதளத் வதாடுெதற் கு அதிக துல் லியத்தத

முன்வமாழிகிறது.

5
TABLE OF CONTENTS

CHAPTER TITLE PAGE No.


No.
ABSTRACT 4

LIST OF FIGURES 5

1 INTRODUCTION 8
1.1 Problem Definition 9
9
1.2 Segmentation Methods
1.3 Identification of Touching Characters 11

1.4 Segmentation of Single Touching 11


Character

2 LITERATURE SURVEY 14

3 SYSTEM SPECIFICATION 17

3.1.1 HARDWARE REQUIREMENTS 18

3.1.2 SOFTWARE REQUIREMENTS 18

4 SYSTEM ANALYSIS 19

4.1.1 EXISTING SYSTEM 19

4.1.2 PROPOSED SYSTEM 20


4.2 SYSTEM ARCHITECTURE 21

4.3 SOFTWARE DESCRIPTION 22


4.4 MODULE DESCRIPTION 23

6
5 SYSTEM DESIGN 28

DATA FLOW DIAGRAM 28


USE CASE DIAGRAM 29
CLASS DIAGRAM 30
ACTIVITY DIAGRAM 31

6 TESTING 32

SAMPLE CODE 32

SAMPLE SCREEN SHOT 35

7 CONCLUSION 41

8 REFERENCES 42

7
INTRODUCTION

The Tamil language contains one of the world's oldest scripts, which

goes back to the 6th century BC. And it's thought that such figures were born

in Keezhadi, Tamil Nadu, India. Tamil characters have evolved from Greek

characters in terms of shape. The script was previously known as "Tamili"

before being renamed "Tamil." The script is written in a left-to-right direction.

Vowels (Uyirezhuthu) and consonants (Uyirezhuthu) have been separated

from the script (Meiyezhuthu). The consonant has 18 characters, whereas

the vowel has 12 characters. Characters have had certain strokes as their

forms since the evolution of the language. The Christian preacher

Constantine Joseph Beschi, afterwards known as 'Veeramamunivar,'

changed the forms using different strokes and shapes to identify the short

(Kuril) and long (Nedil) vowels during the 17th century AD. The famous Tamil

Nadu reformer E.V. Ramasamy (Periyar) made adjustments to the shapes

of the characters in the 19th century AD that are still in use today. Because

paper was readily available in their region prior to the development of paper,

a previous civilization of Tamil people used to record their medical tips,

architectural information, astrological views, and literatures.

8
1.1 PROBLEM DEFINITION:

The difference in the forms of the scripts is caused by the way the pen

is held. This has a significant impact on how the scripts are depicted as

touching characters. OCR identifies just single characters, leaving the

touching characters unnoticed. As a result, character segmentation is a

critical step in providing the best answer in the recognition phase. Single and

multiple touching characters are the two forms of touching characters. Single

touching characters are divided into two categories: horizontal touching and

vertical touching. Horizontal touching occurs when the characters interact

with the same line characters, while vertical touching occurs when the

characters interact with consecutive line characters. Multiple touching

characters are those who touch in both directions.

1.2 SEGMENTATION METHODS

An OCR in manuscripts starts with the primary phase of line segmentation

process. An Adaptive Partial Projection (APP) method can be used to identify the

line numbers and space between the text lines by a piece wise projection method.

9
The second method is known as A* Path Planning (A*PP) is used to identify

the touching and partially overlapping characters in text lines by heuristic way. The

latter uses by means of various cost functions in Thai and Khmer language

manuscripts.

In Tamil manuscripts, both the above-mentioned methods are ineffective and

they change the structure of the character.

In this Dynamic Labelling Algorithm (DLA), text lines can be segmented with

a recognition accuracy of 96 percent (RA). Even when the characters are strongly

contacting and overlapping with the preceding or subsequent line characters, this

offers optimum line segmentation in Tamil manuscripts.

Following the successful line segmentation procedure using DLA, character

segmentation is the second key phase in OCR of Tamil manuscripts. Image filtering,

image sharpening, and morphology are employed in the pre-processing stage to

remove undesired particles other than the character present in the lines. They also

make the character stroke visible. The flow diagram depicts the suggested character

segmentation method's step-by-step process.

10
1.3 Identification of Touching Characters

The linked component is used to determine the character's weight. A threshold

value has been set to assess the weight; when the lower value falls below the

threshold, the characters are treated as single characters and are immediately split

without any complications. The bigger the value, the more touching characters are

assessed using the following factor:

The touching character has a greater aspect ratio(ar) than single characters that

are automatically divided. The factor (far) indicated above is used to detect touching

characters which need greater attempt to make single characters. The touching

characters are determined by the formula far =ea/(1+ea), where a=w/h and w and h

are the character's width and height parameters. After the touching character has

been identified, it is divided into three categories: horizontal touching, vertical

touching, and multiple touching characters.

1.4 Segmentation of Single Touching Character

When the cutting edge in between characters is fixed, segmentation of

touching characters is achievable. The cutting edge is defined as the point at which

two linked characters are separated by a segmentation point. Types 1 to 6

11
demonstrate the many ways two characters can contact horizontally. For the above-

mentioned category characters, the cutting edge is employed by calculating their

weight in columns. They are designated as the lightest weight that can anticipate as

the character's contact point, as well as a cutting edge to segment the character.

Because the weight is derived by the character's column, the cutting edge for this

category of horizontal touching characters. The height of the character, can be used

to identify the vertical touching character. In vertical touching characters, the types

7 to 10 indicate the various ways of touching point. Rows determine the cutting edge

between the characters with the least weight. Vertical touching characters have a

horizontal cutting edge, while segmented characters have a vertical cutting edge.

Multiple touching characters can be identified by horizontally contacting the same

line characters and vertically touching the subsequent line characters. The horizontal

and vertical cutting-edge method creates touching characters that can be used to

segment single characters.

12
LITERATURE SURVEY

In this paper, Anupama. B and Seenivasa Reddy propose a method for extracting

image features based on multiple histogram projections and morphological operators.

The text image is horizontally projected, and the peaks in the horizontal projection are

used to identify line segments. To divide the text image into segments, a threshold is

used. Another threshold is used to eliminate false lines. For the line segments, vertical

histogram projections are used, which are then decomposed into words and then further

decomposed into characters using threshold. Several document images are used to test

the algorithm. According to the results of the experiment, the proposed method is fast

and reliable for handwritten documents with non-overlapping lines. This method has the

drawback of causing segmentation errors for touching characters. For good quality

documents, DR has a line segmentation accuracy of 99 percent, whereas RA has a line

segmentation accuracy of 98 percent.

Shalini M. and Indira Reddy B. are the authors of this study. One of the most

important aspects of record picture inquiry is content line division. To recognise all

content areas in the record picture, content line separation is critical. We offer a

computation that takes into account multiple histogram projections and uses

morphological administrators to concentrate elements of the image in this work. The

content picture is level projected, and then line segments are recognized by the tops in

the flat projection. The limit is used to divide the content image into pieces. False lines

13
are eliminated by using a different edge. For the line segments, vertical histogram

projections are used, which are then fragmented into words using edge and then further

deteriorated to characters. This strategy's drawback is that it resulted in division mishaps

for touching characters. For good quality archives, line division precision with DR is 99

percent and RA is 98 percent.

A Mahmoud and A. Mousa present a Hough-based and a Histogram-based

strategy to character segmentation in this study. The fact that Arabic is primarily written

in cursive poses a significant obstacle. As a result, a segmentation procedure must be

used to establish where the character begins and ends. This is a crucial stage in character

recognition. To separate lines, words, and characters, the suggested method employs

Projection-based approach concepts. The stage of character segmentation is in charge

of separating the related characters. The horizontal axis profile and vertical axis profile

were used to discover character separations using the profile's amplitude filter and a

simple edge tool. When tested on several printed papers using various Arabic typefaces,

this method shows promise. The word segmentation algorithm has a 99 percent accuracy

rating. The character segmentation algorithm also reached a satisfactory accuracy ratio

of 98 percent.

14
Partha Bhowmick and Gaurav Harit present a novel character segmentation

approach for Bangla handwritten text documents in this study. This approach is

dependent on the vertices of the outer isothermic coverings being characterised. Words

in a text that correspond to each other. Each cover is divided into several sub-polygons

that represent the characters that make up a word using the vertex characterisation. The

suggested method is unique in that it can execute character segmentation without

deskewing skewed documents and is suitable to handwritten Bangla language without

using any heuristics. Accuracy: Using this character segmentation, we acquire a 96.04

percent accuracy.

In this publication, Vijay and Madan Kharat present their findings. In handwritten

Hindi text documents, global threshold performs better for character segmentation, but

Otsu's threshold technique segments words and lines. Character accuracy is determined

not just by the lines and words, but also by the precision of character segmentation. This

method must be used to evaluate the job in order to correct slope and congested lines

documents where accuracy has been degraded. The word segmentation method in

obtained an accuracy of 85 percent. The character segmentation algorithm had a 92%

accuracy rate.

15
The precise character level segmentation of printed or handwritten text is a

critical pre-processing step for optical character recognition, according to Soumen Bag

and Ankit Krishna in this paper (OCR). It has been observed that languages with cursive

writing make the segmentation problem significantly more difficult. The fundamental

problem in handwritten character segmentation is dealing with the inherent variety in

different people's writing styles. They describe an effective character segmentation

method for handwritten Hindi words in this research. Segmentation is carried out based

on some structural patterns seen in this language's writing style. The proposed approach

can handle a wide range of writing styles as well as skewed header lines as input. For

both printed and handwritten words, the approach has been tested on our own database.

The average success rate is 96.93 percent when it comes to accuracy. When compared

to other methods, the method produces reasonably good results for this database.

Kiruba and Nivethitha presented a histogram strategy to segment the characters

in this method. The image is initially converted to a grayscale format. The image is then

processed using Otsu's thresholding method to boost the text's intensity and make it

stand out from the backdrop. The data is then analysed to detect and rectify skew. The

image is then converted to a histogram, with each low point representing the line space

between lines and being segmented line by line. Vertical histogram is used for character

segmentation. Line segmentation has a 97 percent accuracy rate. Character

segmentation is 87 percent accurate.

16
Kathirvalavakumar and Karthigaiselvi's major goal in this study is to use vertical

and horizontal projections based on character structure to separate the various

horizontally overlapping lines and touching characters found in all zones of machine

printed Tamil script. Documents of various categories are gathered and tested. The

suggested method successfully segments all of the photos used in the experiment. Even

when the lines and characters are touched, all of the lines and words are appropriately

segmented, and characters are segmented more precisely. The suggested algorithms

have the benefit of being able to segment more than two touching letters in a word.

Because it is based on projection values, the procedures required in line, word, and

character segmentation algorithms are fairly simple to implement. The projection

profile-based approach has a 92 percent accuracy. The accuracy of the connected

component labelling approach is 91 percent.

3.1 SYSTEM SPECIFICATION


3.1.1 Hardware Requirements:

The most common set of requirements defined by any operating system or

software application is the physical computer resources, also known as hardware. A

hardware requirements list is often accompanied by a hardware compatibility list

(HCL), especially in case of operating systems. An HCL lists tested, compatibility and

17
sometimes incompatible hardware devices for a particular operating system or

application. The following sub-sections discuss the various aspects of hardware

requirements.

RAM 8 GB And Above


Any Intel or AMD x86-64
PROCESSOR processor with four logical
cores
SPEED 2.6 GHz and Above
STORAGE 20 - 30 GB of SSD
MONITOR Any Monitor With 1280x720
pixels
INPUT Basic Keyboard & Mouse
Hardware accelerated
GRAPHICS graphics card supporting
OpenGL 3.3 with 1GB GPU
memory

Table 3.1.1 Hardware Requirements

3.1.2 Software Requirements:

Software Requirements deal with defining software resource requirements and

prerequisites that need to be installed on a computer to provide optimal functioning of

an application. These requirements or prerequisites are generally not included in the

software installation package and need to be installed separately before the software is

installed.

18
OPERATING SYSTEM Windows 7 or Higher, Mac, Linux.

CODING LANGUAGE Matlab

IDE Matlab

Table 3.1.2 Software Requirements

4.1 SYSTEM ANALYSIS


4.1.1 EXISTING SYSTEM:

There are many image segmentation techniques proposed to segment the images

to retrieve essential knowledge and information out of it. All of the techniques vary in

their method used for segmenting the images. Some of the popular techniques used for

image segmentation are as follows:

➢ Thresholding

➢ Edge detection segmentation

➢ Region based segmentation techniques

➢ Clustering techniques

➢ Watershed segmentation

19
4.1.2 PROPOSED SYSTEM:

Image Segmentation refers to partition of an image into different region that a

similar and different in some characteristics like color intensity and texture image

segmentation used in almost in all modern computers and mobiles for image

reorganization. Here Tamil handwritten documents are converted into grayscale image

and then segmented into characters segmentation is done in both vertical and horizontal

direction in this algorithm. First, the algorithm checks for touching in character in

horizontal zone and then in vertical zone. If the touching character is in vertical zone,

then the cut should be made in horizontal way. or if the touching character is in

horizontal zone, then the cut should be made in vertical way. This method proposes

more accuracy for touching lines in Tamil language.

20
4.2 SYSTEM ARCHITECTURE

21
4.3 SOFTWARE DESCRIPTION
4.3.1 What is Matlab

Matlab is a programming platform designed specifically for engineers and

scientists to analyze and design systems and products that transform our world. The

heart of MATLAB is the MATLAB language, a matrix-based language allowing the

most natural expression of computational mathematics.

4.3.2 What Can I Do With MATLAB?

• Analyse data

• Develop algorithms

• Create models and applications

MATLAB lets you take your ideas from research to production by deploying to

enterprise applications and embedded devices, as well as integrating with Simulink ® and

Model-Based Design.

22
4.4 MODULE DESCRIPTION
4.4.1 What are Modules?

A module is a separate unit of software or hardware. Typical characteristics of

modular components include portability, which allows them to be used in a variety of

systems, and interoperability, which allows them to function with the components of

other systems. The term was first used in architecture.

4.4.2 List of Modules

1.uigetfile
file = uigetfile opens a modal dialog box that lists files in the current folder. It

enables a user to select or enter the name of a file. If the file exists and is valid, uigetfile

returns the file name when the user clicks Open. If the user clicks Cancel or the window

close button (X), uigetfile returns 0.

2. imread

• A = imread(filename) reads the image from the file specified by filename, inferring

the format of the file from its contents. If filename is a multi-image file, then imread

reads the first image in the file.

23
• A = imread(filename,fmt) additionally specifies the format of the file with the

standard file extension indicated by fmt. If imread cannot find a file with the name

specified by filename, it looks for a file named filename.fmt.

• A = imread(___,idx) reads the specified image or images from a multi-image file.

This syntax applies only to GIF, PGM, PBM, PPM, CUR, ICO, TIF, SVS, and HDF4

files. You must specify a filename input, and you can optionally specify fmt.

• A = imread(___,Name,Value) specifies format-specific options using one or more

name-value pair arguments, in addition to any of the input arguments in the previous

syntaxes.

• [A,map] = imread(___) reads the indexed image in filename into A and reads its

associated colormap into map. Colormap values in the image file are automatically

rescaled into the range [0,1].

• [A,map,transparency] = imread(___) additionally returns the image transparency.

This syntax applies only to PNG, CUR, and ICO files. For PNG files, transparency

is the alpha channel, if one is present. For CUR and ICO files, it is the AND (opacity)

mask.

3. imshow

• imshow(I) displays the grayscale image I in a figure. imshow uses the default display

range for the image data type and optimizes figure, axes, and image object properties

for image display.

24
• imshow(I,[low high]) displays the grayscale image I, specifying the display range as

a two-element vector, [low high]. For more information, see the DisplayRange

argument.

• imshow(I,[]) displays the grayscale image I, scaling the display based on the range

of pixel values in I. imshow uses [min(I(:)) max(I(:))] as the display range. imshow

displays the minimum value in I as black and the maximum value as white. For more

information, see the DisplayRange argument.

• imshow(RGB) displays the truecolor image RGB in a figure.

• imshow(BW) displays the binary image BW in a figure. For binary images, imshow

displays pixels with the value 0 (zero) as black and 1 as white.

• imshow(X,map) displays the indexed image X with the colormap map.

• imshow(filename) displays the image stored in the graphics file specified by

filename.

• imshow(___,Name,Value) displays an image, using name-value pairs to control

aspects of the operation.

• himage = imshow(___) returns the image object created by imshow.

• imshow(I,RI) displays the image I with associated 2-D spatial referencing object RI.

• imshow(X,RX,map) displays the indexed image X with associated 2-D spatial

referencing object RX and colormap map.

25
4. rgb2gray

• I = rgb2gray(RGB) converts the truecolor image RGB to the grayscale image I. The

rgb2gray function converts RGB images to grayscale by eliminating the hue and

saturation information while retaining the luminance. If you have Parallel

Computing Toolbox™ installed, rgb2gray can perform this conversion on a GPU.

• newmap = rgb2gray(map) returns a grayscale colormap equivalent to map.

5. graythresh

• T = graythresh(I) computes a global threshold T from grayscale image I, using Otsu's

method [1]. Otsu's method chooses a threshold that minimizes the intraclass variance

of the thresholded black and white pixels. The global threshold T can be used with

imbinarize to convert a grayscale image to a binary image.

• [T,EM] = graythresh(I) also returns the effectiveness metric, EM.

6. im2bw

• BW = im2bw(I,level) converts the grayscale image I to binary image BW, by

replacing all pixels in the input image with luminance greater than level with the

value 1 (white) and replacing all other pixels with the value 0 (black).

• This range is relative to the signal levels possible for the image's class. Therefore, a

level value of 0.5 corresponds to an intensity value halfway between the minimum

and maximum value of the class.

26
• BW = im2bw(X,cmap,level) converts the indexed image X with colormap cmap to

a binary image.

• BW = im2bw(RGB,level) converts the truecolor image RGB to a binary image.

7. bwareaopen

• BW2 = bwareaopen(BW,P) removes all connected components (objects) that have

fewer than P pixels from the binary image BW, producing another binary image,

BW2. This operation is known as an area opening.

• BW2 = bwareaopen(BW,P,conn) removes all connected components, where conn

specifies the desired connectivity.

8. regionprops
• stats = regionprops(BW,properties) returns measurements for the set of properties

for each 8-connected component (object) in the binary image, BW. You can use

regionprops on contiguous regions and discontiguous regions

• stats = regionprops(CC,properties) measures a set of properties for each connected

component (object) in CC, which is a structure returned by bwconncomp.

• stats = regionprops(L,properties) measures a set of properties for each labeled

region in label image L.

• stats = regionprops(___,I,properties) returns measurements for the set of properties

specified by properties for each labeled region in the image I. The first input to

regionprops (BW, CC, or L) identifies the regions in I.

27
SYSTYEM DESIGN

5.1 DATA FLOW DIAGRAM

5.2 USE CASE DIAGRAM

28
5.2 CLASS DIAGRAM

29
5.2 ACTIVITY DIAGRAM

30
TESTING

31
6.1 SAMPLE CODE:

%% Image segmentation and extraction%% Get file from folder

[filename,pathname]=uigetfile('*','Load an Image');

%% Read Image

imagen=imread(fullfile(pathname,filename));

%% Show image

figure(1)

imshow(imagen);

title('INPUT IMAGE WITH NOISE')

%% Complement

%Include this line of code when segmenting an Image with White Text on Black

Background.

%imagen=imcomplement(imagen);

%% Convert to gray scale

if size(imagen,3)==3 % RGB image

imagen=rgb2gray(imagen);

32
end

%% Convert to binary image

threshold = graythresh(imagen);

imagen =~im2bw(imagen,threshold);

%% Remove all object containing fewer than 30 pixels

imagen = bwareaopen(imagen,30);

pause(1)

%% Show image binary image

figure(2)

imshow(~imagen);

title('INPUT IMAGE WITHOUT NOISE')

%% Label connected components

[L Ne]=bwlabel(imagen);

%% Measure properties of image regions

propied=regionprops(L,'BoundingBox');

hold on

%% Plot Bounding Box

33
for n=1:size(propied,1)

rectangle('Position',propied(n).BoundingBox,'EdgeColor','g','LineWidth',2)

end

hold off

pause (1)

%% Objects extraction

figure(3)

for n=1:Ne

[r,c] = find(L==n);

n1=imagen(min(r):max(r),min(c):max(c));

imshow(~n1);

pause(0.5)

end

6.2 SAMPLE SCREEN SHOT

34
35
36
37
38
39
40
CONCLUSION

The current article discusses the character segmentation of touching Tamil

characters using the categories of 'horizontal touching' and ‘vertical touching.' In this

paper, an approach called Dynamic Labelling is used to compare character segmentation

methods on line segmented pictures. The Dynamic Labelling is combined with two

more current line segmentation approaches, the APP and A*PP algorithms. APP and

A*PP algorithm provides 87% of RA and 89% accuracy. Dynamic labelling algorithm

provides 91% accuracy in segmentation of touching character. A compression table on

different methods are also discussed.

41
REFERENCES

[1] Anupama. B & Seenivasa Reddy "Character Segmentation for Telugu handwritten

documents"

[2] Shalini. M & Indira Reddy. B "Character Segmentation for Telugu Image

Document"

[3] A. Mahmoud and A. Mousa "Arabic Character Segmentation Using Projection

Based Approach with Profile Amplitude Filter"

[4] Partha Bhowmick & Gaurav Harit "Character Segmentation of Bengali Handwritten

Text by Vertex Characterization"

[5] Vijay and Madan kharat "Segmentation of Devanagari handwritten text using

Thresholding approach"

[6] Soumen Bag and Ankit Krishna "Character Segmentation of Hindi Unconstrained

Handwritten Words"

[7] Kiruba and Nivethitha "Segmentation of handwritten tamil character from palm

script using histogram approach"

[8] Kathirvalavakumar and Karthigaiselvi "Efficient Segementation of

Printed Tamil Scripts into Characters using Projection and Structure"

42

You might also like