Lecture 08
Lecture 08
Yaregal Assabie
2018/19—Sem I
Modes of Language Representation
Written and Spoken Languages
Speech Recognition
Conversion to Machine-Editable Text
Optical Character Recognition
♦ Spoken Language
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 2/42
Modes of Language Representation
Written and Spoken Languages
Speech Recognition
Conversion to Machine-Editable Text
Optical Character Recognition
• Most of the NLP applications and tasks discussed so far assume that the language is
represented as machine editable text.
• Optical Character Recognition (OCR) Systems convert non-editable texts into their
equivalent machine-editable text.
• Speech Recognition (SR) Systems convert spoken language into its equivalent machine
editable text.
• Both OCR and SR systems merge interdisciplinary technologies from Signal Processing,
Pattern Recognition, Natural Language, and Linguistics into a unified framework.
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 3/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems
Parameters of SR Systems
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 4/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems
General Architecture
Acoustic Signal
Feature Extraction
Acoustic Model +
Language Model Decoding
Lexical Model
Text
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 5/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems
Aበበ በሶ በላ
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 6/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems
• Feature Extraction is the process of transforming the input acoustic signal data into the
set of features.
• Mel-Frequency Cepstral Coefficients (MFCCs) are commonly used as features in speech
recognition systems.
♦ A total of 39 features are extracted.
Wavefile
Feature Extraction
Feature
Vectors
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 7/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems
• Further reading:
Language Modeling in Statistical Machine Translation [Lecture 08].
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 8/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems
• The goal of the probabilistic noisy channel architecture for speech recognition can be
summarized as follows:
What is the most likely sentence out of all sentences in the language L given some acoustic input O?”
• We can treat the acoustic input O as a sequence of individual “symbols” or
“observations” (for example by slicing up the input every 10 milliseconds, and
representing each slice by floating-point values of the energy or frequencies of that
slice).
O = o1,o2,o3, . . . ,ot
• Similarly, we treat a sentence as if it were composed of a string of words:
W = w1,w2,w3, . . . ,wn
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 9/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 10/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 11/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 12/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems
• Given the language model and acoustic model (along with lexical model), a decoder
searches for the best sequence of words from speech.
• The Viterbi algorithm is widely used as a decoder in Speech Recognition systems
• Currently, the HMM Toolkit (HTK) is the most widely used open source toolkit to
implement HMM-based Speech Recognition systems.
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 13/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
• Optical Character Recognition (OCR) is a process that involves reading text from paper
in the form of image and converting the image into a standard encoding scheme
representing the text, e.g. ASCII or Unicode.
• The idea of OCR came into existence when G. Tauscheck obtained a patent on ‘Reading
Machine’ in Germany in 1929.
♦ However, the modern history of OCR started with the advent of computers.
• In the early years of Latin OCR, some standards of fonts were developed to help easy
recognition.
♦ Among the standard OCR fonts are OCR-A and OCR-B, which are widely used
in passports, bank checks, serial tracking labels, credit card imprints, cash
registers, license plates and postal mails.
OCR-A
OCR-B
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 14/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 15/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Recognition
Type of input text
Method Technology Complexity
Machine printed Offline OCR Easy
Offline handwritten Offline ICR Difficult
Online handwritten Online ICR Easy
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 16/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
General Architecture
Text Image
Preprocessing
Segmentation
Feature Extraction
Optional Component
Language
Classification Model
Post-Processing
Editable Text
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 17/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
• Preprocessing stage aims to produce data that are easy for recognition systems to
produce accurate results.
• It includes image enhancement, noise removal, skewness and slant correction, and size
normalization and thinning.
• Image Enhancement
♦ Used to improve the quality of degraded documents which is typically
observed in ancient documents.
♦ The enhancement can be done by filling some part of missing data or by
adjusting the intensity of images.
• Noise Removal
♦ Noise is commonly present in ancient documents, low quality papers, or poor
printing and writing conditions.
♦ Noisy documents are improved by using smoothing operations which replace
each pixel with some function of the pixel’s neighborhood.
♦ Morphological operations such as dilation and erosion can be used for noise
removal.
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 18/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
♦ The use of Gaussian function is one of the most commonly used methods for
noise removal due to its isotropic smoothing.
♦ A 2-dimensional (2D) Gaussian function is defined as:
1 ⎛ x2 + y2 ⎞
g ( x, y ) = exp⎜⎜ − ⎟
2πσ 2
⎝ 2σ 2 ⎟⎠
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 19/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 20/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
• Segmentation refers to all procedures in which observed patterns in the image are
segregated into units of sub-patterns such as graphical objects, tables, text lines,
words, and characters.
• Handwriting systems usually have difficulties to segment unconstrained text into
individual characters.
♦ With this regard, recognition systems are seen to follow either of the two
paradigms: segmentation-based and segmentation-free.
♦ Segmentation-based approaches assume that the would-be characters are
extracted for further processing such as feature description or recognition.
This assumption can be feasible for machine-printed documents but it
is not easy for handwritten texts.
♦ Thus, most handwriting recognition systems are designed based on a
segmentation-free paradigm.
Here, words are considered to be inputs for the system and for this
reason, the segmentation-free technique is also known as holistic
approach.
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 21/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 22/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 23/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Text Line Detection in Skewed Handwritten Amharic Document Images [From EthioReader]
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 24/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 25/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 26/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 27/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Direction field image of the Ethiopic character “ ም” scanned from a noisy document
• Gradient field is a low level feature describing the change in gray level with direction.
♦ Calculated by taking the difference in value of neighboring pixels, producing a
vector for each pixel.
Can be computed by convolving the image with a Gaussian and
derivatives of Gaussian operators.
♦ The gradient of pixels is expressed in the range of [0..360] degrees, where
pixels with directions of zero are represented by the red color.
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 28/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Direction field image of the Ethiopic character “ ም” scanned from a noisy document
• Direction field represents the ideal local direction of pixels characterized by the fact
that the gray value remains constant in one direction (along the direction of lines), and
only changes in the orthogonal direction.
♦ Can be computed by convolving the image with a Gaussian and derivatives of
Gaussian operators, and then by pixel-wise complex squaring.
♦ The direction of pixels is represented in double angle and expressed in the
range of [0..180] degrees, where pixels with directions of zero are represented
by the red color.
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 29/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 30/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 31/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Time Parameterized Direction Field (Double Angle Representation) for Online Handwritten “ጬ”
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 32/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Time Parameterized Direction Field (Normal Angle Representation) for Online Handwritten “ጬ”
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 33/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 34/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 35/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
• The primary goal of any recognition system is to classify unknown data into a set of
known categories.
♦ The basic idea is to take the extracted features and determine what label
(class) it should have with minimal error.
♦ The classes in text recognition systems can be characters in a script or words in
a lexicon.
• Classification is the final stage in recognition systems in which a decision is made on
the recognition of a given input.
• The result of decision made by the system can be:
♦ Correct Classification
A given input is recognized by the system as a correct class.
♦ Misclassification
A given input is recognized by the system as a wrong class.
♦ Rejection
Occurs when the system cannot match the input data with one of the
known classes.
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 36/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Approaches to Recognition
• The field of pattern recognition, of which character recognition is a sub-field, has seen
much progress since its beginnings.
• A large number of different approaches have been proposed to solve pattern
recognition problems.
• However, most of them are grouped into one of the following four important
recognition techniques:
♦ Template matching
♦ Structural and syntactic
♦ Statistical
♦ Artificial neural network
• Despite their strengths to solve a particular problem, not a single approach is found to
be optimal for all pattern recognition problems.
• Each of these recognition techniques have their own advantages and limitations, and
hybrid systems draw upon the synergy effect of two or more techniques.
• Hybrid methods aim at combining the advantages of different paradigms within a single
system.
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 37/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 38/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
• Syntactic and structural techniques utilize structural features and syntactic rules to
recognize patterns (characters).
♦ They are used for recognition of complex patterns which are represented in
terms of the interrelationships between simple sub-patterns called primitives.
♦ Large number of complex patterns can be described by a small number of
primitives and their spatial relationships.
♦ This provides a description of how a given character is constructed from the
given set of primitives.
• Recognition is made by parsing the sub-patterns according to a predefined rule and
grammar, and the recognition accuracy depends on the successful extraction of
primitives and their relationships.
• The choice of primitives is application dependent and relies on the general
understanding of the language, the script as well as the technical and mathematical
model building.
• The relationships of primitive structural features are represented by means of symbolic
data structures such as strings, trees, and graphs.
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 39/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 40/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition
• Artificial neural network (ANNs) are recently introduced pattern techniques inspired by
neuronal operations in biological systems.
♦ Although established in the 1940s, ANNs have been considerably applied in
the field of pattern recognition only since the 1980s.
• ANNs are a large number of highly interconnected processing elements called neurons,
which are organized into three layers:
♦ Input Layer
Takes data of the unknown pattern
♦ Hidden Layer
Contains many of the neurons in various interconnected structures
hidden from the outside view.
♦ Output Layer
Provides an interface for generating the recognition result.
• ANNs are known to be more effective on handwritten character recognition.
• Samples, pixels, or features can be used as inputs for the neural network system.
• Like statistical classification methods, neural networks require training of samples from
which they learn about how new samples are classified.
Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 41/42
TOC: Course Syllabus