unit1
unit1
The term automata, derived from the Greek word “αὐτόματα” meaning “self-
acting”, is the plural of automaton which may be defined as an abstract self-
propelled computing device that follows a predetermined sequence of operations
automatically.
It may be defined as the type of finite automation wherein, for every input symbol
we can determine the state to which the machine will move. It has a finite number
of states which is why the machine is called Deterministic Finite Automaton
(DFA).
Example of DFA —
It may be defined as the type of finite automation where for every input symbol we
cannot determine the state to which the machine will move i.e. the machine can
move to any combination of the states. It has a finite number of states which is
why the machine is called Non-deterministic Finite Automation (NDFA).
Example of NDFA —
Finite Automata in NLP
Language Recognizer
There are many tasks that need language recognizing mechanism. For example,
spelling checker, morphological analysis, language identification etc.. Finite state
machine are quite useful as a language recognizer. For a given word, a NFA can
be designed easily that recognize the word. For example, NFA for the words ‘boy’
and ‘bat’ is shown in the Figure below. Similarly for every word a NFA can be
designed and the different NFA’s can be combined to form spelling checker or
dictionary compilation for a language.
For each category of words, we can form a separate NFA and then combine them
using transitions. For example, nouns and their plural can be recognized through
one NFA and verbs and their different forms can be recognized through another
NFA and finally, the two NFA can be combined. The figure below shows the NFA
for some words and their morphological variations.
NFA for some words and their morphological variations
It starts in some start state and then tries to reach a final state by making transitions
from one state to another. Every time it makes such a transition
it emits (or writes or generates) a symbol.
So, what does the generator in the pictures say? It laughs. It generates sequences of
symbols of the form ha! or haha! or hahaha! or hahahaha! and so on. Why does it
behave like that? Well, it first has to make a transition emitting h. The state that it
reaches through this transition is not a final state. So, it has to keep on going
emitting an a. Here, it has two possibilities: it can either follow the ! arrow,
emitting ! and then stopping in the final state (but remember, it can’t look ahead to
see that it would reach a final state with the ! transition) or it can follow the h
arrow emitting an h and going back to the state where it just came from.
Finite state generators can be thought of as directed graphs. And in fact finite state
generators are usually drawn as directed graphs. Here is our laughing machine as
we will from now on draw finite state generators:
The approach to spelling rules that is described here involves the use of finite state
transducers (FSTs). Rather than jumping straight into this, we will briefly consider
the simpler finite state automata and how they can be used in a simple recogniser.
Suppose we want to recognise dates (just day and month pairs) written in the
format day/month. The day and the month may be expressed as one or two digits
(e.g. 11/2, 1/12 etc). This format corresponds to the following simple FSA, where
each character corresponds to one transition:
This is a non-deterministic FSA: for instance, an input starting with the digit 3 will
move the FSA to both state 2 and state 3. This corresponds to a local ambiguity:
i.e., one that will be resolved by subsequent context. By convention, there must be
no ‘left over’ characters when the system is in the final state.