0% found this document useful (0 votes)
37 views16 pages

Computational Linguistics: Dr. Dina Khattab

The document provides an example of a finite state automaton with states Q={1,2,3}, initial state I={1}, final states F={3}, alphabet T={a,b}, and transitions E between states. It explains that

Uploaded by

Dalia Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views16 pages

Computational Linguistics: Dr. Dina Khattab

The document provides an example of a finite state automaton with states Q={1,2,3}, initial state I={1}, final states F={3}, alphabet T={a,b}, and transitions E between states. It explains that

Uploaded by

Dalia Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Computational Linguistics

Lecture 2

Dr. Dina Khattab


[email protected]
Representations for languages
 We will discuss the two principal methods for defining
languages: the generator and the recognizer

 In particular we will focus on a particular class of


generators (grammars) and of recognizers (automata)

 Regular languages are the simplest formal languages:


• Their generators are the regular expressions
• Their recognizers are the finite state automata
2
Concepts and Notations
 Set: An unordered collection of unique elements
S1 = { a, b, c } S2 = { 0, 1, …, 19 } empty set: 
membership: x  S union: S1  S2 = { a, b, c, 0, 1, …, 19 }
universe of discourse: U subset: S1  U
complement: if U = { a, b, …, z }, then S1' = { d, e, …, z } = U - S1

Alphabet: A finite set of symbols


• Examples:
• S1 = { a, b } S2 = { Spring, Summer, Autumn, Winter }

 String/word: A sequence of zero or more symbols from an


alphabet
• The empty string: e
Concepts and Notations
Language: A set of strings over an alphabet
• Also known as a formal language; may not bear any resemblance to a
natural language, but could model a subset of one.
• The language comprising all strings over an alphabet  is written as: *
Graph: A set of nodes (or vertices), some or all of which may be
connected by edges.
• An example: – A directed graph example:

1 2 a c

3 b
Finite State
Automata (FSA)
5
Finite State Automata
 Language Recognition Problem:
Whether a word belonging to
language?

i.e. given a language description and a


string, is there an algorithm which will
answer yes or no correctly?
6
Finite State Automata
A finite state automaton is an abstract model of a
simple machine (or computer) i.e. a computational
device to solve the language recognition problem

The machine can be in a finite number of states. It


receives symbols as input, and the result of receiving a
particular input in a particular state moves the machine
to a specified new state.

Certain states are finishing states, and if the machine is


in one of those states when the input ends, it has ended
successfully (or has accepted the input). 7
FSA: Formal Definition
A Finite State Automaton (FSA) is a 5-tuple (Q, I, F, T, E) where:
Q = states a finite set;
I = initial states a nonempty subset of Q;
F = final states a subset of Q;
T = an alphabet;
E = edges a subset of Q  T  Q.

FSA can be represented by a labelled, directed graph


= set of nodes (some final/initial) +
directed arcs (arrows) between nodes +
each arc has a label from the alphabet. 2
a
Example: formal definition of A1 1 b
a
b
Q = {1, 2, 3}
3 b
I = {1}
F = {3}
T = {a, b}
E = { (1,a,2), (1,b,3), (2,b,3), (3,a,2), (3,b,3) }
What does it mean to accept
a string/language?
If the FSA is in a final (or accepting) state after all
input symbols have been consumed, then the string
is accepted (or recognized), otherwise it is rejected

2
e.x. String: abb a
1 b
a
Give other Examples! b
3 b
9
The language accepted by A1 is the
set of strings of a's and b's which end
in b, and in which no two a's are
adjacent a
2

1 b
a
b
3 b
10
Finite-state Automata
An FSA defines a regular language over an
alphabet :
•  is a regular language: q0

• Any symbol from  is a regular language:

 = { a, b, c} q0 b q1

• Two concatenated regular languages is a regular


language: b c
q0 q1 q0 q1

 = { a, b, c}
q0 b q1 c q2
FSA Example
Consider the following FSA
T: {0, 1}
Q: {s1, s2}
I: s1
F: s2 0 1
S1 S1 S2
E: S2 S2 S1 12
FSA Example
0 1

S1 S2

13
FSA Example
Determine which string is accepted and
which is rejected:
01101
 011011
 00000
 11111
 10101010 14
Assignment (due to 14 Oct.)
th

Consider the following FSA


T: {a, b, c}
Q: {s1, s2, s3}
I: s1
F: s2, s3 a b c
S1 S1 S2 S2
E: S2 S1 S2 S3 15
S3 S3 S1 S2
Determine which string is
accepted and which is rejected

abb
abba
bcbccc
caaabbc
16

You might also like