Background
Background
Terminologies to know
1. Finite state Automata
2. Regular Expressions
3. Context Free Grammar/ Phrase structure grammar
4. Dependency Grammar
5. Corpus
6. Annotated corpus
7. Other Lexical resources
Formal language theory
Formal language theory
• Alphabet is a finite, non-empty set.
• Elements of the set - symbols.
• Finite sequence of symbols a1a2...an from an alphabet - string
q0 q1 q2 q3 q4
4
Finite State Automata
FSAs recognize the strings represented by regular expressions
• /baa!
• /baaa!
• /baaaa!
a
b a a !
q0 q1 q2 q3 q4
5
Regular Expressions
Regular Expression: Way of describing the structure of the strings in a
language (Formula in algebraic notation)
• Language (over alphabet Σ={a, b})
• L={x|x starts and ends with ‘a’}.
• Regular expression a·(a|b)∗·a is a pattern that captures this
structure and matches any string in L
• String: Any sequence of alphanumeric characters
• Letters, numbers, spaces, tabs, punctuation marks
6
Automata in Language
Automata are computational devices to solve language recognition
problems