Lexical Analysis I: Prof. Bodik CS 164 Lecture 2 1
Lexical Analysis I: Prof. Bodik CS 164 Lecture 2 1
Lecture 2
stream of tokens
PA3: parser
PA4: checker
• Lexer input:
\tif(i==j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
• Lexer output:
a sequence Token-lexeme pairs:
(Whitespace, “\t”),
(Keyword, “if”),
(OpenPar, “(“),
(Identifier, “i”),
(Relation, “==“),
(Identifier, “j”),
…
• Webster:
– “an item in the vocabulary of a language”
• cs164:
– strings into which the input string is partitioned.
– serve as attributes of tokens:
(Whitespace, “\t”),
(Keyword, “if”), (Keyword, “class”)
(OpenPar, “(”),
(Identifier, “i”), (Identifier, “Foo”)
(Relation, “==”), (Relation, “>”)
• given:
– D: description of the lexical part of the input
language L
– in our case (L=Decaf)
• deliver:
– the lexer for the language L
– such that you produce the lexer directly from D
\tif(i==j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
lexer
description PA2: lexer
(Whitespace, “\t”),
(Keyword, “if”),
(OpenPar, “(“),
(Identifier, “i”),
(Relation, “==“),
(Identifier, “j”),
…
Prof. Bodik CS 164 Lecture 2 12
Outline (continued)
• lexeme Token
• a left-parenthesis OpenPar
• Regular expressions
– Simple and useful theory
– Easy to understand
– Efficient implementations
• Union
L(A | B) = { s | s L(A) or s L(B) }
• Examples:
‘if’ | ‘then‘ | ‘else’ = { “if”, “then”, “else”}
‘0’ | ‘1’ | … | ‘9’ = { “0”, “1”, …, “9” }
(note the … are just an abbreviation in this slide)
• Another example:
(‘0’ | ‘1’) (‘0’ | ‘1’) = { “00”, “01”, “10”, “11” }
Prof. Bodik CS 164 Lecture 2 19
More Compound Regular Expressions
digit = ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
number = digit digit*
Abbreviation: A+ = A A*
(‘ ‘ | ‘\t’ | ‘\n’)+
• Consider [email protected]
= letters { ., @ }
name = letter+
address = name ‘@’ name (‘.’ name)*
• Finite automata
– Deterministic Finite Automata (DFAs)
– Non-deterministic Finite Automata (NFAs)
• Transition
s1 a s2
• Is read
In state s1 on input “a” go to state s2
• If end of input
– If in accepting state => accept
– Otherwise => reject
• A state
• An accepting state
a
• A transition
• A finite automaton accepts a string if we can follow transitions labeled with the characters in the string from the start to some accepting state
• Alphabet {0,1}
• What language does this recognize?
1 0
0 0
1
1
• Alphabet still { 0, 1 }
1
0 1
• Input: 1 0 1
1 0
0 0
DFA
1
1
• For
• For input a
a
• For AB
A
B
• For A | B
B
A
• For A*
A
C 1 E
A B 1
0 F G H I J
D
Prof. Bodik CS 164 Lecture 2 46
Next
NFA
Regular
expressions DFA
Lexical Table-driven
Specification Implementation of DFA
C 1 E
A B 1
0 F G H I J
D
0
0 FGABCDHI
ABCDHI 0 1
1
1 EJGABCDHI
Prof. Bodik CS 164 Lecture 2 49
NFA to DFA. Remark
0
0 T
S 0 1
1
1 U
0 1
S T U
T T U
U T U
Prof. Bodik CS 164 Lecture 2 52
Implementation (Cont.)