Lexical Analysis 3
Lexical Analysis 3
token
Source Lexical To semantic
Parser
program Analyzer analysis
getNextToken
Symbol
table
Why to separate Lexical
analysis and parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
Tokens, Patterns and Lexemes
• E = M * C ** 2
• <id, pointer to symbol table entry for E>
• <assign-op>
• <id, pointer to symbol table entry for M>
• <mult-op>
• <id, pointer to symbol table entry for C>
• <exp-op>
• <number, integer value 2>
Lexical errors
E = M * C * * 2 eof
Sentinels
d1 -> r1
d2 -> r2
…
dn -> rn
• Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Extensions
• Example:
• letter_ -> [A-Za-z_]
• digit -> [0-9]
• id -> letter_(letter|digit)*
Recognition of tokens
lex.yy.c
C a.out
compiler
declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
Compiling & executing lex programs