Lecture 3 (30-1-23)
Lecture 3 (30-1-23)
• token (also called word) –> set of strings defining an atomic element
with a defined meaning
• pattern -> a rule describing a set of string (specified using regular
expression)
• lexeme -> a sequence of characters that match some pattern
• symbol -> the recognized token
• At the first occurrence of the symbol, entry is made in symbol
table
• Additional information (attributes) about the symbol may be
added by the parser
Examples
integer (0-9)* 42
• Keywords, operators, identifiers (names), constants, literal strings, punctuation symbols such as
parentheses, brackets, commas, semicolons, and colons, etc.
• A unique integer representing the token is passed by LA to the parser
• Attributes for tokens (apart from the integer representing the token)
• identifier: the lexeme of the token, or a pointer into the symbol table where the lexeme is
stored by the LA
• intnum: the value of the integer (similarly for floatnum, etc.)
• string: the string itself
• The exact set of attributes are dependent on the compiler designer
Challenges in lexical analysis
• Certain languages do not have any reserved words, e.g., while, do, if, else, etc., are reserved in ’C’, but not in
PL/1
Example of using do loop in FORTRAN
• In FORTRAN, some keywords are context-dependent