CC 02 - Scanning Notes
CC 02 - Scanning Notes
m a i n ( ) _ { |
int i;
_ _ i n t _ i ; |
i = 0;
_ _ i _ = _ 0 ; |
while (i < 10) {
_ _ w h i l e _ ( i _ < _ 1 0 ) _ { |
i = i + 1;
_ _ _ _ i _ = _ i _ + _ 1 ; |
}
_ _ } |
i = 11;
_ _ i _ = _ 11; |
} }
24-"main" 8 9 27
43 24-"i" 17
24-"i" 21 16-'0' 17
33 8 24-"i" 18 16-'10' 9 27
main
24-"i" 21 24-"i" 11 16-'1' 17
__int
28
____i
24-"i" 21 16-'11' 17
__=
28 ____i
____0
__while
____<
______i
______10
____=
______i
______+
________i
________1
a scanner recognizes symbols
__=
(sequences of characters) and
____i
____11
returns tokens (integers) that
represent symbols uniquely
the scanner maintains a global variable currentCharacter which is
initialized to the first character of the input program by invoking
a library procedure:
readCharacter();
which reads the next character from the input program. the
scanner is invoked by the parser through a procedure:
getSymbol();
which returns the token that represents the next symbol (in
another global variable). for each invocation of getSymbol() the
scanner checks if currentCharacter already constitutes a valid
symbol. if yes, the scanner invokes readCharacter() (to prepare for
the next invocation of getSymbol()) and returns the appropriate
token. if not, the scanner keeps invoking readCharacter() until it
recognizes a valid symbol or returns an error.
1. define the set of valid symbols
(identifiers are sequences of letters and digits that
start with a letter; numbers are sequences of digits;
strings are sequences of printable characters in quotes)
2. define the set of keywords
3. define what a comment is
4. define symbol-to-token mapping
5. implement in your language