Lexical Analysis: Tokens
Lexical Analysis: Tokens
The first phase of scanner works as a text scanner. This phase scans the source code as a stream
of characters and converts it into meaningful lexemes. Lexical analyzer represents these lexemes
in the form of tokens as:
<token-name, attribute-value>
Tokens
Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some
predefined rules for every lexeme to be identified as a valid token. These rules are defined by
grammar rules, by means of a pattern. A pattern explains what can be a token, and these patterns
are defined by means of regular expressions.
Language
A language is considered as a finite set of strings over some finite set of alphabets.
For example:
int intvalue;
While scanning both lexemes till int, the lexical analyzer cannot determine whether it is a
keyword int or the initials of identifier int value.