0% found this document useful (0 votes)
6 views1 page

Lexical Analysis: Tokens

The first phase of compiling source code is lexical analysis. This phase scans the code as characters and converts lexemes into meaningful tokens represented as pairs of a token name and attribute value. Lexemes are sequences of alphanumeric characters that comprise tokens, which are identified by predefined grammar rules and regular expressions. When scanning code, the lexical analyzer uses the longest match rule - it identifies a completed word when it encounters a whitespace, operator, or special symbol.

Uploaded by

Er Ajay Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views1 page

Lexical Analysis: Tokens

The first phase of compiling source code is lexical analysis. This phase scans the code as characters and converts lexemes into meaningful tokens represented as pairs of a token name and attribute value. Lexemes are sequences of alphanumeric characters that comprise tokens, which are identified by predefined grammar rules and regular expressions. When scanning code, the lexical analyzer uses the longest match rule - it identifies a completed word when it encounters a whitespace, operator, or special symbol.

Uploaded by

Er Ajay Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Lexical Analysis

The first phase of scanner works as a text scanner. This phase scans the source code as a stream
of characters and converts it into meaningful lexemes. Lexical analyzer represents these lexemes
in the form of tokens as:

<token-name, attribute-value>

Tokens
Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some
predefined rules for every lexeme to be identified as a valid token. These rules are defined by
grammar rules, by means of a pattern. A pattern explains what can be a token, and these patterns
are defined by means of regular expressions.

Language

A language is considered as a finite set of strings over some finite set of alphabets.

Longest Match Rule


When the lexical analyzer read the source-code, it scans the code letter by letter; and when it
encounters a whitespace, operator symbol, or special symbols, it decides that a word is
completed.

For example:

int intvalue;
While scanning both lexemes till int, the lexical analyzer cannot determine whether it is a
keyword int or the initials of identifier int value.

A form of recursive-descent parsing that does not require any back-tracking is


known as predictive parsing.

You might also like