0% found this document useful (0 votes)
171 views4 pages

Token Lexeme Pattern

Its a note of Tekens and Lexeme of Compiler construction course

Uploaded by

Saeed Azfar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views4 pages

Token Lexeme Pattern

Its a note of Tekens and Lexeme of Compiler construction course

Uploaded by

Saeed Azfar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Token, Patterns, and Lexemes

In computer science, it is important for the programmer to understand the various


basic elements that compose programming languages. These include tokens, patterns,
and lexemes, among others, which are essential in parsing and interpreting code.
A compiler is system software that translates the source program written in a high-
level language into a low-level language. The compilation process of source code is
divided into several phases in order to ease the process of development and design.
The phases work in sequence as the output of the previous phase is utilized in the next
phase. The various phases are as follows:
 Lexical Analysis
 Syntax Analysis
 Semantic Analysis
 Intermediate Code Generation
 Code Optimization
 Storage Allocation
 Code Generation
Lexical Analysis Phase
In this phase, input is the source program that is to be read from left to right and the
output we get is a sequence of tokens that will be analyzed by the next Syntax
Analysis phase. During scanning the source code, white space characters, comments,
carriage return characters, preprocessor directives, macros, line feed characters, blank
spaces, tabs, etc. are removed. The Lexical analyzer or Scanner also helps in error
detection. To exemplify, if the source code contains invalid constants, incorrect
spelling of keywords, etc. is taken care of by the lexical analysis phase. Regular
expressions are used as a standard notation for specifying tokens of a programming
language.
What is a Token?
In programming, a token is the smallest unit of meaningful data; it may be an
identifier, keyword, operator, or symbol. A token represents a series or sequence of
characters that cannot be decomposed further. In languages such as C, some examples
of tokens would include:
 Keywords: Those reserved words in C like `int`, `char`, `const`, `goto`, etc.
 Identifiers: Names of variables and user-defined functions.
 Operators: `+`, `–`, `*`, `/`, etc.
 Delimiters/Punctuators: Symbols used such as commas “,” semicolons “;” braces `
{} `.
By and large, tokens may be divided into three categories:
 Terminal Symbols (TRM): Keywords and operators.
 Literals (LIT): Values like numbers and strings.
 Identifiers (IDN): Names defined by the user.
Let’s understand now how to calculate tokens in a source code (C language):
Example 1:
int a = 10; //Input Source code

Tokens
int (keyword), a(identifier), =(operator), 10(constant) and ;(punctuation-semicolon)
Answer – Total number of tokens = 5
Example 2:
int main() {

// printf() sends the string inside quotation to


// the standard output (the display)
printf("Welcome to GeeksforGeeks!");
return 0;
}
Tokens
'int', 'main', '(', ')', '{', 'printf', '(', ' "Welcome to GeeksforGeeks!" ',
')', ';', 'return', '0', ';', '}'
Answer – Total number of tokens = 14
What is a Lexeme?
A lexeme is a sequence of source code that matches one of the predefined patterns and
thereby forms a valid token. For example, in the expression `x + 5`, both `x` and `5`
are lexemes that correspond to certain tokens. These lexemes follow the rules of the
language in order for them to be recognized as valid tokens.
Example:
Main is lexeme of type identifier (token)

(,),{,} are lexemes of type punctuation(token)


What is a Pattern?
A pattern is a rule or syntax that designates how tokens are identified in a
programming language. In fact, it is supposed to specify the sequences of characters
or symbols that make up valid tokens, and provide guidelines as to how to identify
them correctly to the scanner.
Example of Programming Language (C, C++)
For a keyword to be identified as a valid token, the pattern is the sequence of
characters that make the keyword.
For identifier to be identified as a valid token, the pattern is the predefined rules that
it must start with alphabet, followed by alphabet or a digit.

Difference between Token, Lexeme, and Pattern


Criteria Token Lexeme Pattern

It is a sequence of
Token is basically a It specifies a
characters in the source
sequence of set of rules
code that are matched by
characters that are that a scanner
Definition given predefined language
treated as a unit as it follows to
rules for every lexeme to
cannot be further create a
be specified as a valid
broken down. token.
token.

all the reserved The sequence


Interpretation
keywords of that of characters
of type int, goto
language(main, printf, that make the
Keyword
etc.) keyword.

it must start
with the
Interpretation
name of a variable, alphabet,
of type main, a
function, etc followed by
Identifier
the alphabet
or a digit.

Interpretation
all the operators are
of type +, = +, =
considered tokens.
Operator

each kind of
Interpretation punctuation is
of type considered a token. (, ), {, } (, ), {, }
Punctuation e.g. semicolon,
bracket, comma, etc.

any string of
characters
Interpretation a grammar rule or “Welcome to
(except ‘ ‘)
of type Literal boolean literal. GeeksforGeeks!”
between ” and

Output of Lexical Analysis Phase


The output of Lexical Analyzer serves as an input to Syntax Analyzer as a sequence
of tokens and not the series of lexemes because during the syntax analysis phase
individual unit is not vital but the category or class to which this lexeme belongs is
considerable.
Example:
z = x + y;
This statement has the below form for syntax analyzer
<id> = <id> + <id>; //<id>- identifier (token)
The Lexical Analyzer not only provides a series of tokens but also creates a Symbol
Table that consists of all the tokens present in the source code except Whitespaces
and comments.
Conclusion
Tokens, patterns, and lexemes represent basic elements of any programming
language, helping to break down and start making sense of code. Tokens are the basic
units of meaningful things; patterns define how such units are identified, whereas the
lexemes are actual sequences that match patterns. Basically, understanding these
concepts is indispensable in programming and analyzing codes efficiently.
Token, Patterns, and Lexemes – FAQs
What is the function of a token in programming?
The tokens are the minor units of significant code, keywords or operators, which are
detected during the processes of compilation or interpretation.
How do patterns feature in the tokenization process?
Patterns provide rules to identify tokens through the matching of sequences of
characters to predefined formats.
What is the difference between token and lexeme?
A token is the type of unit, like identifier or keyword, for example, while a lexeme
would be the actual sequence of characters which matches a pattern for that token.
Why are patterns important in programming languages?
Patterns are important in programming languages because they ensure that code is
parsed correctly and that the valid sequences for tokens are predefined to help
interpret the code accurately.

You might also like