0% found this document useful (0 votes)
22 views2 pages

Introduction To Lexical Analysis

The document discusses the role of a lexical analyzer, or scanner, in the compilation process, which involves breaking down source code into tokens. It highlights key tasks such as reading input, identifying tokens using regular expressions, and discarding unnecessary characters. Additionally, it covers tokenization examples and implementation techniques, including table-driven approaches and lexical analyzer generator tools like Lex and Flex.

Uploaded by

Muhammad Danish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views2 pages

Introduction To Lexical Analysis

The document discusses the role of a lexical analyzer, or scanner, in the compilation process, which involves breaking down source code into tokens. It highlights key tasks such as reading input, identifying tokens using regular expressions, and discarding unnecessary characters. Additionally, it covers tokenization examples and implementation techniques, including table-driven approaches and lexical analyzer generator tools like Lex and Flex.

Uploaded by

Muhammad Danish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Lecture 5-6: Introduction to Lexical Analysis

Role of Lexical Analyzer (Scanner)


A lexical analyzer, often referred to as a scanner, is the first phase of a compiler. Its primary role
is to break down the input source code into meaningful units called tokens. These tokens are the
building blocks for subsequent phases of the compiler.
Key Tasks of a Lexical Analyzer:
1. Read the input character stream: The scanner reads the source code character by
character.
2. Identify tokens: It recognizes and classifies tokens based on predefined patterns.
3. Discard white spaces and comments: It removes unnecessary characters like spaces,
tabs, and comments.
4. Create a token stream: It generates a sequence of tokens, each with a token type and an
associated value (lexeme).
Regular Expressions and Finite Automata
Regular expressions and finite automata are two fundamental tools used in lexical analysis:
Regular Expressions:
 A concise way to describe patterns of characters.
 Used to define the syntax of tokens.
 Examples of regular expressions for tokens:
o \d+: Matches one or more digits (integer)
o \w+: Matches one or more word characters (identifier)
o [+-/*]: Matches one of the arithmetic operators
Finite Automata:
 A mathematical model for recognizing patterns in text.
 Can be represented as a directed graph with states and transitions.
 A finite automaton can be constructed from a regular expression to recognize the tokens
defined by the expression.
Tokenization
Tokenization is the process of breaking down the input source code into tokens. The lexical
analyzer uses the defined regular expressions and finite automata to identify and classify tokens.
Example of Tokenization:
Consider the following C code:
C
int x = 10 + 20;
Use code with caution.
The lexical analyzer would break it down into the following tokens:
 int: Keyword
 x: Identifier
 =: Assignment operator
 10: Integer literal
 +: Addition operator
 20: Integer literal
 ;: Semicolon
Implementation Techniques:
 Table-driven approach: Uses a finite state machine and a transition table to recognize
tokens.
 Lexical analyzer generator tools: Tools like Lex and Flex can automatically generate
lexical analyzers from regular expressions.

You might also like