CT - Lecture 2
CT - Lecture 2
CT - Lecture 2
LECTURE 2
Phases of Compiler:
Lexical Analysis
• The first phase of the scanner works as a text scanner. This phase scans the
source code as a stream of characters and converts it into meaningful lexemes.
<token-name, attribute-value>
Phases of Compiler:
Syntax Analysis
• The next phase is called the syntax analysis or parsing.
• It takes the token produced by lexical analysis as input and generates a parse
tree <or syntax tree>.
• In this phase, token arrangements are checked against the source code
grammar,
• i.e. the parser checks if the expression made by the tokens is syntactically
correct.
Phases of Compiler:
Semantic Analysis
• Semantic analysis checks whether the parse tree constructed follows the rules
of language.
• For example, the assignment of values is between compatible data types, and
adding string to an integer. Also, the semantic analyzer keeps track of
identifiers, their types and expressions; whether identifiers are declared before
use or not, etc. The semantic analyzer produces an annotated syntax tree as an
output.
Phases of Compiler:
Intermediate Code Generation
• After semantic analysis, the compiler generates an intermediate code of the
source code for the target machine. It represents a program for some abstract
machine. It is in between the high-level language and the machine language.
This intermediate code should be generated in such a way that it makes it
easier to translate into the target machine code.
Phases of Compiler:
Code Optimization
• The next phase is code optimization of the intermediate code. Optimization can
be assumed as something that removes unnecessary code lines and arranges
the sequence of statements to speed up the program execution without wasting
resources <CPU, memory>.
Phases of Compiler:
Code Generation
• In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language. The code
generator translates the intermediate code into a sequence of <generally> re-
locatable machine code. The sequence of instructions of machine code
performs the task as the intermediate code would do.
Phases of Compiler:
Symbol Table
• It is a data structure maintained throughout all the phases of a compiler. All the
identifier's names along with their types are stored here. The symbol table
makes it easier for the compiler to quickly search the identifier record and
retrieve it. The symbol table is also used for scope management.
Lexical Analysis
• Lexical analysis is a compiler’s first phase, also called linear analysis or scanning.
It takes modified source code from language pre-processors that are written in the
form of sentences. The lexical will read the source program one by one letter and
group the characters into meaningful sequences called lexemes. For each lexeme,
the lexical analyzer produces as output a token of the form
token-name; attribute-value
• If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer
works closely with the syntax analyzer. It reads character streams from the source
code, checks for legal tokens, and passes the data to the syntax analyzer when it
demands.
Lexical Analysis
• Correlate error messages from the compiler with the source program.
Lexical Analysis
• Token: A token is a group of characters having collective
meaning: typically, a word or punctuation mark, separated by a
lexical analyzer and passed to a parser.
• A lexeme is an actual character sequence forming a specific
token instance, such as num.
• Pattern: A rule describing the strings associated with a token.
Expressed as a regular expression and explaining how a particular
token can be formed.
Lexical Analysis
• For example, in C language, the variable declaration
line
int value = 100;
<int, keyword> <value, identifier> < =, operator > <100, constant> <; , symbol>
Lexical Analysis
• What are Tokens?
• A token is the smallest individual element of a program that
is meaningful to the compiler. It cannot be further divided.
Identifiers, strings, keywords, etc., can be the example of the
token. In the lexical analysis phase of the compiler, the
program is converted into a stream of tokens.
Lexical Analysis
• Different Types of Tokens