CD Unit 1
CD Unit 1
Example:
Position:= initial+ rate * 60
id1:= id2 + id3 * 60 (tokens)
Syntax Analysis
• Heirarchical analysis or Parsing.
• the tokens are categorized hierarchically into nested
groups.
• Rules-
• Any Identifier is an Expression.
• Any Number is an Expression.
• If E1 & E2 are expression, then E1*E2 & E1+E2 are
expressions.
• If id is identifier & E is an expression then:
Id:=E
Semantic Analysis
• This phase is used to check whether the
components of the source program are
meaningful or not.
• It validates the syntax tree by applying rules &
regulations of the target language.
• It does type checking, scope resolution, variable
declaration, etc.
• It decorates the syntax tree by putting data types,
values, etc.
Semantic Analysis
Example:-
Position:= initial+ rate * 60
id1:= id2 + id3 * 60 (tokens)
Intermediate Code Generation
• After syntax and semantic analysis of the source
program, many compilers generate an explicit low-
level or machine-like intermediate representation.
• This intermediate representation should have two
important properties: – it should be easy to produce
and it should be easy to translate into the target
machine.
• Intermediate representation has variety of forms one
such representation is three address code.
• Each instruction has at most 3 operands.
• Consider intermediate instruction of 3 address
code-
Position:= initial+ rate * 60
id1:= id2 + id3 * 60 (tokens)
• Example
Position:= initial+ rate * 60
id1:= id2 + id3 * 60 (tokens)
Optimized code is-
Temp1:=id3 * 60.0
id1:=id2+Temp1
Code Generation
• Consists of relocatable assembly code or machine code.
• Example
Position:= initial+ rate * 60
id1:= id2 + id3 * 60 (tokens)
• Target Code-
MOVF id3,R2
MULF #60.0,R2
MOVF id2,R1
ADDF R2,R1
MOVF R1,id1
Symbol Table
• It stores identifiers identified in lexical analysis.
• A Symbol table is a data structure containing a
record for each identifier, with fields for the
attributes of the identifier.
• The data structure allows us to find the record for
each identifier quickly and to store or retrieve
data from that record quickly.
• It adds type and scope information during
syntactical and semantical analysis.
• This info is used in code generation to find which
instructions to use.
Error Handler
• Error can be reported in the form of massage.
1. Removal of comments
eg. int main()
{
// a=10;
int a=10;
}
2. Removal of white spaces
eg. int main()
{
// a=10;
int a= 10;
}
int a=10;
3. Correlates error messages with the source
program.
Token, Pattern, Lexeme
• Token: Sequence of characters that have a
collective meaning. Typical tokens are,
1) Identifiers
2) Keywords
3) Operators
4) Special symbols
5) Constants
Example of Tokens-
• Keywords-for, while, if etc.
• Identifier-Variable name, function name, etc.
• Operators- '+', '++', '-' etc.
Example of Non-Tokens:
• Comments,blanks, tabs, newline, etc
• Pattern: A set of rules used for identifying
various tokens presents in source program is
called pattern.
• Lexeme: A lexeme is a sequence of characters
in the source program that is matched by the
pattern for a token.
OR
Lexemes are said to be a sequence of characters
(alphanumeric) in a token.
Example of Tokens-
Consider this expression in the programming language
C:
sum = 3 + 2;
Lexeme Token category
sum Identifier
= Assignment operator
3 Integer literal
+ Addition operator
2 Integer literal
; End of statement
Token Lexeme Pattern
letter followed by letters
ID x y n0
and digits
IF if if
LPAREN ( (
• Symbol
• String
• Length of String
• Prefix and Suffix of a string
• Concatenation of two string
• Reverse of string
• Operation on string
• Alphabets
• Language
Recognition of Tokens
– Recognition of identifier.
– Recognition of delimiter.
– Recognition of keywords.
– Recognition of operators.
– Recognition of numbers.
Input Buffer
• Input buffering is done for increasing efficiency of
the compiler.