0% found this document useful (0 votes)
13 views11 pages

IAT - 1 Set2 Answers

The document is an answer key for an internal exam on Compiler Design at Stella Mary's College of Engineering. It includes questions on compiler definitions, error types, attributes, lexical analysis, DFA construction, token recognition, and predictive parsing. Each section provides detailed explanations and examples related to compiler design concepts.

Uploaded by

binusha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views11 pages

IAT - 1 Set2 Answers

The document is an answer key for an internal exam on Compiler Design at Stella Mary's College of Engineering. It includes questions on compiler definitions, error types, attributes, lexical analysis, DFA construction, token recognition, and predictive parsing. Each section provides detailed explanations and examples related to compiler design concepts.

Uploaded by

binusha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

QP Code : 16727

S5 104 CS3501 Reg. No.:


9635 – STELLA MARY’S COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
INTERNAL EXAM – I ( NOV-DEC 2023)
III Year / V Semester
CS3501-COMPILER DESIGN ANSWER KEY
Date &Time: 13/09/2023 & 1:20 PM to 3.20 PM Max. : 60 Marks

Part – A
(Each Question carries 2 Marks)
1. What is a compiler, and why is it essential in program execution?
A compiler is a software tool that translates high-level source code written in a programming language
into machine code, intermediate code, or another low-level format understandable by a computer. This
translation typically occurs in multiple stages, such as lexical analysis, syntax analysis, semantic analysis,
optimization, and code generation.
2. Differentiate between compilation and interpretation with examples
Aspect Compilation Interpretation
Translates the entire source code into Executes the source code line-by-line or
Definition
machine code before execution. statement-by-statement.
Produces an independent executable file
Output Does not produce a separate executable file.
(machine code).
Faster since the translation is done Slower because each instruction is translated
Execution Speed
beforehand. and executed on the fly.
Errors are detected at compile time before
Error Handling Errors are detected during runtime.
execution.
C, C++, Java (with JVM for bytecode
Examples Python, Ruby, JavaScript.
execution).
3. Explain ambiguous grammar with a suitable example.
An ambiguous grammar is a grammar in which a single string (or sentence) can have more than one valid parse tree
or derivation. In other words, the grammar allows multiple interpretations of the same input, making it difficult to determine the
intended structure of the string.
4. Enumerate the types of errors that can occur during compilation and describe various recovery modes
Types of Errors

1. Lexical Errors: Errors due to invalid tokens, such as illegal characters or identifiers.
2. Syntax Errors: Errors in the grammar or structure of the program (e.g., missing semicolon or brackets).
3. Semantic Errors: Errors due to incorrect meaning or usage of constructs (e.g., type mismatches).
4. Runtime Errors: Errors that occur during program execution (e.g., division by zero, null pointer access).
5. Logical Errors: Errors in the logic of the program that lead to incorrect outputs.
Recovery Modes
1. Panic Mode Recovery: The parser skips to a predefined set of synchronizing tokens to resume parsing.
2. Phrase-Level Recovery: The parser corrects the error by inserting, deleting, or modifying tokens locally.
3. Error Productions: Specific error-handling rules are added to the grammar to detect and recover from common
errors.
4. Global Correction: Analyzes and modifies the entire source code to correct errors with minimal changes.

5. Compare synthesized attribute and inherited attribute


QP Code : 16727

Aspec
Synthesized Attribute Inherited Attribute
t
An attribute whose value is computed from the An attribute whose value is determined using the
Definition
attributes of child nodes. attributes of parent or sibling nodes.
Direction Information flows top-down or laterally in the
Information flows bottom-up in the parse tree.
of Flow parse tree.
Often used for evaluating expressions or computing Often used for passing contextual information, such
Usage
results. as variable scope.
Example In E → E1 + T, the value of E.val = E1.val + T.val. In S → id = E, the type of E is inherited from S.

Part – B
(Answer for each question carries 10 Marks)
1) a) Illustrate the structure of LEX with a detailed explanation and a sample program.

Lexical Analysis
It is the first step of compiler design, it takes the input as a stream of characters and gives the output as tokens also known
as tokenization. The tokens can be classified into identifiers, Sperators, Keywords, Operators, Constants and Special
Characters.
It has three phases:
 Tokenization: It takes the stream of characters and converts it into tokens.
 Error Messages: It gives errors related to lexical analysis such as exceeding length, unmatched string, etc.
 Eliminate Comments: Eliminates all the spaces, blank spaces, new lines, and indentations.
What is Lex in Compiler Design?
Lex is a tool or a computer program that generates Lexical Analyzers (converts the stream of characters into tokens).
The Lex tool itself is a compiler. The Lex compiler takes the input and transforms that input into input patterns. It is
commonly used with YACC(Yet Another Compiler Compiler). It was written by Mike Lesk and Eric Schmidt.

Function of Lex
1. In the first step the source code which is in the Lex language
having the file name ‘File.l’ gives as input to the Lex Compiler
commonly known as Lex to get the output as lex.yy.c.
2. After that, the output lex.yy.c will be used as input to the C
compiler which gives the output in the form of an ‘a.out’ file, and
finally, the output file a.out will take the stream of character and
generates tokens as output.

lex.yy.c: It is a C program.
File.l: It is a Lex source program
a.out: It is a Lexical analyzer

(b) Describe the structure and working of a compiler with a detailed diagram.

Phases of a Compiler
We basically have two phases of compilers, namely the Analysis phase and Synthesis phase. The analysis phase creates an
intermediate representation from the given source code. The synthesis phase creates an equivalent target program from the
intermediate representation.
A compiler is a software program that converts the high-level source code written in a programming language into low-
level machine code that can be executed by the computer hardware. The process of converting the source code into
machine code involves several phases or stages, which are collectively known as the phases of a compiler. The typical
phases of a compiler are:
QP Code : 16727

1. Lexical Analysis: The first phase of a compiler is lexical analysis, also known as scanning. This phase reads the
source code and breaks it into a stream of tokens, which are the basic units of the programming language. The tokens
are then passed on to the next phase for further processing.
2. Syntax Analysis: The second phase of a compiler is syntax analysis, also known as parsing. This phase takes the
stream of tokens generated by the lexical analysis phase and checks whether they conform to the grammar of the
programming language. The output of this phase is usually an Abstract Syntax Tree (AST).
3. Semantic Analysis: The third phase of a compiler is semantic analysis. This phase checks whether the code is
semantically correct, i.e., whether it conforms to the language’s type
system and other semantic rules. In this stage, the compiler checks the
meaning of the source code to ensure that it makes sense. The compiler
performs type checking, which ensures that variables are used correctly
and that operations are performed on compatible data types. The
compiler also checks for other semantic errors, such as undeclared
variables and incorrect function calls.
4. Intermediate Code Generation: The fourth phase of a compiler is
intermediate code generation. This phase generates an intermediate
representation of the source code that can be easily translated into
machine code.
5. Optimization: The fifth phase of a compiler is optimization. This phase
applies various optimization techniques to the intermediate code to
improve the performance of the generated machine code.
Code Generation: The final phase of a compiler is code generation. This phase
takes the optimized intermediate code and generates the actual machine code
that can be executed by the target hardware
7) a) Construct and minimize the DFA for the given regular expression (a+b).(b+c)
Step 1: Breakdown of the Regular Expression
The given regular expression is:
 (a + b): Matches either “a” or “b”.
 (b + c): Matches either “b” or “c”.
 Concatenation: The sequence matches a string where the first component “a + b” is followed by the second
component “b + c”.

Step 2: NFA Construction


NFA for (a + b):
State 0 → (via a) → State 1
State 0 → (via b) → State 1
NFA for (b + c):
State 2 → (via b) → State 3
State 2 → (via c) → State 3
Concatenation of (a + b).(b + c):
1. Start from State 0 for (a + b).
2. Connect the accepting state of (a + b) (State 1) to the start of (b + c) (State 2).
3. Final accepting state is State 3.
NFA Transitions:
State Input a Input b Input c
0 1 1 -
1 - 2 2
2 - 3 3
3 - - -

Step 3: DFA Construction


Using the subset construction method, we derive the DFA:
QP Code : 16727

DFA States and Transitions:


DFA State NFA States Input a Input b Input c
{0} {0} {1} {1} -
{1} {1} - {2} {2}
{2} {2} - {3} {3}
{3} {3} - - -

Step 4: Minimization of DFA


Merge equivalent states in the DFA:
1. {0}, {1}, {2}, {3} are distinct because they lead to different transitions.
2. Final minimized DFA remains the same as the derived DFA.

Minimized DFA Representation


State Input a Input b Input c Accepting
0 1 1 - No
1 - 2 2 No
2 - 3 3 No
3 - - - Yes

Diagram of Minimized DFA


1. State 0: Start state.
o a → State 1
o b → State 1
2. State 1:
o b → State 2
o c → State 2
3. State 2:
o b → State 3
o c → State 3
4. State 3: Accepting state (final).
b) Analyze the process of token recognition and specification with a detailed example.
Recognition and Specification of Tokens
Token recognition is the process of identifying and categorizing the smallest units of a programming language, known
as tokens. Token specification involves defining the rules for recognizing these tokens within the source code.
Recognition of Tokens
Token recognition involves scanning the source code and identifying individual tokens based on predefined rules. For
example, in the statement int num = 10;, the tokens are int, num, =, and 10. The recognition process involves
identifying these tokens based on the language's syntax and rules.
Specification of Tokens
Token specification defines the rules for recognizing different types of tokens. These rules are typically defined using
regular expressions or finite automata. For example, in a programming language, the token specification for integers
may be defined as a sequence of digits, while the specification for identifiers may involve a combination of letters,
digits, and underscores with certain restrictions.
Example
Consider the following token specifications for a simple programming language:
 Integer: A sequence of one or more digits
 Identifier: A letter followed by zero or more letters, digits, or underscores
Given the input int num = 10;, the token recognition process would identify the following tokens based on the
specifications:
1. Token: int (Keyword)
2. Token: num (Identifier)
QP Code : 16727

3. Token: = (Assignment Operator)


4. Token: 10 (Integer)
5. Token: ; (Semicolon)
In this example, the recognition process applies the token specifications to identify and categorize the individual tokens
within the source code.
By defining and applying token specifications, programming languages can effectively recognize and process the
various elements of source code.
8) a) Construct Predictive Parser for the following grammar and checking the input string :” id+id*id”
E->E+T / T
T->T*F / F
F-> (E) /id
To construct a Predictive Parser for the given grammar and check the input string "id+id*id", follow these steps:

Step 1: Grammar Analysis

1. E→E+T ∣ TE \to E + T \,|\, T


The given grammar is:

2. T→T∗F ∣ FT \to T * F \,|\, F


3. F→(E) ∣ idF \to (E) \,|\, id
This grammar is left-recursive, which must be eliminated before creating the predictive parser.

Step 2: Eliminating Left Recursion


Eliminate left recursion for the grammar.
1. For E → E + T | T:
Rewrite as:
E → T E'
E' → + T E' | ε
2. For T → T * F | F:
Rewrite as:
T → F T'
T' → * F T' | ε
3. F does not have left recursion, so it remains:
F → (E) | id
The new grammar is:
1. E → T E'
2. E' → + T E' | ε
3. T → F T'
4. T' → * F T' | ε
5. F → (E) | id

Step 3: Compute First and Follow Sets


First Sets
Non-terminal First Set
E { id, ( }
E’ { +, ε}
T { id, ( }
T′ { *, ε}
F { id, ( }
Follow Sets

Non-terminal Follow Set


E { ), $ }
E’ { ), $ }
QP Code : 16727

T { +, ), $ }
T′ { +, ), $ }
F { *, +, ), $ }

Step 4: Construct Predictive Parsing Table


Based on the First and Follow sets, the predictive parsing table is:
Non-terminal id ( + * )$
E TE′T E' TE′T E'
E' +TE′+ T E' εε

∗FT′* F
T FT′F T' FT′F T'
T' ε εε
T'
F idid (E)(E)

Step 5: Parse the Input String "id+id*id"


Input String: "id+id*id$"
Stack Initialization: E$
Step Stack Input Action
1 E$ id+id*id$ Apply E → T E'
2 T E'$ id+id*id$ Apply T → F T'
3 F T' E'$ id+id*id$ Apply F → id
4 T' E'$ +id*id$ Apply T' → ε
5 E'$ +id*id$ Apply E' → + T E'
6 + T E'$ +id*id$ Match `+`
7 T E'$ id*id$ Apply T → F T'
8 F T' E'$ id*id$ Apply F → id
9 T' E'$ *id$ Apply T' → * F T'
10 * F T' E'$ *id$ Match `*`
11 F T' E'$ id$ Apply F → id
12 T' E'$ $ Apply T' → ε
13 E'$ $ Apply E' → ε
14 $ $ Accept
Result:
The input string "id+id*id" is successfully parsed using the predictive parser.
(b) Explain about YACC tool with an example program.
YACC Tool Explanation with Example Program
Introduction to YACC
YACC (Yet Another Compiler Compiler) is a tool used for creating parsers for context-free grammars. It works
with a lexical analyzer (like Lex) to build language-processing applications like compilers and interpreters.
YACC takes grammar rules as input and generates C code to parse them. Developers can attach actions to these
rules to handle parsed input programmatically.
Structure of a YACC Program
A YACC program has three sections separated by %%:
1. Declarations Section: Contains token declarations, C headers, and global variables.
2. Rules Section: Contains grammar rules and actions executed when rules are recognized.
3. Auxiliary Code Section: Contains helper functions, the main() function, and error-handling routines.
Example YACC Program
Below is a simple calculator that evaluates expressions with addition (+) and multiplication (*).
YACC Code:
%{
QP Code : 16727

#include <stdio.h>
#include <stdlib.h>
%}
%token NUMBER
%%
expr: expr '+' term { printf("Sum: %d\n", $1 + $3); $$ = $1 + $3; }
| term { $$ = $1; }
;
term: term '*' factor { printf("Product: %d\n", $1 * $3); $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor: '(' expr ')' { $$ = $2; }
| NUMBER { $$ = $1; }
;
%%
int main() {
printf("Enter an expression: ");
yyparse();
return 0;
}
int yyerror(char *s) {
fprintf(stderr, "Error: %s\n", s);
return 0;
}
Code Explanation
1. Declarations Section:
o %{ and %} enclose C code to include in the output file.
o #include statements add required libraries.
o %token NUMBER declares tokens used in the grammar.
2. Rules Section:
o Contains grammar rules (e.g., expr: expr '+' term).
o Actions in {} specify what happens when a rule is matched (e.g., $1 and $3 refer to rule components, and $$
stores results).
3. Auxiliary Code Section:
o main() runs the parser with yyparse().
o yyerror() handles syntax errors.
Input and Output Example
Input:
3+4*5
Output:
Product: 20
Sum: 23
9) a) YACC Tool Explanation

YACC (Yet Another Compiler-Compiler) is a tool used to generate parsers. It is commonly used in compiler design
to handle syntax analysis. YACC takes a formal grammar as input, typically in Backus-Naur Form (BNF), and
generates a parser in the C programming language. The parser evaluates whether an input string conforms to the
specified grammar and performs specified actions.
How It Works
1. Lexer: The yylex() function processes the input to generate tokens (NUMBER, +, -).
2. Parser: The parser generated by YACC uses the grammar to parse the input.
3. Action Execution: When rules are reduced, the corresponding actions (e.g., arithmetic operations) are executed.
Simpler YACC Program: Adding Two Numbers
QP Code : 16727

%{
#include <stdio.h>
#include <stdlib.h>

int yylex();
void yyerror(const char *s);
%}

%token NUMBER

%%

sum:
NUMBER '+' NUMBER {
printf("Sum: %d\n", $1 + $3);
}
;

%%

int main() {
printf("Enter two numbers to add (e.g., 3 + 5): ");
yyparse();
return 0;
}

void yyerror(const char *s) {


fprintf(stderr, "Error: %s\n", s);
}

int yylex() {
int c = getchar();
if (c >= '0' && c <= '9') {
ungetc(c, stdin);
scanf("%d", &yylval);
return NUMBER;
}
return c;
}
Example Input/Output
Input:
4+5
Output:
Sum: 9
Sum: 23
b) Illustrate
recursive descent parser shift reduce parser for the following grammar and checking
the input string :”id+id-id”
E->E+E / E- E/ id
Shift-Reduce Parser
The Shift-Reduce Parser uses a stack to shift input symbols and reduce them using grammar
rules.
Parsing Table (Stack + Input + Action):
QP Code : 16727

Stack Input Action


id + id – id$ Shift id
id + id – id$ Reduce id → E
E + id – id$ Shift +
E+ id – id$ Shift id
E + id - id$ Reduce id → E
E+E - id$ Reduce E + E → E
E - id$ Shift -
E- id$ Shift id
E - id $ Reduce id → E
E-E $ Reduce E - E → E
E $ Parsing Complete

10) a) Explain in detail about intermediate languages with an example.


Intermediate Languages in Compiler Design

Introduction
An intermediate language (IL) is a representation of a program between the source code and machine code
during the compilation process. It is used to improve portability, simplify optimization, and ease the generation of
machine-specific code.
Compilers typically translate source code into an intermediate language before generating machine code. This
approach divides the compilation process into manageable stages, enhancing the modularity and reusability of the
compiler components.

Characteristics of Intermediate Languages


1. Platform Independence: IL is often designed to be independent of the target machine architecture, making
it portable.
2. Simplified Syntax: It has fewer constructs than high-level languages, making it easier to optimize.
3. Close to Assembly: IL is closer to machine code than high-level languages but retains some abstractions
for easier analysis and transformation.
4. Ease of Optimization: IL enables optimization techniques, such as loop unrolling or constant folding,
before generating the final code.

Types of Intermediate Languages


1. Three-Address Code (TAC): Instructions have at most three operands, often in the form of x = y op z.
2. Stack-Based Code: Operations are performed using a stack, avoiding explicit operands (e.g., PUSH, POP,
ADD).
3. Control Flow Graph (CFG): Represents the flow of control in the program, useful for advanced
optimizations.

Example of Intermediate Language


Consider the following high-level code:
int a, b, c;
a = b + c * 5;
The translation into an intermediate language (TAC) might look like this:
t1 = 5
t2 = c * t1
a = b + t2
QP Code : 16727

Benefits of Using Intermediate Languages


1. Portability: The compiler backend can generate machine code for different architectures from the same IL.
2. Simplified Design: Breaking down compilation into stages reduces complexity.
3. Optimization: IL provides a simpler structure for applying optimization algorithms.
4. Reusability: The same intermediate representation can be reused for multiple source languages or target
platforms.

Applications
1. Java Bytecode: Java programs are compiled into bytecode, an intermediate language executed by the Java
Virtual Machine (JVM).
2. .NET Common Intermediate Language (CIL): C#, VB.NET, and other languages are compiled into CIL
for execution by the .NET runtime.
3. LLVM Intermediate Representation (LLVM IR): A low-level, strongly-typed IL used for program
analysis and optimization in the LLVM framework.
b) Analyze about Syntax Directed Definitions and annotated parse tree for simple desk calculator.

Syntax-Directed Definitions (SDD) and Annotated Parse Trees


Introduction to Syntax-Directed Definitions
A Syntax-Directed Definition (SDD) is a formal specification in a context-free grammar that associates semantic
rules with grammar productions. These semantic rules define how attributes are computed for the symbols in the
grammar. Attributes can be:
1. Synthesized Attributes: Values computed from the attributes of children in the parse tree.
2. Inherited Attributes: Values passed from parent nodes or siblings in the parse tree.
SDDs are used to specify the semantics of programming languages, such as type checking, intermediate code
generation, or expression evaluation.

Example: Desk Calculator


A simple desk calculator evaluates arithmetic expressions involving addition (+), multiplication (*), and integers.
The grammar and corresponding semantic rules are defined as follows:
Grammar:
E → E '+' T { E.val = E1.val + T.val }
E→T { E.val = T.val }
T → T '*' F { T.val = T1.val * F.val }
T→F { T.val = F.val }
F → '(' E ')' { F.val = E.val }
F → NUM { F.val = NUM.val }
Semantic Rules Explanation:
 E.val, T.val, and F.val are synthesized attributes representing the value of expressions.
 NUM.val is the numerical value of a number.
 The rules define how these values are computed based on the operators and operands.

Annotated Parse Tree for Input


Consider the input:
3+4*5
Parse Tree:
QP Code : 16727

Annotated Parse Tree with Attributes:

Evaluation Steps:
1. NUM(3).val = 3, NUM(4).val = 4, NUM(5).val = 5.
2. F.val = NUM.val: F(3).val = 3, F(4).val = 4, F(5).val = 5.
3. T(4 * 5).val = F(4).val * F(5).val = 4 * 5 = 20.
E(3 + 20).val = T(3).val + T(20).val = 3 + 20 = 23.

You might also like