Midsem Exam CD
Midsem Exam CD
Unit-1
1. How many phases are there in a compiler? Explain each phase in detail. [L2, 10M]
A compiler typically consists of several phases, which can be broadly categorized into two main sections: front end
and back end. The major phases include:
1. Lexical Analysis: This is the first phase, where the source code is scanned to convert it into a sequence of tokens.
The lexical analyzer removes whitespace and comments, identifying valid sequences of characters as tokens.
2. Syntax Analysis: Also known as parsing, this phase checks the tokens against the grammar rules of the language. It
constructs a parse tree or syntax tree, which represents the grammatical structure of the source code.
3. Semantic Analysis: In this phase, the compiler checks for semantic errors. It verifies that the parse tree follows the
semantic rules of the language, such as type checking and scope resolution.
4. Intermediate Code Generation: The compiler generates an intermediate representation of the source code. This
representation is easier to manipulate than the source code and closer to the target machine code.
5. Code Optimization: This phase improves the intermediate code to make it more efficient. Various optimization
techniques are applied to enhance performance without altering the program's output.
6. Code Generation: The final phase generates the target machine code from the intermediate representation. The
generated code is usually in a form that can be executed directly by the target machine.
7. Code Optimization (Back End): Sometimes, additional optimizations are performed at this stage to further enhance
the machine code before final output.
Lexical analysis is the first phase of compiler design, and its primary role is to convert the input source code into a
stream of tokens. This process involves several key functions:
- Token Generation: It identifies keywords, identifiers, literals, and operators, generating tokens that represent these
elements.
- Removing Unwanted Characters: The lexical analyzer ignores comments and whitespace, which are not needed for
further processing.
- Error Detection: It performs basic error checking for invalid tokens and reports errors early in the compilation
process.
- Facilitating Parsing: The stream of tokens generated is passed to the syntax analyzer, making it easier to analyze the
grammatical structure of the program.
2. b) Explain Input Buffering with simple examples. [L2, 5M]
Input buffering is a technique used in lexical analysis to efficiently read input characters from the source code. It
involves maintaining two buffers to handle input characters and minimize the number of I/O operations.
Example:
- Consider a source code line: `position := initial + rate * 60;`
- The input buffering process may involve two buffers:
- Buffer 1: Contains the first half of the input (e.g., `position := initial +`).
- Buffer 2: Contains the second half (e.g., `rate * 60;`).
- The lexical analyzer reads from the active buffer. If it reaches the end of Buffer 1, it quickly switches to Buffer 2
without needing to access the disk again, improving efficiency.
3. Explain about Language Processor in compiler design. [L2, 10M]
A language processor is a system that translates high-level programming languages into machine-readable formats. It
encompasses various components, including compilers, interpreters, assemblers, and loaders.
- Compilers: Convert high-level code into machine code in one go, producing an object file. This process involves
several phases, including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code
optimization, and code generation.
- Interpreters: Translate high-level code into machine code line-by-line or statement-by-statement during execution,
without producing an object file. They are often used for scripting languages.
- Assemblers: Translate assembly language (low-level language) into machine code. They perform similar functions to
compilers but work with a lower-level representation.
- Loaders: Load the generated machine code into memory for execution. They prepare the executable file to be run
by the operating system.
In summary, language processors play a crucial role in translating and executing programs written in high-level
languages, making them understandable and executable by computers.
The specification of tokens defines the rules and patterns for recognizing valid tokens in a programming language. It
usually involves:
- Token Types: Common token types include keywords, identifiers, literals, operators, and punctuation. Each type has
specific rules for its formation.
- Regular Expressions: These are used to describe the patterns for each token type. For example, an identifier in many
programming languages can be specified using the regular expression `[a-zA-Z_][a-zA-Z0-9_]*`, indicating that it must
start with a letter or underscore, followed by letters, digits, or underscores.
The specifications ensure that the lexical analyzer can accurately identify and classify tokens during the compilation
process.
b) Recognition of Tokens
Recognition of tokens involves the process of scanning the input source code and matching substrings against the
defined patterns in the token specifications. This can be achieved using finite automata, where:
- The lexical analyzer reads the input characters sequentially.
- It transitions through states based on the current character and the defined token specifications.
- When a complete token is recognized, it is added to the token stream, and the process continues until all characters
in the source code are processed.
To design a compiler for the source program `position := initial + rate * 60`, follow these steps:
1. Lexical Analysis:
- Define tokens: `IDENTIFIER (position, initial, rate)`, `ASSIGNMENT (:=)`, `ARITHMETIC OPERATORS (+, *)`, `NUMBER
(60)`.
- Use LEX to generate tokens from the input.
2. Syntax Analysis:
- Construct a syntax tree using the following grammar rules:
- The syntax tree would represent the assignment operation, breaking down the expression on the right side.
3. Semantic Analysis:
- Verify that `initial` and `rate` are defined and of compatible types for addition and multiplication.
- Ensure the `position` variable is assigned correctly.
t1 = rate * 60
position = initial + t1
5. Code Optimization:
- Optimize the generated code if possible. In this case, multiplication by a constant (60) could be simplified if
necessary.
6. Code Generation:
- Generate target machine code that performs the operations represented in the intermediate code.
- Programming Language Implementation: Compilers are essential for translating high-level programming languages
into machine code for execution.
- Syntax and Semantic Checkers: Compilers help in creating tools that check code for syntax and semantic errors.
- Code Optimization: Compiler techniques can be used to optimize programs for better performance, which is
essential in resource-constrained environments.
- Programming Tools: Integrated Development Environments (IDEs) utilize compiler technology to provide features
like syntax highlighting, code completion, and error checking.
- Cross-Compilers: These allow developers to build applications for different platforms using the same source code,
expanding the portability of applications.
b) Specification of Tokens
The specification of tokens involves defining the rules for recognizing various token types in a programming language.
This includes:
- Regular Expressions: Used to describe the patterns that define each token type, such as identifiers, keywords,
literals, and operators.
- Token Categories: Each token type (e.g., identifier, keyword, operator) has a unique specification that the lexical
analyzer uses to classify input sequences.
- Example Specifications:
- Identifiers: `[a-zA-Z_][a-zA-Z0-9_]*`
- Numbers: `[0-9]+`
- Keywords: `if|else|while|return`
The specifications are essential for the lexical analysis phase of the compiler.
- Tokens: Tokens are the basic building blocks of the source code, representing a category of lexemes. For example,
the token type could be `IDENTIFIER` or `NUMBER`.
- Patterns: Patterns are the rules or regular expressions that describe how tokens can be recognized. For example, the
pattern for an `IDENTIFIER` token might be `[a-zA-Z_][a-zA-Z0-9_]*`.
- Lexeme: A lexeme is a specific sequence of characters in the source code that matches a token type according to its
pattern. For example, in the statement `x = 10;`, `x` is a lexeme corresponding to the `IDENTIFIER` token.
- Regular Expressions: A regular expression is a sequence of characters that defines a search pattern. It is used for
pattern matching within strings, commonly utilized in lexical analysis to specify token patterns.
- Regular Grammar: A regular grammar is a type of formal grammar that describes regular languages. It consists of
production rules with a specific structure that allows for the generation of strings in a regular language, typically used
in defining the syntax of tokens in a programming language.
10. e) List the various error recovery strategies for lexical analysis. [L1, 2M]
1. Panic Mode Recovery: The analyzer discards input until a designated token (usually a delimiter) is found, allowing it
to recover and continue processing.
2. Phrase Level Recovery: Attempts to correct errors by replacing or inserting tokens to form valid phrases based on
the expected token types.
3. Error Productions: Including specific rules in the grammar that define how to handle errors, allowing the lexer to
recognize and recover from specific types of lexical errors.
4. Error Reporting: Providing detailed feedback about the nature and location of the errors to aid debugging.
These strategies ensure that the lexical analyzer can continue processing the input even when errors are
encountered, improving robustness and user experience.
Unit-2
1. a) Construct the recursive descent parser for the following grammar? [L4, 5M]
Grammar:
E -> E + T | T
T -> T * F | F
F -> (E) | id
def parse(self):
return self.E()
def E(self):
node = self.T()
while self.current_token < len(self.tokens) and self.tokens[self.current_token] == '+':
self.current_token += 1 # Consume '+'
node = ('E', node, self.T()) # Create an E node
return node
def T(self):
node = self.F()
while self.current_token < len(self.tokens) and self.tokens[self.current_token] == '*':
self.current_token += 1 # Consume '*'
node = ('T', node, self.F()) # Create a T node
return node
def F(self):
if self.tokens[self.current_token] == '(':
self.current_token += 1 # Consume '('
node = self.E()
self.current_token += 1 # Consume ')'
return ('F', node)
elif self.tokens[self.current_token] == 'id':
node = ('F', 'id') # Create F node for id
self.current_token += 1 # Consume 'id'
return node
else:
raise SyntaxError("Unexpected token")
# Example usage
tokens = ['id', '+', 'id', '*', 'id'] # Example token list
parser = Parser(tokens)
syntax_tree = parser.parse()
print(syntax_tree)
1. b) Explain about Left factoring and Left Recursion with examples? [L2, 5M]
Left Factoring:
Left factoring is a technique used in grammar to eliminate ambiguity when the same prefix is present in multiple
productions. It transforms the grammar to make it suitable for top-down parsing.
Example:
- Original Grammar:
A -> aB | aC
A -> aA'
A' -> B | C
Left Recursion:
Left recursion occurs when a non-terminal references itself as the first element in one of its productions. It can cause
infinite recursion in a top-down parser.
Example:
- Original Grammar:
A -> Aα | β
A -> βA'
A' -> αA' | ε
2. Define augmented grammar? Construct the LR(0) items for the following Grammar? [L1, 10M]
Augmented Grammar:
An augmented grammar is a modified version of a grammar that includes a new start symbol and an additional
production that represents the original start symbol. This helps in parsing by providing a clear point of entry.
Given Grammar:
S -> L = R
S -> R
L -> * R
L -> id
R -> L
LR(0) Items:
1. Item: A production with a dot (•) indicating how much of the production has been seen.
2. Initial Item:
S' -> •S
3. LR(0) Items:
Continue to create sets of items by shifting and closing until all items are generated.
3. Calculate FIRST and FOLLOW for the following grammar? [L3, 5M]
Grammar:
E -> E + T | T
T -> T * F | F
F -> (E) | id
FIRST Sets:
- FIRST(E) = { `(`, `id` }
- FIRST(T) = { `(`, `id` }
- FIRST(F) = { `(`, `id` }
FOLLOW Sets:
- FOLLOW(E) = { `)`, `$` }
- FOLLOW(T) = { `+`, `)`, `$` }
- FOLLOW(F) = { `*`, `+`, `)`, `$` }
S -> xABC
A -> a | bbD
B -> a | ε
C -> b | ε
D -> c | ε
FIRST Sets:
- FIRST(S) = { `x` }
- FIRST(A) = { `a`, `bb` }
- FIRST(B) = { `a`, ε }
- FIRST(C) = { `b`, ε }
- FIRST(D) = { `c`, ε }
FOLLOW Sets:
- FOLLOW(S) = { `$` }
- FOLLOW(A) = { `A`, `B` }
- FOLLOW(B) = { `C` }
- FOLLOW(C) = { `$` }
- FOLLOW(D) = { `C` }
4. Construct Predictive Parse Table for the grammar E->E+T/T, T->T*F/F, F->(E)|id and parse the string id+id*id. [L3,
5M]
Grammar:
E -> E + T | T
T -> T * F | F
F -> (E) | id
| Non-Terminal | ( | id | + | * | $ |
|---------------|-----|-----|-----|-----|-----|
|E | E -> T | E -> T | | | |
|T | | T -> F | | | |
|F | F -> (E) | F -> id | | | |
S -> (L) | a
L -> L, S | S
Parsing Steps:
1. Shift: `(` → Stack: `(`
2. Shift: `a` → Stack: `a`
3. Reduce: `S -> a` → Stack: `(` S
4. Shift: `,` → Stack: `(` S ,
5. Shift: `(` → Stack: `(` S , `(`
6. Shift: `a` → Stack: `(` S , `(` a
7. Reduce: `S -> a` → Stack: `(` S , `S`
8. Shift: `,` → Stack: `(` S , `S` , `,`
9. Shift: `a` → Stack: `(` S , `S` , `,` a
10. Reduce: `S -> a` → Stack: `(` S , `S` , S
11. Reduce: `L -> L, S` → Stack: `(` L `)`
12. Reduce: `S -> (L)` → Stack: `S`
ii) Grammar:
E -> E + E | E * E | (E) | id
Parsing Steps:
1. Shift: `(` → Stack: `(`
2. Shift: `id` → Stack: `id`
3. Reduce: `E -> id` → Stack: `E`
4. Shift: `*` → Stack: `E *`
5. Shift: `id` → Stack: `E * id`
6. Reduce: `E -> id` → Stack: `E * E`
7. Reduce: `E -> E * E` → Stack: `E`
8. Shift: `+` → Stack: `E +`
9. Shift: `id` → Stack: `E + id`
10. Reduce: `E -> id` → Stack: `E + E`
11. Reduce: `E -> E + E` → Stack: `E`
12. Reduce: `E -> (E)` → Stack: `E`
6. Construct CLR Parsing table for the given grammar [L3, 10M]
Grammar:
S -> CC
C -> aC | d
2. Table Construction:
| State | a | d | C | $ |
|-------|-----|-----|-----|-----|
| 0 | S -> aC | S -> d | | |
| 1 | | | | accept |
| 2 | | | | reduce S -> CC |
| 3 | S -> aC | S -> d | | |
| 4 | | | | reduce C -> d |
7. Construct the predictive parsing table. Show that the given grammar is LL(1) or not [L3, 10M]
Grammar:
S -> AB | ABad
A -> d
E -> b
D -> b | ε
B -> c
LL(1) Check:
Since the entries are not ambiguous and no cell contains more than one production, the grammar is LL(1).
8. Construct predictive parsing table for the given grammar. [L3, 10M]
Grammar:
S -> xABC
A -> a | bbD
B -> a | ε
C -> b | ε
D -> c | ε
9. Perform Shift Reduce Parsing for the input string using the grammar. [L4, 5+5M]
Grammar:
S -> (L) | a
L -> L, S | S
Unit-3
1. Explain syntax directed definition with simple examples? [L2, 10M]
Example:
Consider a simple arithmetic expression grammar:
E -> E + T
E -> T
T -> T * F
T -> F
F -> (E)
F -> id
Semantic Rules:
Example: For the expression grammar mentioned above, let's evaluate `3 + 4 * 5` using SDT:
E
/|\
E+T
| |
3 T
|
T*F
| |
4 F
|
5
3. Evaluation Order:
- Start with the leaves:
- `F -> id`: `F.val = 3`
- `F -> id`: `F.val = 4`
- `F -> id`: `F.val = 5`
- Next, calculate `T.val`:
- For `T -> T * F`: `T.val = 4 * 5 = 20`
- Finally, calculate `E.val`:
- For `E -> E + T`: `E.val = 3 + 20 = 23`
3. Explain the Type Checking with suitable examples? [L2, 10M]
Type Checking:
Type checking is the process of verifying and enforcing the constraints of types in a programming language. It ensures
that operations are performed on compatible types, thus preventing type errors.
Example:
Consider a simple language with basic types: `int`, `float`, and `string`.
In both examples, type checking ensures that operations are performed between compatible types, thus preventing
potential errors.
Example:
For the arithmetic expression grammar:
Here, the translation scheme combines parsing (derivation of the grammar) with semantic evaluation, allowing us to
evaluate expressions directly while parsing.
5. Describe the representation of 3-address code with examples. [L5, 10M]
Three-Address Code:
Three-address code (TAC) is an intermediate representation used in compilers, where each instruction consists of at
most three addresses or operands. It allows for a simplified and easy-to-translate form of operations.
Format:
`result = operand1 operator operand2`
Example:
Consider the expression `a = b + c * d`. The TAC representation might be:
1. `t1 = c * d` // Intermediate result for `c * d`
2. `t2 = b + t1` // Intermediate result for `b + (result of c * d)`
3. `a = t2` // Assign final result to `a`
The resulting three-address code can be represented as:
t1 = c * d
t2 = b + t1
a = t2
Backpatching Technique:
Backpatching is a method used in intermediate code generation for handling forward references in control flow
statements (like jumps). It allows for deferring the actual addresses or labels until they are known, enabling easier
management of labels and jumps.
Example:
Consider the following pseudocode:
if (condition) then
goto L1
// some statements
L1: // target for the jump
1. During the first pass, when the `goto L1` is encountered, the compiler notes down that a jump is required but does
not know the address of `L1` yet. Instead, it saves the address of the instruction that requires backpatching.
2. After identifying the position of `L1`, the compiler updates the earlier entry with the correct address.
This method allows the compiler to manage control flows effectively without requiring prior knowledge of all labels.
t1 = evaluate_condition(condition)
if t1 == false goto L1
// Code for statement1
goto L2
L1:
// Code for statement2
L2:
9. Explain different types of intermediate code representations? [L2, 10M]
1. Three-Address Code:
A representation where each instruction consists of at most three addresses. Example: `t1 = a + b`.
2. Quadruples:
A four-tuple representation for each operation, consisting of operator, operand1, operand2, and result. Example:
`(+, a, b, t1)`.
3. Triples:
Similar to quadruples, but instead of using a separate result field, the result is represented as an index in a list of
instructions. Example: `(+, a, b)` refers to a previous result.
Syntax-Directed Translation:
Syntax-directed translation is a method of defining the semantics of a programming language using a context-free
grammar augmented with semantic
rules. Each grammar production is associated with actions that compute values or perform operations based on the
attributes of its non-terminals.
Functions of Backpatching:
1. Deferred Address Resolution: Allows for delaying the resolution of jump addresses until they can be accurately
determined.
2. Label Management: Facilitates the management of labels in control flow statements, ensuring that jumps are
correctly placed.
3. Error Reduction: Minimizes the likelihood of errors during code generation by postponing the handling of forward
references.
case expression of
value1: statement1;
value2: statement2;
...
default: statement_default;
endcase;