0% found this document useful (0 votes)
15 views

Compailer Design Assignment (2)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Compailer Design Assignment (2)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Compailer Design Assignment

#1.Discuss the type of compiler?


Compilers can be categorized based on several criteria, such as their structure, the
type of source and target languages, the level of optimization, and the approach they
use for translation.
Here are some common types of compilers:
1. Based on Translation Method
 Single-Pass Compiler: Translates the source code to machine code in a single
pass. It is efficient but may have limitations in handling complex language
features.
 Multi-Pass Compiler: Processes the source code in multiple passes, allowing for
more complex analysis and optimizations at each stage.
2. Based on Language Type
 Native Compiler: Compiles high-level language code into machine code for a
specific architecture (e.g., GCC for C/C++).
 Cross Compiler: Generates executable code for a platform different from the one
on which the compiler is running (e.g., compiling code for an embedded system
on a desktop).
3. Based on Optimization Level
 Optimizing Compiler: Focuses on improving the performance of the generated
code through various optimization techniques (e.g., reducing execution time or
memory usage).
 Non-Optimizing Compiler: Generates code without any significant
optimizations, prioritizing speed of compilation over performance of the
generated code.
4. Based on Intermediate Representation
 Intermediate Code Compiler: Translates source code into an intermediate
representation (IR) that can be further optimized or translated into machine code
in later stages (e.g., LLVM).
 Source-to-Source Compiler : Converts source code from one high-level
language to another without going to machine code (e.g., TypeScript to
JavaScript).
5. Based on Functionality
 Translators: These are not traditional compilers but can convert source code
from one language to another (e.g., assemblers convert assembly language to
machine code).
 Preprocessor: A tool that processes source code before it is compiled, often used
for macro substitution and file inclusion (e.g., C preprocessor).
#2. What is the difference b/n Top down and bottom-up parsing?
Top-down and bottom-up parsing are two distinct approaches used in syntax
analysis (parsing) in compilers and interpreters to process programming languages.
1. Top-Down Parsing
In top-down parsing, the parsing process starts from the root of the parse tree (which
represents the start symbol) and works down to the leaves (the terminal symbols).

Example: Recursive Descent Parsing, LL Parsing.

2. Bottom-Up Parsing

In bottom-up parsing, the process starts from the leaves (the input symbols or
terminals) and works its way up to the root (the start symbol of the grammar).

Example: Shift-Reduce Parsing, LR Parsing.


Top-Down Parsing Bottom-Up Parsing
Starts from the start symbol and Starts from the input and
expands downward. reduces upwards.
Attempts to construct the parse tree Attempts to reduce the input
from root to leaves. string into the start symbol.
From the root to the leaves. From the leaves to the root.
Recursive Descent, LL Parsing. Shift-Reduce, LR Parsing.
Can be inefficient for complex More efficient for many
grammars. grammars.
Can handle left recursion (e.g.,
Struggles with left recursion.
LR parsers).
More complex and requires
Easier to implement. lookahead.
#3. Write the phase of compiler design using sample c++ code
1. Lexical Analysis
• The compiler reads the source code and converts it into tokens. Tokens are the basic
building blocks of the language, such as keywords, identifiers, operators, and
punctuation.
Sample C++ Code:
int main() {
int x = 10;
return 0;
}
Tokens generated might include: int, main, (, ), {, int, x, =, 10, ;, return, 0, ;, }.
2. Syntax Analysis (Parsing)
In the syntax analysis phase, the compiler checks if the sequence of tokens follows the
grammatical rules of the programming language.
Sample C++ Code:
#include <iostream>
int main() {
int a = 5; // Variable declaration and initialization
int b = 10;
int sum = a + b; // Expression evaluation
std::cout << "Sum: " << sum << std::endl; // Output the result
return 0; // Return statement
}
3. Semantic Analysis
In this phase, the compiler checks for semantic errors and ensures that the tree is
meaningful. This involves type checking, scope resolution, and other checks.
Sample C++ Code:
#include <iostream>
int add(int x, int y) {
return x + y; // Valid return type (int)
}
int main() {
int a = 5; // Declaration before use
int b = 10; // Declaration before use
// Valid function call with correct argument types
int sum = add(a, b);
// Using the variable 'sum' correctly
std::cout << "Sum: " << sum << std::endl;
return 0; // Valid return type (int)
}
4. Intermediate Code Generation
The compiler converts the high-level abstract syntax tree into an intermediate
representation (IR), which is easier to manipulate than the original source code and
closer to machine language.
Sample C++ Code:
#include <iostream>
int main() {
int a = 5; // Variable declaration
float b = 10.5; // Another variable
float sum = a + b; // Calculate sum
std::cout << "Sum: " << sum << std::endl; // Output the result
return 0; // End of program
}
5. Code Optimization
During this phase, the compiler optimizes the intermediate code to improve the
performance and efficiency of the generated code.
Sample C++ Code:
#include <iostream>
int main() {
int result = 2 + 3 * 4; // Optimized
std::cout << "Result: " << result << std::endl;
return 0;
}
6. Code Generation
In this phase, the optimized intermediate code is translated into the target machine
code.
Sample C++ Code:
MOV R0, 5 ; Load the value 5 into register R0
MOV a, R0 ; Store the value from R0 to variable a
MOV R0, 0 ; Load return value 0 into register R0
RET ; Return from main
#4. Write the Regular expression for the following language
a. Language that contain ab
b. L= {a, ab.abb.abba.abba….}
c. Language that started with ant two digit
a. Language that contains "ab":
The requirement is for strings that contain the substring "ab". We need a regular
expression that ensures the string has at least one occurrence of "ab", with any
characters before or after it.
Regular Expression:
.*ab.*

 .* matches any number of characters (including zero) before and after the

substring "ab".
 ab is the substring that we want to appear somewhere in the string.

 This regex ensures that "ab" appears at least once somewhere in the string.

bb. L = {a, ab, abb, abba, abba...}

The language consists of the string "a" followed by zero or more instances of the
letter "b". A regular expression to represent this language is:
Regular Expression:
a(b*)*
 a matches the single letter "a",
 b* matches zero or more occurrences of "b".
This means it can generate "a", "ab", "abb", "abba", "abbaa", etc., as specified.
c. Language that starts with "ant" followed by two digits
A regular expression for a language that starts with the string "ant" followed by
exactly two digits can be expressed as:
= ant[0-9]{2}
 ant matches the exact substring "ant",
 [0-9]{2} matches exactly two digits, where [0-9] specifies the range of characters
(0 through 9) and {2} indicates that exactly two of these digits should follow
"ant".
#5. Covert the following NFA to DFA?
#6. What are types of LR Parser in compiler design?
There are three types of LR Parsers which are as follows: explain
each component
A.LR parsers:
 LR parsers are a type of bottom-up parser used in compiler design, capable of
parsing a wide class of context-free grammars. The "LR" stands for "Left-to-
right" reading of the input and "Rightmost derivation" in reverse.
1. LR Parser
An LR parser is a powerful bottom-up parser for context-free grammars. Its main
characteristics are:
 Left to Right Scanning: It reads the input string from left to right.
 Rightmost Derivation in Reverse: It constructs the rightmost derivation of the
string in reverse by using a stack (often implemented as an array).
 State-Based: LR parsers maintain a state machine that manages the parsing states
based on the input symbols and the actions to take (shift, reduce, accept, or error).
types of LR parsers as well as their components, specifically focusing on :
a. LR(0),
b. SLR(1),
C.LALR(1), and
d.CLR(1) parsers, along with the definitions for LR(0) and LR(1) items.
a. LR(0) Parser
The LR(0) parser is the simplest form of LR parser:
 No Lookahead: The "(0)" indicates that no lookahead tokens are used; it bases
decisions entirely on the current state of the parser and the input read so far.
 LR(0) Items: These items consist of productions of the grammar with a dot (•)
representing how much of the production has been seen.
b. SLR(1) Parser
The SLR (Simple LR) parser is a refinement of LR(0):
 Single Lookahead Token: The "(1)" indicates that it makes decisions based on
one lookahead token in addition to the current state of the parser.
 Reductions: It reduces when the dot reaches the end of a production and the
lookahead token is in the follow set of the non-terminal being reduced.
c. LALR(1) Parser
The LALR (Look-Ahead LR) parser is another refinement that merges states in the
SLR(1):
 Single Lookahead Token: Like SLR(1), it uses one lookahead token.
 Merge of States: LALR(1) combines similar states in the LR(1) parser that have
identical core items but may have different lookahead symbols.
d. CLR(1) Parser
The CLR (Canonical LR) parser is the most powerful of the LR parsers:
 Single Lookahead Token: Like LALR(1) and SLR(1), it uses one lookahead
token.
 Explicit State Representation: CLR(1) retains more information in its states by
considering specific lookahead tokens rather than merging states like LALR(1).
Each state may have different items for each possible lookahead token.
2.LR(0) Items and LR(1) Items
 LR(0) Items: These are derived from grammar productions by placing a dot in
various positions within the productions.
 The set of LR(0) items allows the parser to know where it is in the production
process without any information about the next symbols.
Example:
S→•AB
The parser expects to see A and B after the start.
 LR(1) Items: These extend LR(0) items by adding a lookahead symbol following
the dot.
 The lookahead token helps the parser make correct decisions about shifts and
reductions.
Example:
S → • A B, $
The parser expects to see A and B with a lookahead of $ (end of input).
#7. Explain by detail what is Code Generation and Optimization in
compiler design with the following components
7.1 Simple Code Generation
7.2 Register Allocation
 Code Generation: is the phase of a compiler that converts the intermediate
representation (IR) of a program into machine code specific to the target
architecture. This phase involves translating high-level constructs into low-level
instructions that can be executed by the CPU.
 Optimization: refers to the process of improving the efficiency of the generated
code. This can include reducing the execution time, minimizing memory usage,
or decreasing power consumption. Optimization can occur at various stages in the
compilation process, including during code generation.
7.1 Simple Code Generation
Simple Code Generation refers to the straightforward translation of an intermediate
representation into machine code without significant optimization.
Key Aspects of Simple Code Generation:
• Translation of Constructs: The compiler translates high-level constructs (like
loops, conditionals, and function calls) directly into corresponding machine
instructions.
• Memory Management: The compiler generates instructions for memory allocation
and deallocation, including stack management for local variables and function calls.
7.2 Register Allocation
Register Allocation is a crucial optimization technique used during code generation to
efficiently manage the limited number of CPU registers available for executing
programs.
Key Aspects of Register Allocation:
• Graph Coloring Algorithm: A common method for register allocation involves
constructing an interference graph where nodes represent variables and edges indicate
that two variables cannot be stored in the same register simultaneously.
• Spilling: When there are more live variables than available registers, some variables
must be "spilled" to memory.
• Register Assignment: After determining which variables should reside in registers,
the compiler assigns actual registers to them based on availability and usage patterns.
7.3 DAG Representation
DAG (Directed Acyclic Graph) Representation is a data structure used to represent
expressions in an optimized way during code generation.
Key Aspects:
• Nodes and Edges: In a DAG, nodes represent operands (variables or constants),
while edges represent operations (like addition or multiplication).
• Common Subexpression Elimination: DAGs facilitate the identification of
common subexpressions.
• Topological Sorting: The DAG can be topologically sorted to determine the order
in which operations should be performed, ensuring that all dependencies are resolved
before an operation is executed.
7.4 Peephole Optimization Techniques
Peephole Optimization Techniques are a set of local optimization strategies that
examine a small window (or "peephole") of instructions at a time.
Key Aspects:
• Local Transformations: Peephole optimizations typically involve simple patterns
or sequences of instructions that can be replaced with more efficient alternatives. For
example, replacing a sequence of instructions with a single instruction that achieves
the same result.
• Redundant Instruction Elimination: Removing unnecessary instructions that do
not affect the program's outcome.
• Strength Reduction: Replacing expensive operations with cheaper ones (e.g.,
replacing multiplication by a power of two with a bit shift).
#8. Explain what is Run time- Environments
8.1 Symbol table
8.2 Hash Table
8.3 Representing Scope Information
 Runtime Environments refer to the data structures and mechanisms that a
programming language runtime uses to manage the execution of a program. This
includes handling variable storage, function calls, scope resolution, and other
aspects that are crucial for the correct execution of code. The runtime
environment is responsible for maintaining information about the current state of
the program, including local and global variables, function parameters, and
control flow.
8.1 Symbol Table
A symbol table is a data structure used by a compiler or interpreter to maintain
information about variables, functions, objects, and their attributes in a program.
Key points about symbol tables:
- Storage: It typically stores name-value pairs where the name is the variable or
function name, and the value includes data type, scope level, and sometimes memory
address.
- Lookup: The table allows efficient lookup for resolving variable and function
names during compilation or execution.
- Scope Management: Symbol tables can maintain information about different
scopes, including local and global variables.
8.2 Hash Table
A hash table is often used to implement a symbol table. It uses a hash function to
compute an index into an array of buckets or slots, from which the desired value can
be found.
Key points about hash tables:
- Efficiency: Provides average-case constant time complexity for search, insert, and
delete operations.
- Collisions: Measures must be taken to handle collisions when two keys hash to the
same index, often using methods like chaining or open addressing.
- Dynamic: Depending on the implementation, hash tables can resize dynamically as
the number of entries increases.
8.3 Representing Scope Information
Scope information in runtime environments indicates the context in which variables
and functions are declared and accessed. Proper management of scope is crucial for
ensuring correct program behavior, especially in the presence of nested functions or
blocks.
Key points regarding scope representation:
- Nested Environments: It may represent scopes hierarchically, with each scope
maintaining its own symbol table and potentially referring to parent scopes (also
known as lexical scoping).
- Access Control: When a variable is referenced, the compiler checks the symbol
tables from the innermost scope to the outermost until it finds the variable, ensuring
that the correct variable is accessed.
- Lifetime Management: It manages the lifetime of variables, ensuring that memory
can be allocated and freed appropriately as scopes are entered and exited.
#9 Type Checking in compiler
9.1 Rules of Type Checking
9.2 Type Conversions
 Type checking is a crucial aspect of compiler design that ensures the correctness
of types in a program. It validates the types of variables, expressions, and
function return values to avoid type errors during the compilation process, which,
if left unchecked, could lead to runtime errors.
Key Aspects of Type Checking
1.Type System:
A collection of rules that defines how types are used in a programming language.
Types can include primitive types (such as integers, booleans, floats) and composite
types (such as arrays, records, or user-defined types).
2.Static vs. Dynamic Type Checking:
 Static Type Checking: This occurs at compile time. The compiler checks the
types of all expressions and ensures they are valid according to the language's
type rules (e.g., Java, C).
 Dynamic Type Checking: This occurs at runtime. The types are checked when
the program is executing, which can provide more flexibility but may introduce
performance overhead (e.g., Python, Ruby).
3.Type Safety:
A language is considered type-safe if it guarantees that a program cannot perform
operation on types that are inconsistent.
9.1 Rules of Type Checking
Type checking encompasses several rules and principles, including:
 Operand Compatibility:
Operators require operands of compatible types. For instance, an addition operation
typically requires both operands to be numeric types.
• Example: int + int is valid; int + string is invalid
 Return Type Checking:
The return type of functions must match the expected type in the calling context. If a
function is declared to return an int, it must not return a string or another incompatible
type.
 Type Inference:
Some languages support type inference, where the compiler deduces the type of an
expression based on the context. For example, in languages like Scala and Haskell,
the compiler can infer types without explicit annotations.
9.2 Type Conversions

In programming, type conversion (also called type casting) refers to the process of
converting one data type into another. This is important because different data types
serve different purposes, and converting between them allows for better manipulation
and usage of data in a program. There are two main types of type conversions:

1. Implicit Type Conversion (Automatic)

This type of conversion is done by the compiler or interpreter automatically, without


requiring the programmer's intervention. It happens when a value is assigned to a
variable of a larger type (or when types are compatible). The smaller data type is
automatically promoted to the larger data type.

For example, in many languages, assigning an integer to a floating-point variable will


automatically convert the integer to a float.

2. Explicit Type Conversion (Manual Casting)

Explicit type conversion requires the programmer to specify the conversion explicitly.
This is done using built-in functions or casting operators. The programmer forces the
conversion of one data type to another.

 This is particularly useful when you want to ensure that a specific conversion
takes place, especially when dealing with types that might not automatically
convert.
 It is used when you need to convert a variable from one type to another manually.

Example: Converting a float to an integer may require explicit casting, such as


int(myFloat) in Python or (int)myFloat in C/C++.
y Aspects of Type Checking

Runtime Environments refer to the data structures and

mechanisms that a programming language runtime uses to

manage the execution of a program. This includes handling

variable storage, function calls, scope resolution, and other

aspects that are crucial for the correct execution of code.

The runtime environment is responsible for maintaining

information about the current state of the program,

including local and global variables, function parameters,

and control flow.which a program is executed. It includes

the setup that the system creates to run a program,

manage variable lifetimes, store variable values, manage

function calls, and so forth.

the single letter "a",


= {a, ab, abb, abba, abbaa, ...}
Translation Method:

Compilers can be categorized in various ways based on different criteria.


Here are some of the key types of compilers:

You might also like