Fall 2024 - CS606 - 1 (BSCS) .
Fall 2024 - CS606 - 1 (BSCS) .
Total Marks: 20
Assignment No. 01
Due Date: Nov 25, 2024
Semester: Fall 2024
Please read the following instructions carefully before solving & submitting assignment:
Uploading Instructions:
o You are supposed to consult recommended book/s to clarify your concepts as handouts are not sufficient.
o The assignment file must be an MS Word file. Any other software/tool is not allowed.
o The required file format is .doc or .docx. Any other format like scan images, txt, pdf, png or jpeg etc. will
not be accepted.
o Place all solutions in a single MS Word file along with your own Student Id at top.
o Submit the MS Word file at VULMS within the due date.
Note:
o No assignment will be accepted after the due date via email in any case (whether it is the case of load
shedding or internet malfunctioning etc.). Hence refrain from uploading assignment in the last hour of
deadline.
o It is recommended to upload solution file at least one day before its closing date.
o Do not put any query on MDB regarding this assignment, if you have any query then email at
[email protected]
1
CS606 – Compiler Construction
Total Marks: 20
Assignment No. 01
Due Date: Nov 25, 2024
Semester: Fall 2024
In a compiler, the lexical analyzer (also called the lexer) plays a crucial role in the first phase of
the compilation process. Its primary function is to take the raw source code and convert it into a
sequence of tokens that are easier for the compiler to process. A token is a categorized unit of
the source code, which typically corresponds to keywords, identifiers, operators, literals, and
punctuation.
Role of a Lexical Analyzer:
The lexical analyzer performs the following tasks:
1. Reading the Source Code: It reads the input source code character by character.
2. Tokenization: It groups characters into meaningful sequences and classifies them into categories
(e.g., keywords, operators, variables, numbers).
3. Skipping Whitespace and Comments: It ignores irrelevant whitespace, newline characters, and
comments.
4. Error Handling: It detects invalid sequences of characters and reports lexical errors (e.g.,
unrecognized symbols).
Phases Involved in Lexical Analysis:
The process of lexical analysis typically involves several phases:
1. Input Reading: The lexical analyzer reads the source code character by character.
2. Pattern Recognition: The analyzer uses regular expressions or other pattern matching
techniques to identify token types (e.g., keywords, identifiers, literals).
3. Tokenization: Once a match is found, the sequence of characters is grouped into a token and
passed to the next phase of the compiler.
4. Error Detection: If an illegal sequence of characters is encountered (such as a malformed
identifier), the lexical analyzer generates an error message.
Example Code in C:
Consider the following simple C code snippet:
int main() {
int x = 10;
float y = 20.5;
if (x < y) {
x = x + 1;
}
return 0;
}
2
CS606 – Compiler Construction
Total Marks: 20
Assignment No. 01
Due Date: Nov 25, 2024
Semester: Fall 2024
3
CS606 – Compiler Construction
Total Marks: 20
Assignment No. 01
Due Date: Nov 25, 2024
Semester: Fall 2024
Token 9: IntegerLiteral("10")
Token 10: Keyword("float")
Token 11: Identifier("y")
Token 12: Operator("=")
Token 13: FloatingPointLiteral("20.5")
Token 14: Keyword("if")
Token 15: Symbol("(")
Token 16: Identifier("x")
Token 17: Operator("<")
Token 18: Identifier("y")
Token 19: Symbol(")")
Token 20: Symbol("{")
Token 21: Identifier("x")
Token 22: Operator("=")
Token 23: Identifier("x")
Token 24: Operator("+")
Token 25: IntegerLiteral("1")
Token 26: Symbol("}")
Token 27: Keyword("return")
Token 28: IntegerLiteral("0")
Token 29: Symbol(";")
Token 30: Symbol("}")
Each of these tokens represents a meaningful element of the source code. After this tokenization process, the
lexical analyzer passes the tokens to the parser phase, where the syntactic structure of the code is analyzed.
Let's break down the code snippet line by line and identify the lexemes (the individual
components or substrings that represent a meaningful unit of the source code) and the
corresponding tokens (the category or type that the lexeme belongs to).
Here is the code:
int x = 20;
if (x > 10) {
4
CS606 – Compiler Construction
Total Marks: 20
Assignment No. 01
Due Date: Nov 25, 2024
Semester: Fall 2024
x = x + 5;
}
5
CS606 – Compiler Construction
Total Marks: 20
Assignment No. 01
Due Date: Nov 25, 2024
Semester: Fall 2024