0% found this document useful (0 votes)
13 views14 pages

C2p2 CD Report

Uploaded by

pranavdagay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

C2p2 CD Report

Uploaded by

pranavdagay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Activity based

Project Report on

Compiler Design
Project Module - II
Submitted to Vishwakarma University, Pune
Under the Initiative of
Contemporary Curriculum, Pedagogy, and Practice (C2P2)

By
Pranav Srikanth Dagay
SRN No : 202101229
Roll No : 1
Div : D
Third Year Engineering

Faculty Incharge:- Prof Richa Agnihotri


Date Of Project 1:- 14/02/2024

Department of Computer Engineering


Faculty of Science and Technology

Academic Year
2023-2024 Term-II
Compiler Design

Compiler Design : Project I


Project Name : Compiler for Protocol Specification Language
Phase 1: Lexical Analysis

Problem Statement: Develop a syntax and semantic analyser for a Protocol Specification
Language (PSL) to improve the interpretation and processing of communication protocols in a
structured format.

Introduction on Syntax And Semantics Analyser:


Syntax Analysis:
Syntax analysis, also known as parsing, is a fundamental phase in the compilation process of
programming languages. It involves Analysing the structure of input code to determine if it conforms
to the grammar rules of the language. Syntax analysis ensures that the code follows the correct
sequence of tokens and adheres to the rules defined by the language's syntax. This process typically
involves tokenization, where the input code is broken down into a sequence of tokens, followed by
parsing, where these tokens are analysed according to the grammar rules to build a parse tree or syntax
tree. Any deviations from the grammar rules result in syntax errors, indicating that the code cannot
be interpreted correctly by the compiler or interpreter. Syntax analysis lays the groundwork for
subsequent phases of compilation, such as semantic analysis and code generation.

Semantics Analysis:
Semantic analysis is the stage in the compilation process where the meaning of the program is
examined beyond its syntactic structure. It delves into the context and implications of the code to
ensure that it adheres to the rules and constraints of the programming language. Unlike syntax
analysis, which focuses on the arrangement and relationships between symbols, semantic analysis
checks for logical errors, type inconsistencies, and other violations of language rules that cannot be
detected by syntax alone. This phase verifies that the program behaves as intended and conforms to
the intended semantics of the language. Semantic analysis often involves type checking, scope
resolution, and other analyses to ensure program correctness and reliability. Additionally, it may
involve optimizations and transformations to improve program efficiency. Ultimately, semantic
analysis plays a crucial role in ensuring the correctness, safety, and reliability of software systems.

Page |2
Compiler Design

Parsing:
Parsing, a fundamental process in computer science and linguistics, involves analyzing a sequence of
tokens to determine its grammatical structure with respect to a given formal grammar. In the context
of programming languages, parsing is the process of analyzing source code to build a parse tree,
which represents the syntactic structure of the program according to the grammar rules of the
language. During parsing, the input code is typically broken down into tokens using lexical analysis,
and these tokens are then organized hierarchically based on the grammar rules to construct the parse
tree.

Parsing ensures that the input code adheres to the syntax specified by the language grammar, enabling
subsequent phases of the compilation process such as semantic analysis and code generation. If the
input code contains syntax errors, the parser detects and reports them, indicating where and why the
code does not conform to the language syntax. This feedback helps developers identify and correct
errors in their code, facilitating the creation of correct and well-structured programs. Overall, parsing
plays a critical role in translating human-readable source code into a format that computers can
understand and execute.

Parse tree:
A parse tree, also known as a syntax tree or derivation tree, is a hierarchical representation of the
syntactic structure of a string according to a formal grammar. It serves as a visual representation of
the parsing process, illustrating how the input string is broken down into its constituent parts based
on the grammar rules.

In the context of programming languages, a parse tree depicts the structure of source code, showing
how individual tokens or lexemes are combined to form higher-level constructs such as statements,
expressions, and declarations. Each node in the parse tree corresponds to a specific syntactic element,
with parent-child relationships indicating how these elements are related to each other.

Parse trees provide valuable insights into the structure and organization of code, making it easier to
understand its syntactic composition and detect any parsing errors. They serve as an intermediate
representation during the compilation process, facilitating subsequent phases such as semantic
analysis and code generation.

Page |3
Compiler Design

Parsing & its types:


1. Recursive Descent Parser:
sA recursive descent parser is a top-down parser that starts at the top of the parse tree
and works its way down recursively, typically using a set of recursive procedures, each
corresponding to a grammar rule.
This parser is often hand-written and can be relatively simple to implement for
languages with simple grammars.
2. LL Parser (Left-to-Right, Leftmost Derivation):
LL parsers are top-down parsers that parse the input from left to right and produce a
leftmost derivation of the input string.
LL parsers are often used for parsing languages described by LL grammars, which are
a subset of context-free grammars.
3. LR Parser (Left-to-Right, Rightmost Derivation):
LR parsers are bottom-up parsers that parse the input from left to right and construct
a rightmost derivation of the input string.
LR parsers are generally more powerful than LL parsers and can handle a larger class
of grammars, including many context-free grammars.
4. LALR Parser (Look-Ahead LR Parser):
LALR parsers are a variant of LR parsers that use a lookahead token to decide which
action to take during parsing.
LALR parsers are commonly used in parser generators like Yacc/Bison.
5. Parsers Generated by Parser Generators:
Many parser generators such as Yacc/Bison, ANTLR, and JavaCC can generate
parsers automatically from a formal grammar specification.
These tools can generate parsers for various types of grammars, including LL, LR, and
LALR grammars.
6. DOM Parser (Document Object Model):
DOM parsers parse the entire XML/HTML document into a tree structure representing
the document's elements, attributes, and text content.
DOM parsers typically load the entire document into memory, which can be memory-
intensive for large documents.
7. SAX Parser (Simple API for XML):

Page |4
Compiler Design

• SAX parsers parse XML/HTML documents sequentially and generate events


(callbacks) as they encounter elements, attributes, and text content.
• SAX parsers are event-driven and are generally more memory-efficient than DOM
parsers, making them suitable for large documents.

Parser Used:
The LL(1) parser, or Left-to-Right, Leftmost derivation with one lookahead symbol, is a type of
predictive parsing algorithm commonly used in syntax analysis of programming languages. It
operates based on a context-free grammar (CFG) and a parsing table, where each entry represents a
production rule to apply given the current non-terminal symbol and lookahead token. In the context
of our semantics and syntax analysis for the protocol specification language, the LL(1) parser
efficiently analyses the input code by predicting the next production rule to apply, based on the current
state of the parse stack and the lookahead token. This parsing strategy allows for a systematic and
deterministic approach to parsing, enabling the parser to quickly identify and handle syntactic and
semantic constructs according to the grammar rules. By employing LL(1) parsing, our analysis tool
ensures that the protocol specification language adheres to its defined syntax and semantics, providing
accurate feedback on the correctness and validity of the input code.

Tokens for the Problem Statement:

Page |5
Compiler Design

Grammar:
<Message> ::= "- type: <Type>"
" format: <Format>"
" data:"
" \"Message ID\": <MessageID>"
" \"Timestamp\": <Timestamp>"
" \"Source\": <Source>"
" \"Destination\": <Destination>"
" \"Type\": <Type>"
" \"Payload\":"
" \"Operation\": <Operation>"
" \"Resource\": <Resource>"

<Type> ::= "Request" | "Response" | "Event" | "Acknowledgement"

<Format> ::= <String>

<MessageID> ::= <Integer>

<Timestamp> ::= <String>

<Source> ::= <String>

<Destination> ::= <String>

<Operation> ::= <String>

<Resource> ::= <String>

Production Rules:
<Message> ::= "- type: <Type>" " format: <Format>" " data:" " \"Message ID\":
<MessageID>" " \"Timestamp\": <Timestamp>" " \"Source\": <Source>" " \"Destination\":
<Destination>" " \"Type\": <Type>" " \"Payload\":" " \"Operation\": <Operation>" "
\"Resource\": <Resource>"

<Type> ::= "Request" | "Response" | "Event" | "Acknowledgement"

<Format> ::= <String>

<MessageID> ::= <Integer>

<Timestamp> ::= <String>

<Source> ::= <String>

Page |6
Compiler Design

<Destination> ::= <String>

<Operation> ::= <String>

<Resource> ::= <String>

Code:
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <unordered_map>
#include <regex>

using namespace std;

// Define a struct to represent a message


struct Message {
string type;
string format;
unordered_map<string, string> data;
};

// Function to parse a string into a Message struct


Message parseMessage(const vector<string>& lines) {
Message message;
for (const auto& line : lines) {
// Search for key-value pairs in the line
regex pattern("\"([^\"]+)\": \"([^\"]+)\"");
smatch matches;
if (regex_search(line, matches, pattern)) {
// Extract key and value
string key = matches[1].str();
string value = matches[2].str();
if (key == "type") {
message.type = value;
} else if (key == "format") {
message.format = value;
} else {
message.data[key] = value;
}
} else {
cerr << "Error parsing line: " << line << endl;
}
}
return message;
}

Page |7
Compiler Design

int main() {
// Open the file
ifstream inputFile("data3.txt");
if (!inputFile.is_open()) {
cerr << "Failed to open the file." << endl;
return 1;
}

// Read lines from the file


vector<string> input;
string line;
while (getline(inputFile, line)) {
input.push_back(line);
}

// Close the file


inputFile.close();

// Parse the input lines into messages


vector<Message> messages;
vector<string> currentMessage;
for (const auto& line : input) {
if (line.empty()) {
if (!currentMessage.empty()) {
messages.push_back(parseMessage(currentMessage));
currentMessage.clear();
}
} else {
currentMessage.push_back(line);
}
}
// Parse the last message
if (!currentMessage.empty()) {
messages.push_back(parseMessage(currentMessage));
}

// Output result
cout << "Given data has appropriate syntax and semantics" << endl;

return 0;
}

Page |8
Compiler Design

Parse Tree:

Page |9
Compiler Design

Test Case with Outputs:


Test Case Input 1:
- type: "Request"
format: "Request Format"
data:
"Message ID": 1
"Timestamp": "2024-02-15T13:45:30Z"
"Source": "Client-1"
"Destinatiokbjfg szdklbf::;
- type: "Response"
format: "Response Format"
data:
"Message ID": 2
"Timestamp": "2024-02-15T13:45:35Z"
"Source": "Server-1"
"Destination":tur adipiscing elit."
- type: "Event"

Test Case 1 Output:

P a g e | 10
Compiler Design

Test Case Input 2:


{
"type": "Request",
"format": "Request Format",
"data": {
"Message ID": 1,
"Timestamp": "2024-02-15T13:45:30Z",
"Source": "Client-1",
"Destination": "Server-1",
"Type": "Request",
"Payload": {
"Operation": "GET",
"Resource": "data.txt"
}
}
},
{
"type": "Response",
"format": "Response Format",
"data": {
"Message ID": 2,
"Timestamp": "2024-02-15T13:45:35Z",
"Source": "Server-1",
"Destination": "Client-1",
"Type": "Response",
"Payload": {
"Status": "Success",
"Data": "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
}
}
},
{
"type": "Event",
"format": "Event Format",
"data": {
"Message ID": 3,
"Timestamp": "2024-02-15T13:45:40Z",
"Source": "Node-2",
"Type": "Event",

P a g e | 11
Compiler Design

Test Case Output 2:

Test Case Input 3:


"type": "Request",
"format": "Request Format",
"data": {
"Message ID": "1",
"Timestamp": "2024-02-15T13:45:30Z",
"Source": "Client-1",
"Destination": "Server-1",
"Type": "Request",
"Payload": {
"Operation": "GET",
"Resource": "data.txt"
}
},
{
"type": "Response",
"format": "Response Format",
"data": {
"Message ID": "2",
"Timestamp": "2024-02-15T13:45:35Z",
"Source": "Server-1",
"Destination": "Client-1",
"Type": "Response",
"Payload": {
"Status": "Success",
"Data": "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
}
},
{
"type": "Event",
"format": "Event Format",
"data": {
"Message ID": "3",
"Timestamp": "2024-02-15T13:45:40Z",
"Source": "Node-2",
"Type": "Event",
"Payload": {

P a g e | 12
Compiler Design

"Event": "Node offline",


"Node ID": "Node-2"
}
},
{
"type": "Acknowledgement",
"format": "Acknowledgement Format",
"data": {
"Message ID": "4",
"Timestamp": "2024-02-15T13:45:45Z",
"Source": "Server-1",
"Destination": "Client-1",
"Type": "Acknowledgement",
"Payload": {
"Acknowledged Message ID": "1",
"Status": "Received"
}
}

Justification of the Parser Used:


The custom parser implemented in the provided code is a top-down parser. In a top-down parsing
approach, parsing starts from the highest level of abstraction (e.g., the entire message) and recursively
breaks down the input into smaller components until reaching the lowest level of detail (e.g.,

P a g e | 13
Compiler Design

individual key-value pairs). In this case, the parser starts by reading each line from the input file,
representing a message, and then recursively extracts key-value pairs from each line to construct the
message structure. This top-down approach aligns well with the straightforward and hierarchical
nature of the dataset structure, making it suitable for parsing the input data efficiently.

Conclusion:
In conclusion, syntax analysis ensures that the input adheres to the syntactic rules of the language or
format, while semantic analysis focuses on understanding the meaning and context of the input,
enforcing language-specific rules, and detecting semantic errors. Both analysis phases are critical for
compiling programs and documents accurately and efficiently, ultimately contributing to the
correctness and reliability of software systems.

P a g e | 14

You might also like