0% found this document useful (0 votes)
8 views13 pages

CC Lab 1-4

The document outlines practical experiments for a Compiler Construction course, focusing on lexical analysis, token counting, left recursion removal, and LL(1) grammar checking using C programming. Each experiment includes an aim, theoretical background, and code implementation to demonstrate the concepts. The experiments facilitate understanding of compiler design principles and the importance of tokenization and grammar analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views13 pages

CC Lab 1-4

The document outlines practical experiments for a Compiler Construction course, focusing on lexical analysis, token counting, left recursion removal, and LL(1) grammar checking using C programming. Each experiment includes an aim, theoretical background, and code implementation to demonstrate the concepts. The experiments facilitate understanding of compiler design principles and the importance of tokenization and grammar analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

COMPILER CONSTRUCTION

CSE304
Practical file

AMITY SCHOOL OF ENGINEERING AND TECHNOLOGY


AMITY UNIVERSITY, UTTAR PRADESH

Submitted by:
Krish Dogra
A2305222573
6CSE-9X

Submitted to:
Dr Roshan Lal
Experiment 1
Aim: Write a program to identify keywords, constants, special characters and
identifiers from a given input string.
Language Used: C
Theory: Lexical analysis is a fundamental part of compiler design, where a
given input string is processed to identify meaningful components called tokens.
Tokens include keywords, constants, special characters, and identifiers, each
playing a crucial role in programming languages.
Keywords are predefined reserved words such as int, return, and if, which have
specific meanings in the C programming language. Constants refer to fixed
numeric values like 100 or 3.14, which do not change during execution.
Identifiers are user-defined names for variables, functions, and arrays, following
the language's naming conventions. Special characters include symbols like +,
-, *, {, }, which define operations and control structures in the code.
The program reads an input string, processes each character, and groups them
into tokens based on predefined rules. It scans the string, checks whether a
sequence of characters matches a keyword, a constant, or an identifier, and
detects special symbols. This process ensures accurate token classification,
which is crucial for syntax analysis in a compiler.
Lexical analysis is essential for code parsing, debugging, and compilation, as it
helps detect syntax errors early. By systematically breaking the input into
tokens, the program enhances readability and assists in further stages of code
execution, such as semantic analysis and optimization.

Code:
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdbool.h>
const char *keywords[] = {"int", "float", "char", "double", "return", "if",
"else", "for", "while", "do", "switch", "case", "break", "continue", "void",
"static", "struct", "typedef", "const", "sizeof", "volatile", "enum", "union",
"default", "extern", "goto", "register", "short", "signed", "unsigned", "long",
"auto", "inline", "restrict", "_Alignas", "_Alignof", "_Atomic", "_Bool",
"_Complex", "_Generic", "_Imaginary", "_Noreturn", "_Static_assert",
"_Thread_local"};
int keyword_count = sizeof(keywords) / sizeof(keywords[0]);

bool isKeyword(char *word) {


for (int i = 0; i < keyword_count; i++) {
if (strcmp(word, keywords[i]) == 0) {
return true; }}
return false;
}
bool isNumber(char *word) {
for (int i = 0; word[i] != '\0'; i++) {
if (!isdigit(word[i])) {
return false; } } return true; }
bool isSpecialSymbol(char ch) {
char special_symbols[] = "!@#$%^&*()-+=|<>?/{}[]:;.,'\\";
for (int i = 0; special_symbols[i] != '\0'; i++) {
if (ch == special_symbols[i]) {
return true;
}} return false;
}
void identifyTokens(char *str) {
char token[50]; int index = 0;
for (int i = 0; str[i] != '\0'; i++) {
if (isalnum(str[i])) {
token[index++] = str[i];
} else {
if (index > 0) {
token[index] = '\0';
if (isKeyword(token)) { printf("Keyword: %s\n", token); }
else if (isNumber(token)) { printf("Constant: %s\n", token); }
else { printf("Identifier: %s\n", token); }
index = 0; }
if (isSpecialSymbol(str[i])) {
printf("Special Character: %c\n", str[i]); } } }
}
int main() {
char input[100];
printf("Enter a C statement: "); fgets(input, sizeof(input), stdin);
identifyTokens(input);
return 0;
}
Output:
Experiment 2
Aim: Write a program to count the total number of tokens in the source code.
Language Used: C
Theory: A token is the smallest meaningful unit in a programming language,
including keywords, identifiers, operators, constants, and special symbols.
Tokenization is the process of breaking source code into tokens, which is a
fundamental step in lexical analysis for compilers. This program reads a C
source code file, extracts tokens using strtok(), and counts them. It identifies
tokens by splitting the text based on predefined delimiters such as whitespace,
punctuation, and operators. Tokenization is widely used in compiler design,
syntax highlighting, and static code analysis. By implementing this process, we
can better understand how source code is structured and interpreted.
Code:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAX_LENGTH 100

int isSpecialCharacter(char ch) {


char specialChars[] = "(){}[];,=+-*/<>!&|\"'";
for (int i = 0; specialChars[i] != '\0'; i++) {
if (ch == specialChars[i]) {
return 1; }}
return 0;
}
int main() {
char sourceCode[MAX_LENGTH];
printf("Enter the source code: \n");
fgets(sourceCode, MAX_LENGTH, stdin);
int tokenCount = 0;
char token[MAX_LENGTH];
int index = 0;

for (int i = 0; sourceCode[i] != '\0'; i++) {


if (isalnum(sourceCode[i]) || sourceCode[i] == '_') {
token[index++] = sourceCode[i];
} else {
if (index > 0) {
token[index] = '\0';
tokenCount++;
index = 0;
}
if (isSpecialCharacter(sourceCode[i])) {
tokenCount++; } } }
if (index > 0) {
tokenCount++;
}
printf("Total number of tokens: %d\n", tokenCount);
return 0;
}
Output:
Experiment 3
Aim: Write a program to remove left recursion from given grammar.
Language Used: C
Theory: Left recursion occurs when a non-terminal in a grammar can
eventually derive itself as the first symbol on the right-hand side of its own
production. This causes problems for top-down parsers, like recursive descent
parsers, as they can enter an infinite loop when attempting to parse such
productions.
To remove left recursion, we apply a transformation to the grammar. For a
production of the form:
𝑨 → 𝑨𝜶 | 𝜷
where A is a non-terminal and α and β are sequences of terminals and/or non-
terminals, the left recursion is eliminated by introducing a new non-terminal A'.
The transformation is as follows:
𝑨 → 𝜷𝑨′
𝑨′ → 𝜶𝑨′ | 𝜺
Here, A' represents the new non-terminal, and ε is the empty string.
This transformation allows the grammar to be parsed by top-down parsers,
making it LL(1) compatible and preventing infinite recursion during parsing.
Code:
#include <stdio.h>
#include <string.h>
#define MAX 10

void removeLeftRecursion(char nonTerminal, char productions[MAX][MAX],


int prodCount) {
char alpha[MAX][MAX], beta[MAX][MAX];
int alphaCount = 0, betaCount = 0;
for (int i = 0; i < prodCount; i++) {
if (productions[i][0] == nonTerminal) {
strcpy(alpha[alphaCount++], productions[i] + 1);
} else {
strcpy(beta[betaCount++], productions[i]); }
}
if (alphaCount == 0) {
printf("%c -> ", nonTerminal);
for (int i = 0; i < prodCount; i++) {
printf("%s", productions[i]);
if (i < prodCount - 1) printf(" | ");
}
printf("\n"); return;
}
printf("%c -> ", nonTerminal);
for (int i = 0; i < betaCount; i++) {
printf("%s%c'", beta[i], nonTerminal);
if (i < betaCount - 1) printf(" | ");
}
printf("\n");
printf("%c' -> ", nonTerminal);
for (int i = 0; i < alphaCount; i++) {
printf("%s%c'", alpha[i], nonTerminal);
if (i < alphaCount - 1) printf(" | "); }
printf(" | ε\n");
}

int main() {
char nonTerminal;
int prodCount;
char productions[MAX][MAX];
printf("Enter the non-terminal: ");
scanf(" %c", &nonTerminal);
printf("Enter the number of productions: ");
scanf("%d", &prodCount);
printf("Enter the productions (e.g., A->Aa|b, enter only 'Aa' and 'b'):\n");
for (int i = 0; i < prodCount; i++) {
scanf("%s", productions[i]);
}
printf("\nGrammar after removing left recursion:\n");
removeLeftRecursion(nonTerminal, productions, prodCount);
return 0;
}
Output:
Experiment 4
Aim: Write a program to check whether a given grammar is LL(1).
Language Used: C
Theory:
LL(1) parsing is a top-down method that uses a parsing table to decide the next
action. A grammar is LL(1) if:
1. No Left Recursion – Avoids infinite loops.
2. No Ambiguity – Each parsing table cell has at most one production.
3. No Left Factoring – No common prefixes in productions.
To check if a grammar is LL(1):
1. Compute First and Follow sets.
2. Build the Parsing Table using these sets.
3. If any cell has multiple entries, the grammar is not LL(1).
This ensures efficient predictive parsing without backtracking.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define MAX 10

typedef struct {
char nonTerminal;
char productions[MAX][MAX];
int prodCount;
char first[MAX];
char follow[MAX];
} Grammar;
int n;
Grammar grammar[MAX];

int isNonTerminal(char ch) {


return (ch >= 'A' && ch <= 'Z');
}
void calculateFirst(int index, char firstSet[MAX]) {
for (int i = 0; i < grammar[index].prodCount; i++) {
char firstSymbol = grammar[index].productions[i][0];
if (!isNonTerminal(firstSymbol)) {
strncat(firstSet, &firstSymbol, 1);
} else {
for (int j = 0; j < n; j++) {
if (grammar[j].nonTerminal == firstSymbol) {
strcat(firstSet, grammar[j].first);
}}}}}
void calculateFollow() {
grammar[0].follow[0] = '$';
for (int i = 0; i < n; i++) {
for (int j = 0; j < grammar[i].prodCount; j++) {
char *prod = grammar[i].productions[j];
int len = strlen(prod);
for (int k = 0; k < len - 1; k++) {
if (isNonTerminal(prod[k])) {
strncat(grammar[i].follow, &prod[k + 1], 1); }}}}}
int checkLL1() {
char parsingTable[MAX][MAX][MAX] = {""};
for (int i = 0; i < n; i++) {
char firstSet[MAX] = "";
calculateFirst(i, firstSet);
for (int j = 0; j < strlen(firstSet); j++) {
char terminal = firstSet[j];
if (parsingTable[grammar[i].nonTerminal - 'A'][terminal - 'a'][0] != '\0'){
return 0;
}
strcpy(parsingTable[grammar[i].nonTerminal - 'A'][terminal - 'a'],
grammar[i].productions[j]); }
}
return 1;
}
int main() {
printf("Enter the number of non-terminals: ");
scanf("%d", &n);
for (int i = 0; i < n; i++) {
printf("Enter non-terminal %d: ", i + 1);
scanf(" %c", &grammar[i].nonTerminal);
printf("Enter the number of productions for %c: ",
grammar[i].nonTerminal);
scanf("%d", &grammar[i].prodCount);
printf("Enter productions for %c (one per line):\n",
grammar[i].nonTerminal);
for (int j = 0; j < grammar[i].prodCount; j++) {
scanf("%s", grammar[i].productions[j]);
}
}
calculateFollow();

if (checkLL1()) {
printf("The given grammar is LL(1).\n");
} else {
printf("The given grammar is NOT LL(1).\n");
}
return 0;
}
Output:

You might also like