Ccfile
Ccfile
Compiler Construction
[CSE304]
1
6CSE-DS-
1Y
Practical 1
Theory:
A regular expression is a sequence of characters that is used to search pattern. It is mainly
used for pattern matching with strings, or string matching, etc. They are a generalized
way to match patterns with sequences of characters.
POSIX Regular Expression Library in C allows developers to work with regular
expressions through the <regex.h> header. The key functions used are:
Procedure:
1. Include the <regex.h> header.
2. Create an array of regex patterns, including valid and invalid ones.
3. Use the regcomp() function to compile each pattern.
4. Check the return value of regcomp() to determine if the regex is valid or not.
5. Print the result for each pattern.
CODE:
// C program for illustration of regcomp()
#include <regex.h>
2
#include <stdio.h>
RESULT-
3
4
regex_t reegex;
5
// Variable to store the return // value after creation of regex int value;
// Function call to create regex value = regcomp( &reegex, "[:word:]", 0);
// If compilation is successful if (value == 0) {
printf("RegEx compiled successfully."); }
6
Practical 2
Theory:
In C programming, tokens are the smallest individual units of a program. These include
keywords, identifiers, constants, operators, and special symbols. Tokenizing a program
involves parsing its contents and separating it into these individual components. This
process helps in lexical analysis and understanding the structure of the program.
As it is known that Lexical Analysis is the first phase of compiler also known as scanner.
It converts the input program into a sequence of Tokens. For Example:
1) Keywords:
7
Examples- for, while, if etc.
2) Identifier
Examples- Variable name, function
name etc.
3) Operators:
Examples- '+', '++', '-' etc.
4) Separators:
Examples- ', ' ';' etc
Procedure:
1. Prepare Input File: Write or use an existing C program file to serve as input for
token counting.
2. Implement Tokenizer: Use standard file I/O in C to read the file, and process the
content character by character to identify tokens based on delimiters like spaces,
tabs, and special symbols.
4. Display Results: Output the total number of tokens detected in the program.
Code:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
8
// Function to check if a character is a delimiter int isDelimiter(char
ch) {
return (ch == ' ' || ch == '+' || ch == '-' || ch == '*' || ch == '/' ||
ch == ',' || ch == ';' || ch == '>' || ch == '<' || ch == '=' ||
ch == '(' || ch == ')' || ch == '[' || ch == ']' || ch == '{' || ch == '}'); }
10
}
RESULT-
11
Practical 3
Aim: Evaluate the postfix expression using push pop operations using c language
Theory:
Postfix expression: The expression of the form “a b operator” (ab+) i.e., when a pair of
operands is followed by an operator.
Postfix expression (Reverse Polish Notation) is a mathematical notation where
operators follow their operands. It eliminates the need for parentheses. For example:
1. Infix: (3 + 4) * 5
2. Postfix: 3 4 + 5 * Evaluation Steps:
1) Scan the postfix expression from left to right.
2) Use a stack to store operands.
3) If an operand is encountered, push it onto the stack.
4) If an operator is encountered, pop two elements from the stack, perform the
operation, and push the result back.
5) At the end of the expression, the stack will contain the result.
Stack Operations:
1. Push: Insert an element into the stack.
2. Pop: Remove and return the top element from the stack.
Procedure:
1) Define a stack structure and functions for push and pop.
2) Traverse the postfix expression:
1. If it is an operand, push it onto the stack.
2. If it is an operator, pop two elements from the stack, perform the operation, and
push the result back.
3) Continue until the end of the expression.
4) The result will be at the top of the stack.
12
Code:
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
// Define stack and its functions
#define MAX 100
int stack[MAX]; int
top = -1;
void push(int value) { if (top >=
MAX - 1) { printf("Stack
overflow\n"); exit(1); }
stack[++top] = value; } int pop()
{ if (top == -1) { printf("Stack
underflow\n"); exit(1); } return
stack[top--];
}
13
result = operand1 * operand2; break; case '/':
result = operand1 / operand2; break; default:
printf("Invalid operator encountered\n");
exit(1);
RESULT-
14
15
}
16
// Push
the result
back
onto the
stack
push(resu
lt);
} i++;
}
return 0;
}
Theory:
17
1. A Nondeterministic Finite Automaton (NFA) is an automaton where for some state
and input, the machine may transition to multiple states or no state at all. It may also
include epsilon transitions (transitions without consuming an input symbol).
2. A Deterministic Finite Automaton (DFA) is an automaton where for each state and
input, there is exactly one possible next state. No epsilon transitions are allowed, and
each input symbol leads to exactly one state transition. The process of converting an
NFA to a DFA involves:
Procedure:
1. Input: The states, input symbols, transition function, start state, and accepting
states of the NFA.
2. Construct DFA states:
1) Identify the epsilon-closure (set of reachable states from a given state using
epsilon transitions).
2) For each state of the NFA, create new DFA states by combining possible states of
the NFA.
3. Transition function: For each new DFA state, calculate the possible transitions for
each input symbol.
4. Final DFA construction: Once all possible transitions are calculated, print the
DFA state transition table.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_STATES 10
#define MAX_SYMBOLS 2
18
#define MAX_TRANSITIONS 5
int nfaStates, nfaAlphabetSize,
nfaTransitions[MAX_STATES][MAX_SYMBOLS][MAX_TRANSITIONS];
int nfaStartState, nfaFinalStates[MAX_STATES]; int
dfaStates[MAX_STATES][MAX_SYMBOLS], dfaStateCount = 0;
void convertNFAtoDFA() {
int currentState, nextState, i, j, found;
int dfaState[MAX_STATES];
dfaState[dfaStateCount++] = nfaStartState;
void printDFA() {
printf("DFA States and Transitions:\n");
for (int i = 0; i < dfaStateCount; i++) { printf("DFA
State %d\n", i);
19
}}
RESULT-
20
int main() { nfaStates = 3; nfaAlphabetSize = 2; nfaStartState = 0;
21
// Define NFA transitions: -1 means no transition memset(nfaTransitions, -1,
sizeof(nfaTransitions)); nfaTransitions[0][0][0] = 0; nfaTransitions[0][1][1] = 1;
nfaTransitions[1][0][1] = 1; nfaTransitions[1][1][2] = 2; nfaTransitions[2][0][2] = 2;
nfaTransitions[2][1][2] = 2;
convertNFAtoDFA();
printDFA();
return 0;
}
Practical 5
Aim:
To write a C program that checks whether a given line is a comment in C programming.
Theory:
1. Comments in C:
o Comments are used to add explanations or notes within the code without
affecting program execution.
o C supports two types of comments:
▪ Single-line comments: Begin with // and extend to the end of the
line.
▪ Multi-line comments: Begin with /* and end with */, spanning
multiple lines if needed.
2. Approach to Identifying Comments:
22
o Read the given line of text. o Check if it starts with // (indicating a single-
line comment). o Check if it starts with /* and ends with */ (indicating a
multi-line comment).
o If neither condition is met, the line is not a comment.
Procedure:
1. Take input from the user as a string (line of code).
2. Check the first two characters of the string:
o If "//", it is a single-line comment.
Code:
#include <stdio.h>
#include <string.h>
23
else if (line[0] == '/' && line[1] == '*' && line[len - 2] == '*' && line[len - 1] == '/')
}
else {
printf("The given line is NOT a comment.\n");
}
}
int main() {
char line[200];
checkComment(line);
return 0;
}
Output:
Time Complexity:
24
•
the result.
{ printf("The given line is a multi-line comment.\n");
Best Case (O(1)): If the input starts with characters that immediately determine
Worst Case (O(n)): When scanning a long string to verify a multi-line comment.
25
Practical 6
Aim:
To write a C program that simulates an automaton that accepts strings belonging to the
regular expression: a(a+b) aa(a+b)^*aa(a+b) a where the symbol set = {a, b}.
Theory:
1. Regular Expression (RE) Explanation:
o a(a+b)*a represents strings that:
▪ Start with a.
▪ Contain any number of a or b in between (including none).
▪ End with a.
o Example of accepted strings:
▪ aa, aba, aaa, abba, ababaaa o Example of rejected strings:
▪ bba, ab, ba, bbba
2. Finite Automaton for RE:
o States: {q0, q1, q2, qf} o Transitions:
▪ q0 → q1 on a (first a)
▪ q1 → q1 on a or b (middle part: (a+b)*)
▪ q1 → qf on a (last a) o Start State: q0 o Final State: qf o
Reject State: If input does not follow the above rules.
Procedure:
1. Read the input string.
2. Start from state q0.
3. Transition through states based on the input symbol: o If first character is a, move
to q1.
17
o Read the middle part (accept a or b in q1).
o If the last character is a, move to the final state qf.
4. If the string reaches qf, accept it; otherwise, reject it.
Code:
c
CopyEdit
#include <stdio.h>
#include <string.h>
if (isAccepted(input)) {
printf("The string is ACCEPTED.\n");
} else {
printf("The string is REJECTED.\n");
}
return 0;
}
Output:
Time Complexity:
• O(n) (where n is the length of the input string) since it scans the string once.
19
Practical 7
Aim:
To write a C program to check whether a given identifier is valid according to the rules
of C programming.
Theory:
1. Definition of an Identifier: o An identifier is the name of variables, functions,
arrays, etc. in a programming language.
2. Rules for a Valid Identifier in C:
o Must start with a letter (A-Z or a-z) or an underscore (_).
o Can contain letters, digits (0-9), and underscores (_). o Cannot be a
reserved keyword in C.
o Cannot contain special characters like @, #, $, %, &, etc..
o Cannot start with a digit.
3. Examples:
✅ Valid Identifiers:
o name, var_1, _count, MyVar ❌ Invalid Identifiers: o 1var (starts with a
digit) o my-var (contains -)
o int (reserved keyword)
Procedure:
1. Read the input string.
2. Check if the first character is a letter or an underscore.
3. Check the remaining characters to ensure they contain only letters, digits, or
underscores.
4. Verify that the string is not a C reserved keyword.
5. Display the result accordingly.
20
Code:
c
CopyEdit
#include <stdio.h>
#include <ctype.h>
#include <string.h>
return 1;
}
int main() {
char identifier[100];
if (isValidIdentifier(identifier)) {
printf("The identifier '%s' is VALID.\n", identifier);
} else { printf("The identifier '%s' is INVALID.\n",
identifier);
}
return 0; 22
}
Output:
Time Complexity:
• Best Case (O(1)): If the first character is invalid.
• Worst Case (O(n)): If the identifier is long, it scans each character and checks for
keywords.
40
Practical 8
Theory:
1. Lexical Analysis in Compiler Design:
o Lexical analysis is the first phase of a compiler.
o It scans the input program and recognizes tokens such as keywords,
identifiers, operators, and symbols.
2. Operators to be Validated: o Relational Operators: <, >, <=, >=, ==, != o
Logical Operators: !, ||, && o Bitwise Operators: |, & o Assignment
Operator: =
3. Lexical Analyzer Working:
o Read the input string. o Check if the string matches any of the valid
operators.
o If valid, print "Valid Operator."
o If invalid, print "Invalid Operator."
Procedure:
1. Read the input string.
2. Compare it with valid operators using strcmp().
3. If the input matches any valid operator, print "Valid Operator."
4. Otherwise, print "Invalid Operator."
Code:
c
CopyEdit
#include <stdio.h>
41
#include <string.h>
// List of valid operators const char *operators[] = { "<", ">", "<=", ">=", "!", "!=", "|",
"||", "&", "&&", "=", "==" };
#define OPERATORS_COUNT (sizeof(operators) / sizeof(operators[0]))
int main() {
char input[3]; // Max length of an operator in the list is 2, so we take 3 for safety
if (isValidOperator(input)) { printf("'%s'
is a VALID operator.\n", input);
} else {
printf("'%s' is an INVALID operator.\n", input);
}
return 0;
42
}
Output:
43
Time Complexity:
• Best Case (O(1)): If the operator is found at the beginning of the list.
• Worst Case (O(n)): If it has to check all operators.
44