CD Lab Manual PDF

Uploaded by

Shivalika Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

146 views71 pages

CD Lab Manual PDF

Uploaded by

Shivalika Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 71

a MANIPAL INSTITUTE OF TECHNOLOGY MANIPAL, (A constituent unit of MAHE, Manipal) DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING CERTIFICATE ‘This is to certify that Ms./Mr. ... .. has Reg. No. csssssssssssecsesseeeeses Section: .....e.cceeee satisfactorily completed the lab exercises prescribed for Compiler Design Lab [CSE 3211] of ‘Third Year B. Tech. (Computer Science and Engineering) Degree at MIT, Manipal, in the academic year 2019-2020. Date: Signature Faculty in ChargeLAB | TITLE PAGE| REMARKS NO. ‘COURSE OBJECTIVES AND OUTCOMES. EVALUATION PLAN INSTRUCTIONS To THE STUDENTS iv 1 _| PRELIMINARY SCANNING APPLICATIONS 1 2 (| LEXICAL ANALYZER a 3. | IMPLEMENTATION OF SYMBOL TABLE b 4 | RD PARSER FOR DECLARATIONS, ARRAY DECLARATIONS AND] 28 EXPRESSION STATEMENTS 5 | RD PARSER FOR DECISION MAKING AND LOOPING STATEMENTS. 3 6 _ | IupLEMeNTATION OF ERROR RECOVERY TECHNIQUES 38 7 | IvreRMeDIATE CODE GENERATION « 8 | IvrropucTIoNTO FLEX 6 9 | INTRODUCTION TO BISON 2 10 | CODE GENERATION a 1 | CODE GENERATION ee REFERENCES @ APPENDIXCourse Objectives * Apply theory to the various challenges involved in design and development of a compiler. © Understand theory and practical aspects of modem compiler design, Understand implementat n detail for a small subset of a language applying different techniques studied in this course. * Study both implementation and automated tools for different phases of a compiler. Course Outcomes At the end of this course, students will gain the # Ability to create preliminary scanning applications and to classify different lexemes. * Ability to create a lexical analyzer without using any lexical generation tools. © Ability to implement suitable data structure for a symbol table. * Ability to implement a recursive descent parser for a given grammar with/without using any parser generation tools. * Ability to implement a recursive descent parser for a given grammar of C programming language with/without using any parser generation tools, # Ability to design acode generator, Evaluation plan, # Internal Assessment Marks : 60 Marks Scanning and Token izing 5 Marks Lexical analyzer 15 Marks Parser 20 Marks Flex and Bison 10 Marks Code generation 10 Marks © End semester assessment : 40 Marks ¥ Duration: 2 hours Y¥ Total marks : Write up: 15 Marks Execution: 25 Marks4. 5, INSTRUCTIONS TO THE STUDENTS Pre- Lab Session Instructions Students should carry the Class notes, Lab Manual and the required stationery to every lab session Be in time and follow the Instructions from Lab Instructors Mus Sign in the log register provided Make sure to occupy the allotted seat and answer the attendance Adhere to the rules and maintain the decorum In- Lab Session Instructions Follow the instru ions on the allotted exercises given in Lab Manual Show the program and results to the instructors on completion of experiments On receiving approval from the instructor, copy the program and results in the Lab record Prescribed textbooks and class notes can be kept ready for reference if required General Instructions for the exercises in Lab ‘The programs should meet the following eriteria Programs should be interactive with appropriate prompt messages, error messages if any, and descriptive messages for outputs Use meaningful names for variables and procedures. Copying from others is strictly prohibited and would invite severe penalty during evaluation. ‘The exercises for each week are divided under three sets: Lab exercises — to be completed during lab hours Additional Exercises — to be completed outside the lab or in the lab to enhance the skill In case a student misses a lab class, he! she must ensure that the experiment is completed at students end or in a repetition class (if available) with the permission of the faculty concemed but credit will be given only to one day’s experiment(s) Questions for lab tests and examination are not necessarily limited to the questions in the manual, but may involve some variations and / or combinations of the questions. THE STUDENTS SHOULD NOT... Bring mobile phones or any other electronic gadgets to the lab.2. Go out of the lab without permission.Lab No. 1 LAB Now 1 Date: PRELIMINARY SCANNING APPLICATIONS Objectives: * To understand basies of scanning process. © Ability to preprocess the input file so that it becomes suitable for compilation. Prerequisites: + Knowledge of file pointers and string handling functions. + Knowledge of the C programing language. L INTRODUCTION: FILE OPERATIONS: In any programming language it is vital to lear file handling techniques. Many applications will at some point involve accessing folders and files on the hard drive. In C, a stream is associated with a file. A file represents a sequence of bytes on a storage device like disk where a group of related data is stored. File is created for permanent storage of data. It is a readymade structure created by an Operating System. In C language, we use a structure pointer of file type to declare a file. FILE *fp; ‘Table 1.1 shows some of the built-in functions for file handling, ‘Table 1.1: File Handling functions Function Description fopen0, Create a new file or open an existing file feloseQ, Closes a file gete) ‘Reads a character from a file puted Writes a character to a file feani) Reads a set of data from a file printf) Writes a set of data to a file getwO) ‘Reads an integer from a fileLab No. 1 putw0 Writes an integer to a file feck) Set the position to desire point tell Gives current position in the file rewind() Set the position to the beginning point 1. fopen(): This function accepts two arguments as strings. The first argument denotes the name of the file to be opened and the second signifies the mode in which the file is to be opened. ‘The second argument can be any of the following, Syntax: *fp = FILE *fopen(const char *filename, const char *mode); ‘Table 1.2: Various modes present in file handling File Description Mode R ‘Opens a text file for reading, Ww Creates a text file for writing, if exists, itis overwritten. a ‘Opens a text file and append text to the end of the file. Rb ‘Opens a binary file for reading Wb Creates a binary file for writing, ifexists, it is overwritten, ‘Ab ‘Opens a binary file and append text to the end of the file. 2, felose(): This function is used for closing opened files. The only argument it accepts is the file pointer. If a program terminates, it automatically closes all opened files. But it is a good programming habit to close any file once it is no longer needed. This helps in better utilization of system resourves, and is very useful when you are working on numerous files simultaneously. Some operating systems place a limit on the number of files that can be open at any given point in time, Syntax: int felose( FILE "fp ); 3. fscanf) and fprintf(): ‘The functions fprinti) and fscanf{) are similar to printf() and scanf() except that these functions operate on files and require one additional and first argument to be a file pointer Syntax: fprinti(fileprinter, ‘format specifier”.v1,V2,...):Lab No. 1 facanf{filepointer, ‘format specifier”, &V1,8V2,....) 4. gete() and pute putchar() functions except that these functions require an argument which is the file pointer. ‘The functions gete() and putc() are equivalent to getchar() and Function geto() reads a single character from the file which has previously been opened using a function like fopen0. Function pute() does the opposite, it writes a character to the file identified by its second argument. Syntax: geto(in_file), pute(e, out_file), Note: The second argument in the pute() function must be a file opened in either write or append mode. 5, fseek(): This function positions the next 1/0 operation on an open stream to a new position relative to the current position. Syntax: int fseek(FILE *fp, long int offset, int origin); Here fp is the file pointer of the stream on which 1/O operations are carried on, offset is the number of bytes to skip over. ‘The offset can be either positive or negative, denoting forward or backward movement in the file. Origin is the position in the stream to which the offset is applied, this can be one of the following constants: SEEK_SE’ : offset is relative to beginning of the file SEEK_CUR: offget is relative to the current position in the file SEEK_END: offfet is relative to end of the file Binary stream input and output: The functions fread() and fwrite() are a somewhat complex file handling functions used for reading or writing chunks of data, The function prototype of fread() and fwrite() is as below : size_t fread(void *ptr, size ts7, size tn, FILE *fp); size_{ fwrite(const void *ptr, ize_tn, FILE *fp); inet 87,Lab No. 1 You may notice that the retum type of fread( is size_t which is the number of items read. You will understand this once you understand how fread() works. It reads n items, each of size sz from a file pointed to by the pointer fp into a buffer pointed by a void pointer ptr which is nothing but a generic pointer. Function fread() reads it as a stream of bytes and advances the file pointer by the number of bytes read. If it encounters an error or endof-file, it returns a zero, you have to use feof() or ferror() to distinguish between these two, Function fwrite() works similarly, it writes n objects of sz bytes long from a location pointed to by ptr, to a file pointed to by fp, and retums the number of items written to fp. PRELIMINARY SCANNING: In this process, we mainly deal with preprocessing the input file so that it becomes suitable for scanning process. Preprocessing aims at removal of blank spaces, tabs, preprocessor directives, comments from the given input file, Scanning is part of lexical analyzer. Scanning is built in a lexival analyzer. I. SOLVED EXERCISE: Write a program in *C’, that removes single and multiline comments from a given input *C” file. Algorithm: Removal of single and multiline comments Step 1: Open the input C file in read mode. Step 2: Check if the file exists. Display an error message ifthe file doesn’t exists. Step 3: Open the output file for writing Step 4: Read each character from the input file. Step 5: If the character read is */” a, Ifnext character is */” then i. Continue reading until newline character is encountered, Ifthe next character is **" then i. Continue reading until next ‘*” is encountered. ii. Check if the next character is */” Step 6: Otherwise, write the characters into the output file, Step 7: Repeat step 4, 5 and 6 until EOF is encountered.Step &: Stop Program. //Program to remove single and multiline comments from a given *C #include int main. 1 FILE *fa, *fb; int ca, eb; fopen("q4in.c", "t"); printf("Cannot open file \n"), exit(0);} fb = fopen("qdout.e”, "w ca = gete(fay: while (ca != EOF) { if(ea—="") t cb = geto(fa), if(eb =") t while(ca !="n') ca = gete(fa); 3 else if (eb = "*') t dof while(ca =") ca = gete(fa): ca = gete(fa): } while (ca 3 else { pute(ca.tb); Lab No. 1pute(eb,fb); 3 else pute(ca.fb); ca = gete(fa); 3 felose(fa), felose(tb, return 0; } } Sample Input and Output ! This is a single line comment (#28 88This is a +***>* Multiline Comment wae Hy nelude , , ~),1,6, RB>, <{2LLC>, , 3.12, SS> <=4,3, ASSIGNMENT OPERTOR> <1,4,5, NUMBER> .6, SS> <$,5,6, SS> Lab No.2 10Lab No.2 6.4, ASSIGNMENT OPERATOR» ), 5, IDENTIFIER> <1,6,6, ARITHMETIC OPERATOR> <3,6.8, SS> Assumptions to be made: * Scan for identifiers from the beginning, recording it as global. Once an entry named FUNC is created, all variables (except argument section) will be assumed to be local to that function. © However, the scope ends when a second function declaration is made, IL SOLVED EXERCISE: Write a program in °C” to identify the relational operators from the given input °C” file. Algorithm: Identification of relational operators from the given input file. Step 1: Open the input °C file Step 2: Check if the file exists. Display an error message if the file doesn’t exists Step 3: Read each character from the input file, ind next character read is “/* or *** it is considered as comments, Step 4: If character read is then skip all the characters until the end of the comment. Step 5: If character read is “, skip all the characters until the another “ is encountered. Step 6: Check if the next character is *” or!” a, Add it to the buffer. b. If next character is ‘= display It as Less Than Equal (LTE), Greater Than Equal (GTE) or NotEqualsTo (NE). ¢. Otherwise, display it as Less Than (LT), Greater Than (GT), Else Step 7: Repeat step 3, 4.5 and 6 until EOP is encountered. uLab No.2 Step 8: Stop. Program: #include #includestring h> #include int main() { char e,bufl 10}; FILE *fp=fopen("‘digit.c","r"); feete(tp); if (fp == NULL) t printf("Cannot open file \n", exit(0); t bufli-+J-<; c= fgete(tp): iff") t buffit+]-e bufli]-\0 printf("\n Relational operator : %s" buf), } else 12Lab No.2 printi("\n Assignment operator: %s" buf); t bulfivs]-<; = fgete(fp): iff") c= fgeto(fp); ei I. LAB EXERCIS! 1. Design a lexical analyzer which contains getNextToken( ) for a simple © program to create a structure of token each time and retum, which i cludes row number, column number, token type and lexeme. The following tokens should be identified - arithmetic operators, relational operators, logical operators, special symbols, keywords, numerical constants, string literals and identifiers. Also, getNextToken() should ignore all the tokens when encountered inside single line or muttiline comment or inside string literal. Preprocessor directive should also be ignored. 13IV. ADDITIONAL EXERCISE! 1. Write a getNextToken ( ) to generate tokens for the perl script given below #1 fusribinipert get total number of arguments passed. $n = scalar (@_); Soum = 0; foreach Sitem(@_) { Ssum += $item; ; Saverage = $sum + Sn: #1 Represents path which has to be ignored by getNextToken(). # followed by any character other than ! represents comments, $n followed by any identifier should be treated as a single token, Sealar, foeach are considered as keywords. @_,+= are treated as single tokens, Remaining symbols are tokenized accordingly Lab No.2 14Lab No.3 LAB No.:3 Date: IMPLEMENTATION OF SYMBOL TABLE Objectives: + To implement symbol table for the compiler + To store the tokens in symbol table. rerequisites: + Knowledge of the C programing language and FLEX. + Knowledge of file pointers. + Knowledge of data structures. I. INTRODUCTION: Symbol table is an important data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variable names, function names, objects, classes, interfaces, etc, Symbol table is used by both the analysis and the synthesis parts of a compiler. A symbol table may serve the following purposes depending upon the language in hand: Y To store the names of all entities in a structured form at one place. ¥ To verify if'a variable has been declared ¥ — To implement type checking, by verifying assignments and expressions in the source code are semantically correct ¥ To determine the scope of a name (scope resolution). A symbol table is simply a table which can be either linear or a hash table. Symbol Table Management A symbol table is a data structure containing all the identifiers (i.e. names of variables, procedures ete.) of a source program together with all the attributes of each identifier. For variable ypical attributes include: + Variable type * Size of memory it occupies + It ope. 15+ Offset © Arguments + Number of arguments + Retum type An entry made in the symbol table during le: analysis phase and it Lab No.3 updated during, syntax and semantic analysis phases. Hash table is used for the implementation of symbol table due to fast look up capability Structure of symbol table: ‘There are two types of symbol table * Global Symbol table: Contains entry for each fanetion. © Local symbol table: Created for each functions. It stores identifier details used inside the function. 1. int fact5; 2. int factorial (int n) { 3. int val; 4 iff 5, val-n*factorial (n-1); 6. return(val); 2} 8 else { 9, return (1); 10. } IL} 12. int main( ) { 13, printf (“factorial program\n”); 14, factS=factorial (5); 15, printf fact5=%d”, fact5); 16.) Input Program 3.1 16Lab No.3 v7Lab No.3 ‘The structure of symbol table for the program 3.1 is as shown in table 3.1 Table 3.1 Name Type | Size Scope No. of Arguments Retw arguments Type 1/fctS int | 4 |G 2 factorial FUNC G 1 sid, 3> int 30 int 4 4 val int 4 L 5) main FUNC. G 0 NULL Int Sample Output file for the factorial funetion in Program 3.1 int> <; < (> <(> <> > <)><{> id, 4> <= > < id, 4> <*> <> < id, 3> <-> <)> <> < (> < id,4> <)> <{ > return < (> < 4 ~ Assumptions to be made: Assume at most a single argument is present for a function, ‘* Scan for identifiers from the beginning, recording it as global. Once an entry named FUNC is created, all variables (except argument section) will be assumed to be local to that function. © However, the scope ends when a second function declaration is made. Il. SOLVED EXERCISE : ‘This a program that implements the Symbol Table using Linked Lists Ituses Open Hashing... ‘The entire implementation done with the functions Search, Insert, Hash Display function displays the whole symbol table. 18Lab No.3 *) #include +#include #include #define TableLength 30 enum token Type { EOFILE=-1, LESS_ THAN, LESS_THAN_OR_EQUAL,GREATER_THAN,GREATER_THAN_OR_EQUAL, EQUAL,NOT_ EQUAL}; struct token char *lexeme; int index; unsigned int rowno,colno; //row number, column number. enum tokenType type; b struct ListElement{ struct token tok; struct ListElement *next; bh struct ListElement * TABLE|TableLength}, void InitializeQt for(int i-0;itok = th; cur>next = NULL: if(TABLE[val]==NULL){ TABLE|val] = cur; // No collosion. i elset struct ListElement * ele= TABLE[val]; while(ele->next!=NULL){ ele = ele->next; // Add the element at the End in the case of a //collision. i ele->next = cur; BB II. LAB EXERCISES: 1. Implement symbol table to store all the identifiers and user defined function names of a C program. IV. ADDITIONAL EXERCISES: 1. For the given code snippet, store all the identifiers identified into a symbol table. #1 usr/bin/perl #get total number of arguments passed Sn =scalar (@_); 20Lab No.3 ‘Ssum = 0; foreach Sitem(@_) t Ssum + Sitem; } Saverage = Ssum + Sn; aLab No. 4 LAB NO: 4 Date: RD PARSER FOR DECLARATION AND EXPRESSION STATEMENTS Objectives: To design RD parser for simple variable, array declaration and expression statements of a program, Prerequisites: © Acquaintance of top down parsing. * Knowledge on removal of left recursion from the grammar and performing left factoring on the grammar. * Knowledge on computation of first and follow. I. RECURSIVE DESCENT PARSER FOR C GRAMMAR: A simple *C* language grammar is given. Student should write/update RD parser for subset of grammar each week and integrate it lexical analyzer. Before parsing the input file, remove ambiguity and left recursion, if present and also perform left factoring on subset of grammar given. Include the functions first(X) and follow(X) which already implemented in previous week. Lexical analyzer code should be included as header file in parser code. Parser program should make a function call getNextToken() of lexical analyze which generates a token. Parser parses it according to given grammar. The parser should report syntax errors if any (for e.g: Misspelling an identifier or keyword, Undeclared or multiply declared identifier, Arithmetic or Relational Expressions with unbalanced parentheses and Expression syntax error ete.) with appropriate line-no Sample C grammar: Data Types : int, char Arrays : 1-dimensional Expressions : Arithmetic and Relational Looping statements, : for, while Decision statements : if, if-else Note: ‘The following grammar for C language is to be adopted with necessary corrections Program > main () { declarations. statement-list } declarations> data-type identifier-list; declarations | € data-type > int | char 22n_ stat ) I. Lab No. 4 lia. statement list > statement statement_list| © identifier-list > lemtifier-list | id[number] , identifier-list| statement > assign-stat; | decision_stat | looping-stat assign_stat > id = expn expn> simple-expn eprime eprimerelop simple-expn|e simple-exp> term seprime seprime>addop term seprime | term > factor tprime tprime > mulop factor tprime |€ factor > id | num decision-stat > if (expn ) {statement_list} dprime dprime > else {statement_list} | € looping-stat > while (expn) {statement _list} | for (as |_stat ; expn ; {statement_list} relop > lel addop > + mulop > *| / | % Grammar 4.1 LAB EXERCISE: For given subset of grammar 4.1, design RD parser Program ~ main () {declarations statement-list } identifier-list > id, identifier-lst | id{mumber], identifier-list | idfnumber] statcment_list > statcment statement list| © statement > assign-stats assign stat > id = expn espn simple-expn eprime eprime->relop simple-expn) = simple-exp-> term seprime seprime->addop term seprime |= term > factor tprime 23mL. {prime > mulop factor tprime |= factor > id | num relop > = =|! addop > +|- mulop > *| / | % ADDITIONAL EXERCISES: Write a program to parse pointer declarations. . Modify the RD parser to handle compound expressions present in C program, Modify the RD parser to handle temary statements present in C program, Lab No. 4 24Lab No. 5 LAB No: 5 Date: RD PARSER FOR DECISION MAKING AND LOOPING STATEM! NTS Objectives: + To design RD parser for decision and looping statements of a *C* program. Prerequisi © Acquaintance of top down parsing. © Knowledge on removal of left recursion from the grammar and performing left factoring on the grammar * Knowledge on computation of first and follow. I. LAB EXERCISE: 1. For given subset of grammar 4.1, design RD parser statement > assign-stat; | decision_stat | looping-stat decision-stat > if ( expn ) {statement_list} dprime dprime > else {statement _list} | € Jooping-stat > while (expn) {statement _fist} | for (assign_stat; expn ; assign_stat ) {statement_list} Il. ADDITIONAL EXERCISES: 1. Modify the RD parser to parse °C’ file consisting of do-while constructs. 25Lab No. 6 LAB No. :6 Date: IMPLEMENTATION OF ERROR RECOVERY TECHNIQUES Objectives: © To understand and implement few error recovery techniques. Prerequisites: * Knowledge of parsing. Parsing is the process of checking correctness of source code. An input program may contain error. It becomes essential for the parser to report error. The parser must report any syntax errors in an intelligible fashion and to recover from commonly occuring errors to continue processing the remainder of the program. ‘Common programming errors can occur at many different levels. There are four types of errors. i. Lexical errors include misspellings of identifiers, keywords, of operators - e.g,, the use of an identifier elipsesize instead of ellipsesize - and missing quotes around text intended as a string, ii, Syntactic errors include misplaced semicolons or extra or missing braces; that is, ("or ")." As another example, in C or Java, the appearance of a case statement without an enclosing switch is a syntactic error . (however, this situation is usually allowed by the parser and caught later in the processing. as the compiler attempts to generate code). Semantic errors include type mismatches between operators and operands. An example is a return statement in a Java method with result type void. iv. Logical errors ean be anything from incorrect reasoning on the part of the programmer to ‘The the use in a C program of the assignment operator = instead of the comparison operato program containing ~ may be well formed; however, it may not reflect the programmer's intent. 26Lab No. 6 ‘The precision of parsing methods allows syntactic errors to be detected very efficiently. Several parsing methods, such as the LL and LR methods, detect an error as soon as possible; that is, when the stream of tokens from the lexical analyzer cannot be parsed further according to the grammar for the language. More precisely, they have the viable-prefix property, meaning that they detect that an error has occurred as soon as they see a prefix of the input that cannot be completed to form a string in the language. Many etrors appear syntactic, whatever their cause, and are exposed when parsing cannot continue. A few semantic errors, such as type mismatches, can also be detected efficiently however, accurate detection of semantic and logical errors at compile time is in general a difficult task ‘The error handler in a parser has goals that are simple to state but challenging to realize: The goals are i. Report the presence of errors clearly and accurately. ii, Recover from each error quickly enough to detect subsequent errors. iii, Add minimal overhead to the processing of correct programs. 6.1 Eror-Recovery Strategies There are four types of error recovery strategies namely panic-mode recovery, phrase level correction, error productions and global correction. We discuss in the following sections two of them namely panic-mode recovery and phrase-level correction 6.1.1 Panie-Mode Recovery With this method, on discovering an error, the parser discards input symbols one at a time until one of a designated set of synchronizing tokens is found. The synchronizing tokens are usually delimiters, such as semicolon or }, whose role in the source program is clear and unambiguous ‘The compiler designer must select the synchronizing tokens appropriate for the source language. While 27L Lab No. 6 panic-mode correction often skips a considerable amount of input without checking it for additional errors, it has the advantage of simplicity, and, unlike some methods to be considered la er, is guaranteed not to go into an infinite loop. 6.1.2 Phrase-Level Recovery On discovering an error, a parser may perform local correction on the remaining input; that is, it may replace a prefix of the remaining input by some string that allows the parser to continue. A typical local correction is to replace a comma by a semicolon, delete an extrancous semicolon, or insert a missing semicolon. The choice of the local correction is left to the compiler designer. Of course, we must be careful to choose replacements that do not lead to infinite loops, as would be the case, for example, if we always inserted something on the input ahead of the current input symbol, Phrase-level replacement has been used in several error-repairing compilers. as it can correct any input string, Its major drawback is the difficulty it has in coping with situations in which the actual error has occurred before the point of detection. LAB EXERCISES Design and continue development of the parser using panic-mode recovery error correction Demonstrate the working of the error recovery for erroneous input. Design and continue development of the parser using phrase-level recovery error correction strategy. Demonstrate the working of error recovery for erroneous input. Compare the two types of error recovery strategies in terms of time and space complexity, programming effort. 28Lab No.7 LAB NO:7 Date: [ERMEDIATE CODE GENERATION Objectives: ‘© To understand the intermediate code generation phase of compilation, ‘© To generate the intermediate representation ic, three address code from simple C code Prerequisite ‘* Knowledge of three address code statements. I. INTRODUCTION ‘This phase of compiler construction takes postfix notation generated from the syntax tree in the previous phase as input and produces intermediate representation, There are wide range of intermediate representations and in this lab, we mainly consider an intermediate form called three-address code, which consists of a sequence of assembly-like instructions with three operands per instruction. There are several points worth noting about three-address instructions. First, each three-address assignment instruction has at most one operator on the right Thus, these instructions fix the order in which operations are to be done. Second, the compiler must generate a temporary name to hold the value computed by a three-address instruction ‘Third, some three-address instructions have fewer than three operands. I SOLVED EXERCISE: Write a C program to implement the intermediate code for the given postfix expression. Store in int-code-gen.c #include #include #define MAX_STACK_SIZE 40 #define MAX_IDENTIFIER_SIZE 64 * Assume variables are separated by a space* / char *str="a bed * +="; 29/*Expected output templ=ctd temp2-bitemp1 a-temp2 J/implementation using stack char **stack-NULL: int top int tcounter=1; int push(char *str) { intk; {((top+1)“MAX STACK SIZE)) return 0; strepy(stack[top+I].str), top=top+1; retum 1; } char *pop0 { iftop <0) return NULL; top-top-1; retum stack{top+1]; 3 char *dee_to_str(int num) { char numstr[ MAX_IDENTIFIER_SIZE]; int count=0,i-0; int rem, while(num > 0) { rem=num®%10: numstr[count++]-(char)rem+48; num=num/10; 3 numstr[eount]-"0" reverse the string Lab No.7 30Lab No. 7 for(i-0:i—'A! && strfi]"Z’) | (str ! && sti] <'9')) { && str[i]<—'z') || str{i} (ste{i: op[kop++]=str[i; 3 itis ita space ? else if(strfi]=="') { oplkop]-\0: kop=0; iffstremp(op."")!=0) push(op): 3 ‘/has to be any operator namely +, else { Heheck if previous identifier is stored in stack iflkop >0) { 31Lab No.7 op{kop|="\0': kop=0; ifstremp(op."")!=0) push(op); 3 pop2=pop(); pop]=pop(): int k; check for = operator iftoufi ‘os\n" pop 1 pop2); push(pop1); 3 else {//could be any + char tempStr[MAX_IDENTIFIER_SIZE]; char *numstr, strepy(tempStr,"temp"); convert teounter number to string numStr=dee_t0_str(teounter); int j, int ts=strlen(tempStr), for(j=strlen(temp Str);j int maind { int x; 3-342; printf("%0d" retum 0; 33Lab No. 8 LAB No.:8 Date: EMBLY LANGUAGE PROGRAMMIN Objectives: © To understand the code generation phase of compilation, © To generate machine code from the fe representation ie. postfix expression L INTRODUCTION: Code Generation is the next phase of the compilation process. It takes any of the intermediate representation format as input and produces equivalent Assembly Code as output, Here we consider postfix. expres n and Three Address Code as the intermediate representations to generate the basic level assembly code. Intermediate code generator receives input from its predecessor phase, semantic analyzer. in the form of an annotated syntax tree, That syntax tree then can be converted into a linear representation, e.g, postfix notation. Intermediate code tends to be machine independent code. Therefore, code generator assumes to have unlimited number of memory storage (register) to generate code. For example: a=bte*d, ‘The intermediate code generator will try to divide this expression into sub-expressions and then generate the comesponding code. rl=c*d, r=b+rl; a= “r being used as registers in the target program. A three-address code has at most three address locations to calculate the expression. ‘The assembly code has three different kinds of elements: 34Lab No. 8 ‘© Directives begin with a dot and indicate structural information useful to the assembler, linker, or debugger, but are not in and of themselves assembly instructions. For example, file simply records the name of the original source file. .data indicates the start of the data section of the program, while text indicates the start of the actual program code. .string indicates a string constant within the data section, and .globl main indicates that, the label main is a global symbol that can be accessed by other code modules. (You can ignore most of the other directives.) ‘© Labels end with a colon and indicate by their position the association between names and loca cates that the immediately following, string ions. For example, the label .LC0: i should be called .LC . ‘The label main: indicates that the instruction pushq orbp is the first instruction of the main function. By convention, labels beginning with a dot are temporary local labels generated by the compiler, while other symbols are user-visible funetions and global variables. ‘© Instructions are the actual assembly code (pushq %rbp), typically indented to visually distinguish them from directives and labels. Register We say almost general purpose because earlier versions of the processors intended for each register to be used for a specific purpose, and not all instructions could be applied to every register, Sorax orb Yorex Yc Sersi Sordi Serb Sersp ShrB 1 YO Shrtt rt2 Sert3 orld Sert5, A few remaining instructions, particularly related to string processing, require the use of %rsi and %rdi. In addition, two registers are reserved for use as the stack pointer (rsp) and the base pointer (°érbp). The final eight registers are numbered and have no specific restrictions. The architecture has expanded from & to 16 to 32 bits over the years, and so each register has some intemal structure that you should know about: 35Lab No. 8 Fig, 10.1 Registers ‘The lowest 8 bits of the %rax register are an 8-bit register %al, and the next 8 bits are known as ah, The low 16 bits are collectively known as %ax, the low 32-bits as %eax, and the whole 64 bits as ra, Addressing modes As the design developed, new instructions and addressing modes were added to make the various registers almost equal. ‘The first instruction that you should know about is the MOV instruction, which moves data between registers and to and from memory. X86-64 is a complex instruction set (CISC), so the MOV instruction has many different variants that move different types of data between different cells, ‘MOV, like most instructions, has a single letter suffix that determines the amount of data to be moved. Table 10.1: Describe data values of various sizes Suffix | Name Size B BYTE L byte (8 bits) w WORD Dbytes (16 bits) L LONG 4 bytes (32 bits) Q QUADWORD _ | 8 bytes (64 bits) 36Lab No. 8 ‘The arguments to MOV can have one of several addressing modes. A global value is simply referred to by an unadomed name such as x or printf An immediate value is a constant value icated by a dollar sign such as $56 A register value is the name of a register such as Sb. An indirect refers to a value by the address contained in a register. For example, ( pAb; rspyrefers to the value pointed to by relative value is given by adding a constant to the name of a register. For example, -16(%rce) refers to the value at the memory location sixteen bytes below the address indicated by %rex. Thi mode is important for manipulating stacks, local values, and function parameters. There are a variety of complex variations on base-relative, for example ~ Lo(Cérbs %orex8) refers to the value at the address -161%rbx%rex*8. "This mode is useful for accessing elements of unusual sizes arranged in arrays Table 10.2: Addressing modes Mode Example Global Symbol MOVQ x, *rax Immediate MOVQ $56, %rax Register MOVQ %rbs, %rax Indirect MOVQ (%orsp), %orax, Base-Relative MOVQ -8(rbp), %rax Offset-Scaled-Base-Relative MOVQ -16(%orbs,%rex,8), %orax, Basic Arithmetic You will need four basic arithmetic instructions for your compiler: ADD, SUB, IMUL, and IDIV, ADD and SUB have two operands: a source and a destructive target. For example, this instruction: ADDQ %arbx, rax: adds %rbx to %rax, and places the result in %rax, overwriting what might have been there before. This requires that you be a little careful in how you make use of registers. For example, suppose that you wish to translate © = b*(b+a), where a and b are global integers. To do 1! 37Lab No. 8 you must be carefull not to clobber the value of b when performing the addition. Here is one possible translation: MOVQ a, %rax MOYQ b, %rbx ADDQ “orbx, Yara IMULQ %rbx MOVQ %rax. ¢ ‘The IMUL instruction is a little unusual: it takes its argument, multiplies it by the contents of *orax, and then places the low 64 bits of the result in orax and the high 64 bits in %rds, (Multiplying two 64-bit numbers yields a 128-bit number, after all.) ‘The IDIV instruetion does the same thing, except backwards: it starts with a 128 bit integer value whose low 64 bits are in %rax and high 64 bits in %rdx, and divides it by the value give in the instruct n. (The CDQO instruction serves the very specific purpose of sign-extending, %orax into %rds, to handle negative values correctly.) ‘The quotient is placed in %rax and the remainder in %rdx. For example, to divide a by five: MOVQa, %rax # set the low 64 bits of the dividend cDgo # sign-extend %rax into Yordx IDIVQSS _# divide %erdx:%rax by 5, leaving result in %eax (Note that the modulus instruction found in most languages simply exploits the remainder left in %rdx.) ‘The instructions INC and DEC increment and decrement a register destructively. For example, the statement a = ++b could be translated as: MOVQb, %rax INCQ %orax: 38Lab No. 8 MOVQ “rax, a Boolean operations work in a very similar manner: AND, OR, and XOR perform destructive boolean operations on two operands, while NOT performs a destructive boolean-not on one operand, Like the MOV instruction, the various arithmetic instructions can work on a variety of addressing modes. However, for your compiler project, you will likely find it most convenient to use MOY to load values in and out of registers, and then use only registers to perform arithmetic. ‘Comparisons and Jumps Using the JMP instruction, we may create a simple infinite loop that counts up from zero using the Yoeax register: MOVQ $0, %rax loop: INCQ %rax IMP loop To define more useful structures such as terminating loops and if-then statements, we must have a mechanism for evaluating values and changing program flow. In most assembly languages. these are handled by two different kinds of instructions: compares and jumps. All comparisons are done with the CMP instruction, CMP compares two different registers and then sets a few bits in an intemal EFLAGS registers, recording whether the values are the same, greater, or lesser. You don't need to look at the EFLAGS register directly. Instead a selection of conditional jumps examine the EFLAGS register and jump appropriately: Table 10.3: Jumps Instruction Meaning TE Tump IF Equat INE Tump IfNot Equal 7 Jump If Less Than 39Lab No. 8 ILE Jump If Less or Equal IG Jump if Greater Than IGE Jump If Greater or Equal For example, here is a loop to count *orax from zero to five MOVQ SO, %rax loop: INCQ Y%orax CMPQ 85, %rax JLE. loop And here is a conditional assignment: if global variable x is greater than zero, then global variable y gets ten, else twenty MOVQ x, %rax CMPQ 80, %rax ILE twenty ten: MOYVQ $10, %rbx IMP done twenty: MOVQ $20, %rbx IMP done done: MOVQ %ebx, yLab No. 8 Note that jumps require the compiler to define target labels. These labels must be unique and private within one assembly file, but cannot be seen outside the file unless a 101 directive whilea .g1 is given. In C parlance, a plain assembly label is st at 1 label is extern. Write a program to generate Assembly Language code for an arithmetic expression involving single addition operator. Instructions: ‘© Save the following as z.s, ‘* The assembly code is nim using the command gee z.s -0z and. /z, Program: char *three_address_input= atb's char *code_prefix file \"input.e\ section rodata LO string \Woutput=®edi\n\" text glob! main ype main, @function main: LEBO: .ofi_startproc pushg %rbp cfi_defcfa_offiet 16 .cfi_offset 6, -16 movq. %rsp. %arbp .ofi_def _cfa_register 6 subg $16, %rsp movl $3, -12(%rbp) a1Lab No. 8 movl $2, -8(¢erbp) movi -8erbp), %eax movl -12(%erbp), %eedx “u subl %eax, Yoedx nt char *eode, movl eax, ~4(%rbp) movi -4(%rbp), %eeax movl %eax, %esi movl S.LCO, %edi movl $0, %eax call printf movl $0, %eax leave .ofi_def_cfa7, 8 ret cfi_endproc -LFEO: size main, -main ident \"GCC: (Ubuntu 4,8.4-2ubuntul~14.04,3) 4,8.4\" section note, GNU-stack,\"\" ,@progbits ‘The assembly code corresponding to the following C program is can also be obtained using the option -S with gee command. I LAB EXERCISES: 1. Write a © program that takes a file containing TAC for expression statements as input and generates the assembly level code for the same. 2. Write a C program that takes a file containing ‘TAC for decision making statements as input and generates the assembly level code for the same. IV. ADDITIONAL EXERCISES 1. Write a C program that takes a file containing TAC for expression involving arrays as input and generates the assembly level code for the same. 2. Write a € program that takes a file containing, TAC for switch statement as input and generates the assembly level code for the same. 42Lab No. 8 LAB No.: 09 Date: CODE GENERATION Objectives: © To understand the code generation phase of compilat a ‘© To generate machine code from the intermediate representation Consider a code snippet Program-z.c #include int main( { int x; x=3+2; printf("ad".3); retum 0; } Steps to obtain the assembly output from the program z.¢ Step 1: Generate the equivalent TAC for the program and the store in a file, Step 2: The TAC obtained in Step 1 is taken as input, Step 3: Run the command gee -S z.c . This will automatically generate a file z.s with corresponding assembly code. Step 4: The assembly code is run using the command gee 2.8 -0 Zand. /z ‘The assembly code generated is as shown below file “z.c" section -rodata .LCO: string "%d" text 43-globl type main: LEBO: main main, @function -efi_startproe pushq varbp cfi_def_efa_offket 16 cfi_offset 6, -16 movq Sersp. Yorbp .ofi_def ofa register 6 subq movl movi movi movl mov call movi leave $16, %rsp $5, -4(%arbp) ~A(erbp), Yeas. eax, esi S.L.CO, %edi , eax printf 80, %eax cfi_def_cfa7, 8 ret -efi_endproe -LFEO: size main, -main ident "GCC: (Ubuntu 4.8.4-2ubuntul~14.04,3) 4.8.4" section -note.GNU-stack,"",@progbits Lab No. 8 44Lab No. 8 L LAB EXERCISES: 1. Write a assembly program for performing subtraction operation involving 2 operands. 2. Write a assembly program for performing multipl jon operation involving 2 operands. 3. Write a assembly program for performing, division operation involving 2 operands. I, ADDITIONAL EXERCISES 1. Write a assembly program for performing addition operation involving 2 Long Int operands, 2. Write a assembly program for performing an operation that takes output of one operation as input to other expression. Ex:asbte; d=ate 45Lab No. 8 LAB No. 10 Date: INTRODUCTION TO FLEX: Objectives: + To implement programs using a Lexical Analyzer tool called FLEX. + To apply regular expressions in pattern matching under FLEX. Prerequisites: + Knowledge of the C programing language. «Knowledge of basic level regular expressions I (TRODUCTION, FLEX (Fast LENical analyzer generator) is a tool for generating tokens. Instead of writing a lexical analyzer from scratch, you only need to identify the vocabulary of a certain language, write a specification of pattems using regular expressions (e.g. DIGIT [0-9]), and FLEX will construct a lexical analyzer for you. FLEX is generally used in the manner depicted in Fig. 3.1 Firstly, FLEX reads a specification of a scanner either from an input file “flex, or from standard input, and it generates as output a C sourve file lex.yy.c. Then, lex.yy.c is compiled and linked with the flex library (using -Ifl) to produce an executable aout. Finally, a.out analyzes its input stream and transforms it into a sequence of tokens. ‘+ Lis in the form of pairs of regular expressions and C code, © lex.yy.e defines a routine yylex() that uses the specification to recognize tokens, © aoutis actually the scanner. Specification of fa Scanner = stdin = stlex + ox tee lex.yy.c = ———l ¢ compiler /-——_» aout Sequence of Tokens Input Strean —————»| aout. ~ |-—___» Fig, 8.1 Steps involved in generating Lexical Analyzer using Flex 46Lab No. 8 Regular Expressions and Scanning Scanners generally work by looking for pattems of characters in the input. For example, in a C program, an integer constant is a string of one or more digits, a variable name is a letter or an underscore followed by zero or more letters, underscores or digits, and the various operators are single characters or pairs of characters, A. straightforward way to deseribe these pattems is regular expressions, often shortened to regex or regexp. AA flex program basically consists of a list of regexps with instructions about what to do when the input matches any of them, known as actions. A flex-generated scanner reads through its input, matching the input against all of the regexps and doing the appropriate action on each match, Flex translates all of the regexps into an efficient intemal form that lets if match the input against all the pattems simultaneously. ‘The general format of Flex source program is: ‘The structure of Flex program has three sections as follows: %f definitions} rules subroutines Definition section: Declaration of variables and constants can be done in this section, This section introduces any initial C program code we want to get copied into the final program. ‘This is especially important if, for example, we have header files that must be included for eode later in the file to work. We surround the C code with the special delimiters "%{"and "¥o}. Lex copies the material between "%{" and "%}" directly to the generated C file, so we may write any valid C code here. The 6% marks the end of this section. Rule section: Each rule is made up of two parts: a pattem and an action, separated by whitespace. The lexer that lex generates will execute the action when it recognizes the pattern. ‘These pattems are UNIX style regular expressions. Each pattem is at the beginning of a line (since flex considers any line that starts with whitespace to be code to be copied into the generated C program), followed by the C code to execute when the pattern matches. The C code can be one statement or possibly a multiline block in braces, { }. If more than one rule matches the input, the longer match is taken. If two matches are the same length, the earlier one in the list is taken. 47Lab No. 8 User Subroutines section: ‘This is the final section which consists of any legal C code. ‘This section consists of the two functions namely main( ) and yywrap( ). ‘© The fimetion yylex( ) is defined in lex.yy.c file and is called from main( ). Unless the actions contain explicit retum statements, yylex() won't retum until it has processed the entire input. ‘© The function yywrap( ) is called when EOF is encountered, If this fumetion retums 1, the parsing stops. If the function returns 0, then the scanner continues scanning. Sample Flex prog: On chars = 0; int words = 0; int Lines = 0; les A-Z]- { words+-+; chars +~ strlen(yytext); } 3 \n { chars++; lines = (chars: } ‘main(int arge, char **argv) t ‘ylexO; printf("%68d%8d%8din lines, words, chars); } In this program, the definition section contains the declaration for character, word and line counts. The rule section consists of only three pattems. The first one, [a-2A-Z]+, matches a word. ‘The characters in brackets, known as a character class, match any single upper- or Jowerease letter, and the + sign means to match one or more of the preceding thing, which here means a string of letters or a word. The action code updates the number of words and characters. seen, In any flex action, the variable yytext is set to point to the input text that the pattern just matched, The second pattem, \n, just matches a new line. The action updates the number of lines and characters. ‘The final pattem is a dot, which is regex that matches any character. ‘The 48Lab No. 8 action updates the number of characters. The end of the rules section is delimited by another Handling ambiguous patterns Most flex programs are quite ambiguous, with multiple pattems that can match the ime input Flex resolves the ambiguity with two simple rules ‘* Match the longest possible string every time the scanner matches input. * Inthe case ofa tie, use the pattem that appears first in the program, ‘These tun out to do the right thing in the vast majority of eases. Consider this snippet from a scanner for C source code: "4" {retum ADD: } "=" { retum ASSIGN; } "+=" { return ASSIGNADD; } "if" { return KEYWORDIF; } "else" { return KEYWORDELSE; } [a-zA-Z_J[a-zA-Z0-9_]* { return IDENTIFIER; } For the first three pattems, the string += is matched as one token, since += is longer than +. For the last three pattems, as long as the patterns for keywords precede the pattem that matches an identifier, the scanner will match keywords correctly Table 8.1 Variables and functions available by default in Flex yytext When the lexical analyzer matches or recognizes the token from the input, then the lexeme is stored in a null terminated string called bytext. It is an array of pointer to char where lex places the current token’s lexeme. The string is automatically null terminated. yyleng Stores the length or number of characters in the input string. The value of yyleng is same as strlen( ) functions. In other words it is a integer that holds strlen(yytext). yyval This variables returns the value associated with token yyin Points to the input file. yyout Points to the output file. yylexO The function that starts the analysis process. It i automatically generated by Lex. 49Lab No. 8 [xyz] <> [af] Xt [0-9] xX? yywrap() | This fimetion is called when EOF is encountered. If this function returns 1, the parsing stops. If the function retums 0, then the scanner continues scanning, ‘Table 8.2 Regular Definitions in FLEX Reg Expression | Description ‘Any characters amongst x, y or z. You may use a dash for character intervals: [a-z] denotes any letter from a through z, You may use a leading hat to negate the class: [0-9] stands for any character which is not a decimal digit, including new-line, Match the end-of-file. ‘matches either a.b,c,d.e, or Fin the range a to matches one or more occurrences of X ‘matches any integer alternation (or) ‘X is optional (zero or one occurrence) 50Lab No. 8 [Ze] ‘matches any alphabetical character matches any character except newline rs ‘matches the . character \n matches the newline character. ¥ ‘matches the tab character Ped] matches any character other than a,b,c and d. ‘The basic operators to make more complex regular expressions are, with r and s being two regular expressions: ‘© () : Match an r; parentheses are used to override precedence. * rs: Match the regular expression r followed by the regular expression s. This is called concatenation ‘ris: Match either an r oran s. This is called alternation, ‘© {abbreviation}: Match the expansion of the abbreviation definition. Instead of writing regular expression, Example for abbreviation: [a-2A-Z_][a-2A-Z0-9_|* retum IDENTIFIER: We may write id [a-2A-Z_][a-zA-Z0-9_|* {id} retum IDENTIFIER; ‘© ris: Match an but only if it is followed by an s. The text matched by s is included when determining whether this rule is the longest match, but is then retumed to the input before the action is executed. So the action only sees the text matched by r. This type of pattem is called trailing context. © Ar: Match an r, but only at the beginning of a line (ie., which just starting to sean, or right afler a newline has been scanned). 51Lab No. 8 § : Match an r, but only at the end of a line (ie., just before a newline) ‘Tokens and Values When a flex scanner retums a stream of tokens, each token actually has two parts, the token and the token’s value, The token is a small integer. The token numbers are arbitrary, except that token zero always means end-of-file. At the time of parsing, the token numbers are assigned automatically starting at 258, But for now, we'll ust define a few tokens by hand: NUMBER = 258, ADD = 259, SUB = 260, MUL = 261, DIV = 262, ABS = 263, EOL = 264 A token’s value identifies which of a group of similar tokens this one is. In our scanner, all numbers are NUMBER tokens, with the value saying what number it is. When parsing more complex input with names, floating-point numbers, string literals, and the like, the value says which name, number, literal, or whatever, this token Il SOLVED EXERCISE: Write a Flex program to recognize and print tokens for relational operators. Program: %f #include #include #include #define YY_DECL struct token *yylex(void) enum tokenType { EOFILE=-1, LESS_THAN, LESS_THAN_OR_EQUAL,GREATER_THAN,GREATER_THAN_OR_EQUAL,EQUAL.N, OT_EQUAL}: struct token, { char *lexeme; 52Lab No. 8 int index: unsigned int rowno,coino; //row number, column number, enum token Type type; b int Linen ,colno=1; struct token *tk; struct token * alloeToken() { struct token *tk: tk=(struct token *)malloc(sizeof{struct token)): tk->lexeme = (char *)malloc(sizeof(char)*3); //maximum two characters in this case tk-> index: tk->type-EOFILE: retum th; $ void set Token Args(struct token *tk, char *lexeme, int index, int rowno,int colnoenum tokenType type) strepy(tk->lexeme,leseme); tk index=index; rowno=rowno; ->eolno=colno; type-type: see" fint i= 0; while (yytextfi]!“00) £ iffyytestfi] in’) lineno++, 53Lab No. 8 colno=1, 3 else colno+: its 3 3 "Wn" {Linenot+; colno=1:} ("ON") feolno+=strlen(yytext):} (OV) {colno+=strlen(yytext);} \n—_flineno#+; eolno=1;} {tk=allocToken; set TokenArgs(tk,yytext.-I.lineno.colno.LESS_ THAN), colno++s retum tk; } "ot {tk=alloeToken(), set Token Args(th.yytext,-1,lineno,colno,LESS_THAN_OR_EQUAL); colno+=2; return tk; } "8 {tk=allocToken; set TokenArgs(tk,yytext,-1,lineno,colno,GREATER_ THAN); colo; retum tk; 3 ">=!" {tk=allocToken(); setTokenArgs(tk.yytext,-1,lineno,colno,GREATER THAN OR EQUAL) colno+=2; return tks} MlocTokenQ); set Token Args(tkyytext,-1,lineno,colno, EQUAL); colno+=2; 54Lab No. 8 return tk:} me tk-allocToken(), setTokenArgs(th,yytext,I,lineno,colno, NOT_EQUAL), colno+=2;, return tk;} me {colno=8;} : feolno++;} main(arge.argy) int arge: char **argy; t iffarge=2) | printf("This program requires name of one C file"), exit(0), 3 yyin=fopen(argy[1 int ent=0; while((tk=yylex())) { printf("ad %d_%d %s\n",ent.tk->rowno tk->colno,tk->lexeme); ent+; 3 retum 0; 3 int yywrap() { retum 1; We define the token numbers in a C enum. Then we make yylval, the variable that stores the token value, an integer, which is adequate for the first version of our calculator. For each of the tokens, the scanner returns the appropriate code for the token; for numbers, it tums the string of digits into an integer and stores it in yylval before retuming, The pattern that matches whitespace doesn’t retum, so the scanner just continues to look for what comes next. 55Lab No. 8 Installing FLEX: Steps to download, compile, and install are as follows. Note: Replace 2.5.33 with your version number! Downloading Flex (The fast lexical analyzer): Run the command below, wget hitp:/iprdownloads sourceforge net/flex/flex 2.5.33.tar. gz?download Extracting files from the downloaded package: tar-xvaf flex-2.5.33.tar.gz Now, enter the directory where the package is extracted. ed flex-2.5.33 Configuring flex before installation: If you haven't installed m4 yet then please do so, Click here to read about the installation instructions for m4. Run the commands below to include m4 in your PATH variable. PATH=$PATH:'ust/local’m4/bin/ NOTE: Replace ‘/usr/local/m4/bin' with the location of m4 binary. Now, configure the source code before installation. /eonfigure --prefix/ust/local/flex Replace "/usr/local/flex" above with the directory path where you want to copy the files and folders. Note: check for any error message. ‘Compiling flex: make Note: check for any error message. Installing flex: As root (for privileges on destination directory), run the following. 56Lab No. 8 With sudo, sudo make install Without sudo, make install Note: check for any error messages. Flex has been successfully installed. Steps to execute: a. Type Flex program and save it using .1 extension. 'b. Compile the flex code using $ flex filename} c. Compile the generated C file using 8 gcc lex.yy.c - 0 output d. This gives an executable output.out e, Run the executable using $ ./output.out TI LAB EXERCISES Write a FLEX program to 1. Find the number of vowels and consonants in the given input. 2. Count the number of words, characters, blanks and lines in a given text. 3. Find the number of positive integer, negative integer, positive floating positive number and negative floating point number 4, Replace scanf with READ and printf with WRITE statement also find the number of scant and printf in the given input file. 5. Find whether the given sentence is simple or compound for example, if input statement is “My name is John and I study in MIT” then the program should display it as eompound statement 6 Generate tokens for a simple C program. (Tokens to be considered are: Keywords, Identifiers, Special Symbols, arithmetic operators and logical operators) IV. ADDITONAL EXERCI 57Lab No. 8 L.Write a FLEX program that changes a number from decimal to hexadecimal notation. 2Write a LEX program to convert uppervase characters to lowervase characters of C file excluding the characters present in the comment. 58Lab No. 8 LAB Now: 11 Date: 'TRODUCTION TO BISON Objectives: ‘© To understand bison tool ‘© To implement the parser using bison Prerequisites: ‘© Knowledge of the C programing language. © Knowledge of basie level of contest free and EBNF grammars. L INTRODUCTION Parsing is the process of matching grammar symbols to elements in the input data, according to the rules of the grammar. The parser obtains a sequence of tokens from the lexical analyzer, and recognizes its structure in the form of a parse tree. The parse tree expresses the hierarchical structure of the input data, and is a mapping of grammar symbols to data elements. Tree nodes represent symbols of the grammar (non-terminals or terminals), and tree edges represent derivation steps. There are two basic parsing approaches: top-down and bottom-up. Intuitively, atop- down parser begins with the start symbol. By looking at the input string, it traces a leftmost derivation of the string. By the time it is done, a parse tree is generated top-down. While bottom-up parser generates the parse tree bottom-up. Given the string to be parsed and the set of productions, it traces a rightmost derivation in reverse by starting with the input string and working backwards to the start symbol. Bison is a tool for building programs that handle structured input. The parser’s job is to figure out the relationship among the input tokens. A common way to display such relationships is a parse tree, Bison is a general-purpose parser generator that converts a grammar description (Bison Grammar Files) for an LALR(1) context-free grammar into aC program to parse that grammar. ‘The Bison parser is a bottom-up parser. It tries. by shifis and reductions, to reduce the entire input down to a single grouping whose symbol is the grammar's start-symbol 59Lab No. 8 re A sees. [ce compiler aout rmpue stream —————el_ avout |_——— Parse tree Fig. 9.1 Working of Bison How a Bison Parser Matches its Input A grammar is a series of rules that the parser uses to recognize syntactically valid input Statement: NAME * expression Expression: NUMBER ‘+’ NUMBER | NUMBER ** NUMBER ‘The vertical bar, |, means there are two possibilities for the same symbol; that is, an expression can be either an addition or a subtraction. The symbol to the left of the : is known as the left hand side of the rule, often abbreviated LHS, and the symbols to the right are the right-hand side, usually abbreviated RHS. Several rules may have the same left-hand side; the vertical bar is just shorthand for this. Symbols that actually appear in the input and are returned by the lexer are terminal symbols or tokens, while those that appear on the left-hand side of each rule are nonterminal symbols or non-temminals, Terminal and nontemninal symbols must be different; i is an error to write a rule with a token on the left side. A bison specification has the same three-part structure as a flex specification. (Flex copied its structure from the earlier lex, which copied its structure from yace, the predecessor of bison.) ‘The first section, the definition section, handles control information for the parser and generally sets up the execution environment in which the parser will operate. ‘The second section contains the rules for the parser, and the third section is C code copied verbatim into the generated C program, . definition section . rules section .. user subroutines section 60Lab No. 8 ‘The declarations here include C code to be copied to the beginning of the generated C parser, again enclosed in %{ and %}. Following that are %etoken token declarations, telling bison the names of the symbols in the parser that are tokens. By convention, tokens have uppercase names, although bison doesn’t require it, Any symbols not declared as tokens have to appear on the left side of at least one rule in the program. The second section contains the rules in simplified BNF. Bison uses a single colon rather than ::=, and since line boundaries are not significant, a semicolon marks the end of a mule. Again, like flex, the C action code goes in braces at the end of each rule. Bison creates the C program by plugging pieces into a standard skeleton file, The rules are compiled into arrays that represent the state machine that matches the input tokens. The actions have the SN and @N values translated into C and then are put into a switch statement within yyparse() that runs the appropriate action each time there’s a reduction. Abstract Syntax Tree One of the most powerful data structures used in compilers is an abstract syntax tree, An AST is basically a parse tree that omits the nodes for the uninteresting rules. A bison parser doesn’t automatically create this tree as a data structure. Every grammar includes a start symbol, the ‘one that has to be at the root of the parse tree. Il SOLVED EXERCISE: Write a Bison program to check the syntax of a simple expression involving operators +, -. and / token NUMBER ID NL. left +" left “** stmt : exp NL { printf(“Valid Expression”); evit(0);} exp rexp ‘+ term, 61Lab No. 8 | term term: term “*” factor |factor factor: ID | NUMBER int yyerror(char *msg) { printf(“Invalid Expression\n”); exit(O);, } void main () t printf(“Enter the expression\n”), yyparse(); 3 Flex Part %E include “y.tab.h” %} [0-9]+ {retum NUMBER; } \n {return NL ;} [azA-Z]fa-zA-Z0-9_}* {return ID; } . {return yytext[O]; } Steps to execute: a. Type Flex program and save it using .1 extension, b. Type the bison program and save it using -y extension, c, Compile the bison code using $ bison —d filename.y ‘The option -d Generates the file y.tabh with the #define statements that associate the yace user-assigned “token codes" with the user-declared "token names." ‘This association allows sourve files other than y.tab.c to access the token codes. d. This command generates two files filename.tab.h and filename.tab.c e, Compile the flex code using 62Lab No. 8 $ flex filename.l f. Compile the generated C file using $ gee lex.yy.c filename.tab.e - 0 output 8. This gives an executable output.out h, Run the executable using $ ./output.out IIL LAB EXERCISES: Write a bison program, 1. To check a valid declaration statement. 2. To check a valid decision making statement and display the nesting level. 3. To evaluate an arithmetic expression involving operations +,-,* and /, 4, To validate a simple calculator using postfix notation. The grammar rules are as follows ~ input > input line | ¢ line > *\n’ | exp “in? exp > num | exp exp“ | exp exp *-* | exp exp | exp exp */” | exp exp‘ exp 'n? IV. ADDITIONAL EXERCISES: 1. Write a grammar to recognize strings “aabb’ and ‘ab’ (a"b", n>=0), Write a Bison program to validate the strings for the derived grammar. 2. Write a grammar to recognize (a"b, n>=10). Write a Bison program to validate the strings for the derived grammar. 63REFERENCES: Alfred V. Aho, Monica 8. Lam, Ravi Sethi, Jeffrey D. Ullman, Compilers Principles, ‘Techniques and Tools, Pearson Education, 2nd Edition, 2010. D M Dhamdhere, “Systems Programming and Operating Systems”, Tata MeGraw Hill, 2nd Revised Edition, 2001. Kenneth C. Louden, “Compiler Construction- Principles and Practice”, Thomson, India Edition, 2007. “Keywords and Identifiers” https://www.programiz.com/c-programming/c-keywords-identifier. Behrouz A. Forouzan, Richard F. Gilberg “ A Structured Programming Approach Using C”, 3" edition, Cengage Leaming India Private Limited, India,2007 Debasis Samanta, “Classic Data Structures”, 2"! Edition, PHI Leaming Private Limited, India,2010. File handling .http://iti.ac.in/people/~tanimad/FileHandlinginCLanguage pdf https:/www3.nd.edw/~dthain/courses/cse40243/fal12015/intel-intro.htm] 64Appendix program > main () { declarations. statement-list } declarations > data-type identifier-list; declarations | data-type > int| char identifier-list > id id, identifier-tist | id{number , identifier-tist | idfrumber] statement _list > statement statement_list | € n-stat; | decisis statement > assi on_stat | looping-stat assign stat > id =expn expn> simple-expn eprime eprime>relop simple-expn|€ simple-exp> term seprime seprimeaddop term seprime |e term > factor tprime tprime > mulop factor tprime |e factor > id |num decision-stat > if (expn ) {statement st} dp dprime > else {statement list} | € looping-stat > while (expn) {statement _list} | for (assign_stat : expn ; assign_stat ) {statement_list} relop > addop > + mulop > *| 6566

Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
76 pages
FoCP Lab Manual 13
No ratings yet
FoCP Lab Manual 13
5 pages
mcsl-17 C and Assembly Language Programming Lab
No ratings yet
mcsl-17 C and Assembly Language Programming Lab
42 pages
PSPC Unit-V-File-IO
No ratings yet
PSPC Unit-V-File-IO
8 pages
Revised OS Lab Manual 2023 Final
No ratings yet
Revised OS Lab Manual 2023 Final
127 pages
AEP CS2 Files 1
No ratings yet
AEP CS2 Files 1
35 pages
NMLT Topic 9 File
No ratings yet
NMLT Topic 9 File
27 pages
Module V
No ratings yet
Module V
32 pages
C Labbook
No ratings yet
C Labbook
54 pages
Filehandling
No ratings yet
Filehandling
34 pages
Unit-5 Part 2
No ratings yet
Unit-5 Part 2
8 pages
CENG251 Lab3 2018
No ratings yet
CENG251 Lab3 2018
3 pages
Unit IV Fundamental File Processing Unit IV Fundamental File Processing Operations
No ratings yet
Unit IV Fundamental File Processing Unit IV Fundamental File Processing Operations
39 pages
C - Day 9
No ratings yet
C - Day 9
42 pages
Advanced Data Structure II
No ratings yet
Advanced Data Structure II
4 pages
Lecture 8 - File IO
No ratings yet
Lecture 8 - File IO
35 pages
Programming in C:: Department of CSE, BUET
No ratings yet
Programming in C:: Department of CSE, BUET
17 pages
06 File Handling Tutorial
No ratings yet
06 File Handling Tutorial
12 pages
File Handling
No ratings yet
File Handling
42 pages
Derivatives Functions
No ratings yet
Derivatives Functions
129 pages
Lab Manual For Computer Science I: Terence Soule January 11, 2008
No ratings yet
Lab Manual For Computer Science I: Terence Soule January 11, 2008
41 pages
FILES
No ratings yet
FILES
8 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
28 pages
2023 - DS Lab Manual
No ratings yet
2023 - DS Lab Manual
55 pages
CLang Lect14
No ratings yet
CLang Lect14
19 pages
Adobe Scan Jun 22, 2025
No ratings yet
Adobe Scan Jun 22, 2025
4 pages
Unit 10 Files and File Handling in C
No ratings yet
Unit 10 Files and File Handling in C
8 pages
Pps Unit3 Qa
No ratings yet
Pps Unit3 Qa
11 pages
File Handling in C
No ratings yet
File Handling in C
25 pages
Pps Unit III
No ratings yet
Pps Unit III
59 pages
Libraries
No ratings yet
Libraries
18 pages
PPS Chapter Files
No ratings yet
PPS Chapter Files
6 pages
Inbound 8530897362781195649
No ratings yet
Inbound 8530897362781195649
32 pages
Part 10
No ratings yet
Part 10
5 pages
File Handling in C
No ratings yet
File Handling in C
25 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
44 pages
Assignment 1 Data File Lab
No ratings yet
Assignment 1 Data File Lab
15 pages
Unit 5 C Programming Study Material
No ratings yet
Unit 5 C Programming Study Material
16 pages
Files: Closing A File: A File Must Be Closed As Soon As All Operations Have Been Closed
No ratings yet
Files: Closing A File: A File Must Be Closed As Soon As All Operations Have Been Closed
12 pages
Unit - 5 File: Dept of CSE, SGCET
No ratings yet
Unit - 5 File: Dept of CSE, SGCET
14 pages
File Handling
No ratings yet
File Handling
6 pages
Programming in C FILE PROCESSING
No ratings yet
Programming in C FILE PROCESSING
7 pages
C Unit 5
No ratings yet
C Unit 5
6 pages
Lab Manual For System Software, VTU
No ratings yet
Lab Manual For System Software, VTU
34 pages
C Prog
No ratings yet
C Prog
25 pages
Unit VI-Part1
No ratings yet
Unit VI-Part1
36 pages
PC Unit 5
No ratings yet
PC Unit 5
58 pages
C-Programming Chapter 5 File-handling-C
No ratings yet
C-Programming Chapter 5 File-handling-C
22 pages
Background: Problems
No ratings yet
Background: Problems
31 pages
CPNM Lecture 17 File Handling
No ratings yet
CPNM Lecture 17 File Handling
24 pages
File Handling
No ratings yet
File Handling
11 pages
Unit4 Adv C
No ratings yet
Unit4 Adv C
6 pages
File Operations & System Services
No ratings yet
File Operations & System Services
30 pages
CD LabFile2
No ratings yet
CD LabFile2
29 pages
B.C.A. Semester - II CA-118 Advanced Programming Unit-3 Files What Is A File?
No ratings yet
B.C.A. Semester - II CA-118 Advanced Programming Unit-3 Files What Is A File?
10 pages
Programming Assignment - 2
No ratings yet
Programming Assignment - 2
16 pages
Compiler Design Lab Manual For r13 PDF
100% (2)
Compiler Design Lab Manual For r13 PDF
52 pages
UNIT-5 File Management in C: There Are Two Types of Files in C Language
No ratings yet
UNIT-5 File Management in C: There Are Two Types of Files in C Language
12 pages
Notes
No ratings yet
Notes
35 pages

CD Lab Manual PDF

Uploaded by

CD Lab Manual PDF

Uploaded by

You might also like