0% found this document useful (0 votes)
64 views47 pages

KCA-015 Compiler Design Unit - 4-5

Uploaded by

Akarsh Dubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
64 views47 pages

KCA-015 Compiler Design Unit - 4-5

Uploaded by

Akarsh Dubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 47
SYMBOL TABLES 1. What is Symbol Table? | Symbol table is an important data structure used in a compiler. Symbol table is used to store the information about the occurrence of various entities such as objects, classes, variable name, interface, function name ete. it is used by both the analysis and synthesis phases. 2. What is the purpose of symbol table? The symbol table used for following purposes : t is used to store the name of all.entities in a structured form at one place. gos It is used to verify if a variable has been declared. 8) It is used to determine the scope of a namc. ~)_ It is used to implement type checking by verifying assignments and expressions in the source code are semantically correct. © 3. Explain about Symbol Table format. A symbol table can either be linear or a hash table. Using the following format, it maintains the entry for each name. The clause attribute ” the name, Implementation : The 8} in the unordered list if t small amount of data. A symbol table can be im) techniques : (1) Linear (sorted or unsorted) list (2) Hash table (8) Binary search tree (4) Symbol table are mostly implemented as hash table, 4. Explain about Operations of Symbol table. The Symbol Table Provides the Following Operations : i Insert () : Insert () operation is more frequently used in! the analysis phase when the tokens are identified ang names are stored in the table. The insert() operation is used to insert the information in the symbol table like the unique name occurring in the source code. In the source code, the attribute for a symbol is the; information associated with that symbol. The information contains the state, value, type and scope about the symbol. The insert () function takes the symbol and its value in the form of argument. For Example: | int x; > Should be processed by the compiler as: . 4 insert (x, int)- { lookup() In the symbol table, lookup() operation is used to search a name. It is used to determine : The existence of symbol in the table. The declaration of the sym! iti Check whether the name i Initialization of the symb Checking whether the times. The basic format of lookup() Lookup (symbol) . This format is varies ad language. 5. Define runtime environ in runtime environment? _ The source language abstractions such as names, d coupiueR DESIG . (0.3) operators, parameters, procedy res, » an constructs. These abstractions must be implement canto compiler, To implement these abstractions oi ta me no. ey, compiler needs to cooperate with Fie om ingsystem and other system software For the successful: execution of the program create and manage a runtime environment, which broadly ; In case of compilation of a’ program, the runtime environment is indirectly controlled by generating. Explain about Data structure for symbol table in briefly. A compiler contains two type of symbol table: global symbol table and scope symbol table. : Global symbol table can be accessed by all the procedures and scope symbol table. , The scope of a name and symbol table is arranged in the hierarchy structure as shown below : int value=10, void sum_num() { int num_1; int num_2; { int num_3; int num_4; } int num_5; { 7 int_num 6; int_num 7; } } Void sum_id f int id_1; Int id_2; int id_3; int id_4; } int num_5; } The above grammar ‘can be represented in a hierarchical data structure of symbol tables : Global symbol Table Data structure hierarchy of symbol table i, If the name is found then search ig else the name will be searched in the sym parent until, The name is found or global symbol is sea: Representing Scope Information ; In Program, every name possesses a region of vali the scope of that name. Explain about the rules in a block-structured language. The rules in a block-structured language are as _ follows : If a name declared within block B then it will be valid only within B. : If Bi block is nested within Bo then the name that is valid for block Be is also valid for Bi unless the name's identifier is re-declared in Bi. These scope rules: need a more complicated organization of symbol table than a list of associations between names and attributes. Tables are organized into stack and each table contains the list of names and their associated | attributes. Whenever a new block is entered then a new table is _ entered onto the stack. The new table holds the name that | is declared as loca} to this block. When the declaration is compsted then the table is searched for a name. If the name is not found in the tal name is inserted. When the name's reference i each table is searched, starting from t! stack. For example : int x; void f(int m) { float x, y; { inti, j; int u, v; } } int g (int n) { bool t; } ee ee Re et cm (0.6) FOR MC, ee a. ae (Figure : Symbol Table Organization that Complies with Static Scope Information Rules) 8. What is storage Organization? Explain about Storage Organization. When the target program executes then it runs in its own logical address space in which the value of each program has a location, F The logical address space is shared among the compiler, operating system and target machine for management and organization, The Operating system is used to map the logical a i i COMPILER DESIGN ICD Runtime storage comes into blocks, where a byte is used to show the smallest unit of addressable memory. Using the four bytes a machine word can form. Object of multibyte is stored in consecutive bytes and gives the first byte address, Run-time storage can be subdivide to hold the different components of an executing program : (1) Generated executable code (2) Static data objects (3) Dynamic data-object- heap (4) Automatic data objects- stack 9. What is Activation Record? Explain it. AOL a ead Control stack is a run time stack which is used to keep track of the live procedure activations i.e. it is used to find out the procedures whose execution have not been completed. When it is called (activation begins) then the procedure name will push on to the stack and when it returns (activation ends) then it will popped. Activation record is used to manage the information needed by a single execution of a P An activation record is pus procedure is called and it is returns to the caller function. The diagram below shows records : Actual Parameters ‘n Value : It is used by ‘alue to calling procedure. ‘\ctual Parameter ; It ia used «pply parameters to the called ®ntrol Link : It points to activ Retur ae ECG ror mca Prime Ministers of India | List of Prime Minister of India (1947-2020) Access Link : It is used to refer to non-local data held in other activation records. Saved Machine Status : It holds the information about status of machine before the procedure is called. Local Data : It holds the data that is local to the execution of the procedure. Temporaries : It stores the value that arises in the evaluation of an expression. What the different ways to allocate memory? Explain in briefly. The different ways to allocate memory are : (1) Static storage allocation (2) Stack storage allocation (3) Heap storage allocation (4) Static storage allocation : In static allocation, names are bound to storage locations. If memory is created at compile time then the memory will be created in static area and only once. Static allocation supports the dynamic data structure that means memory is created only at compile time and deallocated after program completion. The drawback with static storage allocation is that the size and position of data objects should be known at compile time. Another drawback is restriction of the procedure. Stack Storage Allocationt . (1) In static storage allocation, storage is stack. (2) An activation record is pushed into t activation begins and it is popped when end, (3) Activation record containe the locala so bound to fresh storage in each activati value of locals ia deleted when the activ: (4) It works on the basis of last-in-first-o' this allocation supports the recursion p —_—_ SE's COMPILER DESIGN (0.9) ——oo ee Heap Storage Allocation : (1) Heap allocation is the most flexible allocation scheme. (2) Allocation and deallocation of memory can be done at any time and at any place depending upon the user's requirement. (3) Heap allocation is used to allocate memory to the variables dynamically and when the variables are no more used then claim it back. (4) Heap storage allocation supports the recursion process. Example: fact (int n) - if (ns=1) return 1; else retum (n * fact(n-1)); } fact (6) The dynamic allocation is as follows : (Figure) ll. Explain about Types of errors, During the lexical analysis phase can be detected, Lexical error is a sequence of charac| match the pattern of any token. Lexical found during the execution of the program. Lexical phase error can be: () Spelling error. ae (2) Exceeding length of identifier or num ee el es . a [D.10) FORMICA Eee (3) Appearance of illegal characters. (4) To remove the character that should be present. (5) To replace a character with an incorrect character, (6) Transposition of two characters. Example: Void main() int x=10, y=20; char * a; a= &x; x= 1xab; } In this code, Ixab is neither a number nor an identifier. So this code will show the lexical error. Syntax Error : During the syntax analysis phase, this type of error appears. Syntax error is found during the execution of the program. Some syntax error can be: (1) Error in structure (2) Missing operators (3) Unbalanced parenthesis When an invalid calculation enters into a calculator then a syntax error can also occurs. This can be caused by entering several decimal points in one number or by opening brackets without closing them. For example 1: Using "=" when "==" is needed. Hello Java Program for Beginners : 16 _ if (number=200) 17. count << “number is equal to 20”; 18 else 19. count << "number is not equal to 200” - The following warning message will be Many compilers :’ - Syntax Warning : Assignment opera expression line 16 of program firstprog.cpp In this code, if expression used the eq is actually an assignment operator not operator which tests for equality. i Due to the assignment operator, num! and the expression number=200 are always the expression’s value is actually 200. For. thi correct code would be : 16 if (number==200) Example 2: Missing Semicolon : intas5 i semicolon is missing Compiler Message : ab java:20: ',” expected inta=5 ‘ " Example 3: Errors in Expressions : x =(3+5; // missing closing parenthesis ')' y=3+"5; // missing argument between '+' and "" 12. Explain about Semantic Error. During the semantic analysis phase, this type of error appears. These types of error are detected at compile time. Most of the compile time errors are scope and \cclaration error. 3 For example : Undeclared or multiple declared j identifiers. Type mismatched is another compile time i error. : The semantic error can arises using the wrong variable or using wrong operator or doing operation in wrong order. " Some semantic error can be : (1) - Incompatible types of operands (2) Undeclared variable Not matching of actual argument with for Example 1 : Use of a non-initialized vari Features of Java ~ Javatpoint: int i; void f (int m) { met, } In this code, t is undeclared that's semantic error. Example 2: Type incompatibility : inta="hello", —// the types String and Example $ : Errore in expressions: Strings ="...", inta =5-s; 11 the - operator does not ‘ype String } ; eo ey FORMA 13. Explain about Error detection and Recovery in Compiler. In this phase of compilation, all possible errors madg by the user are detected and reported to the user in form of error messages. This process of locating errors and reporting it to user is called Error Handling proce Functions of Error handler () Detection (2) Reporting (3) Recovery Classification of Errors : SS Compile time Syntactic phase errors (Figure) Compile time errors are of three types : Me (1) Lexical phase Errors : These errors are detected . during the lexical analysis _ phase i i errors are ‘(a) Exceeding length of i constants. (b) Appearance of illegal char; (c) Unmatched string (2) Panic Mode Recovery : In characters from the input are until a designated set of sy; found. Synchronizing tokens Of) ra (a) Advantage is that it is e guarantees not to go to in: (b) Disadvantage is that a input is skipped with additional errors eh COMPILER DESIGN [0.13] (8) Syntactic Phase Errors : These errors are detected during syntax analysis phase. Typical syntax errors are (a) Errors in structure (b) Missing operator (c) Misspelled keywords (d) Unbalanced parenthesis The keyword switch is incorrectly written as swicth. Hence, “Unidentified keyword/identifier” error occurs. 14, Explain about Error recovery. | There are following error recovery methods : . (1) Panic Mode Recovery : : (a) In this method, successive characters from input are removed one at a time until a designated set of synchronizing tokens is found. Synchronizing tokens are deli-meters such as ; or } (b) Advantage is that its easy to implement and . guarantees not to go to infinite loop - (c) Disadvantage is that a considerable amount of input is skipped without checking it additional errors (2) Statement Mode Recovery : (a) In this method, when a parser encoun’ error, it performs necessary correcti remaining input so that the rest of statement allow the parser to parse ahead. (b) The correction can be deletion of semicolons, replacing comma by semi inserting missing semicolon. (c) While performing correction, atmoat care be taken for not going in infinite loop. (d) Disadvantage is that it finde difficult to situations where actual error occured point of detection. (3) Error Production; : (a) If user has knowledge of common errors be encountered then, these errors incorporated by augmenting the gram) error productions that generate constructs. [0.14] EGE ror Mca (b) If this is used then, during parsing appropriatg error messages can be generated and parsing can be continued. (c) Disadvantage is that its difficult to maintain. — (4) Global Correction : (a) The parser examines the whole program a tries to find out the closest match for it which j error free. (b) The closest match program has less number insertions, deletions and changes of tokens recover from erroneous input. (c) Due to high time and space complexity, thi method is not implemented practically. 15. Explain about representing the scope information in the symbol table. . There are several methods of organizing the symbol table. These methods are discussed below. } The Linear List : A linear list of records is the easiest way to implement a symbol table. The new names are added to the table in the order that they arrive. Whenever] a new name is to be added to the table, the table is first searched linearly or sequentially to check whether or not the name is already present in the table. If the name is not present, then the record for new name is created and added to the list at a position specified by the available pointer, as shown in the Figure. (Figure : A New Record is Added to the L To retrieve the information abo! is searched sequentially, starting fro the table. The average number of com, for search are p=(n+1)/2 for (0.15) nd p=n for an unsuccessful search, wh a number of records in symbol table. The adva organization is that it takes less space, and ad table are simple. This method's disady, a higher accessing time. Search Trees : A search tree is a to symbol table organization. We right, in each record, and these lin! the search tree. Whenever a name name is searched in the tree. If i record for the new name is created and added at the proper Position in the search tree. This organization has the property of alphabetical accessibility; that is, all the names accessible from name i will, by following a left link, precede name 1 in alphabetical order. Similarly, all the name accessible from name i will follow. name i in alphabetical order by following the right link. The- expected time needed to enter n names and to make m queries is proportional to (m+n)log2n; so for greater numbers of records (higher n ) this method has advantages over linear list organization. ere nis the ntage of this ditions to the antage is that it has more efficient approach add two links, left and ks point to the record in is to be added, first the t does not exist, then a : The Search Tree Organization Ap; ite Table.) Hash Tables : A hash table is a t numbered from zero to k’ 1 that point and a record within the symbol table. symbol table, we find out the hash va applying a suitable hash function. The the name into an integer between zero this value as an index in the hash tabl * name and insert it at the hea retrieving the information associated ie ee py I orc, hash value of the name is first obtained, and then the lig that was ‘built on this hash value is searched a information about the name. ri Hash Table (Figure : Hash Table Method of Symbol Table Organtesdon Qo CODE GENERATION nanalastieaienane o 1, What is Code Generator? Code generator is used to produce the target code for three-address statements. It uses registers to store the : operands of the three address statement. ; Example : Consider the three address statement x:=y+z. It can have the following sequence of codes : MOV x,Ro ADD y,Ro Register and Address Descriptors : A register descriptor contains the track of what is currently in each register. The register descriptors show that all the registers are initially empty. An address descriptor is used to location where current value of the name can store the be found at run time. orithm. 2. Explain about code-generation alg The algorithm takes a sequence of statements as input. For each three addret the form a:= 6 op ¢ perform the various acti as follows : 3 Invoke a function getreg to find o1 L where the result of computation b © stored. Consult the address description for y to de value of y currently in memory ‘and reg prefer the register y'. If the value of y i L then generate the instruction MOV hs of yin L. ‘ Generate the instruction OP z', L show the current location of z. if z is inb register to a memory location. Upd: descriptor of x to indicate that x is in loc ee - L then update its descriptor and remove x from all othe, descriptor. If the current value of y or z have no next uses or live on exit from the block or in register then alter the register descriptor to indicate that after execution of x: =y op z those register will no longer contain y or z, 3. Explain about Generation of Code for Assignment Statements. . The assignment statement d: (a - 6) + @ ~c)+ (a ~¢) can be translated into the following sequence of three address code: Prime Ministers of India | List of Prime Minister of India (1947-2020) t:=a-b ui=a-c vistt+u d:=v+u Code sequence: for the example is as follows : Register Address Descriptor Descriptor ister Empty Sees In the code generation phase, various issue arises : (1) Input to the code generator (2) Target program (3) Memory management (4) Instruction selection (5) Register allocation (6) Evaluation order —— we cowenek DESIGN (E.3} 5 Explain about Input to the code generator, The input to the code generator contains the intermediate representation of the source program and the information of the symbol table. The source program is produced by the front end. Intermediate representation has the several choices : (1) Postfix notation (2) Syntax tree (3) Three address code We assume front end produces low-level intermediate representation i.e. values of names in it can directly manipulated by the machine instructions. . The code generation phase needs complete error-free intermediate code as an input requires. 6. Explain about Target program. The target program is the output of the code generator. The output can be: (i) Assembly Language : It allows subprogram to be separately compiled. (2) Relocatable Machine Language : It makes the process of code generation easier.” (3) Absolute Machine Language : It can be placed in a fixed location in memory and can be executed immediately. 1. Explain about Memory management. During code generation process. the entries have to be mapped to actual p levels have to be mapped address. Mapping name in the source prog data is co-operating done by the front generator. : Local variables are stack 8 activation record while global variab area. ee {E.4] GN ror McA a orc, 8. What is Instruction selection? Explain it, Nature of instruction set of the target machine should be complete and uniform. When you consider the efficiency of target machine then the instruction speed and machine’ idioms are important factors. The quality of the generated code can be determined by its speed and size. Example : The Three address code is : az=b+e ‘dizate Inefficient assembly code is : MOV b, RO RO>b ADD c¢, RO RO c+R0 MOV R0,a a> RO MOV a,RO R0->a ADD e,RO | RO-e+R0 MOV R0,d d-+R0 What is Register allocation? Explain it. Register can be accessed faster than memory. The instructions involving operands in register are shorter and faster than those involving in memory operand. The following sub problems arise when we use registers: ° Register Allocation : In register set of variables that will reside in Register Assignment : In Regi some operands and result. For example; « Consider the following division i Duy Where, x is the dividend even register in gl as ER DESIGN [E.5] come! $$$ yis the divisor Even register is used to hold the reminder, Old register is used to hold the quotient, 10, What is Evaluation order? Explain it, The efficiency of the target code can be affected by the order in which the computations are performed. Some computation orders need fewer registers to hold results of intermediate than others. Target Machine : The target. computer is a type of byte-addressable machine. It has 4 bytes to a word. The ae machine has n general purpose registers, Ro, Rj,..-.R,_1- It also has two-address instructions of the form : op source, destination Where, op is used as an op-code and source and destination are used as a data field. | 11. Explain about Opcode. It has the following op-codes : (1) ADD (add source to destination) (2) SUB (subtract source from destination) (3) MOV (move source to destination) The source and destination of an specified by the combination o location with address modes. Reel [oats Pe | register _| contents(R) | register as [ise |_| coat Ca a ' Here, cost 1 means that memory. Each instruction has a ¢ the source and destination. ae MG ror mc, Instruction cost = 1 + cost is used for source ang destination mode. Example : (1) Move register to memory Ry >M MOV Ro, M cost = 1+1+1 (since address of memory location Mis‘in word following the instruction) (2) Indirect indexed mode : MOV * 4(Ry)}M cost = 1+1+1, (since one word for memory location M, one word for result of *4(Ry) and one for instruction) (3) Literal Mode: C++ vs Java MOV #1, RO cost = 1+1+1 = 3 (one word for constant 1 and one for instruction) 12. Explain about run-time storage management. ~ The information which required during an execution of a procedure is kept in a block of storage called an activation record. The activation record includes storage for names local to the procedure. , r -We can describe address in the target code using the following ways : (1) Static allocation (2) Stack allocation In static allocation, the record is fixed in memory at co} In the stack allocation, procedure a new activation re When the activation ends then For the run-time alloc activation records the followi are associated : (1) Call (2) Return (3) Halt COMPILER DESIGN [E.7] Action, a placeholder for other statements We assume that the run-time memory is divided into areas for : (1) Code (2) Static data : (8) Stack ‘ 13. Explain about Static allocation. (1) Implementation of Call Statement : The following code is needed to implement static allocation : MOV #here + 20, callee.static_area Fit saves return address*/

GOTO callee.code_area _/* It transfers control to the target code for the called procedure*/ Where, callee.stalic_area shows the address of the activation record. callee.code_area shows the address of the first instruction for called procedure. #here + 20 literal are used to return address of the instruction following GOTO. (2) Implementation of Return Statement : The following code is needed to implement return from procedure callee: GOTO * callee.static_area It is used to transfer the control that is saved at the beginning of record. (3) Implementation of Action State’ ACTION instruction is used ta impl statement. (4) Implementation of Halt Statement statement is the final instruction tl return the control to the operating sys' 14. Explain about Stack allocation. Using the relative address, sta can become stack allocation for storage records, : In stack allocation, register is the position of activation record so word (e.8) COW -onyo, rds can be accessed a8 offsets from the value in th, records ogister. : -™ The followin allocation + g code is needed to implement stacy ation of Stack : (1) Initializ it, SP. initializes stack*/ wen : /*terminate execution*/ ementation of Call Statement : ‘ordsize, SP/* increment stack pointer */ /*Save return address */ (2) Impl ADD #caller.rec MOV #here + 16, *SP GOTO callee.code_area Where, : ivati caller.recordsize is the size of the activation record #here + 16 is the address of the instruction following the GOTO t (3) Implementation of Return Statement : GOTO *0( SP) /*return to the caller oh “ SUB #caller.recordsize, SP decrement SP and restore to previous value */ f Basic Block : Basic block contains -a sequence of | statement. The flow of control enters at the beginning of the statement and leave at the end without any halt (except may be the last instruction of the block). The following sequence of three address statements forms a basic block : Hiax*x i tQiexty 13:=2* 12 4:= 11413 Wis y*y i= 14415 ay spp F Oat ae Te Input : It contai statemete’ ains the sequenc Hello Java Out addi Program for Beginne ~~ b a containg a list of basic b ue tement 1n exactly one bloc! d : First identify the leader i ders are ag follows ; ent is a leader, (E.9] Statement L is a leader if there is an conditional or unconditional goto statement like: if....goto L or goto L Instruction L is a leader if it immediately follows a goto or conditional goto statement like: if goto B or goto B - For each leader, its basic block consists of the leader and all statement up to. It doesn't include the next leader or end of the program. ; : Consider the following source code for dot product of two vectors a and 6 of length 10 : begin . prod :=0; ret do begin =prod+ afi] * bfi), i:=i+1; end while i <= 10 end i The three address code for the above source program is given below : Bi: “ro (1) prod :=0 (2) i:=1 B2: (1) (2) (3) (4) (5) (6) 7) (8) (9) i (10) ifi<=10 goto (3) Basic block B1 contains the state: Basic block B2 contains the state Flow Graph : Flow graph is a contains the flow of control informati , block. ; A control flow graph is used to program control is being parsed a useful in the loop optimization. Flow graph for the vector d follows : Sa trs4%j {2: = alt] 13:= 4%) t4: = bjt3] (5: =t2* t4 t6: = prod + t5 prod: = t6 t7:=i+4 ist? ifi<= 10 goto B2 (Figure) : Block B1 is the initial node. Block B2 immediately follows B1, so from B2 to B1 there is an edge. The target of jump from last statement of B1 is the first statement B2, so from B1 to B2 there is an edge. B2 is a successor of B1 and B1 is the predecessor of B2. ? 16. Explain about Optimization of Basic Blocks and types of Basic block. Optimization process can be applied on a basic block. While optimization, we don't néed to change the set of expressions computed by the block, There are two type of basic block optimization. are as follows : ; qi) Structure-Preserving Transformations (2) Algebraic Transformations - GQ)» Structure Preserving Transformati Primary Structure-Preserving Transform basic blocks is as follows : (a) Common sub-expression elimination (b) Dead code elimination (c) Renaming of temporary variables . (d) Interchange of two independent statements a baeott {E.11] . (a) Common Sub-expression Elimination : In the common sub-expression, you don't need to be computed it over and over again. Instead of this you can compute it once and kept in store from where it's referenced when encountered again. Lagrange multipliers, using tangency to solve constrained optimization a:=bte b:=a-d c:=bte d:=a-d~ “ In the above expression, the second and forth expression computed the same expression. So the block can be transformed as follows : a:=b+e -d +e d:=b (b) Dead-code Elimination : It is possible that a Program contains a large amount of dead code. This can be caused when oneé declared and defined once and forget to remove them in this case they serve no purpose. Suppose the statement x:= y +z appears in a block and x is dead symbol that means it will never subsequently used. Then without changing the value of the basic block you can safely remove this statement. (c) Renaming Temporary Variables statement t:= b + ¢ can be changed to u:= b where t is a temporary variable and u is a temporary variable. All the instance of t can replaced with the u without changing the block value. (d) Interchange of Statement : Suppose a has the following two adjacent statements : ~ tl:=b+e i=x+y These two statementa can be interc without affecting the value of block when value does not affect the value of t2. i [E.12] EG cey,, 2), Algebraic Transformations : In the g * Igebrai : ri transformation, we can change the set of expressige ; n into algebraically equivalent set. Thug th expression 1= x + 0 or x= x *1 can be eliminate from a basic block without changing the set of expression. Constant ‘folding is a class of Telateq optimization. Here at compile time, we evaluate constant expressions and replace the constant expression by their values. Thus the expression 5*2.7 would be replaced by 13.5: Sometimes the unexpected common gy} expression is generated by the relational operators like <=, >=, <, >, +, =ete. ‘ Sometimes .associative expression is applied to expose common sub expression without changing the basic block value. if the source code has the assignments a=b+c e=ec +d +b The following intermediate code may be generated : a=b+c t=ct+d e=ttb 17. Explain about Machine-Independent Optimization. BUCATAO Tan Oa eee Machine independent optimization attempts to improve the intermediate code to get a better target code. . The part of the code which is transformed here does not involve any absolute memory location or any CPU. registers, The process of intermediate code generation introduces much inefficiency like: using variable instead 0 constants, extra copies of variable, repeated evaluation expression. Through the code optimization, you can remove such efficiencies and improves code. : It can change the structure of program sometimes of beyond recognition like: unrolls loops, _inliné functions, eliminates some variables that are programme? defined. ¢ — 6 OOO COMPILER DESIGN [€.13] Code Optimization can perform in the following different ways : (1) Compile Time Evaluation : ' (a) z= 5*(45.0/5.0)*r Perform 5*(45.0/5.0)*r at compile time. (b) x=5.7 : y=13.6 Evaluate x/3.6 as 5.7/3.6 at compile time. (2) Variable Propagation : Before Optimization the code is : OOPs Concepts in Java c=a*b x=a till d=x*b+4 After Optimization the code is : c=a"b ee x=a till d=a*b+4 Here, after variable ae ti a*b and x*b identified as common sub expression. (3) Dead Code Elimination : Before elimination the code is : c=a*b x=b till d=a*b+4 After elimination the cor is: c=a*b till d=a*b+4 Here, x= b is a dead state because i subsequently used in the. program. S eliminate this state. (4) Code Motion : It reduces the evaluation ’ expression. Tt brings loop invariant atatements out of do item = 10; valuevalue = value + item; } while(value<100); ee ‘This code can be further optimized as item = 10; do { i while(value<100); (5) Induction Variable and Strength Redu Strength reduction is used to Ction ; replace the }: strength operator by the low Strengt! high An induction variable is used in loop for the! following kind of assignment like i =j + constant, Before reduction the code is: i=1; while(i<10) { Yas 4) After Reduction the code is: i=4 i t=4 { while( t<40) Vet Loop optimization is most valuable independent optimizatio: takes bulk to time of a programmer. If we decrease the number of instru inner loop then the running time of a pro improved even if we increase the amount of that loop. : ' For loop optimization the following three are important : (1) Code motion (2) Induction-variable elimination’ (3) Strength reduction : : QQ) Code Motion : Code motion is used to amount of code in loop. This transforma’ nm because program's st 0804 [E.15] atatemont or oxpression which can be moved outside the loop body without affecting the semantics of the program, For example : In the while statement, the limit-2 equation is a - loop invariant equation. while (i<=limit-2) — /*statement does not change limit*/ After code motion the result is as follows: a limit-2; z while(isea) —/*statement does not change limit or a’! (2) Induction-Variable’ Elimination : Induction variable elimination is used to replace variable from inner loop. It can reduce the number of additions in a loop. It improves both code space and run time performance. Before (Figure) In this figure, we can replace th t4:=4*j by (4:=14-4. Tho only problem arose that t4 does not have a value N block B2 for the firet time. So we Pl (4=4*j on entry to the block B2. ca FOR Mca (3) Reduction in Strength : Strength reduction ig to replace the expensive operation by the a Bed once on the target machine. aber Addition of a constant is cheaper thas multiplication. So we can replace multiplication wi an addition within the loop. ith Multiplication is cheaper than exponentiatig So we can replace exponentiation with multiplication, within the loop. Example : while (i<10) je 3titt; a(jj=alj]-2; i=i+2; } After strength reduction the code will be : s= 3*i+1; while (i<10) { Fs, affl= alj}-2; isi+2;_ s=st+6; , In the above -code, it is cheaper to compute s =s+6 than j=3 *i d 19. Explain about DAG representation for basic blocks in briefly. 3 A DAG for basic block is a directed acyclic graph the following labels on nodes : (1) The leaves of graph are labeled by unique identi and that identifier can be variable names constants, (2) Interior nodes of the graph is labeled by an ope symbol. (3) Nodes are also given a sequence of identifiers labels to store the computed value. (4) DAGs are a type of data structure. It is implement transformations on basic blocks. (5) DAG provides a good way to determine the co™ sub-expression, owt DESIGN 1.17) It gives a picture representation of how the value computed by the statement is used in subsequent statements, Algorithm for Construction of DAG : Input ; It contains a basic block Output : It contains the following information : Each node contains a label, For leaves, the label is an identifier. Each node contains a list of attached identifiers to hold the computed values, Case (i) x= y OP z Case (ii) x= OP y Case (iii) = y . Method : Step 1: "10 Sec If y operand is undefined then create node(y). If z operand is undefined then for case(i) create node(z). Step 2: For case(i), create node(OP) whose right child is node(z) and left child is node(y). For case(ii),’ check whether there is node(OP) with one child node(y). fs For case(iii), node n will be node(y). Output : For node(x) delete x from the list of identifiers. Append x to attached identifiers list for the node n found in step 2. Finally set node(x) to n. Example : Consider the following three address statement : : St:=4*i $2:= a[S1} 83:5 4°) SA:= b[S3] S5:= 62° S4 $6'= prod + SS. Prod:= 66 STs it (= $7 it <= 20 goto (1) | KEEN ron ic, Stages in DAG Construction : (a) $1 @ (o) Statement (1) 4°10 node exist already hence attach identifier S3 to the existing node for statement ) @ ‘ Statement (6), attach Iden tifler prod for Statement (7) [E.19] (h) 20. Explain about Global data flow analysis. To efficiently optimize the code compiler ¢ the information about the program and disti information to each block of the flow graph. This known as data-flow graph analysis. _ i ; EE orto, 20) 1 A fe optimization can only be achieved ' Jertain ' Cer entire program. It can't be achieyg ly sxamining the Aperins just a portion of the ret ; ? wi For this kind of optimization user defined chaining , icular problem. one ct te the value of the variable, we try to fig out that which definition of a variable is applicable ing statement. : : Pi Based on the local information a compiler can perform some optimizations. For example, consider the following code: x=atb; x=6*3 In this code, the first assignment of x is useless. The value computer for x is never used in the program. ~ ~ At compile time the expression 6*3 will be | computed, simplifying the second assignment statement | tox=18; / . Some optimization needs more global information. For example, consider the following code : a=1; if (....) x=a+5; else x=b + 4; C= xb. In this code, at line 3 the initial and x +] expression can be simplified But it is less obvious that kow a Pas facts 4 by looking only at one * ieee A more global analysis ig er knows ¢! i i A 8 the following things Data flow analysig j 'Y818 is used to eee The data flow analysis can #ram's control flow Braph (CFG) ontrol flow ae -graph d to determine those paw punts DESIGN ; {E.21) which a particular value assigned to,a variable might propagate. 21, Explain about Code generation technique. Code generation can be considered as the final phase of compilation. Through post code generation, optimization process can be applied on the code, | but that can be seen as a part of code generation phase itself. The code generated by the compiler is an object code of some lower-level programming language, for it - example, assembly language. We have seen that the | 1 | | source code’ written in a higher-level language is transformed into a lower-level language that results in a lower-level object code, which should have the following minimum properties : (@) It should carry the exact meaning of the } source code. i (2) It should be efficient in terms of CPU usage ‘and memory management. ‘ We will now see how the intermediate code is transformed into target object code (assembly code, in this case). . : Directed Acyclic Graph :. Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic - blocks, helps to see the flow of values flowing among the basic blocks, and offers optimization too. DAG pro easy transformation on basic blocks. DAG understood here : (1) Leaf nodes represent identifiers, constants, (2) Interior nodes represent operators. (3) Interior nodes ‘also represent the expressions or the identifiers/name where are to be stored or assigned, ‘Example : 0=atb tl=+c d=0+tL [h=to+ cd [d = to + ti] | Peephole Optimization : This optimization technique works locally on the source code to transform it into an optimized code. By locally, we mean a small portion of the code block at hand. These methods can’ be applied on | intermediate codes as well as on target codes. A bunch of statements is analyzed and are checked for the following possible optimization : Redundant instruction elimination At source code level, the following can be done by the user : ree x) | Pee x) |} intadd_ten(int x) ||| intadd_ten(int x) { retum x + 10; ; At compilation level, the compiler searches fo instructions redundant in nature. Multiple loading ani ae instructions may carry the same meaning even -some 0! em are rem : ay Mayas ‘oved. For example : (2) MOV RO, R1 We can delete the first instruction and re-write tht sentence as : : MOV x, R1 Unreachable Code : Unreachable code is a part of th Program code that is never accessed because -Programming constructs, Programmers may hav 5 . —— ete Ps Bs COMPILER DESIGN [E.23] accidently written a piece of code that can never be reached. Example : void add_ten(int x) retum x + 10; printf(value of x is %d”, x); } In this code segment, the printf statement will never be executed as the program control returns back before it can execute, hence printf can be removed. Flow of Control Optimization : There are instances in a code where the program control jumps back and forth without performing any significant task. These jumps can be removed. Consider the following chunk of code : | MOV R1, R2 - GOTOLI L1; GOTOL2 L2: INCR1 In this code, label L1 can be removed as it passes the control to L2. So instead of jumping to L1 and then to L2, the control can directly reach L2, as shown below : MOV R1, R2 GOTO L2 L2: INCR1 : Algebraic Expression Simplification : There are occasions where algebraic expressions can be made sim For example, the expression a = a + 0 can be replaced itself and the expression a = a + 1 can simply be re by INC a. Strength Reduction : There are operations that cor more time and space. Their ‘strength’ can be redu replacing them with other operations that consw time and space, but produce the same result, For example, x * 2 can be replaced by x << 1, involves only one left shift. Though the output of a a2 is same, a2 is much more efficient to implement. Accessing Machine Instructions : The target can deploy more sophisticated instructions, which the capability to perform specific operatio oe

You might also like