0% found this document useful (0 votes)
66 views44 pages

System Software Unit I

1) System software includes operating systems, compilers, linkers and other programs that support the operation of a computer system. Application software focuses on solving problems. 2) The Simplified Instructional Computer (SIC) is designed to illustrate common hardware concepts while avoiding machine peculiarities. It has 5 registers, instructions in 24-bit format, and supports direct and indexed addressing modes. 3) The SIC/XE version expands memory to 1 megabyte and adds registers. It introduces 48-bit floating point data and new instruction formats.

Uploaded by

baby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views44 pages

System Software Unit I

1) System software includes operating systems, compilers, linkers and other programs that support the operation of a computer system. Application software focuses on solving problems. 2) The Simplified Instructional Computer (SIC) is designed to illustrate common hardware concepts while avoiding machine peculiarities. It has 5 registers, instructions in 24-bit format, and supports direct and indexed addressing modes. 3) The SIC/XE version expands memory to 1 megabyte and adds registers. It introduces 48-bit floating point data and new instruction formats.

Uploaded by

baby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

SYSTEM SOFTWARE

UNIT I
Introduction
The Software is set of instructions or programs written to carry out certain task on
digital computers. It is classified into system software and application software. System software
consists of a variety of programs that support the operation of a computer. Application software
focuses on an application or problem to be solved.
Examples for system software are Operating system, compiler, assembler, macro
processor, loader or linker, debugger, text editor, database management systems (some of them)
and, software engineering tools. These software’s make it possible for the user to focus on an
application or other problem to be solved, without needing to know the details of how the
machine works internally.

SYSTEM SOFTWARE AND MACHINE ARCHITECTURE

One characteristic in which most system software differs from application software is
machine dependency.

System software supports operation and use of computer. Application software


provides solution to a problem. Assembler translates mnemonic instructions into machine code.
The instruction formats, addressing modes etc., are of direct concern in assembler design.
Similarly,

Compilers must generate machine language code, taking into account such hardware
characteristics as the number and type of registers and the machine instructions available.
Operating systems are directly concerned with the management of nearly all of the resources of a
computing system.

There are aspects of system software that do not directly depend upon the type of
computing system, general design and logic of an assembler, general design and logic of a
compiler and code optimization techniques, which are independent of target machines. Likewise,
the process of linking together independently assembled subprograms does not usually depend
on the computer being used.

THE SIMPLIFIED INSTRUCTIONAL COMPUTER (SIC)

 This machine has been designed to illustrate the most commonly encountered hardware
features and concepts, while avoiding most of the peculiarity that are often found in real
machines.
 SIC comes in two versions:
1. The standard model
2. An XE version(extra equipment or extra expensive)

NASC- Department of Computer Applications[System Software] Page 1


 The two versions have been designed to be upward compatible – that is, an object
program for the standard SIC machine will also execute properly on a SIC/XE system.

SIC MACHINE ARCHITECTURE

MEMORY
 Memory consists of 8-bit bytes, any 3 consecutive bytes form a word.(24 bits)
 All addresses on SIC are byte addresses, words are addressed by the location of their
lowest numbered byte.
 There are a total of 32,768 bytes in the computer memory.

REGISTERS
 There are 5 registers, all of which have special uses.
 Each register is 24 bits in length.
Mnemonic Number Special use
Accumulator, used for
A 0
arithmetic operations
Index register, used for
X 1
addressing
Linkage register, the Jump to
L 2
subroutines
Program counter, contains
the address of the next
PC 8
instruction to be fetched for
execution
Status word, contains a
variety of information,
SW 9
including a Condition Code
(CC).

DATA FORMATS
 Integers are stored as 24-bit binary numbers, 2’s complement representation is used for
negative values.
 Characters are stored using their 8-bit ASCII codes.
 There is no floating-point hardware on the standard version of SIC.

INSTRUCTION FORMATS
 All machine instructions on the standard version of SIC have the following 24-bit format:
 The flag bit x is used to indicate indexed-addressing mode.
8 1 15
Opcode X Address

ADDRESSING MODES

 There are 2 addressing modes available by setting the x value.


Mode Indication Target address calculation
Direct X=0 TA=address

NASC- Department of Computer Applications[System Software] Page 2


Indexed X=1 TA=address + (X)

1. Direct addressing mode

LDA TEN

0000 0000 0 001 0000 0000 0000


0 0 1 0 0 0
Opcode x TEN

Effective address (EA) = 1000


Content of the address 1000 is loaded to Accumulator.
2. Indexed addressing mode

STCH BUFFER, X

0101 0100 1 001 0000 0000 0000


5 4 1 0 0 0
Opcode x BUFFER

Effective address (EA) = 1000 + [x]


= 1000 + content of the index register X.
 The accumulator content, the character is loaded to the effective address.

INSTRUCTION SET
 This includes instruction that load and store register.
LDA – load accumulator
LDX – load index register
STA – store accumulator
STX – store index register
 It also includes integer arithmetic instructions ADD, SUB, MUL, DIV.
 All arithmetic operations involve register A and a word in memory, with the result being
left in the register.
 It also includes an instruction COMP that compares the value in register A with a word in
memory.
 It also includes jump instructions like,
JLT - less than
JEQ – equal
JGT – greater than.
 The two instructions are provided for subroutine linkage. They are,
1. JSUB – jumps to subroutine
2. RSUB – returns to subroutine.

INPUT AND OUTPUT


NASC- Department of Computer Applications[System Software] Page 3
 Input and output are performed by transferring 1 byte at a time to or from the rightmost 8
bits of register.
 Each device is assigned a unique 8-bit code.
 There are 3 I/O instructions, each of which specifies the device code as an operand.
1. Test Device(TD):
 This instruction tests whether the addressed device is ready to send or receive
a byte of data.
 The condition code is set to indicate the result of this test.
 A setting of < means the device is ready to send or receive and + means the
device is not ready)
2. Read Data(RD)
3. Write Data(WD)
 A program needing to transfer data or receive, wait until the device is ready, then execute
a read data or write data.

SIC/XE MACHINE ARCHITECTURE

MEMORY
 The maximum memory available is 1 megabyte.
 This increase leads to a change in instruction formats and addressing modes.

REGISTERS

Mnemonic Number Special use


Base register, used for
B 3
addressing
S 4 General working register
T 5 General working register
Floating-point
F 6
accumulator(48 bits)

DATA FORMATS
 The data format is same as standard SIC version.
 In addition is a 48 bit floating-point data type and the format is
1 11 36
S exponent Fraction
 The fraction lies between 0 and 1.
 The exponent is an unsigned binary number lies between 0 and 2047.
 There is a 48-bit floating-point data type, F*2(e-1024)

Instruction Formats:
 The new set of instruction formats from SIC/XE machine architecture is as follows.
 Format 1 (1 byte): contains only operation code (straight from table).
8
OP

NASC- Department of Computer Applications[System Software] Page 4


 Format 2 (2 bytes): first eight bits for operation code, next four for register 1 and
following four for register 2.
8 4 4

OP R1 R2

 Format 3 (3 bytes): First 6 bits contain operation code, next 6 bits contain flags, last 12
bits contain displacement for the address of the operand. Operation code uses only 6 bits,
thus the second hex digit will be affected by the values of the first two flags (n and i). The
flags, in order, are: n, i, x, b, p, and e. The last flag e indicates the instruction format.

6 1 1 1 1 1 1 12
OP n i x b p e disp

 Format 4 (4 bytes): same as format 3 with an extra 2 hex digits (8 bits) for addresses
that require more than 12 bits to be represented.
6 1 1 1 1 1 1 20
OP n i x b p e disp

Addressing mode
 Two new relative addressing modes are available for use with instructions assembled
using format 3.

Mode Indication Target address calculation


Base relative b=1, p=0 TA=(B)+disp
Program-counter relative b=0, p=1 TA=(pc)+disp

Instruction Set
 SIC/XE provides all of the instructions that are available on the standard version.
 In addition we have, Instructions to load and store the new registers LDB, STB, etc,
 Floating- point arithmetic operations, ADDF, SUBF, MULF, DIVF, Register move
instruction : RMO Register-to-register arithmetic operations, ADDR, SUBR, MULR,
DIVR and, Supervisor call instruction : SVC generates an interrupt that can be used for
communication with the OS.

Input and Output:


 There are I/O channels that can be used to perform input and output while the CPU is
executing other instructions.
 Allows overlap of computing and I/O, resulting in more efficient system operation.
 The instructions SIO, TIO, and HIO are used to start, test and halt the operation of I/O
channels.

LOADRES AND LINKERS

 Loading which brings the object program into memory for execution.
 Relocation which modifies the object program so that it can be loaded at an address.

NASC- Department of Computer Applications[System Software] Page 5


 Linking which combines two or more separate object programs and supplies the
information needed to allow references between them.
LOADRES AND LINKERS

 A loader is a system program that performs the loading function.


 Many loaders also support relocation and linking.
 Some systems have a linker or linkage editor to perform the linking operation and a
separate loader to handle relocation and loading.
 Here, we often use the term loader in place of loader and (or) linker.
BASIC LOADER FUNCTIONS

 The most fundamental function of a loader is bringing an object program into memory
and starting its execution.
Design of an absolute loader
 This loader does not need to perform functions like linking and program relocation.
 All operations are done in a single pass.

 The header record is checked to verify that the correct program has been presented for
loading.
 When each text record is read, the object code from the test record is moved to the
indicated address in memory.
 When the end record is encountered the loader jumps to the specified address to begin
execution of the loader program.

NASC- Department of Computer Applications[System Software] Page 6


 The above figure shows a representation of the program from figure (a) after loading,
 The contents of memory locations for which there is no text record are shown as xxx.

Absolute loader algorithm


Begin
read Header record
verify program name and length
read first Text record
while record type is <> ‘E’ do
begin
{if object code is in character form, convert into internal representation}
move object code to specified location in memory
read next object program record
end
jump to address specified in End record
end
 Although the above process is extremely simple, we have to consider the following
points.
1. Our object program is stored in hexadecimal format (ie) each byte of assembled code
is given using its hexadecimal representation in character form.
Ex: STL instruction is represented by pair of characters “1” & “4”. When loader read
this instructions it occupy two bytes of memory. During loading, it is stored as a
single byte with hexadecimal value 14.
2. The above method of representing an object program is inefficient in terms of both
space and execution time.
3. Therefore, most machines store object program in a binary form.

A simple bootstrap loader


 When a computer is first turned on or restarted, a special type of absolute loader, called
bootstrap loader is executed.
 This bootstrap loads the first program to be run by the computer -- usually an operating
system.
 The bootstrap itself begins at address 0.
 It loads the OS starting address 0x80.
 No header record or control information, the object code is consecutive bytes of
memory.

NASC- Department of Computer Applications[System Software] Page 7


BOOT START 0 BOOTSTRAP LOADER FOR SIC/XE
.
.
LOOP CLEAR A CLEAR REGISTER A TO ZERO
LDX #128 INITIALIZE REGISTER X TO HEX 80
JSUB GETC READ HEX DIGIT FROM PROGRAM
BEING LOADED
RMO A,S SAVE IN REGISTER S
SHIFTL S,4 MOVE TO HIGH ORDER 4 BITS OF
BYTE
JSUB GETC GET NEXT HEX DIGIT
ADDR S,A COMBINE DIGITS TO FORM ONE BYTE
STCH O,X STORE AT ADDRESS IN REGISTER X
TIXR X,X ADD 1 TO MEMORY ADDRESS BEGIN
LOADED
J LOOP LOOP UNTIL END OF INPUT IS
REACHED
GETC TO INPUT TEST INPUT DEVICE
JEQ GETC LOOP UNTIL READY
RD INPUT READ CHARCTER
COMP #4 IF CHARACTER IS HEX 04
(END OF FILE)
JLT 80 JUMP TO START OF PROGRAM JUST
LOADED
COMP #48 COMPARE TO HEX 30
(CHARACTER ‘0’)
JLT GETC SKIP CHARACTERS LESS THAN ‘0’
SUB #48 SUBTRACT HEX 30 FROM ASCII CODE
COMP #10 IF RESULT IS LESS THAN 10,
CONVERSATION IS
JLT RETURN COMPLETE. OTHERWISE,
SUBTRACT 7 MORE
SUB #7 (FOR HEX DIGITS ‘A’ THROUGH ‘F’)
RETURN RSUB RETURN TO CALLER
INPUT BYTE X ‘F1’ CODE FOR INPUT DEVICE
END LOOP
 The above source code is divided into 3 sections.
1. Header section.
2. Loop.
3. Subroutine - GETC
 The bootstrap reads object code from device F1 and enters into memory starting at
address 80.
 After all the code from device F1 has been entered into memory, the bootstrap executes a
jump to address 80 to begin execution of the program just loaded.
1. Header section:
 The bootstrap itself begins at address 0 in the memory of the machine.

NASC- Department of Computer Applications[System Software] Page 8


 It loads the operating system starting at address 80 (Hexadecimal).
2. Loop section:
 The bootstrap reads object code from device F1 and enter into memory starting at
address 80.
 After all the object code from device F1 has been loaded, the bootstrap execute a
jump to address 80 to begin execution of the program that was loaded.
 Register X contain the address of the next memory location is loaded.
3. Subroutine – GETC:
 GETC is used to reads one character from device F1 and converts it from the
ASCII character code to value of the hexadecimal digit.
 Ex: The ASCII code for the character “0” (hexadecimal 30) is converted to the
numeric value 0.
 Likewise, ASCII codes for “1” through “9” (hexadecimal 31 through 39) are
converted to the numeric values 1 through 9 and the codes for “A” through “F”
(hexadecimal 41 through 46) are converted to the values 10 through 15.
 The subroutine GETC jumps to address 80 when an end-of-file (hexadecimal 04)
is read from device F1.
MACHINE DEPENDENT LOADER FEATURES

 In this section we consider the design and implementation of a more complex loader that
is used on a SIC/XE version.
 This loader provides for program relocation and linking and also for the simple loading
function.
RELOCATION

 The need for program relocation is an indirect reason for the change to larger and more
powerful computer.
 The way relocation is implemented in a loader is also dependent upon machine
characteristics.
 Loaders that allow for program relocation are called relocating loaders or relative loaders.
 The two methods for specifying relocation are:
1. Relocation by modification records.
2. Relocation by bit mask
Relocation by modification records

 A modification record is used to describe each part of the object that must be changed
when the program is relocated.
Modification record
col 1: M
col 2-7: Starting address of the field to be modified, relative to the beginning of the
control section (hexadecimal).
col 8-9: Length of the field to be modified, in half bytes (hexadecimal)
col 10: Modification flag (+/-)
col 11-17: External symbol whose value is to be added to or subtracted from the indicated
field.
 The following SIC/XE program is used for specifying relocation.
NASC- Department of Computer Applications[System Software] Page 9
Line loc Source Statement object code
5 0000 COPY START 0
10 0000 FIRST RETADR 17202D
. . .
. . .
15 0006 CLOOP +JSUB REREC 4B101036
. . .
. . .
35 0013 +JSUB WRREC 4B10105D
. . .
. . .
65 0026 +JSUB WRREC 4B10105D
. . .
. . .
115 SUBROUTINE TO READ RECORD INTO BUFFER
. . .
125 1036 RDREC CLEAR X B410
. . .
. . .
200 SUBROUTINE TO WRITE RECORD INTO BUFFER
. . .
. . .
210 105D WRREC CLEAR X B410

 Most of the instruction in the above program use relative or immediate addressing.
 The instruction on lines 15, 35, 65 contain actual addresses (instructions are extended
format) whose values are affected by relocation.
 The following is an object program corresponding to the above source program.

 There is one modification record for each instruction that must be changed during
execution. ( 3 modification record for instruction in line 15, 35, 65).
 Each modification record specifies the starting address and length of the filed whose
value is to be altered.
 In the above example, all modifications add the value of the symbol COPY, which
represents the starting address of the program.

NASC- Department of Computer Applications[System Software] Page 10


Drawbacks:

 Some SIC machine does not use relative addressing.


 That use mainly direct addressing.
 When the program instruction must be modified.
 This required many modification records.
 Automatically object program is relocated all the direct addressing size will be large.
Relocation by bitmask

 Relocation by bitmask technique is mainly used to a machine that primarily uses direct
addressing and has a fixed instruction format.
 The standard SIC program is used for this method.
 The below figure shows the object program with relocation by bitmask.

 Here, there is no modification records.


 Text record contain relocation bit associated with each word of object code.
 All SIC instruction occupy one word.
 So one relocation bit for each possible instruction.
 The relocation bits are gathered together into a bit mask.
 In the above figure mask is represented in character form as three hexadecimal digits.
 These characters are underlined.
 A bit value of 0 indicates that no modification is necessary.
 A bit value of 1 indicates, the programs starting address is to be added to the instruction
when the program is relocated.
 In the above example, the bit mask FFC in the first Text Record specifies that all 10
words (instruction) of object code are modified during relocation.
 The mask E00 in the second test record specifies that the first three words are to be
modified.
FFC – 1111 1111 1100 – total 10 words
E00 – 1110 00 – total 3 words
 The other text record follows the same pattern.

PROGRAM LINKING

 In this section we are going to see complex examples of external references between
programs and examine the relationship between relocation and linking
 Consider the following 3 separate program each consists of a single control section.
Loc Source statement Object code

NASC- Department of Computer Applications[System Software] Page 11


0000 PROGA START 0
EXTDEF LISTA, ENDA
EXTREF LISTB, ENDB, LISTC,ENDC
.
.
0020 REF1 LDA LISTA 03201D
0023 REF2 +LDT LISTB+4 77100004
0027 REF3 LDX #ENDA-LISTA
.
.
0040 LISTA EQU *
.
.
0054 ENDA EQU *
END REF1.

Loc Source statement Object code


0000 PROGA START 0
EXTDEF LISTA, ENDB
EXTREF LISTA, ENDA, LISTC,ENDC
.
.
0020 REF1 LDA LISTA 03100000
0023 REF2 +LDT LISTB+4 772027
0027 REF3 LDX #ENDA-LISTA 05100000
.
.
0040 LISTB EQU *
.
.
0054 ENDB EQU *
END REF2.

Loc Source statement Object code


0000 PROGA START 0
EXTDEF LISTC, ENDC
EXTREF LISTA, ENDA, LISTB,ENDB
.
.
0020 REF1 LDA LISTA 03100000
0023 REF2 +LDT LISTB+4 77100004
0027 REF3 LDX #ENDA-LISTA 05100000
.
.
0040 LISTC EQU *
.
.
0054 ENDC EQU *

NASC- Department of Computer Applications[System Software] Page 12


END REF3.
 Each program contains a list of items LISTA, LISTB and LISTC.
 End of the lists are marked by the labels ENDA, ENDB and ENDC.
 REF1, REF2,…. Are references to the external symbols.
REFERENCES REF1, REF2 AND REF3:

1. Take first reference REF1::


 In the first program (PROGA), REF1 is simply a reference to a label within the
program.
 No modification for relocation or linking is necessary.
 In PROGB, REF1 refers to an external symbol.
 Here extended format instruction is used (+sign).
 Modification record and linking list necessary.
 In PROGC, REF1 is handled in the same ways as PROGB.
2. Take REF2 & REF3:
 In PROGA, REF2 & REF3 refers to external symbol.
 Modification and linking necessary.
 In PROGB, REF2 refers to local reference.
3. No modification and linking:
 In PROGB, REF1 & REF3 refers to external symbols.
 In PROGC, REF3 is a immediate operand whose value is to be the difference
between ENDA & LISTA.

Algorithm and data structure for a linking loader

 Linking loader usually makes two passes over its input.


Pass 1: Assign addresses to all external symbols
Pass 2: Perform the actual loading, relocation, and linking
Data structure
 ESTAB (External Symbol Table)
Two Variables

PROGADDR CSADDR
(Program load (Control Section
Address) Address)
ESTAB:
1. Used to store the name & address of each external symbol in the control section being
loaded.
2. The table also indicates in which control section the symbol is designed.

CONTROL SYMBOL NAME ADDRESS LENGTH


SECTION

NASC- Department of Computer Applications[System Software] Page 13


PROGA 4000 0063
LISTA 4040
ENDA 4054
PROGB 4063 007F
LISTB 40C3
END3 40D3
PROGC 40E2 0051
LISTC 4112
ENDC 4124
PROGADDR:
 It is the beginning address in memory where the linked program is to be loaded.
 Its value is supplied to the loader by the operating system.

CSADDR:
 It contains starting address assigned to the control section currently being scanned by the
loader.
 This address is added to all relative addresses within the control section to convert them
to actual addresses.
PASS 1 ALGORITHM:

Begin
get PROGADDR from operating system
set CSADDR to PROOADDR {for first control section}
while not end of input do
begin
read next input record {Header record for control section}
set CSLTH to control section length
search ESTAB for control section name
if found then
set error flag {duplicate external symbol}
else
enter control section name into ESTAB with value CSADDR
while record type ~ 'E' do
begin
read next input record
if record type = 'D' then
for each symbol in the record do
begin
search ESTAB for symbol name
if found then
set error flag (duplicate external symbol)
else
enter symbol into ESTAB with value(CSADDR +
indicated address)

NASC- Department of Computer Applications[System Software] Page 14


end {for}
end {while ~ 'E'}
add CSLTH to CSADDR {starting address for next control section}
end {while not EOF}
end {Pass 1}
 During pass1 the loader use only Header & Define record types in the control sections.
 The beginning address (PROGADDR) becomes the starting address (CSADDR) for the
first control section in the input sequence.
 The control section name from the header record and all external symbols from the define
record are entered into ESTAB.
 When the End record is read, the control section length CSLTH is added to CSADDR.
 This calculation gives the starting address for the next control section.

PASS 2 ALGORITHM:

Begin
set CSADDR to PROOADDR
set EXECADDR to PROOADDR
while not end of input do
begin
read next input record {Header record}
set CSLTH to control section length
while record type != 'E' do
begin
read next input record
if record type = 'T' then -~
begin
{if object code is in character form, convert
into internal representation}
move object code from record to location
(CSADDR + specified address)
end {if 'T'}
else if record type = 'M' then
begin
search ESTAB for modifying symbol name
if found then
add or subtract symbol value at location
(CSADDR + specified address)
Else
set error flag (undefined external symbol )
end {if 'M' }
end {while != 'E'}
if an address is specified {in End record} then
set EXECADDR to (CSADDR + specified address)
add CSLTH to CSADDR
end {while not EOF}
NASC- Department of Computer Applications[System Software] Page 15
jump to location given by EXECADDR {to start execution of loadedprogram)
end {Pass 2}
 Pass2 of the loader performs the actual loading, relocation and linking of the program.
 As each text record is read, the object code is moved to the specified address.
 When a modification record is encountered, the symbol whose value is to be used for
modification is looked up in ESTAB.
 This value is then added to or subtracted from the indicated location in memory.
 The last step performed by the loader is that transferring of control to the loaded program
to begin execution.

MACHINE INDEPENDENT LOADER FEATURES


AUTOMATIC LIBRARY SEARCH
 An automatic library search process is used for handling external references.
 This feature allows a programmer to use standard subroutines without explicitly
including them in the program to be loaded.
 The subroutines are automatically retired from a library as they are needed during
linking.
Automatic library call
 The subroutines called by the loaded program are automatically taken from the library
and linked with the main program and loaded.
 The programmer does not need to take any action but he has to mention the subroutine
names as external references in the source program.
 This feature is referred to as automatic library call.
Handling external references
 Linking loaders that support automatic library search must take care of undefined
external symbols that are referred.
 In the following ways, loaders can handle external references.
1. Enter the symbols from refer record into the symbol table (ESTAB) unless these
symbols are already present.
2. Undefined symbols are marked and when the definition is encountered, these symbols
are filled.
3. At the end of pass1, the symbols in ESTAB that remain undefined indicates
unresolved external references.
4. The loader searches the libraries that contain the definition of these unresolved
symbols and processes the subroutines found by this search.
5. The subroutines taken from a library in this way it may themselves contain external
references. So, it is necessary to repeat the library search process until all references
are resolved.
6. After the library search the remaining unresolved external references are treated as
errors.
Search of libraries using file structure
 The libraries to be searched by the loader mainly contain assembled or compiled version
of the subroutines (ie object programs).

NASC- Department of Computer Applications[System Software] Page 16


1. A special file structure is used for the library search.
2. This structure contains a directory.
3. A directory contain the name of each subroutine and a pointer to the subroutine’s
address.
4. The library using the above file structure involves a search of the directory and read
the object program (subroutines).
LOADER OPTIONS
 Many loaders have a special command language that is used to specify options.
 The following are some of the loader options that can be selected at the time of loading &
linking.
SPECIFYING ALTERNATIVE SOURCES OF INPUTS
 This loader option allows the selection of alternative sources of input.
Ex: INCLUDE program-name (library-name)
 The above command will make the loader to read the given object program from a library
and treat it as a part of the primary loader input.
CHANGING OR DELETING EXTERNAL REFERENCES
 Using the option it is also possible to change external references within the programs
being loaded or linked.
Ex: CHANGE name1, name2.
 The above command will change the external symbol name1 to name2.
 Some options allow the user to delete external symbols or entire control sections.
Ex: DELETE CS-name.
 The above command will delete the control section cs-name from the loaded program.
Ex: INCLUDE READ (UTLIB)
INCLUDE WRITE (UTLIB)
DELETE REDREC, WRREC
CHANGE REDREC, READ
CHANGE WRREC, WRITE
1. The above commands make the loader to include control sections READ and WRITE
from the library UTLIB.
2. And delete the control sections REDREC and WRREC from the load.
3. The first CHANGE command will cause all external references to symbol RDREC
will be changed to refer to symbol READ.
4. Similarly, references to WRREC will be changed to WRITE.
CONTROLLING AUTOMATIC PROCESSING OF EXTERNAL REFERENCES
 The common loader option involves the automatic inclusion of library subroutines to
specify external references.
 Most loader allows the user to specify alternative libraries to be searched.
Ex: LIBRARY MYLIB
 The above mentioned user-defined libraries are normally searched before the standard
system libraries.
 The unresolved symbols in the library search can be specified in the following way.
Ex: NOCALL STDDEV, CORREL

NASC- Department of Computer Applications[System Software] Page 17


 The above command instruct the loader that these external references are remain
unresolved.
LOADER DESIGN OPTIONS
 Two alternative design options for linking loaders are:
1. Linkage editors
2. Dynamic linking
LINKAGE EDITORS
 It is found on many computing system instead of or in addition to the linking loader.
 It performs linking prior to load time.
 A linkage editor produces a linked version of the program which is written to a file or
library instead of being immediately loaded into memory.
 A linked program is also called as load module or an executable image.
 When the user is ready to run the linked program, a simple relocating loader can be used,
to load the program into memory.
DIFFERENCE BETWEEN LINKAGE EDITOR & A LINKING LOADER

Linkage Editor Linkage Loader


1. Perform linking prior to load time. 1. Perform all linking & relocation at load
2. Linked program is written to a file or time.
library instead of being immediately 2. Loads the linked program directly into
loaded into memory. memory for execution.
3. Loading accomplished in one pass with 3. External symbol table required.
NASC- Department of Computer Applications[System Software] Page 18
no external symbol table required. 4. A linking loader searches libraries and
4. Resolving external references & library resolves external references every time
searching are performed only once when the program is executed.
when a program is executed many
times without being reassembled.

LOADED PROGRAM
 Linked program produce by the linkage editor is processed by a relocating loader.
 All external references are resolved and relocation is indicated by some methods such as
modification records or bit mask.
 Even though all linking has been performed, information about external references is
often retained in the linked program.
 This allows relinking of the program to replace control sections, modify external
references etc.,
FUNCTIONS OF LINKAGE EDITORS
 Linkage editor can perform many useful functions using editor commands. They are:
 Assume that a program (PLANNER) that uses many subroutines.
 One of the subroutine (PROJECT) has to be change to new version.
 After the new version of PROJECT is assembled or compiled, the linkage editor is used
to replace this subroutine in the program (PLANNER).
 The following linkage editor commands are used to perform the above work.
INCLUDE PLANNER (PROGLIB)
DELETE PROJECT {Delete from existing planner}
INCLUDE PROJECT (NEWLIB) {Include new version}
REPLACE PLANNER (PROGLIB)
 Linkage editor is also used to build packages of subroutines or other control sections that
are generally used.
 It combines the related subroutines into a package using editor commands.
Ex:
INCLUDE BLOCK (FTNLIB)
INCLUDE DEBLOCK (FTNLIB)
INCLUDE ENCODE (FTNLIB)
INCLUDE DECODE (FTNLIB)
.
.
SAVE FTN10 (SUBLIB)
 In the above command sequence, all the subroutines are linked into a module named
FTNIO.
 This module is available in the directory SUBLIB.
 A search of SUBLIB, will search FTNIO instead of the separate routines.
 This method saves search time.
DYNAMIC LINKING
 Sometimes loading and linking of subroutine to the program will occur when it is first
called.
 Here, linking function is postponed until execution time.
 This type of function is called as dynamic linking or dynamic loading or load on call.
NASC- Department of Computer Applications[System Software] Page 19
 Loading & linking of a subroutine using dynamic linking
 The following figure shows a method in which subroutines that are dynamically loaded
must be called through an operating system service request.

 In the above figure, the user program makes a load-and-call service request to the
operating system.
 Then the OS checks its internal tables to determine whether the subroutine is loaded or
not.
 If the subroutine is not loaded, then it is loaded from the system libraries as shown in the
below figure.

 Control is then passed from the dynamic loader to the subroutine being called.

NASC- Department of Computer Applications[System Software] Page 20


 When the called subroutine compiles its processing, it returns to its caller.
 The OS then returns control to the user program that made the request.

 After the subroutine is completed, the memory that was allocated for subroutine is retain
for later use as long as the storage space is not needed for other processing.
 If a subroutine is still in memory, a second call request is not require for another load
operation.

NASC- Department of Computer Applications[System Software] Page 21


 Here control is simply passed from the dynamic loader to the called routine.

USES OF DYNAMIC LINKING


1. Dynamic linking is used to allow several execution programs to share one copy of a
subroutine or library.
2. In an object-oriented system, dynamic linking is often used for references to software
objects.
3. Dynamic linking is used to load the subroutines only when they are needed.
 This avoids the unnecessary loading of some subroutines.
 So, dynamic linking saves the time and also memory space.
BOOTSTRAP LOADER

 If the question, how is the loader itself loaded into the memory? is asked, then the answer
is, when computer is started – with no program in memory, a program present in ROM
( absolute address) can be made executed – may be OS itself or A Bootstrap loader,
which in turn loads OS and prepares it for execution.
 The first record ( or records) is generally referred to as a bootstrap loader – makes the
OS to be loaded.
 Such a loader is added to the beginning of all object programs that are to be loaded into
an empty and idle system.
 On some computers, as absolute program is permanently resident in a read-only memory.
 When some hardware signal occurs, the machine begins to execute this ROM program.
 On some computers, the program is executed directly in ROM, on others, the program is
copied from ROM to main memory and executed other.
 Some machines do not have such read-only storage.

NASC- Department of Computer Applications[System Software] Page 22


 If the loading process requires more instructions than can be read in a single record, this
first record causes the reading of others, and these in turn can cause the reading of still
more records-hence the term bootstrap.
 The first record is generally referred to as a bootstrap loader.
UNIT II
INTRODUCTION TO COMPILERS
TRANSLATOR
 Translator is a program that takes as input a program in one programming language &
produces as output a program in another language
Need for Translator
 Machine language is complex to be learnt so a translator becomes vital
COMPILER
 It is a translator, which takes the program in high level language wholly & converts it to
machine language.
Steps involved in executing a program written in a high level programming language
1. Source program is compiled
2. Translated into object program
3. Resulting object program is loaded into memory & executed.
INTERPRETER
 It is a translator program, which converts the high level language program to its machine
equivalent line by line.
 The execution of the interpreted program is very slow.
DIFFERENT PHASES OF A COMPILER

 The compilation process is a sequence of various phases.


 Each phase takes input from its previous stage, has its own representation of source
program, and feeds its output to the next phase of the compiler.
LEXICAL ANALYSIS
 In lexical analysis the lexical analyzer or scanner reads the source program and separate it
into tokens.
 It is the first phase and it is called scanner
 The usual tokens are:

NASC- Department of Computer Applications[System Software] Page 23


1. Keyword: such as DO or IF.
2. Identifiers: such as x or num.
3. Operator symbols: such as <, =, or, +, and
4. Punctuation symbols: such as parentheses or commas.
 The output of the lexical analysis is a stream of tokens, which is passed to thenext
phase; the syntax analyzer or parser.
 The parser asks the lexical analyzer for the next token whenever it needs.
Syntax analysis
 In syntax analysis, the syntax analyzer or parser groups tokens into syntactic structure
 For example, three tokens A, + and B can be grouped to A+B to get a syntactic structure.
 Expression might further be combined to form statements.
 If the token is an identifier the type of the identifier is entered into the symbol table by
the syntax analyzer.
 It checks if the token occur in patterns that are permitted by the specification of the
source language.
 On seeing the invalid syntax the parser detects the error situation.
 For Eg: if the program has an expression
A+/B
 On seeing the “/” the syntax analyzer will detect an error situation.
Output of syntax analyzer is a parse tree
 For Ex: The expression
A:=B+B
 Can be represented using the following parser tree
Assignment
statement

identifier := expression

A Expression + expression

Identifier Identifier

B C

Intermediate code generation


 The intermediate code generator transform the parse tree into an intermediate language
representation of source program.
 The preceding parse tree can be converted into the three address which follows:
T1=B+C
A=T1
Where T1 is a temporary variable.
Code optimization
 It is designed to improve the intermediate code, so that the final object program runs
faster and takes less space.
NASC- Department of Computer Applications[System Software] Page 24
 Output of code optimizer is another intermediate code which does the same job as the
previous intermediate code, but with much efficiency.
 Thus the code optimizer would optimize the preceding 3 address code as A=B+C
1. Optimization compiler
 Object program that is frequently executed should be fast & small
optimization compiler attempts to produce a better target program than would
be produced with no optimization
2. Local optimization
 Local transformation can be applied to a program
 Ex: The statements
If A>B goto L2
Goto L3
L2
Can be replaced by
If A<=B goto L3
3. Elimination of common sub-expression
 Common sub-expression may be eliminated from the program
 For Ex: Consider the following sequence of statements:
A=B+C+D
E=B+C+F
Which can be evaluated as
T1=B+C
A=T1+d
E=T1+F
4. Loop optimization
 Speed-ups of loops should be considered computations that do not vary, every
time the loop is entered can be removed out of the loop to increase the speed
of execution.
 Eg:
for(i=1; i<10; i++)
{
m=1;
.
.
}
Can be replace by
m=1;
for(i=1; i<10; i++)
{
.
.
}
Code generation
 Code generator generates the object code.
Function of code generator
1. Selects code

NASC- Department of Computer Applications[System Software] Page 25


2. Selects registers
Main responsibility of the code generator
 Code generation phase converts the intermediate code into a sequence of target code.
Semantic analysis
 It can be done during the syntax analysis or intermediate code generation or final code
generation phase.
 It analyses if the statements are meaningful
Function
 It is used to determine if the type of intermediate result is legal.

COMPILER DESIGN OPTIONS


1. Division into passes
2. Interpreters
3. P-code compilers
4. Compiler-compilers
Division into passes
 Translation of languages may occur by one-pass compiler or Multi-pass compiler.
 Ex: In some languages, the declaration of an identifier (variable) may appear after it has
been used in the program.
 Here, occur forward references, forward reference to data item cause much serious
problem.
Ex: x:=y+z
 In the above example, if the variable are a mixture of REAL & INTEGER types, one or
more conversion operations will be needed.
 Here, the compiler cannot decide what machine instruction to generate for this statement,
unless information about the identifier is available.
 Thus, a language that allows forward references to data itms cannot be compiled in one
pass, so need Multi-pass compiler.
Factors used in deciding between one pass & Multi-pass compiler design
1. If the speed of the compilation is important, then one-pass design is performed.
Ex: Some jobs require to spend large amount of time for compilations & the programs are
executed once or twice only.
In such environment, improvement in the speed of compilation can improve system
performance.
2. If programs are executed for many times for each compilation, or if they process large
amounts of data, than speed of compilation. In such a case, we might prefer a multi-pass
compiler design.
3. Multi-pass compilers are also used when the amount of memory, or other system
resources is severely limited.
 The requirements of each pass can be kept smaller if the work of compilation is divided
into several passes.
 If a compiler is divided into several passes, each pass becomes simpler and therefore
easier to understand, write and test.
 Different passes can be assigned to different programmers and can be written and tested
in parallel, which shortens the overall time required for compiler construction.
Interpreters

NASC- Department of Computer Applications[System Software] Page 26


 It is a translator program which converts the high level language program to its machine
equivalent line by line.
 An interpreter process a source program written in a high-level language, just as a
compiler does.
 The main difference is that interpreters execute a version of the source program directly,
instead of translating it into machine code.
Working of interpreters
1. An interpreter usually performs lexical and syntactic analysis functions like those of a
compiler, and then translates the source program into an internal form.
2. After translating the source program into an internal form, the interpreter executes the
operations specified by the program.
3. During this pass, an interpreter can be viewed as a set of subroutines.
4. The execution of these subroutines is driven by the internal form of the program.

Factors to decide between interpreter and compiler


1. The process of translating a source program into some internal form is simpler & faster
than compiling it into machine code.
2. However, execution of the translated program by an interpreter is much slower than
execution of the machine code produced by a compiler.
3. Thus, an interpreter would not be used if speed of execution is important.
4. If speed of translation is of primary concern and execution of the translated program will
be short, then an interpreter may be preferred.
Advantages of interpreter
1. It provide goof debugging facilities
2. Some languages are particularly well suited to use of an interpreter
Ex:
 Languages, such as SNOBOL, &APL, a large part of the compiled program
would consist of calls to library routines
 In such cases, an interpreter might be preferred because of its speed of translation.
3. It would be very difficult to compile some languages that use dynamic scoping instead of
usual static scoping. However, dynamic scoping can be easily handled by an interpreter.
p-code compilers
 P-code compiler (also called byte code compilers) are very similar in concept to
interpreters
 But with a p-code compiler program is analyzed & converted into an intermediate form,
which is then executed interpretively.
 But with a p-code compiler this intermediate form is the machine language for a
hypothetical computer, often called a pseudo-machine or p-machine
Translation & execution using p-code compiler
1. The source program is compiled, with the resulting object program being in p-code.
2. This p-code program is then read & executed under the control of a p-code interpreter.

NASC- Department of Computer Applications[System Software] Page 27


Advantages
1. The main advantage of this approach is portability of software. It is not necessary for the
compiler to generate different code for different computers, because the p-code object
programs can be executed on any machine that has a p-code interpreter.
2. A p-code compiler can be used without modification on a wide variety of systems if a p-
code interpreter is written for each different machine.
3. The p-code object program is often much smaller than a corresponding machine code
program would be. This is particularly useful on machines with limited memory size.
Problem
 The execution of a p-code program may be much slower than the execution of the
equivalent machine code.
 Depending upon the environment however, this may not be a problem.

Solution
1. Many p-code compilers are designed for a single user running on a microcomputer
system. In that case, speed of execution may be relatively insignificant.
2. If execution speed is important some p-code compilers support the use of machine
language subroutines.
 By rewriting a small number of commonly used routines in machine language, it is often
possible to achieve some improvements in performance.

Compiler-compilers
 A compiler-compilers is a software tool that can be used to help in the task of compiler
construction.
 Such tools are often called compiler generators or translator writing system.

Automated compiler construction using a compiler-compiler


1. The user (ie, the compiler writer ) provides description of the language to be translated.
2. This description may consists of a set of lexical rules for defining tokens & a grammar
for the source language.

NASC- Department of Computer Applications[System Software] Page 28


3. Some compiler-compilers use this information to generate a scanner & a parser directly.
4. In addition to the description of the source language, the user provides a set of semantic
or code-generation routines.
5. The routine is called by the parser each time it recognizes the language construct
described by the associated rule.
6. But some compiler-compilers can parse larger section of the program before calling
semantic routine.
7. In that case, an internal form of the statements that have been analyzed such as a portion
of the parse tree may be passed to the semantic routine.
8. This latter approach is often used when code optimization is to be performed.
 Compiler-compilers frequently provide special languages, notations, data structure and
other similar facilities that can be used in the writing of semantic routines.
Advantage
1. The main advantage of using a compiler-compiler is very easy of compiler construction
& testing.
2. The object code generated by the compiler may actually be better when a compiler-
compiler is used.
 Because of the automatic construction of scanners and parsers and the special tools
provided for writing semantic routines, the compiler writer is freed from many of the
mechanical details of compiler construction.
 The writer can therefore focus more attention on good code generation & optimization.
MACHINE INDEPENDENT COMPILER FEATURES
The four independent compiler features are,
1. Structured variables
2. Storage allocation
3. Block-structured languages
4. Machine independent code optimization
Structured variables
 The compilation of program use structures variables such as arrays, records, strings &
sets.
 We are primarily concerned with the allocation of storage for such variables & with the
generation of code to reference them.
Storage allocation for variables
Single dimensional array declaration
Ex: A: ARRAY[1……10] of INTEGER // Pascal array declaration
 If each INTEGER variable occupies one word of memory then we must clearly allocate
ten words to store the above array.
 If an array is declared as,
ARRAY [l….u] of INTEGER
 Then we must allocate u-l+1 words of storage for the array
Multi dimensional array declaration
 Allocation for a multi-dimensional array is not much more difficult
Ex: B: ARRAY [0..3,1..6] OF INTEGER //A rows , B columns
 Here the first subscript can take four different values (0-3) and the second subscript can
take six different values.
 We need to allocate a total of 4*6 = 24 words to store the array.
 If the declaration is,
ARRAY[l1…..u1, l2…..u2] of INTEGER
NASC- Department of Computer Applications[System Software] Page 29
 Then the number of words to be allocated is given by,
(u1-l1+1)*9u2-l2+1)
 For an array with n dimensions, the number of words required is product of n such terms.
Methods for storing arrays
 Two methods for storing arrays are,
1. Row-major order
All array elements that have the same value of the 1st subscript are stored in
contiguous locations, this is called row-major order.
Storage of B: ARRAY[0…3, 1….6] IN ROW MAJOR ORDER
0,1 0,2 0,3 0,4 0,5 0,6 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,3 3,4 3,5 3,6

2. Column major order


All elements that have the same value of the second subscript are stored together, this
is called column-major order
0,1 1,1 2,1 0,2 1,2 2,2, 0,3 1,3 2,3 0,4 145 2,4 0,5 1,5 2,5 0,6 1,6 2,6

In row major order, the rightmost subscript varies most rapidly, in column major
order, the left most subscript varies more rapidly
Referring array element
 To refer to an array element, we must calculate the address of the referenced element
relative to the base address of the array.
Ex: One-dimensional array
A: ARRAY [1…10] OF INTEGER
1. Suppose a statement refers to array element A[6]
2. There are five array elements preceding A[6]
3. On a SIC machine, each such element would occupy 3 bytes
4. Thus the address of A[6] relative to the starting address of the array is given by 5*3=15
Code generation for array references
1. If an array reference involves only constant subscripts Ex: A[6], the relative address
calculation can be performed during compilation
2. If the subscripts involve variables Ex: A[6], however the compiler must generate object
code to perform this calculation during execution
Ex: A: ARRAY [l…u] OF INTEGER //array declaration
1. Suppose each array element occupy w bytes of storage
2. If the value of the subscript is S, then the relative address of the referenced array element
A[S] is given by,
W*(s-l)
3. The generation of code to perform such a calculation is illustrated in following figure
Code generation for Array references
A: ARRAY [1….10] OF INTEGERS
.
.
A[J]:=5

1) –I =1, i1
2) *i1=3, i2
3) := =5, A[i2]

NASC- Department of Computer Applications[System Software] Page 30


4. The notation A[i2] in quadruple 3 specifies that the generated machine code should refer
to A using indexed addressing, after having placed the value of i2 in the index register
Storage allocation
 There are two types of storage allocation
1. Static allocation
2. Dynamic allocation
Static allocation
 Static allocation of memory is carried out during compile time.
 It is often used for languages that do not allow the recursive use of procedures (or)
subroutines and do provide for the dynamic allocation of storage during execution
Problem
 If procedures may be called recursively, static allocation cannot be used.
Ex:
1. In the following figure, the program MAIN has been called by the OS (or) the leader
(invocation 1)

2. The first action taken by MAIN is to store the return address from register at a fixed
location RETADR within MAIN

1. In the above figure, MAIN has called the procedure SUB(invocation 2)


2. The return address for this call has been stored at a fixed location within SUB.

NASC- Department of Computer Applications[System Software] Page 31


Recursive invocation of a procedure using static storage allocation
1. In the above figure, SUB calls itself recursively
2. Here a problem occurs
3. SUB stores the address for invocation 3 into RETADR from register 1.
4. This destroys the return address for invocation.
5. As a result there is no possibility for a correct return to MAIN
Solution for problem
 For recursive call, use dynamic storage allocation technique (ie., automatic storage
allocation)
Dynamic storage allocation
 When a recursive call is made, it is necessary to preserve the previous values of any
variables used by SUB, including parameters, temporaries, return addresses, register save
areas etc.,
 This is accomplished with a dynamic storage allocation technique.
1. Automatic storage allocation
 This is one type of dynamic storage allocation that automatically allocate storage
for variable. It is not controlled by programmer. It is used when the procedure is
called recursively.
1. In this method, each procedure call creates an Activation record.
2. Activation record contains storage for all the variables used by the procedure
3. If the procedure is called recursively, another activation record is created.
4. Each activation record is not deleted until a return has been made from the
corresponding invocation.
5. The starting address for the current activation record is usually contained in a
base register (Ex: B) which is used by the procedure to address its variable.
6. Activation record are typically allocated on a stack, with the current record at
the top of the stack.
Ex: Invocation of a procedure using automatic storage allocation

NASC- Department of Computer Applications[System Software] Page 32


1. In the above figure, procedure MAIN has been called, & its activation record appears on
the stack.
2. The base register B has been set to indicate the starting address of this current activation
record.
3. The first word in an activation record contain a pointer PREV, that point to the previous
record on the stack.
4. Here this record is the first, so the pointer value is null.
5. The second word of the activation record contains a ptr NEXT, which will be the starting
address for the next activation record created.
6. The third word contains the return address for this invocation of the procedure, and the
remaining words contain the values of variable used by the programmer.
Invocation of a procedure using automatic storage allocation

1. In the above figure, MAIN has called the procedure SUB.


2. A new activation records has been created on the top of the stack, with register B set to
indicate the new current record.

Recursive invocation of a procedure using automatic storage allocation

NASC- Department of Computer Applications[System Software] Page 33


1. In the above figure, SUB has called itself recursively.
2. Another activation record has been created for this current invocation of SUB.
3. Note that the return addresses and variable value for the two invocations of SUB are kept
separately by this process.
What happens when procedure returns to its caller?
1. When a procedure returns to its caller, the current activation record ( which corresponds
to the most recent invocation) is deleted.
2. The pointer PREV in the deleted record is used to reestablish the previous activation
record as the current one and execution continues.
Ex: SUB returns from a recursive call

1. In the above figure shows that stack as it was appear after SUB returns from the recursive
call.
2. Register B has been reset to point to the activation record for the previous invocation of
SUB.
Rules for automatic storage allocation
1. When automatic allocation is used, the compiler must generate code for references to
variables using some sort of relative addressing.

NASC- Department of Computer Applications[System Software] Page 34


2. The compiler must also generate additional code to manage the activation records
themselves.
 At the beginning of each procedure there must be code to create a new activation record
linking it to the previous one and setting the appropriate pointer. This code is often called
a prologue for the procedure.
 At the end of the procedure, there must be code to delete the current activation record,
resetting pointers as needed. This code is often called an epilogue.
Other types of dynamic storage allocation
1. In FORTRAN 90, the statement
ALLOCATE (MATRIX (ROWS, COLUMNS))
Allocates storage for a dynamic array, MATRIX with the specified dimensions. The
statement,
DEALLOCATE (MATRIX)
Releases the storage assigned to matrix by previous ALLOCATE
2. In PASCAL, the statement
NEW (P)
Allocates storage for a variable and sets the pointer P to indicate the variables just
created. The statement
DISPOSE (P)
Releases the storage that was previously assigned to the variable pointed to by P.
3. In C, the statement,
MALLOC (SIZE)
Allocates a storage block of size specified, and returns a pointer to it. The function
FREE(P)
Frees the storage indicated by the pointer P, which was returned by a previous MALLOC.
Block structured variables
 In some languages, a program can be divided into units called blocks.
 A block is a pointer of a program that has the ability to declare its own identifier

1. In the above figure, shows the outline of a block-structured program in a PASCAL like
language.
2. Each procedure form a block.
3. In block structured program, blocks may be nested within other blocks. In the above
example, procedures B & D are nested within procedure A, & procedure C is nested
within procedure B.
NASC- Department of Computer Applications[System Software] Page 35
4. Each block may contain a declaration of variables.
5. A inner block may also refer to variables that are defined in any outer block, but the same
names are not redefined in the inner block.
Compiling & execution of block-structured programs
1. In compiling a program within in a block-structured language, it is convenient to number
the blocks as shown in above figure.
2. The compiler construct a table that describes the block structure as shown below

3. The table contains the details of block name, block number, block level and surrounding
block.
4. The block-level entry gives the nesting depth for each block.
5. The outermost block has a level number of 1, and each other block has a level number
that is one greater than that of the surrounding block.
Searching of identifiers in symbol table
 Same name can be declared more than once in a program in different blocks.
 So there can be several symbol-table entries for the same name.
 The entries that represent declarations of the same name by different blocks can be
linked together in the symbol table with a chain of pointers.
 When a reference to an identifier appears in the source program the compiler must first
check the symbol table for a definition of that identifier by the current block.
 Id=f no such definition is found, the compiler looks for a definition by the block that
surrounds the current block, then by the block that surrounds that, and so on.
 If the outermost block is reached without finding a definition of the identifier, then the
reference is an error.
 The search process just described can easily be implemented within a symbol table that
uses hashed addressing.
Access to variables in surrounding block
 One common method for providing access to variables in surrounding block uses a data
structure called a display.
 The display contains pointers to the most recent activation records for the current block
and for all blocks that surround the current one in the source program.
 When a block refers to a variable that is declared in some surrounding block, the
generated object code uses the display to find the activation record that contains this
variable.
 Ex: The use of display is illustrated in the following figure. Here data structure display is
used for pascal procedure that is discussed previously.
1. Assume that procedure A has been invoked by the system, A has then called
procedure B, and B has called procedure C. The resulting situation is shown in
following figure.

NASC- Department of Computer Applications[System Software] Page 36


 The stack contains activation records for the invocations of A, B, C.
 The display contains pointers to the activation records for C & for the
surrounding blocks (A & B)
2. Let us assume procedure C calls itself recursively.

 Another activation record for C is created on the stock as a result of this call.
 The display pointer for C is changed accordingly.
 Variables that correspond to the previous invocation of C are not accessible
for the record.
3. Suppose now that procedure C calls D. The resulting stack & display are shown
below

 An activation record for D has been created the usual way & added to the
stack.
 Note, however, that the display now contains only two pointers : one each to
the activation records for D & A.
 This is because procedure D cannot refer to variables in B (or) C.
 Procedure D can refer only to the variables that are declared by D (or) by
some block that contains D in the source program (in this case, procedure A)

NASC- Department of Computer Applications[System Software] Page 37


4. In the above figure, procedure D now calls B.
 Procedure B is allowed to refer only to variables declared by either B (or) A.
 The compiler for a block-structured language must include code at the beginning of a
block to initialize the display for that block. At the end of the block, it must include code
to restore the previous display contents.

Machine independent code optimization


 Code optimization involves complex analysis & various transformation of the
intermediate code without varying the logic of the program.
 Machine independent optimization is performed independent of the target machine. The
code optimization techniques are:
1. Loop optimization
2. Elimination of induction variables
3. Reduction in strength
4. Elimination of local common sub expression
5. Loop unrolling
6. Loop jumping
Loop optimization
 It is an important machine independent optimization.
 It involves elimination of loop invariant computation
Elimination of loop invariant computation
 A loop invariant computation computes the same value every time a loop gets executed.
 Therefore moving such a computation outside the loop leads the reduction in the
execution time.
Eg: for(i=1;i<=0;i++)
{
m=1;
.
.
}
And can be replaced by
m=1;
for(i=1;i<=0;i++)
{
.

NASC- Department of Computer Applications[System Software] Page 38


.
}
Elimination of induction variables
Eg: Intermediate Code Representation
1 PROD:=O
2 I:=1
3 T1:=4*i
.
.
10 I:=i+1
11 if I<=20 goto (3)
 The purpose of I is to count from 1 to 20.
 As T1:=4*I, and ‘I’ takes the value 1….20 (ie) I=20, T1=4*20, T1=80(“T1” progresses
as “I” progresses).
 T1 and I form arithmetic progression such identifiers are called induction variables.
 The induction variable should be eliminated as for as possible.
 So “I” may be represented in forms of “T1”.
 Eg”
T1=4*I can be written as
T1:=T1+4
And “if I<20 goto 3” can be represented as
“if T1<=76 goto 3”
Reduction in strength
 Replacement of an expensive operation by a cheaper one is reduction in strength.
 Eg: Multiplication step T1:=4*I is replaced by T1=T1+4.
 This will speed up the object code as addition takes less time than multiplication.
Elimination of local common sub expression
 The common sub expression in a program can be automatically detected if we construct a
DAG (Directed Acyclic Graph)
 If the interior needs in DAG have more than one label, then these represents the common
sub expression which can be detected and eliminated.
 Ex: Let us now see how to construct a DAG .
 Consider the following intermediate code representation:
1. S1:=4*I
2. S2:=I/J
3. S3=:=S1+S2
4. S4:=4*I
5. S5:=S4+B
 Parse tree or DAG for the above explanation is
S5+ +S3

B S1 , S4 S2 /
*

4 I J
 S1 & S4 are common sub expressions. This can be eliminated as shown below:

NASC- Department of Computer Applications[System Software] Page 39


1. S1:=4*I
2. S2:=I/J
3. S3=:=S1+S2
4. S5:=S4+B
S5+ +S3

B S1 S2 /
*

4 I J
Loop unrolling
 This deals with reducing the number of tests carried out if the number of iteration is
constant.
i=1;
while (i<=100)
{
x[i]=0;
i++;
}
“i<=100” is performed 100 times
 This sequence can be replaced by the following set of statements
i=1;
while (i<=50)
{
x[i]=0;
i++;
x[i]=0;
i++;
}
 Replication of body will reduce the number of checking process up to 50%.
Loop jamming
 This is a technique of merging the bodies of two loops if they have the same number of
iterations.
for(i=0;i<=10;i++)
x[i]=0;
for(i=0;i<=10;i++)
y[i]=1;
 Body of two ‘for’ loops having the variable “I” within the same range can be
concatenated.
Result will be
for(i=0;i<=10;i++)
{
x[i]=0;
y[i]=1;
}
Advantages gained by code optimization
1. Codes can be made to run faster.

NASC- Department of Computer Applications[System Software] Page 40


2. Codes may be made to take less space.
3. Execution efficiency of the object code is achieved.

MACHINE-DEPENDENT COMPILER FEATURES


Intermediate form of the program
 Intermediate code is a stream of simple instruction.
 It is similar to the assembly language instruction except that the register need not be
specified
 Examples of intermediate code are address code, quadruples, triples etc.,
Quadruples
 Quadruples is of the form:
Operation, op1, op2, result
 Where operation is some function to be performed by the object code.
 Op1 & op2 are the operands for this operation and result designates the resulting value is
to be placed.
 Example1: The source program statement
SUM:=SUM+VALUE
could be represented with quadruples
+, SUM, VALUE, i1,
:=, i, , SUM
 Exampple2: The statement
VARIANCE:=SUMS, DIV100 – MEAN * MEAN
could be represented with quadruples
DIV, SUMS, ≠100, i1
*, MEAN, MEAN, i2
-, i1, i2, i3
:=, i3, , VARIANCE
Code optimization on quadruples
 Many types of analysis and manipulation can be performed on the quadruples for code-
optimization purpose.
 The quadruples can be rearranged to eliminate redundant load and store operations
 And the intermediate results ij can be assigned to registers or to temporary variables to
make their use as efficient as possible.
 After optimization has been performed the modified quadruples are translated into
machine code.
Advantage
 The quadruples appear in the order which the corresponding object code instruction is to
be executed.
 This greatly simplifies the task of analyzing the code for purposes of optimization.
 It also means that the translation into machine instructions will be relatively easy.
Machine dependent code optimization
 Different possibilities for performing machine-dependent code optimization:
1. Machine instruction that use register as operands are usually faster than the
corresponding instruction that refer to locations in memory.
Therefore, we would prefer to keep in register all variables and intermediate results
that will be used later in the program.

NASC- Department of Computer Applications[System Software] Page 41


Each time a value is fetched from memory, or calculated as an intermediate result, it
can be assigned to same register.
The value will be available for later use without requiring a memory reference.
This approach also avoids unnecessary movement of values between memory and
registers which takes time.
2. We can replace the register value when it is necessary to assign a register for some
other purpose.
Such register assignments can also be used to eliminate the need of temporary
variable.
3. In making and using register assignments, a compiler must also consider the control
flow of the program.
The existence of jump instruction creates difficulty in keeping track of register contents.
Solution
1. One way to deal with this problem is to divide the program into basic blocks.
A basic block is a sequence of quadruples with one entry point, which is at the
beginning of the block, one exit point, which is at the end of the block, and no jumps
within the block.
When control passé from one basic block to another, all values currently held in
registers are saved in temporary variables.

An arrow from block x to block y indicates that control can pass directly from the last
quadruple of x to the first quadruple of y. this kind of representation is called as flow
graph.
2. Another possibility involves rearranging quadruples before machine code is
generated.

NASC- Department of Computer Applications[System Software] Page 42


The value of the intermediate result i1, is calculated first and stored in temporary
variable t1.
Then the value of i2 is calculated.
The third quadruple in this series calls for subtracting the value of i2 from i1.
Since i2 has just been computed, its value is available in register A.
It is necessary to store the value of i2 in another temporary variable t2, and then load
the value of i1 from t1 into register A before performing the subtraction.

With a little analysis, an optimizing compiler could recognize this situation and
rearrange the quadruples so the second operand of the subtraction is computed first.
The first two quadruple in the sequence have been interchanged.
The resulting machine code requires two fewer instructions and uses only one
temporary variable instead of two.
3. Other possibilities involve taking advantage of specific characteristics and
instructions of the target machine.
For example, there may be special addressing modes that can be used to create more
efficient object code.

NASC- Department of Computer Applications[System Software] Page 43


On some computers there are high-level machine instructions that can perform
complicated functions such as calling procedures and manipulating data structures in
a single operation.

NASC- Department of Computer Applications[System Software] Page 44

You might also like