System Programming
System Programming
System Programming
SYSTEM PROGRAMMING
6TH Sem BCA
Chapter-1
Introduction
Software
Software is as set of instructions or programs written to carry out certain task on digital
computers
Types of software
1. System software
2. Application software
System software
System software consists of a variety of programs that support the operation of a
computer
Ex:OS,compiler,loader linker ,assembler,macroprocessor
System software acts as an intermediary between the users and the hardware. It creates a
virtual environment for the user that hides the actual computer architecture.
Virtual machine is a set of services and resources created by the system software and
seen by the user.
Application software
An application is a program, or group of programs, that is designed for the end user.
The special purpose programs are also know as packages
Ex. database programs, word processors, Web browsers and spreadsheets, library
management system.
Macros:
Macro processor is a program that substitutes and specializes macro definitions for macro
calls.
Compilers:
The programs that translate the high level language program (source code) into machine
language program (object code)
Operating system
It contains programs which manage and it is concerned with allocation of resources and
services such as memory, processor, devices and information.
Machine structure
Memory
Memory is the device where information is stored and retrieved.
Information is stored in the form of 1‟s and 0‟s.
Each 1/0 is a separate binary digit called bit.
Bits are grouped into words, characters or bytes.
o Nibble 4bits
o Byte 8bits
o Half word 16bits
o Word 32bits
o Double word 64bits
Basic unit of memory is a byte.
Processor
Processor is a device that performs a sequence of operations specified by instructions in
memory.
There are two types of processors,
1. Central processing units
2. Input and output processors
Central processing units
It is the brain of the computer
It controls all internal and external devices, performs arithmetic and logic operations
It also carries out the instructions of a computer program
It is concerned with manipulations of data stored in memory
Input and output processors
Input and output processors transfers data between memory and peripheral devices such
as disks, drums, printers etc.
An I/O processor executes these instructions which are activated by a command from the
CPU.
Programming the I/O processor is called as I/O programming
Loaders
The purpose of a loader is to assure that object programs are placed in memory in an
executable .the assembler itself could place the object program directly in memory and
transfer control to it, then that machine level language program is executed. But there are
two disadvantages
1. Wastage of memory-assembler itself occupies more space in memory during
execution.
2. Wasting translation time-need of retranslation of the program with each
execution.inorder to avoid this the new system software called loader is introduced
If the program size is very large then subdivide the program into smaller routines called
sub-routines
There are two different types Subroutines
1. Closed subroutines
2. Open subroutines
The task of adjusting programs, so that they may be placed in arbitrary memory locations
is called relocation.
Formal systems
A formal system is an uninterrupted calculus. It consist of
o An alphabet
o A set of words called axioms and
o A finite set of relations called rules of inference
Examples of formal systems are set theory,boolean algebra, post systems
Uses of formal systems
o Used in the design, implementation and study of programming languages
o Used to specify the syntax and the semantics of programming languages
o Used in syntax directed compilation, compiler verification and complexity studies of
languages
Operating system
OS is a program that controls the execution of an application program and acts as an
interface between user and computer hardware.
Chapter-2
Machine structure, machine language and assembler language
Instruction interpreter
It is a group of electrical circuits that performs the purpose of the instructions fetched
from the memory.
It is like a decoder that decodes the type of the instruction.
Location counters (LC)
It is also known as program counter or instruction counter which holds the location of the
current instructions being executed
Instruction registers (IR)
It contains a copy of the current instruction that is executed.
Working registers (WR)
WR are memory devices that serves as „SCRATCH PAD‟S‟ (a plurality of multibit
storage locations) for instruction interpreter.
WR are general purpose registers.
General purpose register
GPR‟S are used by the programmer as storage locations and for special functions.
Memory address registers (MAR)
It contains the address of memory location that is to be read from or stored into.
Memory buffer register (MBR)
It contains a copy of the designated memory location specified by the MAR, after read or
write.
Memory controller
It is hardware that transfers data between the MBR and the core memory location the
address of which in the MAR
I/O channels
It may be through of as separate computers which interpret special instructions for
inputting and outputting information from the memory.
Registers
Types of register are
Name of the register number size
General purpose registers 16 32bits each
Floating point registers 4 64bits each
Program status word 1 64bits
901 - offset
2 - Index register
15 - Base register
The total number of bits for an add instruction would be 32
i.e., if base register is not used 40 bits is required for each instruction which leads to an addition
of bits and thus wastage of memory.
Data formats
The different types of data formats present in IBM system 360 are
1. Short form fixed point numbers
2. Long form fixed point numbers
3. Packed decimal numbers
4. Unpacked decimal numbers.
5. Short form floating point number
Example: The byte address starting from 1016 it occupies four locations
4 bytes =32bits
0 000 0000 0000 0000 0000 0001 0000 1011
S(+) 1016 1017 1018 1019
Byte address
3. Packed decimal numbers
The last 1byte (4+4 bits) is reserved for sign and data.
In between the BCD of each numbers a 4 bit zone code/padding bit/internibble bit is
introduced which contains 0/1.
The hex digits C,A,F,E indicates a +ve numbers
While D & B indicates a –ve numbers
Example:-118.625e5
4byts(32bits)
1bit 7bits 24bits
1 1000010 0111 0110 1010 0000 0000 0000
S Exponent Fraction
1bit-sign bit(-ve)
7bits represent exponent-in this example 5 is the exponent the range of the 7bit is 127.so
add 127+5=132 then 132 is convert to binary(2) by division method
24bits represents fraction-in this example 0.625 is the fractional part then convert 0.625
to binary by repeated multiplication by 2
It is same as short form floating point numbers but in Long form floating point numbers
consists of 64bits (8 bytes) are allocated for the floating point number which contains the
exponential and fractional part
Register operands
Register operands refer to data stored in one of the 16 general purpose registers, which
are addressed by 4 bit field in the instruction.
Storage operands
Storage operand refers to data stored in core memory. The length of the operand depends
upon the specific data type.
Immediate operand
Immediate operands are single byte of data and are stored as part of the instruction.
RR instruction:
RR instruction denotes register to register operation.i.e, both the operands are register.
The length of RR instruction is 2 bytes (16 bits)
The general format of RR
RX instruction:
RX instruction denotes a register and indexed storage operation
The length of RX instruction is 4 bytes (32 bits)
The general format of RX instruction is
Indexed storage operand refers to the data stored in core memory. The address of the
storage operand is calculated as follows
Address=value of an offset or displacement + contents of a base register + contents of an
index register
=C(B2)+C(X2)+D2
Example: ADD 3,16(0,5)
RS instructions
RS instruction denotes register and storage operation
The length of RS type instruction is 4 bytes (32 bits).
The general format of RS instruction is
4bytes
OP R1 R3 B2 D2
1001 1000 0001 0011 0101 0000 0001 0000
Load multiple
register 1 3 5 16
Address =C (B2) + D2
= C(5)+16
=1000+16
=1016
The load instruction loads register 1 and 3 with the contents of location 1016
SI instruction
SI instruction denotes storage and immediate operand operation
Immediate operands are single byte of data and stored as part of the instruction
Department of BCA Page 19
SS instruction
SS instruction denotes a storage to storage operation
The length of SS instruction is 6 bytes(48 bits)
The general format is
In SS format, the length is always on less than the data moved.i.e. If L=0 move 1 byte.
Here L=79, therefore move 80 bytes from location 1032[till 1111(1032+79)] to 1300[till
1379(1300 + 79)]
Instruction set
The various categories of instructions are
1. load-store registers instructions
2. Fixed point arithmetic
3. Logical instructions
4. Transfer instructions
5. Miscellaneous instructions
3. Logical instructions
4. Transfer instructions
5. Miscellaneous instructions
Example program:
Write a program that will add the number 49 to the contents of 10 adjacent full words
(32bits or 4 bytes) in memory with the following assumptions:
1. The 10 numbers are contiguous full words beginning at absolute location 952.
2. The program is in core memory starting at absolute location 48.
3. The number 49 is a full word at absolute location 948.
4. Register 1 contains a 48.
L 2,904(0, 1)
Load the first number into register 2
Address of the storage operand =904+contents of base register1
=904+48
=952
Contents of register 2=contents of memory location 952=Data1
A 2,900(0, 1)
Add 49 with data1
Address of the storage operand =900+contents of base register1
=900+48
=948
Contents of register 2 =contents of register2+contents of memory location 948
=Data1+49
ST 2,904(0, 1)
Address of storage operand =904+contenet of register1
=904+48
=952
Content of register 2 =content of base register2
=Data1+49
L and ST are RX type instruction. Whose size is 4 bytes. Therefore absolute and relative
address is incremented by 4.
Advantages
Implementation is easy.
Disadvantages
Instructions are repeated for all the data items
It is impossible to access both the first data item and the last data item using register 1 as
the base
Wastage of memory
Need of relocation.
Instruction would overlap data in the core.
Here instruction is treated as data. Therefore adding 4 to an instruction will update its
offset.
For example, if location 48 contains the instruction L 2,904(0, 1)
The instruction is stored as follows from byte number 48
Now when we add 4 to this instruction, the offset present in the 4th byte is treated as data
and is incremented by 4.
L+4=904+4
=908
Advantages
Saves memory
Address is modified easily using the instruction.
Disadvantages
Treating instructions as data is not a good programming practice
Separate instructions are used for increasing the displacement(offset) of load and store
SR 4, 4
Clear register 4 by subtracting the contents of register 4 from register 4.
The contents of register 4=0
L 2,904(4,1)
Load data element of array
Address of the storage operand=904+contents of index register 4+contents of base
register 1
=904+0+48
=952
Content of register 2 =contents of memory location 952
=data1.
A 2,900(0, 1)
Add 49
Address of the storage operand =900+contents of base register 1
=900+48
=948
Contents of register 2 =contents of register 2+contents of memory
=Data1+49
ST 2,904(4, 1)
Replace data element
Advantages
Easy to understand
Saves memory
4. Looping
The additional assumptions made for this method are
Assumption 6: relative location 892 contains a 10
Assumption 7: relative location 888 contains a 1.
s
L 3,892(0, 1)
Load data into register 3
Address of the storage operand =892+C(B1)
=892+48
=940.
Content of register 3 =contents of memory location 940
=10
S 3,888(0, 1)
Subtract 1
Address of the storage operand =888+contents of register 1
=888+48
=936
C(R3) =C(R3)-contents of memory location936
=10-1
=9
ST 3,892(0, 1)
Store temp
Address of the storage operand =892+C(R1)
=892+48
=940
Content of memory location 940 =contents of register 3
=9
BC 2, 2(0,1)
Branch if result is positive.
Assembly language
Definition
Assembly language is a low level programming language that allows and uses to write programs
using mnemonics (symbols)
Advantages
1. It is mnemonic
2. Reading is easier
3. Addresses are symbolic
4. Introduction of data to program is easier
5. It can be easily modified than machine language programs.
Disadvantages
1. An assembly language is required to translate source program into object program
2. It is machine dependent.
3. Lack of portability of programs between computers of different makes.
Pseudo opcodes
Pseudo opcode is an assembly language instruction that specifies an operation of the
assembler.
USING
Using is a pseudo opcode that indicates to the assemble which General Purpose Register
to use as a base register and what value it contained at execution time
Syntax
USING <content of base register><GPR to be used as base register>
Ex:
USING * 5
START:
Start is a pseudo opcode that tells the assembler where the beginning of the program is
and allows the user to give a name to the program
Ex:
START sum
Or
sum START
END:
End is a pseudo opcode that tells the assembler that the last statement of the program has
been reached
Ex:
END
EQU:
EQU is the pseudo opcode which allows the program to define variables
Ex:
BASE EQU 15
<Label> DS „size‟
Ex:
FOUR A DS 1F
DROP:
Drop is a pseudo opcode which indicates an unavailable base register and its contents
Syntax:
DROP<BS register number>
Ex:
DROP 15
LTORG:
LTORG is a pseudo opcode which tells the assembler to place the encountered literals at
an earlier location
Machine opcodes
BALR:
BALR is a branch and link instruction. It is an instruction to the computer to load a
register with the next address and branch to the address specified in the second field.
BALR loads the base register and it is an executable statement it is an RR type instruction
whose length is 2 bytes.
Ex: BALR 15,0
BR:
BR is a machine opcode indicating branch to the location whose address is in general
register
Ex: BR 14
BCT:
BCT indicates branch and count it is a RX type instruction whose size is 4 bytes.
Ex: BCT 3, loop
Decrements register 3 by 1 if result is not 0 branch back to loop
Chapter-3
Assembler
Assembler is system software which is used to translate assembly level language into machine
level language program code
Functions of assembler
First we are analyzing the problem statement next we are maintaining the what are the databases
using design proposes next we are getting one format of those database and write algorithm for
the statement and look overview of these steps
The first pass defines the symbols and literals here we cannot find out the offset value
The second pass generates the instruction addresses means the offset value.
Consider the following source program which has to be converted into machine language(object
program)
JOHN START N0
USING 15
L 1, FIVE
A 1 FOUR
ST 1 TEMP
FOUR DC F‟4‟
FIVE DC F‟5‟
TEMP DS 1F
Pass 1 pass2
0 l 1,-(0,15) 0 l 1,16(0,15)
4 A 1,-(0,15) 4 A 1,12(0,15)
8 ST 1,-(0,15) 8 ST 1,20(0,15)
12 4 12 4
16 5 16 5
20 - 20 -
Note: the load instruction add instruction and store instruction are RX instruction format so we
are using relative address 0,4and 8
Pass 1: database
4. POT(pseudo operation table)-it is used to store the all pseudo opcodes in our source program
and corresponding actions
5. ST(symbol table):it is used to store all the symbol /labels used in our program and its
corresponding value
6. LT(literal table): it is used to store all the literals/constants used in our program and its
assigned location
Pass 2: databases
3. MOT:it is used to store all directives or mnemonics and its length, binary opcode and
instruction format
4. POT: it is used to store all directives and its action corresponding index
5. ST(symbol table)-it is used to prepare by pass 1 it consists of each label and its value
6.BT(base table)-it is used to store which registers are currently specified as base register by
USING pseudo opcode and it specifies the contents of these registers
7. INST-this is a work space used to store each instruction as its various parts.
9. Punch card: it is also work space used for converting the assembled instructions in the format
needed by the loader.
Format of database
After the second step the third step will be format of those above mentioned databases.
MOT table
Codes
Instruction formats
000=RR(2 bytes)
001=RX(4 bytes)
010=RS(4 bytes)
Department of BCA Page 37
011=SI(4 bytes)
100 =SS(6 bytes)
POT table
Format
Symbol table
It is a variable table
Same table is used in both pass1 and pass2
The size of the table is 14 bytes per entry
The length field indicates the length in bytes of the instruction to which symbol is
attached
Absolute means the value of the symbol doesn‟t change if the program is moved in core
Format
Literal table
Format
Base table
It is a variable table
It is used to specify the base register
It is used only in pass2
The size of this table is 4 bytes per entry
After the third step next we will go to fourth step of a design that is algorithms and flowcharts of
an assemblers
The important purpose of a pass1 assembler is to assign location to each instruction and data
defining
Pseudo-ops and define values for symbols appearining in the label field of source program.
ALGORITHM:
1. First we are initialized location counter value is zero because of the relocate location is first
zero value after that based on the instruction formats that will be increased.
Pseudo-ops it will the attach to the location counter and definition of symbols in pass1.
6. If it is not pseudo-opcode that is machine opcode search for MOT for matching with with the
source
The purpose of pass2 is to process each card to find the values and its offset values
ALGORITHM
4. If it is USING pseudo-op or DROP pseudo opcode, then they may require addition processing
in pass2
7. If it is END pseudo opcode terminate pass2.before terminating pass2 generating the code for
literals and symbols.
9. If it is a machine opcode then it will store in to the MOT entry and to find length and binary
opcode and format of the instruction format.
11. It means we will search which type of instruction format it is whether RR, RX, RS, SS etc
Chapter-4
Macro Language and Macro processor
Sometimes need to repeat some blocks of code several times in our program/task. In such
cases to avoid this repetition the system software will be provided one special component
i.e., MACRO
Macro instructions is a notational convenience for the programmer, it allows the
programmer to write a short hand version of a program
Definition:
Macro instructions are the single line of abbreviation for group of instructions
OR
The design of macro processor is generally machine independent.
Ex: C-programming uses a macro processor to support for defining symbolic constant #define,
#include
Macro Definition:
Macro definition attaches a name to a sequence of instruction
Structure of macro-definition
MACRO starting of macro definition
[ ] give a macro_name
…………….
……………. sequence of instruction
…………….
MEND ending of macro definition
The macro definition starts with MACRO pseudo op code. It indicates beginning of the macro
definition. The Macro-definition terminated with the MEND pseudo op code.
It repeats 3 times MACRO permits as to attach a name to this sequence and to use the name in
its place then the macro definition will be
MACRO
Add
A 1, data
A 2, data
A 3, data
MEND
Where
MACRO=> is a pseudo op indicates beginning of definition
Add => is the name of the macro
MEND => is a pseudo op indicate end of the macro definition
Between the name of the Macro Add and MEND we have the sequence of instruction.
Once the macro has been defined, the use of the macro name in the place of sequence
Definition:
Sequence of instruction are simply substituted at the point of call or macro name is referred as
macro call
Ex: In the above mentioned example sequence will be repeated thrice, then we need to replace
sequence by macro name like:
MACRO
Add
A 1, data
A 2, data
A 3, data
MEND
…….
Add
……
Add
……
Data DC f „5‟
3 Macro Expansions:
Whenever the program needs the instruction in the place of macro name then we need macro
expansion.
Definition:
The macro-processor replace each macro calls with the defined set of instructions. This process
of replacement is called macro expansion.
A 1, data
Add A 2, data
…… A 3, data
Data DC f „5‟
To overcome this problem we use macro instructions arguments where these arguments are
appears in macro call. The corresponding macro dummy arguments are appears in macro
definition.
Ex:
A 1, data1
A 2, data2
……..
A 1, data1
A 2, data2
……..
In the above example operations are the same with different parameter value
The first sequence performs an operation using data1 as operand.
The second sequence performs an operation using data2 as operand.
Keyword argument:
The arguments which are presented in the macro definition known as Keyword argument or
dummy argument these arguments must be preceded by & symbol
MACRO
Add &arg1,&arg2, &arg3
A 1,&arg1
A 2,&arg2
A 3,&arg3
MEND
Positional argument:
The arguments which are presented along with the macro call outside the definition referred as
positional argument or actual argument.
Positional and keyword argument must be match according to the number.
Ex: Add a, b, c
Where a replaces the first keyword argument
b replaces the second keyword argument
c replaces the third keyword argument
Add A 1, b
…… A 2, b
Data DC f „5‟
MACRO
Add &arg1, &arg2, &arg3 Keyword argument
A 1, &arg1
A 2, &arg2
A 3, &arg3
MEND
……… A 1, a
Add a, b, c A 2, b
…… A 3, c
……
A 1, x
Add x, y, z A 2, y
…… A 3, z
Data DC f „5‟
The sequence of macro expansion can be reordered or change based on some conditions.
There are 2 Important macro processor pseudo op. they are
i. AIF
ii. AGO
Loop2 A 1,data3
A 2, data2
Loop3 A 1, data1
L 1,&arg
A 1,=F‟1‟
ST 1,&arg
MEND
Macro
ADDS &arg1,&arg2,&arg3
ADD1 &arg1
ADD1 &arg2
ADD1 &arg3
MEND
It is possible to have macro definition with in the body of macro. The inner macro definition is
not defined until the outer macro has been called.
Ex:
MACRO
DEFINE &fun
MACRO
&fun &arg
…………..
…………..
MEND
………….
MEND
Above example defines a macro definition “DEFINE” with an argument & fun
Inside this macro definition we have another macro definition &fun
Using this feature we can dynamically generate the definitions for new macros.
Assembly language
program with macro target
Macro AssemblerlerA
definition and program
processor sseAA ble
macro call
ALP without
Definition and calls
The macro process taken as input an ALP which contains macro definition and macro calls. Then
it transforms to expanded source without consisting macro definition and macro calls is through
the translator it will be convert as an object code.
1 Recognize macro definition:The macro processor must be recognize macro definition by the
MACRO and MEND pseudo op.
2 Save the definition: The macro processor must store the definition in memory which is
required for expanding macro call.
3 Recognize macro call:The macro processor must organize macro names appears as operations
mnemonics.
4 Expanded calls and substitute arguments:The macro substitute dummy/ macro definition
arguments with the corresponding positional arguments in a macro call.
Database Specification
Pass1 [Processing macro-definition and calls]
The input macro source code
The output macro code copy to pass2
MDT[macro definition table] which is used to store the body of the macro definition
MNT[macro name table], which is used to store names of the defined macro
MDTC[Macro Definition table Counter] which is used to indicate the next available
entry in MDT
MNTC[Macro Name Table Counter] which is used to indicate the next available entry in
MNT
ALA[Argument List array] to substitute index marker for the dummy argument before
storing a macro definition
ALA in pass1
In this when the macro definition are stored the arguments in the definition are replaced by index
markers.
# is the index marker, which is preceded by the dummy argument.
MACRO MACRO
Loop Add &arg1, &arg2, &arg3 #0 Add &arg1, &arg2,&arg3
A 1, &arg1 A 1, #1
A 2, &arg2 A 2, #2
A 3, &arg3 A 3, #3
MEND MEND
……..
……..
Add data1, data2, data3
ALA in pass2
In this argument in the macro call are substituted for the index marker stored in macro definition
In the above example the macro call is
MACRO
Loop Add &arg1, &arg2, &arg3
A 1, &arg1
A 2, &arg2
A 3, &arg3
MEND
……..
……..
Add data1, data2, data3
2 Addbbbbb 10
Pass1 Flowchart
If it is a MEND pseudo op then, read the next statement from the source
otherwise write the expanded source code
b) If it is not a macro name directly write into expanded source code
Pass2 Flowchart
Functions of loader:
1. Allocation: The space for program is allocated in the main memory, by calculating the
size of the program.
2. Linking:Which combines two or more separate object programs and supplies the
necessary information
3. Relocation:Adjusting all address location to object program or modifies the object
program so that it can be loaded at an address different from the location originally
specified.
4. Loading:Physically place the machine instruction and data into memory.
1 Compile and go loader:It is a link editor or program loader in which the assembler its self-
places the assembled instruction directly into the designated memory location.
After completion of assembly process it assigns the starting address of the program to the
location counter, and then there is no stop between the compilation, link editing, loading, and
execution of the program
Program loaded on
Compile and go
Source program memory
loader
Assembler
Advantages:
They are simple and easier to implement
Disadvantages:
This loader can perform and take only one object program at a time
A portion of memory is wasted because of the memory occupied by the assembler for
each object program due to that assembler necessary to retranslate the user program every
time
It is very difficult to handle multiple segments or subprograms
Source OObject
translatorr
program program oLoaderader
Advantages:
In this scheme there is no require for retranslation for each and every program because
here we are storing the loader instead of assembler
3 Absolute loader:In this scheme the assembler outputs the machine language translation of the
source program.
The data is punched on the cards instead of being placed directly in memory
The loaders in turns simply accept the machine language text and places into core at the location
prescribed by the assembler.
Ex:
Department of BCA Page 58
The main program assigned to location 100 to 247 and the subroutine is assigned to the location
400 to 477, if the changes were made to main memory i,e., increased its length more to an 300
byte at that time relocation is necessary
Main 100
Main
Absolute 248 248
loader 400
Sqrt
Sqrt 477
2 Transfer cards:These cards must convey the entry point of the program, which is where the
loader is to transfer the control when all instructions are loaded.
Disadvantages:
1. In this loader program adjust all internal segment addresses. So that programmers must
and should know the memory management and address of the programs.
2. If any modification is done in one segment then starting address is also changed.
3. If there are multiple segments the programmer must and should remember the addresses
of all sub-routine.
Sub-routine linkages:
If one main program is transfer to sub program and that sub program also transfer to another
program.
The assembler does not know this mechanism[symbolic reference] hence it will declare the error
message. That situation assembler provides two pseudo-op codes. They are
1 EXTRN
2 ENTRY
The assembler will inform the loader that these symbols may be referred by other programs
1 EXTRN:
The EXTRN pseudo op code is used to maintain the reference between 2 or more subroutines.
OR
The assembler pseudo-op code EXTRN followed by a list of symbols indicates that these
symbols are defined in other programs but referenced in the present program.
2 ENTRY: The assembler pseudo-op code ENTRY followed by a list of symbols indicates that
these symbols are defined in present program and referenced i9n other program.
ENTRY pseudo-op code is optional which is used to defining entry locations of sub-routines.
Ex:
A START B START
EXTRN B USING *15
………… …………
L 15, =A (B) …………
BALR 14, 15 BR 14
……………
END END
Relocating loaders
In order to avoid the disadvantages of reassembling in absolute loader another type of loader
called relocating loader is introduced.
The assembler assembles each procedure segments independently and then passes to loader the
text and information as to relocation and inters segment reference
The output of a relocating assembler using a BSS scheme is the object program and information
about all other programs it reference
For each source program the assembler output a text prefixed by transfer vector that consist of
address containing names of the subroutines referenced by the source program.
The assembler would also provide to loader with additional information the length of the entire
program and length of the transfers‟ vector.
OP R1 X D
It is necessary to relocate the address portion of every instruction, the assembler associate a bit
with each instruction or address field called relocation bits. If relocation bit ==1 the
corresponding address filed must be relocated. If (rb==0) the field is not relocated.
The relocation bits are used to solve the problem of relocation, the transfer vector is used to solve
the problem of linking and the program length information is used to solve the problem of
allocation.
There are 4 types of cards available in the direct linking loader. They are
1. ESD-External symbol dictionary
2. TXT-card
3. RLD-Relocation and linking dictionary
4. END-card
1 ESD card:
It contains information about all symbols that are defined in the program but reference some
where
It contains:
Reference number
Symbol name
Type Id
Relative location
Length
There are again ESD cards classified into 3 types of mnemonics. They are:
1. SD [Segment Definition]: It refers to the segment definition [01]
2. LD; It refers to the local definition [ENTRY] [02]
3. ER: it refers to the external reference they are used in the [EXTRN] pseudo op code [03]
2 TXT Card:It contains the actual information are text which are already translated.
3 RLD Card:This card contains information about location in the program whose contexts
depends on the address at which the program is placed.
In this we are used „+‟ and „–„sign, when we are using the „+‟ sign then no need of relocation,
when we are using „-„sign relocation is necessary.
The relative address and secure code of above two programs is written in the below
ESD Cards:
In a ESD card table contains information necessary to build the external symbol dictionary or
symbols table
In the above source code the symbols are PG1, PG1ENT2, PG2, and PG2ENT1
Format of ESD Card for PG1:
Source card Name Type Id Relative length
reference address
1 PG1 SD 01 0 60
2 PG1ENT1 LD - 20 -
2 PG1ENT2 LD - 30 -
3 PG2 ER - - -
3 PG2ENT1 ER - - -
Here, the PG1 is the segment definition it means, the header of program1
PG1ENT1 and PG1ENT2 those are the local definition of program1, so that the we are using the
type LD.
PG2 and PG2ENT1 those are using the EXTRN pseudo op code, so that we are using the type
ER
Text card for PG1:
The format of card will be
Source card Relative address Content Comments
reference
6 40-43 20
7 44-47 45 =30+15
8 48-51 7 =30-20-3
9 52-55 0 Unknown to PG1
10 56-60 -16 -20+4
6= A(PG1ENT1)=20
7=A (PG1ENT2+15)=30+15=45
8=A (PG1ENT2-PG1ENT1-3)=30-20-3=7
9=A (PG2)=0
10=A (PG2ENT1+PG2-PG1ENT1+4)=0+0-20+4= -16
6 02 4 + 40
7 02 4 + 44
9 03 4 + 52
10 02 4 + 56
10 03 4 + 56
10 02 4 - 56
16=A (PG1ENT1) =0
17=A (PG1ENT2+15) =0+15=15
18=A (PG1ENT2-PG1ENT1-3) =0-0-3=-3
3. Program load address counter [PLA]: It is used to keep track of each segments
assigned location
4. Global external symbol table [GEST]: It is used to store each external symbol and its
corresponding assigned core address
5. A copy of the input to be used later by pass2
6. A printed listing that specifies each external symbol and its assigned value
2 Pass2 database:
1. A copy of object program is input to pass2
2. The initial program load address [IPLA]
3. The program load address counter [PLA]
4. A table the global external symbol table [GEST]
5. The local external symbol array [LESA]: which is used to establish a correspondence
between the ESD ID numbers used on ESD and RLD cards and the corresponding
External symbols , Absolute address value
Object deck:
The object deck contains 4 types of cards
2 TEXT Card:
Source card reference Relative content
address address
3 RLD Card:
Source card ESD ID Length Flag Relative address
references + or -
It is used to store each external symbol and its corresponding core address.
“PG1bbbbb” 104
“PG1ENT1b” 124
The purpose of pass1 is to assign location to each segment and also finding the values of all
symbols
1. Initial program load address [PLA] its set to the initial program load address [IPLA]
2. Read the object card
3. Write copy of card for pass2
4. The card can be any one of the following type
i) Text card or RLD card- there is no processing required during pass1 and then
read the next card
ii) ESD card is processed based on the type of the external symbols
a) SD is read the length field LENGTH from the card is temporarily saved in
the variable SLENGTH. The value, VALUE toassign to this symbol is set
to the current value of the PLA.
The symbol and its assigned value are then stored in the GEST. If the
symbol already existed in the GEST then this is an error.
b) The symbol and its value are printed as part of the load map. LD is read
the value to be assigned is set to the current PLA+ the relative address
[ADDR]. The ADDR indicates on the ESD card
iii) When an END card is encountered the program load address is incremented
by length of the segment and saved on SLENGTH.
iv) EOF card is read pass1 is completed and control transfer to pass2
1. If an address is specified in the END card then that address is used as the executed start
address otherwise, the execution will begin from the first segment
2. In pass2 the five cards are read one by one described as follows
At the beginning of pass2 the program load address is initialized as in pass1 and the
execution start address [EXADDR] is set to IPLA.
i. ESD card
a) SD type=>the length of the segment is temporarily saved in the variable
SLENGTH. The LESA [ID] is set to the current value of the PLA.
b) LD type=> Does not requires any processing during pass2
c) ER type=>the guest is searched for a match with the ER symbols. If it is not
found then there is an error. If found in the GEST its value is extracted and the
corresponding LESA entry is set
ii. Text card: The text card is copied from the card to the appropriate relocated
memory location [PLA+ADDR]
iii. RLD card: The value to be used for relocation and linking is executed from the
LESA [ID] as specified by the ID field.
Depending upon the flag values is either added to or subroutine from the address
constants
iv. END card: If an execution start address is specified on the END card it is saved
in a variable EXADDR. The PLA is incremented by length of the segment and
saved in SLENGTH, becoming the PLA for the next segment
[PLA=PLA+SLENGTH]
v. EOF card: The loader transfers control to the loaded program at the address
specified by the current contents of the execution address variable [EXADDR]
1 Core image builder:A specific memory allocation of the program is performed at a time that
the subroutines are bound together. It is called a core image module and the corresponding
binder is called a core image builder
Advantages:
Simple to implement
Fast to execution
Disadvantages:
Difficult to allocate and load the program
2 Linkage editors:The linkage editor can keep track of relocation information so that the
resulting load module can be further relocated and their care the module loader must performs
additional allocation and relocation as well as loading but it does not worry about the problem of
linking
Advantages:
More flexible allocation and loading scheme
Disadvantages:
Implementation is so complex
Dynamic loading:
For the entire loader scheme we have assured that all of the subroutine needed or loaded into
memory at the same time.
If the total amount of memory required by all these subroutine exceeds the amount available
especially for large programs there is a problem to load all the subroutine into memory at the
same time. There are several hardware techniques to solve this problem such as paging or
segments.
Usually the subroutine of a program is need at different times
Ex:
Pass1 and Pass2 of an assembler are mutually exclusive. The assembler can recognize which
subroutine calls the other subroutine it is possible to produce an overlay structure that identifies
the mutual exclusive subroutine
A 25k
B 15k
C 20k
D 30k
E 20k
Note that procedure „B‟ and „D‟ are never use at the same time
Dynamic linking:
Department of BCA Page 70
The main disadvantages of all of the previous loading scheme are that the sub routine as
references but they never executed, but the loader still in use the overhead of linking the
subroutines.
This mechanism the loading linking of external references are postponed until execution time the
loaded only the main program if the main program should execute a transfer instruction to an
external address or external variable the loader is call.
Advantages:
The number overhead is incurred unless the procedure to be called or reference is actual
used
System can be dynamically reconfigured
Saves memory
Disadvantages:
More complex because of postponed most of binding process until the program execution
time.
Chapter-6
Compiler
Compiler is system software components that accepts a program return in a high level language
and produce an object program
Step 2:
The basic elements are tokens are entered into the table.
The table consists of 2 fields
1. Uniform symbols
2. Pointer
The uniform symbols are of fixed size and points the table entry of the associated basic element.
Here, uniform symbols are IDN for identifiers TRM->for terminals, LIT->for literals
Step 2:
Interpreting the meaning of the construction:
After performing the above step the resultant form is “syntactic form”
Step 3:
Intermediate form:
“The process of generating the object code for each construction after determining syntactic
construction is known as intermediate form”
1 Arithmetic Statements: The one intermediate form of the arithmetic statement is a parse tree.
The rules for convening arithmetic statement into a phrase tree are:
a) Any variable is a terminal node of a tree
b) For every operator having 2 branches in a binary tree whose left branch in the tree for
operand and whose right branch in the tree for operand 2.
Ex:
The another intermediate form is linear representation of the parse tree called a matrix
Matrix number Operator Operand1 Operand2
1 - Start Finish
2 * Rate M1
3 * 2 Rate
4 - Start Finish
5 - M4 100
6 * M3 M5
7 + M2 M6
8 = Cost M7
2 Non-arithmetic statement: The non-arithmetic statements are DD, IF, GOTO are the
examples of non-arithmetic statements
These statements can all be replaced by a sequential ordering of individual matrix entry.
Ex: Return (cost)
End
Matrix
Operator Operand1 Operand2
Return Cost
End
Ex:
Declare (Cost, Rate, Start, Finish) fixedBinary (31) static;
The tables consist of four fields
1. Variables-> cost, rate, start, finish
2. Data type-> fixed binary
3. Precession-> 31 bits
4. Storage class-> static
3) Storage allocation:
Proper amount of memory is reserved i.e., required by the program at some point of time.
Ex:
Declare (cost, Rate, Start, Finish) Fixed Binary
( 3 1 ) Static ;
Each variable of size 32 bite the first bit is reserved for representing sign bit. The sign bit is
allocated during load time
This relative addresses are used by the later phases of the compiler for proper accessing similarly
storage is also assigned for the temporary locations that will contain intermediate results of the
matrix
Ex:[ M1, M2, M3,………M7]
4 Code generations: The code generation phase taking the input in matrix form and generating
the object code for each and every entry defined in the table
Each entry in the matrix and with the associated object code is defined by a table called as
production on table
Ex:
Start – Finish
The operator -> In matrix is treated as a macro call
The operands start and finish -> Is treated as macro arguments
Operator operand1 operand2
Department of BCA Page 77
L 1,&operand1
S 1,&operand2
ST 1,&N
The following code can be generated the above statement using code definition of the
operator minus.
L 1,start
S 1,finish
ST 1,M1
M3 * 2 Rate M3 * 2 Rate
M4 – Start Finish M4
M5 – M4 100 M5 – M1 100
M6 * M3 M5 M6 * M3 M5
M7 + M2 M6 M7 + M2 M6
M8 = Cost M7 M8 = cost M7
Assembly phase:
The code generating phase is producing assembly language or the process of generating the
actual code is known as assembly phase
1 lexical analysis: Recognition of basics element or tokens and creation of uniform single table
3 Interpitaton phases: It describes the definition of exact meaning, creation of matrix and tables
for respective routine [action routings]
4 Machine independent optimization: Creation of most optimal matrix [removes the duplicate
entries in the matrix table]
5 storage assignment: It makes entries in the matrix that allow code generation to create code
that allocates dynamic storage and also the assembly phase to reserve the proper amount of
STARTIC storage
6 Code generation: A macro processor is used to produced more optimal assembly code
7 Assembly and Output: It resolving symbolic address and generating the machine language
Department of BCA Page 80
Phase 1 to 4 is machine independent and language3 dependent. Because this phases helps in
determining the syntax and meaning of each statement in the source program. Hence it
dependent on the language and independent of the machine
Phase 5 to 7 is machine dependent and language independent. Because this phase allocates
memory for literals and also generate the assembly code which is dependent on machine and
independent of language
2 Uniform symbol table: It consist of the tokens or basic elements as they appear in the
program created by lexical analysis phase and given as input syntax analysis and interprition
phase
3 Terminal table: This table is created by lexical analysis phase and contains all variable in the
program
4 Identifier table: It contains all variable in the program and temporary storage [Ex M1, M2,
M3 … M7] and information needed to reference allocate storage for the variables. This table is
created by lexical analysis
6 Reductions: It is a permanent table of decision rules in the form of pattern for matching with
the uniform symbols table to discover synthetic structure.
7 Matrix: Matrix is created by the intermediate form of the program which is created by the
action routine. It is optimized and then used for code generation
8 Code productions: It is permanent table of definition. There is one entry defining code for
each matrix operator.
9 assembly code: The assembly language variation of the program which is created by the code
generation phase and it is input to the assembly phase
10 Re-locatable object codes: The final output of the assembly phase ready to be use as input to
loader
Phases of compiler
1 Lexical phase:
The lexical phase performs the following three tasks:
1. Recognize basic elements are tokens present in the source code
2. Build literal and an identifier table
3. Build a uniform symbol table
Database:
Lexical phase involves the manipulation of 5 databases
1. Source program
2. Terminal table
3. Literal table
4. Identifier table
5. Uniform symbol table
1 Source program: The original form of the program created by the user
2 Terminal table: It is a permanent database it consist of 3 fields
3 Literal table:
It describes all literals constants used in the source program.
It consists of 6 fields:
Literal Base Scale Precision Other address
information
4 Identifier table:
It describes all identifiers used in the source program. It consists of three fields
1. Name
2. Data attribute
3. Address
Algorithm:
Step1: The first task of the lexical analysis algorithm is to parse the input character strange into
tokens
Step2: the second step is to make appropriate entries in the table
Department of BCA Page 84
Implementation:
1 The input strange is separated into tokens by break character. Brake characters are denoted by
the contents of a special field in the terminal table
2 lexical analysis 3 types of tokens:
1. Terminal symbols [TRM]
2. Identifiers [IDN]
3. Literals [LIT]
If symbol== TERMINAL table then
Create uniform symbol table of type TRM
3 Else if symbol==IDENTIFIER table then
Create uniform symbol table of type IDN
4 Else
Create uniform symbol table of type LIT
End if
2 Syntax Phase:
The functions of the syntax phase are
1. To recognize the major construct of the language
2. To call the appropriate action routines that will generate the intermediate form or matrix
form the constructs
Databases:
1 Uniform symbol table: The table create a by lexical phase
The uniform symbols are the source of input to the stack which s used by syntax and
interpretation phase
Table classes index
2 Stack: The stack is a collection of uniform symbol i.e., currently being worked on the stack is
organized in LIFO technique
3 Reduction table: The syntax rules of the source language are contained in the reduction table
The general form of the reduction or rules is:-
Label: old top stack/ action routine/ new top stack/ next reduction
Algorithm:
Step1: Reduction or tested consequently for match between old top of stack field and the actual
top of stack until match is found
Step2: When match is found the action routine specified in the action fields are executed in
ordered from left to right
Step3: when controlled return to the syntax analyzer, it modifies the top of stack to agree with
the new top of tack.
Step4: step1 is repeated starting with the reduction specified in the next reduction field
3. Interpretation Phase:
1. Uniform symbol table
2. Stack
3. Identifier table
4. Matrix
The above mentioned data bases are referred in text book page nos: 210.
5. Optimization Phase:
Optimization performed by a compiler are of 2 types. They are
6. Code generation:
The Purpose of the code generation is to produce appropriate code. In this phase Matrix is
the input data base.
Data bases:
Matrix
Identifier table
Literal table
Code productions.
Ex: code generation with machine dependent Optimization.
A=B+C+D
Refer in text book Page no: 224.
A cross compiler is necessary to compile for multiple platforms from one machine. A platform
could be infeasible for a compiler to run on, such as for the microcontroller of an embedded
system because those systems contain no operating system.
Cross compilers are not to be confused with source-to-source compilers. A cross compiler is for
cross-platform software development of binary code, while a source-to-source "compiler" just
translates from one programming language to another in text code.
Uses of cross compilers
Embedded computers
Compiling for multiple machines
Use of virtual machines
The linker is the software program which binds many object modules to make a
single object program
Functions of Linking are Static linking and dynamic linking.
The cross reference table is easy to use and requires no coding. It is comprised of a set of data
elements (or values) that are organized using a model of rows and columns.
A cross reference table can be used to accept one input value and produce one output value. The
example below shows a cross reference table lookup within a function. In addition to setting up
the function, you need to map all input elements from the source profile to the input values in the
function. You also need to map all output values from the function to the output elements in the
destination profile.
For example, when referring to the U.S. states:
System A uses the State Name value
System B uses the FIPS Alpha Code value
When mapping from System A to System B, you need to translate the State Name value to the
FIPS Alpha Code value. The SQL Select statement for the Output Element would be: SELECT
FIPS_Alpha_Code FROM State_Cross-Reference_Example WHERE State Name =
Input_Element. If the State Name=Alabama in System A then the FIPS Alpha Code=AL for
System B. "AL" is the value that will be returned in the output.