MCA System Programming MC0073
MCA System Programming MC0073
1.)
Describe the following with respect to Language Specification:
A) Fundamentals of Language Processing
This definition motivates a generic model of language processing activities. We refer to the
collection of language processor components engaged in analysing a source program as the
analysis phase of the language processor. Components engaged in synthesizing a target program
constitute the synthesis phase.
A specification of the source language forms the basis of source program analysis. The
specification consists of three components:
1. Lexical rules which govern the formation of valid lexical units in the source language.
2. Syntax rules which govern the formation of valid statements in the source language.
3. Semantic rules which associate meaning with valid statements of the language.
Thus, analysis of a source statement consists of lexical, syntax and semantic analysis.
Lexical analysis identifies the lexical units in a source statement. It then classifies the units into
different lexical classes, e.g. id’s, constants, reserved id’s, etc. and enters them into different
tables. Lexical analysis builds a descriptor, called a token, for each lexical unit.
Syntax analysis processes the string of tokens built by lexical analysis to determine the statement
class, e.g. assignment statement, if statement, etc. It then builds an IC which represents the
structure of the statement. The IC is passed to semantic analysis to determine the meaning of the
statement.
Semantic analysis
Semantic analysis of declaration statements differs from the semantic analysis of imperative
statements. The former results in addition of information to the symbol table, e.g. type, length
and dimensionality of variables. The latter identifies the sequence of actions necessary to
implement the meaning of a source statement. In both cases the structure of a source statement
guides the application of the semantic rules.
Example 1.2 : Consider the statement
The synthesis phase is concerned with the construction of target language statement(s) which
have the same meaning as a source statement. Typically, this consists of two main activities:
From the preceding discussion it is clear that a language processor consists of two distinct
phases–the analysis phase and the synthesis phase. This process is so complex that it is not
reasonable, either from a logical point of view or from an implementation point of view. For this
reason, it is customary to partition the compilation process into a series of sub processes called
phases.
Phase:
A phase is a logically cohesive operation that takes as input one representation of the source
program and produces as output another representation.
Pass: The portions of one or more phases are combined into a module called a pass. A pass reads
the source program or output of another pass, makes the transformations specified by its phases
and writes the output to an intermediate file, which may then be read by a subsequent pass.
The language processor performs certain processing more than once. In pass I, it analyses the
source program to note the type information. In pass II, it once again analyses the source program
to generate target code using the type information noted in pass I. This can be avoided using an
intermediate representation of the source program.
The first pass performs analysis of the source program, and reflects its results in the intermediate
representation. The second pass reads and analyses the IR, instead of the source program, to
perform synthesis of the target program. This avoids repeated processing of the source program.
The first pass is concerned exclusively with source language issues. Hence it is called the front
end of the language processor. The second pass is concerned with program synthesis for a
specific target language. Hence it is called the back end of the language processor.
· Processing efficiency: efficient algorithms must exist for constructing and analysing the IR.
There are two LPDTs widely used in practice. These are, the lexical analyzer generator LEX,
and the parser generator YACC. The input to these tools is a specification of the lexical and
syntactic constructs of L, and the semantic actions to be performed on recognizing the
constructs.
Compiler or Interpreter for a programming language is often decomposed into two parts:
Lex and Yacc can generate program fragments that solve the first task.
The task of discovering the source structure again is decomposed into subtasks:
Lex helps write programs whose control flow is directed by instances of regular expressions in
the input stream. It is well suited for editor-script type transformations and for segmenting
input in preparation for a parsing routine.
Lex source is a table of regular expressions and corresponding program fragments. The table is
translated to a program which reads an input stream, copying it to an output stream and
partitioning the input into strings which match the given expressions. As each such string is
recognized the corresponding program fragment is executed. The recognition of the
expressions is performed by a deterministic finite automaton generated by Lex. The program
fragments written by the user are executed in the order in which the corresponding regular
expressions occur in the input stream.
2.) Define the following:
A.) Addressing modes for CISC(Motorola and Intel)
· Register to Register,
· Register to Memory,
· Memory to Memory
· Absolute address – the address (in either the "short" 16-bit form or "long" 32-bit form) of
the operand immediately follows the instruction
· Program Counter relative with index and displacement – The instruction contains both the
identity of an "index register" and a trailing displacement value. The contents of the index
register, the displacement value, and the program counter are added together to get the final
address.
· Address register indirect – An address register contains the address of the operand.
· Address register relative with index and displacement — The instruction contains both the
identity of an "index register" and a trailing displacement value. The contents of the index
register, the displacement value, and the specified address register are added together to get
the final address.
· RISC characteristics
- Hardwired control
LC processing and construction of the symbol table proceed as in two pass translation. The
problem of forward references is tackled using a process called backpatch-ing. The operand
field of an instruction containing a forward reference is left blank initially. The address of
the forward referenced symbol is put into this field when its definition is encountered.
START 101
READ N 101) + 09 0 113
MOVER BREG, ONE 102) + 04 2 115
MOVEM BREG, TERM 103) + 05 2 116
AGAIN MULT BREG, TERM 104) + 03 2 116
MOVER CREG, TERM 105) + 04 3 116
ADD CREG, ONE 106) + 01 3 115
MOVEM CREG, TERM 107) + 05 3 116
COMP CREG, N 108) + 06 3 113
BC LE, AGAIN 109) + 07 2 104
MOVEM BREG, 110) + 05 2 114
RESULT
PRINT RESULT 111) + 10 0 114
STOP 112) + 00 0 000
N DS 1 113)
RESULT DS 1 114)
ONE DC ‘1’ 115) + 00 0 001
TERM PS 1 116)
END
Fig. 1.7
In the above program (fig. 1.7) , the instruction corresponding to the statement
can be only partially synthesized since ONE is a forward reference. Hence the instruction
opcode and address of BREG will be assembled to reside in location 101. The need for
inserting the second operand’s address at a later stage can be indicated by adding an entry
to the Table of Incomplete Instructions (TII). This entry is a pair (instruction address>,
<symbol>), e.g. (101, ONE) in this case.
By the time the END statement is processed, the symbol table would contain the addresses
of all symbols defined in the source program and TII would contain information describing
all forward references. The assembler can now process each entry in TII to complete the
concerned instruction. For example, the entry (101, ONE) would be processed by obtaining
the address of ONE from symbol table and inserting it in the operand address field of the
instruction with assembled address 101. Alternatively, entries in TII can be processed in an
incremental manner. Thus, when definition of some symbol symb is encountered, all
forward references to symb can be processed.
Pass I:
3. Perform LC processing.
Pass I performs analysis of the source program and synthesis of the intermediate rep-
resentation while Pass II processes the intermediate representation to synthesize the target
program. The design details of assembler passes are discussed after introducing advanced
assembler directives and their influence on LC processing.
3.) Explain the following with respect to Macros and Macro Processors:
A) Macro Definition and Expansion
Definition : macro
A macro name is an abbreviation, which stands for some related lines of code. Macros are
useful for the following purposes:
A macro consists of name, set of formal parameters and body of code. The use of macro
name with set of actual parameters is replaced by some code generated by its body. This is
called macro expansion.
Macros allow a programmer to define pseudo operations, typically operations that are
generally desirable, are not implemented as part of the processor instruction, and can be
implemented as a sequence of instructions. Each use of a macro generates new program
instructions, the macro has the effect of automating writing of the program.
Macros can be defined used in many programming languages, like C, C++ etc. Example
macro in C programming.Macros are commonly used in C to define small snippets of code.
If the macro has parameters, they are substituted into the macro body during expansion;
thus, a C macro can mimic a C function. The usual reason for doing this is to avoid the
overhead of a function call in simple cases, where the code is lightweight enough that
function call overhead has a significant impact on performance.
For instance,
Defines the macro max, taking two arguments a and b. This macro may be called like any C
function, using identical syntax. Therefore, after preprocessing
z = max(x, y);
C macros are capable of mimicking functions, creating new syntax within some limitations,
as well as expanding into arbitrary text (although the C compiler will require that text to be
valid C source code, or else comments), but they have some limitations as a programming
construct. Macros which mimic functions, for instance, can be called like real functions, but
a macro cannot be passed to another function using a function pointer, since the macro
itself has no address.
Macro Expansion.
A macro call leads to macro expansion. During macro expansion, the macro statement is
replaced by sequence of assembly statements.
Example
In the above program a macro call is shown in the middle of the figure. i.e. INITZ. Which
is called during program execution. Every macro begins with MACRO keyword at the
beginning and ends with the ENDM (end macro).when ever a macro is called the entire is
code is substituted in the program where it is called. So the resultant of the macro code is
shown on the right most side of the figure. Macro calling in high level programming
languages
(C programming)
int x , y;
x=4; y=6;
z = max(x, y); }
The above program was written using C programing statements. Defines the macro max,
taking two arguments a and b. This macro may be called like any C function, using
identical syntax. Therefore, after preprocessing
Becomes z = x>y ? x: y;
After macro expansion, the whole code would appear like this.
main()
{ int x , y;
Example 2:
Means that some sections of the program may be optional, either included or not in the final
program, dependent upon specified conditions. A reasonable use of conditional assembly
would be to combine two versions of a program, one that prints debugging information
during test executions for the developer, another version for production operation that
displays only results of interest for the average user. A program fragment that assembles the
instructions to print the Ax register only if Debug is true is given below. Note that true is
any non-zero value.
Here is a conditional statements in C programming, the following statements tests the
expression `BUFSIZE == 1020′, where `BUFSIZE’ must be a macro.
C) Macro Parameters
Macros may have any number of parameters, as long as they fit on one line. Parameter
names are local symbols, which are known within the macro only. Outside the macro they
have no meaning!
Syntax:
ENDM
During macro expansion, these actual arguments replace the symbols of the corresponding
formal parameters, wherever they are recognized in the macro body. The first argument
replaces the symbol of the first parameter, the second argument replaces the symbol of the
second parameter, and so forth. This is called substitution.
Example 3
MOV A,#CONSTANT
ADD A,REGISTER
ENDM
MY_SECOND 42, R5
MOV A,#42
ADD A,R5
are inserted into the program, and assembled. The parameter names CONSTANT and
REGISTER have been replaced by the macro arguments "42" and "R5". The number of
arguments, passed to a macro, can be less (but not greater) than the number of its formal
parameters. If an argument is omitted, the corresponding formal parameter is replaced by an
empty string. If other arguments than the last ones are to be omitted, they can be
represented by commas.
Macro parameters support code reuse, allowing one macro definition to implement multiple
algorithms. In the following, the .DIV macro has a single parameter N. When the macro is
used in the program, the actual parameter used is substituted for the formal parameter
defined in the macro prototype during the macro expansion. Now the same macro, when
expanded, can produce code to divide by any unsigned integer.
Fig. 3.0
Example 4
<macro body>
ENDM
If it is called as follows,
OPTIONAL 1,2,,,5,6
the formal parameters P1, P2, P5 and P6 are replaced by the arguments 1, 2, 5 and 6 during
substitution. The parameters P3, P4, P7 and P8 are replaced by a zero length string.
Boot straping
Bootstrap loading
The discussions of loading up to this point have all presumed that there’s already an
operating system or at least a program loader resident in the computer to load the program
of interest. The chain of programs being loaded by other programs has to start somewhere,
so the obvious question is how is the first program loaded into the computer?
In modern computers, the first program the computer runs after a hardware reset invariably
is stored in a ROM known as bootstrap ROM. as in "pulling one’s self up by the
bootstraps." When the CPU is powered on or reset, it sets its registers to a known state. On
x86 systems, for example, the reset sequence jumps to the address 16 bytes below the top of
the system’s address space. The bootstrap ROM occupies the top 64K of the address space
and ROM code then starts up the computer. On IBM-compatible x86 systems, the boot
ROM code reads the first block of the floppy disk into memory, or if that fails the first
block of the first hard disk, into memory location zero and jumps to location zero. The
program in block zero in turn loads a slightly larger operating system boot program from a
known place on the disk into memory, and jumps to that program which in turn loads in the
operating system and starts it. (There can be even more steps, e.g., a boot manager that
decides from which disk partition to read the operating system boot program, but the
sequence of increasingly capable loaders remains.)
Why not just load the operating system directly? Because you can’t fit an operating system
loader into 512 bytes. The first level loader typically is only able to load a single-segment
program from a file with a fixed name in the top-level directory of the boot disk. The
operating system loader contains more sophisticated code that can read and interpret a
configuration file, uncompress a compressed operating system executable, address large
amounts of memory (on an x86 the loader usually runs in real mode which means that it’s
tricky to address more than 1MB of memory.) The full operating system can turn on the
virtual memory system, loads the drivers it needs, and then proceed to run user-level
programs.
Many Unix systems use a similar bootstrap process to get user-mode programs running.
The kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long,
into that process. The tiny program executes a system call that runs /etc/init, the user mode
initialization program that in turn runs configuration files and starts the daemons and login
programs that a running system needs.
None of this matters much to the application level programmer, but it becomes more
interesting if you want to write programs that run on the bare hardware of the machine,
since then you need to arrange to intercept the bootstrap sequence somewhere and run your
program rather than the usual operating system. Some systems make this quite easy (just
stick the name of your program in AUTOEXEC.BAT and reboot Windows 95, for
example), others make it nearly impossible. It also presents opportunities for customized
systems. For example, a single-application system could be built over a Unix kernel by
naming the application /etc/init.
Design of a linker
The relocation requirements of a program are influenced by the addressing structure of the
computer system on which it is to execute. Use of the segmented addressing structure
reduces the relocation requirements of program.
Example 7.7: Consider the program of written in the assembly language of intel 8088. The
ASSUME statement declares the segment registers CS and DS to the available for memory
addressing. Hence all memory addressing is performed by using suitable displacements
from their contents. Translation time address o A is 0196. In statement 16, a reference to A
is assembled as a displacement of 196 from the contents of the CS register. This avoids the
use of an absolute address, hence the instruction is not address sensitive. Now no relocation
is needed if segment SAMPLE is to be loaded with address 2000 by a calling program (or
by the OS). The effective operand address would be calculated as <CS>+0196, which is the
correct address 2196. A similar situation exists with the reference to B in statement 17. The
reference to B is assembled as a displacement of 0002 from the contents of the DS register.
Since the DS register would be loaded with the execution time address of DATA_HERE,
the reference to B would be automatically relocated to the correct address.
Though use of segment register reduces the relocation requirements, it does not completely
eliminate the need for relocation. Consider statement 14 .
Which loads the segment base of DATA_HERE into the AX register preparatory to its
transfer into the DS register . Since the assembler knows DATA_HERE to be a segment, it
makes provision to load the higher order 16 bits of the address of DATA_HERE into the
AX register. However it does not know the link time address of DATA_HERE, hence it
assembles the MOV instruction in the immediate operand format and puts zeroes in the
operand field. It also makes an entry for this instruction in RELOCTAB so that the linker
would put the appropriate address in the operand field. Inter-segment calls and jumps are
handled in a similar way.
Relocation is somewhat more involved in the case of intra-segment jumps assembled in the
FAR format. For example, consider the following program :
Here the displacement and the segment base of FAR_LAB are to be put in the JMP
instruction itself. The assembler puts the displacement of FAR_LAB in the first two
operand bytes of the instruction , and makes a RELOCTAB entry for the third and fourth
operand bytes which are to hold the segment base address. A segment like
ADDR_A DW OFFSET A
(which is an ‘address constant’) does not need any relocation since the assemble can itself
put the required offset in the bytes. In summary, the only RELOCATAB entries that must
exist for a program using segmented memory addressing are for the bytes that contain a
segment base address.
For linking, however both segment base address and offset of the external symbol must be
computed by the linker. Hence there is no reduction in the linking requirements.
Relocation Algorithm
Algorithm
iii) Add relocation_factor to the operand address in the word with the address
address_in_work_area.
The computations performed in the algorithm are along the lines described… the only new
action is the computation of the work area address of the word requiring relocation(step2(e)
………._). Step2(f) increments program_linked_orign so that the next object module would
granted the next available load address.