MC0073-System Programming-Fall-10
MC0073-System Programming-Fall-10
Ans: Language processing activities arises to bridge the ideas of software designer with the actual execution on the
computer system.
of target program. We refer to the collection of language processor components engaged in analysis of source program
as the analysis phase of language processor. Components engaged in synthesizing a target program constitute the
synthesis phase.
The specification of source program forms the basis of source program analysis. The specification consist of three
components:
1. Lexical rules: which govern the formation of valid lexical units in the source language.
2. Syntax rules which govern the formation of valid statements in source languages
3. Semantic rules which associate meaning with valid statements of the language.
Thus analysis of source statement consists of lexical, syntax and semantic analysis.
Lexical analysis identifies the lexical units in a source statement. It then classifies the units into different lexical classes
e.g. ids, constants etc. and enters them into different tables. Lexical analysis builds a descriptor called token for each
lexical unit.
Syntax Analysis(Parsing)
Syntax analysis processes the string of tokens built by lexical analysis to determine the statement class e.g. assignment
statement, if statement etc. It then builds an IC which represents the structure of statement. The IC is then passed to
semantic analysis to determine the meaning of statement.
Semantic Analysis
Semantic analysis of declaration statements differs from the semantic analysis of imperative statements. The former
results in addition of information into the symbol table. The later identifies the sequence of actions necessary to
implement the meaning of source statement. In both the cases the structure of a source statement guides the
application of semantic rules.
The synthesis phase is concerned with the construction of target language statements which have the same meaning as
the source statement. Typically, this consists of two main activities:
The task of discovering the source program again is decomposed into subtasks:
Lex helps write programs whose control flow is directed by instances of regular expressions in input stream. It is well
suited of editor-script type transformations and for segmenting input in preparation for parsing routine.
Lex is a program generator designed for lexical processing of character input streams. It accepts a high-level, problem
oriented specification for character string matching, and produces a program in a general purpose language which
recognizes regular expressions. The regular expressions are specified by the user in the source specifications given to
Lex. The Lex written code recognizes these expressions in an input stream and partitions the input stream into strings
matching the expressions. At the boundaries between strings program sections provided by the user are executed. The
Lex source file associates the regular expressions and the program fragments. As each expression appears in the input to
the program written by Lex, the corresponding fragment is executed.
The user supplies the additional code beyond expression matching needed to complete his tasks, possibly including code
written by other generators. The program that recognizes the expressions is generated in the general purpose
programming language employed for the user's program fragments. Thus, a high level expression language is provided
to write the string expressions to be matched while the user's freedom to write actions is unimpaired. This avoids
forcing the user who wishes to use a string manipulation language for input analysis to write processing programs in the
same and often inappropriate string handling language.
acc provides a general tool for imposing structure on the input to a computer program. The Yacc user prepares a
specification of the input process; this includes rules describing the input structure, code to be invoked when these rules
are recognized, and a low-level routine to do the basic input. Yacc then generates a function to control the input
process. This function, called a parser , calls the user-supplied low-level input routine (the "lexical analyzer" ) to pick up
the basic items (called tokens ) from the input stream. These tokens are organized according to the input structure rules,
called "grammar rules" ; when one of these rules has been recognized, then user code supplied for this rule, an action ,
is invoked; actions have the ability to return values and make use of the values of other actions.
The input being read may not conform to the specifications. These input errors are detected as early as is theoretically
possible with a left-to-right scan; thus, not only is the chance of reading and computing with bad input data substantially
reduced, but the bad data can usually be quickly found. Error handling, provided as part of the input specifications,
permits the reentry of bad data, or the continuation of the input process after skipping over the bad data.
In some cases, Yacc fails to produce a parser when given a set of specifications. For example, the specifications may be
self contradictory, or they may require a more powerful recognition mechanism than that available to Yacc. The former
cases represent design errors; the latter cases can often be corrected by making the lexical analyzer more powerful, or
by rewriting some of the grammar rules. While Yacc cannot handle all possible specifications, its power compares
favorably with similar systems; moreover, the constructions which are difficult for Yacc to handle are also frequently
difficult for human beings to handle. Some users have reported that the discipline of formulating valid Yacc
specifications for their input revealed errors of conception or design early in the program development.
2. Define the following:
Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The
various addressing modes that are defined in a given instruction set architecture define how machine language
instructions in that architecture identify the operand (or operands) of each instruction. An addressing mode specifies
how to calculate the effective memory address of an operand by using information held in registers and/or constants
contained within a machine instruction or elsewhere.
A CISC (Complex Instruction Set Computer)is a instruction set architecture (ISA) for a microprocessor in which a set of
the low-level operations to the CPU like loading from memory, the arithmetic and logical operations, storing the results
in the memory etc.can be handled using a single instruction.
The CISC uses the complex addressing modes to allow the computing operations involving the data structures,memory
allocation,memory manipulation,array accesses etc to be combined into a single instruction using a high language
programming mode thereby resulting in a compact code and fewer calls to main memory.
Assembler Syntax: An
Address Register Indirect
This addressing mode specifies the operand in memory, the address of which is specified by one of the address registers .
EA = (An)
An = An + SIZE
EA = (An) + d16
Assembler Syntax: (d16, An)
ADDRESS REGISTER INDIRECT WITH INDEX (8-BIT)
This addressing mode specifies the operand in memory, the address of which is specified by one of the address registers
plus the value in an index register, plus the sign extended 8 bit displacement specified as part of the instruction.
EA = (An) + (Xn) + d8
Assembler Syntax: (d8, An, Xn.SIZE)
IMMEDIATE DATA
This addressing mode specifies the address of the operand in memory, the address follows the opcode. The address is
specified high order byte first. The immediate data size is either Byte, Word or Long.
EA = (PC) + d16
Assembler Syntax: (d16, PC)
PROGRAM COUNTER WITH INDEX
This addressing mode extends the program counter relative mode to include an index and offset value. The effective
address of the operand is the sum of the extension word, a sign extended 8-bit displacement integer, and the contents
of an index register. This effectively handles lists or tables.
EA = (PC) + (Xn) + d8
Assembler Syntax: (d8, PC, Xn)
IMPLICIT REFERENCE
Most 8086 instructions can operate on the 8086's general purpose register set. By specifying the name of the register as
an operand to the instruction, you may access the contents of that register. Consider the 8086 mov (move) instruction:
This instruction copies the data from the source operand to the destination operand. The eight and 16 bit registers are
certainly valid operands for this instruction. The only restriction is that both operands must be the same size. Now let's
look at some actual 8086 mov instructions:
The most common addressing mode, and the one that's easiest to understand, is the displacement-only (or direct)
addressing mode. The displacement-only addressing mode consists of a 16 bit constant that specifies the address of the
target location. The instruction mov al,ds:[8088h] loads the al register with a copy of the byte at memory location
8088h. Likewise, the instruction mov ds:[1234h],dl stores the value in the dl register to memory location 1234h:
The displacement-only addressing mode is perfect for accessing simple variables. Of course, you'd probably prefer using
names like "I" or "J" rather than "DS:[1234h]" or "DS:[8088h]". Well, fear not, you'll soon see it's possible to do just that.
As with the x86 [bx] addressing mode, these four addressing modes reference the byte at the offset found in the bx, bp,
si, or di register, respectively. The [bx], [si], and [di] modes use the ds segment by default. The [bp] addressing mode
uses the stack segment (ss) by default.
If bx contains 1000h, then the instruction mov cl,20h[bx] will load cl from memory location ds:1020h. Likewise, if bp
contains 2020h, mov dh,1000h[bp] will load dh from location ss:3020.
The offsets generated by these addressing modes are the sum of the constant and the specified register. The addressing
modes involving bx, si, and di all use the data segment, the disp[bp] addressing mode uses the stack segment by default.
As with the register indirect addressing modes, you can use the segment override prefixes to specify a different
segment:
The based indexed addressing modes are simply combinations of the register indirect addressing modes. These
addressing modes form the offset by adding together a base register (bx or bp) and an index register (si or di). The
allowable forms for these addressing modes are
mov al, [bx][si]
mov al, [bx][di]
mov al, [bp][si]
mov al, [bp][di]
Suppose that bx contains 1000h and si contains 880h. Then the instruction
mov al,[bx][si]
would load al from location DS:1880h. Likewise, if bp contains 1598h and di contains 1004, mov ax,[bp+di] will load the
16 bits in ax from locations SS:259C and SS:259D.
These addressing modes are a slight modification of the base/indexed addressing modes with the addition of an eight bit
or sixteen bit constant. The following are some examples of these addressing modes:
mov al, disp[bx][si]
mov al, disp[bx+di]
mov al, [bp+si+disp]
mov al, [bp][di][disp]
You may substitute di in the figure above to produce the [bx+di+disp] addressing mode.
You may substitute di in the figure above to produce the [bp+di+disp] addressing mode.
Suppose bp contains 1000h, bx contains 2000h, si contains 120h, and di contains 5. Then mov al,10h[bx+si] loads al from
address DS:2130; mov ch,125h[bp+di] loads ch from location SS:112A; and mov bx,cs:2[bx][di] loads bx from location
CS:2007.
Ans:
START 101
READ N 101
MOVER BREG, ONE 102
MOVEM BREG, TERM 103
END
In the above program , instruction corresponding to the statement ‘MOVER BREG,ONE’ is a forward reference. Hence
the instruction opcode and address of BREG will be assembled to reside into location 101. The need for inserting the
second operators address at the later stage can be indicated by adding an entry to the Table of Incomplete Instruction
(TII).
By the time the END statement is processed, the symbol table would contain the addresses of all symbols defined in the
source program and TII would contain information describing all forward references. The assembles can now process
each entry in TII to complete the concerned instruction.
For example entry ‘MOVER BREG,ONE’ would be processed by obtaining the address of ‘ONE’ from the symbol table and
inserting it in the operands address field of the instruction with assembled address 101. Alternatively, entries in TII can
be processed in incremental manner. Thus when some definition of some symbol symb is encountered, all forward
references to symb can be processed
Pass 1 performs analysis of the source program and synthesis of the intermediate representation while pass 2 processes
the intermediate representation to synthesize the target program.
Macro is a unit of specification for program generation through expansion. A macro represents a commonly used group
of statements in the source pro ramming language. The macro processor replaces each macro instruction with the
corresponding group of source language statements. This is called expanding of macros.
A macro name is an abbreviation, which stands for some related lines of code. Macros are useful for some following
purposes:
To simplify and reduce the amount of repetitive coding
To reduce errors caused by repetitive coding
To make an assembly program more readable
A macro consists of a name, a set of formal parameters and a body of code. The use of macro name with set of actual
parameters is replaced by some code generated by its body. This is called macro expansion.
Macros allows a programmer to define pseudo operations, typically operations that are generally desirable, are not
implemented as a part of processor instructions and can be implemented as a sequence of instructions. Each use of a
macro generates new program instructions.
If the macro has parameters, they are substituted into the macro body. The usual reason for doing this is to avoid the
overhead o a function call in simple cases where code is a lightweight enough that function call overhead has significant
overhead on performance.
Macros are similar to functions in that they can take arguments and in that they are call to lengthier set of instructions.
Unlike functions, macros are replaced by actual commands they represent when the program is prepared for execution.
Function instructions are copied into the program only once.
Macro Expansion
A macro call leads to macro expansion. During macro expansion, the macro statement is replaced by sequence of
assembly statements.
Macro Definitions
Expanded
Macro calls
Example:
INITZ Macro
MOV DS,AX
MOV ES,AX
ENDM
In the above program a macro call is shown in the middle of the figure ie INITZ, which is called during program
execution. Every macro begins with MACRO keyword at the beginning and ends with ENDM. Whenever a macro is called
the entire code is substituted in the program where it is called. So the resultant macro code is shown in the right most
side of the figure
Conditional Macro expansion means some sections of the program may be optional, either included or not in the final
program, dependent upon specified conditions. A reasonable use of conditional macro would be to combine two
versions of a program, one prints debugging information during test executions for the developer, another version for
production operation that only displays only result interest for the average user. A program fragment assembles the
instructions to print the Ax register only if Debug is true in given below:
C) Macro Parameters
Macros have any number of parameters as long as they fit on one line. Parameter names are local symbols, which are
known within the macro only. Outside the macro they have no meaning.
Syntax
<macro name> MACRO <parameter 1>…………<parameter n>
<body line>
<ENDM>
Arbitrary sequences of printable characters, not containing blanks , tabs, commas or semicolons
Quoted strings
Single printable characters, preceded by ‘!’ as escape character.
Character sequences, enclosed in literal brackets<…. > which may be arbitrary sequences of valid macro blanks,
commas and semicolons
Arbitrary sequence of valid macro arguments
Expression preceded by a ‘%’ character
During macro expansion these actual arguments replace the symbols of corresponding formal parameters, wherever
they are recognized in the macro body. The first argument replaces the symbol of the first parameter, second argument
replaces the symbol of second parameter and so on.
The number of arguments, passed to a macro, can be less than the number of formal parameters. If the argument is
omitted, the corresponding format parameter is replaced by an empty string. If arguments than the last one is omitted
they can be represented by commas.
Macro parameter support code reuse, allowing one macro definition to implement multiple algorithms. In the following
algorithm, .DIV macro has a single parameter N. When the macro is used in the program the actual parameter used is
substituted for the formal parameter defined in the macro prototype during macro expansion. Now the same macro
when expanded can produce code to divide by any unsigned integer.
Macro Parameters
Div Bx
ENDM