0% found this document useful (0 votes)
228 views

MC0073-System Programming-Fall-10

The document discusses language processing fundamentals and tools. It describes the key stages of language processing as lexical analysis, syntax analysis, and semantic analysis. It also discusses two important language processing development tools - LEX for lexical analysis and YACC for syntax analysis. These tools take language specifications as input and generate programs to perform lexical and syntax analysis.

Uploaded by

Sayantan Indu
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
228 views

MC0073-System Programming-Fall-10

The document discusses language processing fundamentals and tools. It describes the key stages of language processing as lexical analysis, syntax analysis, and semantic analysis. It also discusses two important language processing development tools - LEX for lexical analysis and YACC for syntax analysis. These tools take language specifications as input and generate programs to perform lexical and syntax analysis.

Uploaded by

Sayantan Indu
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

1.

Describe the following with respect to Language Specification:

A) Fundamentals of Language Processing

B) Language Processor development tools

Ans: Language processing activities arises to bridge the ideas of software designer with the actual execution on the
computer system.

A) Fundamentals of Language Processing :


Language processing is the analysis of source program and the synthesis

of target program. We refer to the collection of language processor components engaged in analysis of source program
as the analysis phase of language processor. Components engaged in synthesizing a target program constitute the
synthesis phase.

The specification of source program forms the basis of source program analysis. The specification consist of three
components:

1. Lexical rules: which govern the formation of valid lexical units in the source language.
2. Syntax rules which govern the formation of valid statements in source languages
3. Semantic rules which associate meaning with valid statements of the language.

Thus analysis of source statement consists of lexical, syntax and semantic analysis.

Lexical analysis (Scanning)

Lexical analysis identifies the lexical units in a source statement. It then classifies the units into different lexical classes
e.g. ids, constants etc. and enters them into different tables. Lexical analysis builds a descriptor called token for each
lexical unit.

Syntax Analysis(Parsing)

Syntax analysis processes the string of tokens built by lexical analysis to determine the statement class e.g. assignment
statement, if statement etc. It then builds an IC which represents the structure of statement. The IC is then passed to
semantic analysis to determine the meaning of statement.

Semantic Analysis

Semantic analysis of declaration statements differs from the semantic analysis of imperative statements. The former
results in addition of information into the symbol table. The later identifies the sequence of actions necessary to
implement the meaning of source statement. In both the cases the structure of a source statement guides the
application of semantic rules.

The synthesis phase is concerned with the construction of target language statements which have the same meaning as
the source statement. Typically, this consists of two main activities:

 Creation of data structures in the target program


 Generation of target code

We refer to these activities as memory allocation and code generation respectively.

B) Language Processor development tools


Ans: There are two LPDTs widely in practice. These are Lexical Generator LEX and parser generator YACC. The input to
these tools is the specification of the lexical and syntactic constructs of L, and the semantic actions to perform on
recognizing the constructs

Compiler or Interpreter of a programming language is often decomposed into two parts:

1. Read the source program and discover its structure.


2. Process this structure

The task of discovering the source program again is decomposed into subtasks:

1. Split the source file into TOKENS(Lex)


2. Find the hierarchical structure of the program(YACC)

Lex- Lexical analysis generator

Lex helps write programs whose control flow is directed by instances of regular expressions in input stream. It is well
suited of editor-script type transformations and for segmenting input in preparation for parsing routine.

Lex is a program generator designed for lexical processing of character input streams. It accepts a high-level, problem
oriented specification for character string matching, and produces a program in a general purpose language which
recognizes regular expressions. The regular expressions are specified by the user in the source specifications given to
Lex. The Lex written code recognizes these expressions in an input stream and partitions the input stream into strings
matching the expressions. At the boundaries between strings program sections provided by the user are executed. The
Lex source file associates the regular expressions and the program fragments. As each expression appears in the input to
the program written by Lex, the corresponding fragment is executed.

The user supplies the additional code beyond expression matching needed to complete his tasks, possibly including code
written by other generators. The program that recognizes the expressions is generated in the general purpose
programming language employed for the user's program fragments. Thus, a high level expression language is provided
to write the string expressions to be matched while the user's freedom to write actions is unimpaired. This avoids
forcing the user who wishes to use a string manipulation language for input analysis to write processing programs in the
same and often inappropriate string handling language.

Yacc(Yet another-compiler to compiler)

acc provides a general tool for imposing structure on the input to a computer program. The Yacc user prepares a
specification of the input process; this includes rules describing the input structure, code to be invoked when these rules
are recognized, and a low-level routine to do the basic input. Yacc then generates a function to control the input
process. This function, called a parser , calls the user-supplied low-level input routine (the "lexical analyzer" ) to pick up
the basic items (called tokens ) from the input stream. These tokens are organized according to the input structure rules,
called "grammar rules" ; when one of these rules has been recognized, then user code supplied for this rule, an action ,
is invoked; actions have the ability to return values and make use of the values of other actions.

The input being read may not conform to the specifications. These input errors are detected as early as is theoretically
possible with a left-to-right scan; thus, not only is the chance of reading and computing with bad input data substantially
reduced, but the bad data can usually be quickly found. Error handling, provided as part of the input specifications,
permits the reentry of bad data, or the continuation of the input process after skipping over the bad data.

In some cases, Yacc fails to produce a parser when given a set of specifications. For example, the specifications may be
self contradictory, or they may require a more powerful recognition mechanism than that available to Yacc. The former
cases represent design errors; the latter cases can often be corrected by making the lexical analyzer more powerful, or
by rewriting some of the grammar rules. While Yacc cannot handle all possible specifications, its power compares
favorably with similar systems; moreover, the constructions which are difficult for Yacc to handle are also frequently
difficult for human beings to handle. Some users have reported that the discipline of formulating valid Yacc
specifications for their input revealed errors of conception or design early in the program development.
2. Define the following:

A) Addressing modes for CISC (Motorola and Intel)

B) Addressing modes for RISC Machines.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The
various addressing modes that are defined in a given instruction set architecture define how machine language
instructions in that architecture identify the operand (or operands) of each instruction. An addressing mode specifies
how to calculate the effective memory address of an operand by using information held in registers and/or constants
contained within a machine instruction or elsewhere.

A) Addressing modes for CISC (Motorola and Intel)

A CISC (Complex Instruction Set Computer)is a instruction set architecture (ISA) for a microprocessor in which a set of
the low-level operations to the CPU like loading from memory, the arithmetic and logical operations, storing the results
in the memory etc.can be handled using a single instruction.

The CISC uses the complex addressing modes to allow the computing operations involving the data structures,memory
allocation,memory manipulation,array accesses etc to be combined into a single instruction using a high language
programming mode thereby resulting in a compact code and fewer calls to main memory.

68000 allows wide variety of addressing modes:

Address/Data Register Direct


These addressing modes specify the operand as one of sixteen general purpose registers or one of six control registers
(SR, VBR, SFC, DFC, CACR, CAAR).

 DATA REGISTER DIRECT


The operand is found in the data register specified by the instruction.

 EA = Dn (effective address is found in a data register)
 Assembler Syntax: Dn

ADDRESS REGISTER DIRECT


The operand is found in the address register specified by the instruction.

EA = An (effective address is found in an address register)

Assembler Syntax: An
Address Register Indirect
This addressing mode specifies the operand in memory, the address of which is specified by one of the address registers .

The operand is found in the address specified by an address register.

EA = (An)

Assembler Syntax: (An)

ADDRESS REGISTER INDIRECT WITH POSTINCREMENT


This addressing mode specifies the operand in memory, the address of which is specified by one of the address registers.
After the operand is used, the value in the address register is incremented according to the size of the operand .

ADDRESS REGISTER INDIRECT WITH PREDECREMENT


This addressing mode specifies the operand in memory, the address of which is specified by one of the address registers.
Before the operand is used, the value in the address register is decremented according to the size of the operand.

The operand is found in the address specified by an address register.


EA = (An)

An = An + SIZE

Assembler Syntax: -(An)

ADDRESS REGISTER INDIRECT WITH DISPLACEMENT


This addressing mode specifies the operand in memory, the address of which is specified by one of the address registers
plus the sign extended 16 bit displacement specified as part of the instruction.

EA = (An) + d16
Assembler Syntax: (d16, An)
ADDRESS REGISTER INDIRECT WITH INDEX (8-BIT)
This addressing mode specifies the operand in memory, the address of which is specified by one of the address registers
plus the value in an index register, plus the sign extended 8 bit displacement specified as part of the instruction.

EA = (An) + (Xn) + d8
Assembler Syntax: (d8, An, Xn.SIZE)

ABSOLUTE ADDRESSING MODES


The address of the operand is specified in the extension word(s) as part of the instruction.

 ABSOLUTE SHORT ADDRESS


This addressing mode specifies the address of the operand in memory, the address of which is specified by one
extension word which follow the opcode. The 16 bit address is signed extended to 32 bits before being used.
 ABSOLUTE LONG ADDRESS
This addressing mode specifies the address of the operand in memory, the address of which is specified by two
extension words which follow the opcode. The address is specified high order byte first.

IMMEDIATE DATA
This addressing mode specifies the address of the operand in memory, the address follows the opcode. The address is
specified high order byte first. The immediate data size is either Byte, Word or Long.

PROGRAM COUNTER WITH DISPLACEMENT


This addressing mode permits memory to be accessed relative to the current value of the Program Counter. The major
use is for jumps in position independant code, and reading constants in code segments.

EA = (PC) + d16
Assembler Syntax: (d16, PC)
PROGRAM COUNTER WITH INDEX
This addressing mode extends the program counter relative mode to include an index and offset value. The effective
address of the operand is the sum of the extension word, a sign extended 8-bit displacement integer, and the contents
of an index register. This effectively handles lists or tables.

EA = (PC) + (Xn) + d8
Assembler Syntax: (d8, PC, Xn)

IMPLICIT REFERENCE

SR = Status Register (16bit)


CCR = Condition Code Register (8bit)

B) Addressing modes for RISC Machines.

Register Addressing Modes

Most 8086 instructions can operate on the 8086's general purpose register set. By specifying the name of the register as
an operand to the instruction, you may access the contents of that register. Consider the 8086 mov (move) instruction:

mov destination, source

This instruction copies the data from the source operand to the destination operand. The eight and 16 bit registers are
certainly valid operands for this instruction. The only restriction is that both operands must be the same size. Now let's
look at some actual 8086 mov instructions:

mov dl, al ;Copies the value from AL into DL


mov si, dx ;Copies the value from DX into SI

The Displacement Only Addressing Mode

The most common addressing mode, and the one that's easiest to understand, is the displacement-only (or direct)
addressing mode. The displacement-only addressing mode consists of a 16 bit constant that specifies the address of the
target location. The instruction mov al,ds:[8088h] loads the al register with a copy of the byte at memory location
8088h. Likewise, the instruction mov ds:[1234h],dl stores the value in the dl register to memory location 1234h:

The displacement-only addressing mode is perfect for accessing simple variables. Of course, you'd probably prefer using
names like "I" or "J" rather than "DS:[1234h]" or "DS:[8088h]". Well, fear not, you'll soon see it's possible to do just that.

The Register Indirect Addressing Modes


The 80x86 CPUs let you access memory indirectly through a register using the register indirect addressing modes. There
are four forms of this addressing mode on the 8086, best demonstrated by the following instructions:

mov al, [bx]


mov al, [bp]
mov al, [si]
mov al, [di]

As with the x86 [bx] addressing mode, these four addressing modes reference the byte at the offset found in the bx, bp,
si, or di register, respectively. The [bx], [si], and [di] modes use the ds segment by default. The [bp] addressing mode
uses the stack segment (ss) by default.

Indexed Addressing Modes

The indexed addressing modes use the following syntax:

mov al, disp[bx]

mov al, disp[bp]


mov al, disp[si]
mov al, disp[di]

If bx contains 1000h, then the instruction mov cl,20h[bx] will load cl from memory location ds:1020h. Likewise, if bp
contains 2020h, mov dh,1000h[bp] will load dh from location ss:3020.

The offsets generated by these addressing modes are the sum of the constant and the specified register. The addressing
modes involving bx, si, and di all use the data segment, the disp[bp] addressing mode uses the stack segment by default.
As with the register indirect addressing modes, you can use the segment override prefixes to specify a different
segment:

mov al, ss:disp[bx]


mov al, es:disp[bp]

Based Indexed Addressing Modes

The based indexed addressing modes are simply combinations of the register indirect addressing modes. These
addressing modes form the offset by adding together a base register (bx or bp) and an index register (si or di). The
allowable forms for these addressing modes are
mov al, [bx][si]
mov al, [bx][di]
mov al, [bp][si]
mov al, [bp][di]
Suppose that bx contains 1000h and si contains 880h. Then the instruction
mov al,[bx][si]
would load al from location DS:1880h. Likewise, if bp contains 1598h and di contains 1004, mov ax,[bp+di] will load the
16 bits in ax from locations SS:259C and SS:259D.

Based Indexed Plus Displacement Addressing Mode

These addressing modes are a slight modification of the base/indexed addressing modes with the addition of an eight bit
or sixteen bit constant. The following are some examples of these addressing modes:
mov al, disp[bx][si]
mov al, disp[bx+di]
mov al, [bp+si+disp]
mov al, [bp][di][disp]
You may substitute di in the figure above to produce the [bx+di+disp] addressing mode.

You may substitute di in the figure above to produce the [bp+di+disp] addressing mode.

Suppose bp contains 1000h, bx contains 2000h, si contains 120h, and di contains 5. Then mov al,10h[bx+si] loads al from
address DS:2130; mov ch,125h[bp+di] loads ch from location SS:112A; and mov bx,cs:2[bx][di] loads bx from location
CS:2007.

3. Explain the design of single pass and mutli pass assemblers.

Ans:

Single Pass Assemblers:


A single pass assembler scans the program only once and creates the equivalent binary program. The assembler
substitute all of the symbolic instruction with machine code in one pass.
LC processing and construction of the symbol table proceed as in two pass translation. The problem of forward
references is tackled using a process called backpatch-ing. The operand field of an instruction containing a forward
reference is left blank initially. The address of the forward referenced symbol is put into this field when definition is
encountered.

START 101
READ N 101
MOVER BREG, ONE 102
MOVEM BREG, TERM 103
END

In the above program , instruction corresponding to the statement ‘MOVER BREG,ONE’ is a forward reference. Hence
the instruction opcode and address of BREG will be assembled to reside into location 101. The need for inserting the
second operators address at the later stage can be indicated by adding an entry to the Table of Incomplete Instruction
(TII).

By the time the END statement is processed, the symbol table would contain the addresses of all symbols defined in the
source program and TII would contain information describing all forward references. The assembles can now process
each entry in TII to complete the concerned instruction.

For example entry ‘MOVER BREG,ONE’ would be processed by obtaining the address of ‘ONE’ from the symbol table and
inserting it in the operands address field of the instruction with assembled address 101. Alternatively, entries in TII can
be processed in incremental manner. Thus when some definition of some symbol symb is encountered, all forward
references to symb can be processed

Design of a TWO PASS ASSEMBLER:

Task performed by the passes of a two pass assemblers are as follows:


Pass1:
 Separate the symbol, mnemonic opcode and operand fields
 Build the symbol table
 Perform LC processing
 Construct intermediate representation.
Pass 2: Synthesize the target program

Pass 1 performs analysis of the source program and synthesis of the intermediate representation while pass 2 processes
the intermediate representation to synthesize the target program.

Explain the following with respect to Macros and Macro Processors:

A) Macro Definition and Expansion B) Conditional Macro Expansion


C) Macro Parameters

Macro Definition and Expansion

Macro is a unit of specification for program generation through expansion. A macro represents a commonly used group
of statements in the source pro ramming language. The macro processor replaces each macro instruction with the
corresponding group of source language statements. This is called expanding of macros.

A macro name is an abbreviation, which stands for some related lines of code. Macros are useful for some following
purposes:
 To simplify and reduce the amount of repetitive coding
 To reduce errors caused by repetitive coding
 To make an assembly program more readable

A macro consists of a name, a set of formal parameters and a body of code. The use of macro name with set of actual
parameters is replaced by some code generated by its body. This is called macro expansion.

Macros allows a programmer to define pseudo operations, typically operations that are generally desirable, are not
implemented as a part of processor instructions and can be implemented as a sequence of instructions. Each use of a
macro generates new program instructions.

Macro-name MACRO <formal parameters>


<Macro body>
END M

If the macro has parameters, they are substituted into the macro body. The usual reason for doing this is to avoid the
overhead o a function call in simple cases where code is a lightweight enough that function call overhead has significant
overhead on performance.

Macros are similar to functions in that they can take arguments and in that they are call to lengthier set of instructions.
Unlike functions, macros are replaced by actual commands they represent when the program is prepared for execution.
Function instructions are copied into the program only once.
Macro Expansion

A macro call leads to macro expansion. During macro expansion, the macro statement is replaced by sequence of
assembly statements.

Macro Definitions

Expanded

Microprocessor source code Assembler

Source code with

Macro calls

Fig: Macro expansion

Example:

Macro Definition User program User program (after macro definition)

INITZ Macro

MOV AX @data INITZ MOV AX,@data

MOV DS,AX

MOV ES,AX

ENDM

Prototype(macro name) Macro call

In the above program a macro call is shown in the middle of the figure ie INITZ, which is called during program
execution. Every macro begins with MACRO keyword at the beginning and ends with ENDM. Whenever a macro is called
the entire code is substituted in the program where it is called. So the resultant macro code is shown in the right most
side of the figure

Conditional Macro Expansion

Conditional Macro expansion means some sections of the program may be optional, either included or not in the final
program, dependent upon specified conditions. A reasonable use of conditional macro would be to combine two
versions of a program, one prints debugging information during test executions for the developer, another version for
production operation that only displays only result interest for the average user. A program fragment assembles the
instructions to print the Ax register only if Debug is true in given below:

Display Ax when Debug is non-zero


Debug equ 1; debug is true
.
.
Mul Bx
If Debug ; Assemble only when debug is true
Push BX;
.
.
End if

C) Macro Parameters

Macros have any number of parameters as long as they fit on one line. Parameter names are local symbols, which are
known within the macro only. Outside the macro they have no meaning.

Syntax
<macro name> MACRO <parameter 1>…………<parameter n>

<body line>

<ENDM>

Valid macro arguments are

 Arbitrary sequences of printable characters, not containing blanks , tabs, commas or semicolons
 Quoted strings
 Single printable characters, preceded by ‘!’ as escape character.
 Character sequences, enclosed in literal brackets<…. > which may be arbitrary sequences of valid macro blanks,
commas and semicolons
 Arbitrary sequence of valid macro arguments
 Expression preceded by a ‘%’ character

During macro expansion these actual arguments replace the symbols of corresponding formal parameters, wherever
they are recognized in the macro body. The first argument replaces the symbol of the first parameter, second argument
replaces the symbol of second parameter and so on.

The number of arguments, passed to a macro, can be less than the number of formal parameters. If the argument is
omitted, the corresponding format parameter is replaced by an empty string. If arguments than the last one is omitted
they can be represented by commas.

Macro parameter support code reuse, allowing one macro definition to implement multiple algorithms. In the following
algorithm, .DIV macro has a single parameter N. When the macro is used in the program the actual parameter used is
substituted for the formal parameter defined in the macro prototype during macro expansion. Now the same macro
when expanded can produce code to divide by any unsigned integer.

Macro Parameters

Instruction needed real operators macro definitions use in program

Div N Mov Dc, 0 .Div Macro N Call GetDec$

Mov Bx,N Mov Bx,0 .Div 34

Div Bx Mov Bx, &N Call PutDec$

Div Bx

ENDM

You might also like