0% found this document useful (0 votes)
30 views14 pages

Design of Assembler For Any 15 Instructions of 8086 Using C

Assembler is a software which translates an assembly language code into its equivalent machine language code. It transforms basic commands of the computer into binary code, which the computer’s processor may then use to perform basic operations. Language in which these instructions are written is known as assembly language. In this paper, we have made an attempt to design an assembler for instructions of an 8086-microprocessor using C/C++.

Uploaded by

hitarthpatel001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views14 pages

Design of Assembler For Any 15 Instructions of 8086 Using C

Assembler is a software which translates an assembly language code into its equivalent machine language code. It transforms basic commands of the computer into binary code, which the computer’s processor may then use to perform basic operations. Language in which these instructions are written is known as assembly language. In this paper, we have made an attempt to design an assembler for instructions of an 8086-microprocessor using C/C++.

Uploaded by

hitarthpatel001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Design of assembler for any 15 instructions of 8086

using C/C++
Rutul Gandhi Hitarth Patel
19BEC033 19BEC039
Dept. of Electronics and Communication Dept. of Electronics and Communication
Institute of Technology,Nirma University Institute of Technology,Nirma University
[email protected] [email protected]

Abstract—Assembler is a software which translates an assem- A. Function of Assembler


bly language code into its equivalent machine language code. It
Assembler converts assembly language programs into its
transforms basic commands of the computer into binary code,
which the computer’s processor may then use to perform basic corresponding object code. Code written in assembly language
operations. Language in which these instructions are written is given as an input to the assembler and it gives object code
is known as assembly language. In this paper, we have made as an output. Since, the language used is mnemonic language,
an attempt to design an assembler for instructions of an 8086 assembler design depends on machine architecture.
microprocessor using C/C++.
Index Terms— Assembler, binary code, microprocessor, in-
structions.

I. I NTRODUCTION
Fig. 1. Assembler
Instruction is the language used to command a computer Function of a basic assembler are:
architecture and instruction set is the vocabulary of that
• Translation of mnemonic language code to its correspond-
language. The only way computers can represent information
is based on the level of electric signal, it may be high or low. ing object code.
• Assignment of machine addresses to corresponding sym-
Considering the limitation of 2 alternatives, the instructions
in the computer are represented using binary digits, i.e. 1 or bolic labels.
0. Such representation of instructions as a sequence of bits The processes performed inside the assembler are:
is known as machine language. To make it understandable by • Scanning (also known as tokenizing).
humans, we have an equivalent natural language known as • Parsing is the process of validating the instructions.
assembly language notation. There are 8 types of instructions • Creating the symbol table.
supported by 8086 microprocessor, some of which are data • Resolving the forward references.
transfer, arithmetic, bit manipulation, branch and loop instruc- • Converting into the machine language.
tions. Here, we have implemented some of these instructions In other words, Design of Assembler is:
and generated its equivalent machine instructions. • Converting mnemonic opcodes to its equivalent machine
language.
II. A SSEMBLY L ANGUAGE • Converting symbolic operands to its corresponding ma-
chine address.
An assembly language gives instructions to the processors • Converting data constants to the corresponding internal
for performing various tasks. It is unique for any processor. machine representation.
Assembly language is almost similar to Machine language • Writing object program and assembly listing.
but has easy language and code. Since machine language
comprises of 0s and 1s, it is difficult to write a program B. Types of Assembler
using it. Assembly language code can be written by using a Assembler is classified on the basis of a number of stages
compiler.It makes use of opcode for the instructions. Opcode it uses to convert assembly level language to machine level
primarily provides information about the specific instruction. language:
Opcode is represented in terms of symbols. This symbolic • One-Pass Assembler: One-Pass Assembler accomplishes
representation of opcode is known as Mnemonics which is the conversion of assembly level code to machine level
used by the programmer to remember the operation. code in a single step.
• Multi-Pass/Two-Pass Assembler: Multi-Pass or Two-Pass
Assembler assemblers first process the assembly level
code and then store its value in the opcode table and
symbol table. In the next step, machine level code is
generated using the opcode table and symbol table.
– Pass 1
∗ Defines Symbol table and Opcode table.
∗ Keep track of the location counter.
∗ Processing of pseudo instructions.
∗ Allocate address to each statement.
∗ Save the address allocated to all labels which are
to be used in Pass-2.
– Pass 2
∗ Conversion of the symbolic opcode into its corre-
sponding numeric opcode.
∗ Generation of machine code according to the
values of symbols and literals.
∗ Processes the assembler directives not done during
the Pass-1.
∗ Writing object program and assembly listing.

C. Assembler Design
Generating the symbol table and resolving forward refer-
ences should be taken care of while designing the assembler.
• Symbol Table:

– Created during Pass-1.


– All the labels are symbols.
– All of the instructions’ labels are symbols.
• Forward reference

– Symbols defined in the later part of the program are


referred to as forward referencing.
– In pass-1, there will be no address value for such
symbols in the symbol table.

III. I MPLEMENTATION
The objective is to design an assembler for an 8086 mi-
croprocessor using C language. Various features of C such
as file handling, hashing, data structure, pointers, linked list,
array, string functions are incorporated. The opcodes are stored
in the hash table. Hash table is used to map the keys to
its corresponding values. On the basis of hash table index,
the values can be stored at appropriate locations. It can be
implemented using an array of linked lists. The structure
named ‘Opcode’ is used for hashing using chaining. The
symbol table is made using a linked list to save space. A
function named ‘conBin’ is used to convert decimal to binary.
Hash table is used to store the opcodes being read. It
is generated using the following functions: ‘getHashIndex’,
‘insertAtIndex’, and ‘insertIntoHashMap’. It contains the in-
struction, code and format. Symbol table is generated using
the functions ‘getAddressCode’, ‘getRegisterCode’ and ‘get-
ConstantCode’. A specific 5 bit code is assigned to all the
registers and address bits. First pass is for the generation of
symbol tables and second for the generation of binary codes.
Here, we have created two text files, one containing the input
instructions and other containing the input opcodes. Fig. 2. Input Instructions
Fig. 3. Symbol Table

Using the concept of file handling the instructions will


be read from these files and an output file containing the
machine code will be created. A file containing the symbol
table will also be created on running this program. To store
the information about the occurrence of various entities such
as variables, function names, objects and classes, the compiler
creates and maintains a data structure called symbol table.

IV. R ESULT AND A NALYSIS


The output files named ‘output machine code’ and ‘symbol
table’ are generated on running the program. First file contains
the instruction and its corresponding machine code along with
its format. The format of the instruction depends on the type
of operands. Second file contains the symbol assigned to each
label in the given input instructions. The screenshot of the
output is attached here for reference. The software used to run
the C program was Visual Studio Code. Hash map is shown
as output in VS Code to verify all the instructions and their
formats.

V. C ONCLUSION
In this paper, we have designed an assembler which can
detect the syntax error(if any) in the input instructions. The
output machine code file generated will be empty in such
cases. The assembler can generate the machine code for the
given set of instructions. On changing the instructions, the
machine code will change accordingly. Thus, this program can Fig. 4. Machine Code
generate machine code for any set of instructions provided by
the user.
ACKNOWLEDGEMENT
We would like to express our gratitude to Prof. Dhaval Shah
and Prof. Sachin Gajjar who provided us the opportunity to
make research on the topic of our interest and present it in the
form of a paper.
We would also like to thank them for their guidance and
support to us whenever it was needed. At the end we would
also like to thank the authors of the research papers that we
referred to and gained relevant information from it.
R EFERENCES
[1] Liu, Yu-Cheng, and Glenn A. Gibson. Microcomputer systems: The
8086/8088 family: Architecture, programming, and design. Prentice-
Hall, Inc., 2000.
[2] Carthy, Joe. An introduction to assembly language programming and
computer architecture. International Thomson Computer Press, 1995.
[3] 8086 Logical Instructions with Assembly Programming Examples (mi-
crocontrollerslab.com).
[4] C-Language and Subroutines (8086) (unb.ca).
[5] Instruction Set of 8086 - javatpoint.
[6] http://eceweb.ucsd.edu/ gert/ece30/CN2.pdf
[7] What is an Assembler? Assembly Language , Types, Differences
(toppr.com).
[8] C++ — asm declaration - GeeksforGeeks.
[9] Using Inline Assembly in C/C++ - CodeProject.
APPENDIX
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

struct Opcode{ //This node is used for Hashing using Chaining


char name[10];
char code[35];
char format[5];
struct Opcode *next;
};
struct Symbol{ //Symbol Table is made using Linked List to save space
char name[50];
int add;
struct Symbol *next;
};
typedef struct Opcode Opcode;
typedef struct Symbol Symbol;
Symbol *head=NULL;
Opcode *hash_table[13] = {NULL};
void reverseArray(int arr[], int start, int end){
int temp;
while (start < end){
temp = arr[start];
arr[start] = arr[end];
arr[end] = temp;
start++;
end--;
}
}
int* conBin(int num){ //function to convert in binary
int t;
int i, j;
int *bin;
bin=(int*)malloc(10*sizeof(int));
for(i=0; i<10; i++){
bin[i]=0;
}
i=9;
t = num;
while(t!=0){
bin[i--]= t % 2;
t = t / 2;
}
return bin;
}
char* convertTo5BitBinaryString(int decimal){ //This decimal is between 0 and
31
printf("bitbinary function receives %d\n",decimal);
char *str = (char *)malloc(5*sizeof(char));
int d[5]={0};
int i=0,j=0;
while(decimal>0){
d[i]=decimal%2;
i++;
decimal=decimal/2;
}
int size = i;
int k=0;
int s=0;
reverseArray(d,0,4);
for(s=0;s<5;s++){
printf("%d",d[s]);
str[s] = d[s] + '0';
}
printf("\n");
printf("%s",str);
return str;
}
/*******************************************************************
HASH TABLE IS USED TO STORE THE OPCODES BEING READ
*******************************************************************/
int getHashIndex(char name[]){
int sum=0,i=0;
while(name[i]!='\0')
sum+=name[i++];
return sum%13;
}
void insertAtIndex(Opcode *Node,int index){
if(hash_table[index] == NULL){ //boundary condition
hash_table[index] = Node;
Node->next = NULL;
}
else{
Opcode* temp = hash_table[index];
while(temp->next != NULL){
temp = temp->next;
}
temp->next = Node;
Node->next=NULL;
}
}
void insertIntoHashMap(Opcode *Node){
int index = getHashIndex(Node->name);
insertAtIndex(Node,index);
}
int *getAddressCode(char* temp){
Symbol * t = head;
int * val;
int num;
while(t != NULL){
if(!strcmp(temp,t->name)){
num = t->add;
break;
}
t = t->next;
}
val = conBin(num);
return val;
}
char *getRegisterCode(char *temp){
char *s;
if (strcmp(temp,"R0") == 0)
s = "00000";
else if (strcmp(temp,"R1") == 0)
s = "00001";
else if (strcmp(temp,"R2") == 0)
s = "00010";
else if (strcmp(temp,"R3") == 0)
s = "00011";
else if (strcmp(temp,"R4") == 0)
s = "00100";
else if (strcmp(temp,"R5") == 0)
s = "00101";
else if (strcmp(temp,"R6") == 0)
s = "00110";
else if (strcmp(temp,"R7") == 0)
s = "00111";
else if (strcmp(temp,"R8") == 0)
s = "01000";
else if (strcmp(temp,"R9") == 0)
s = "01001";
else if (strcmp(temp,"R10") == 0)
s = "01010";
else if (strcmp(temp,"R11") == 0)
s = "01011";
else if (strcmp(temp,"R12") == 0)
s = "01100";
else if (strcmp(temp,"R13") == 0)
s = "01101";
else if (strcmp(temp,"R14") == 0)
s = "01110";
else if (strcmp(temp,"R15") == 0)
s = "01111";
else if (strcmp(temp,"A1") == 0)
s = "10000";
else if (strcmp(temp,"A2") == 0)
s = "10001";
else if (strcmp(temp,"A3") == 0)
s = "10010";
else if (strcmp(temp,"A4") == 0)
s = "10011";
else if (strcmp(temp,"port0") == 0)
s = "10100";
else if (strcmp(temp,"port1") == 0)
s = "10101";
return s;
}
char *getConstantCode(int temp){
return convertTo5BitBinaryString(temp);
}
struct Opcode* getOpcodeNode(char *op){
Opcode* temp = NULL;
int index = getHashIndex(op); //get hash-index for the opcode in the hash
table
if(hash_table[index] == NULL){
printf("Wrong Opcode");
return NULL;
}
else{
temp = hash_table[index];
while(strcmp(temp->name,op)!=0 && temp!=NULL){ //loop until the opcode is not
found or the temp pointer not pointing to NULL
temp = temp->next;
}
if(temp == NULL){
printf("Opcode not found!");
return NULL;
}
else{
return temp;
}
}
}
char *getOpcodeFormat(Opcode *temp){
return temp->format;
}
int main(){
FILE *input_opcode;
FILE *output_machine_code;
FILE *input_instructions;
int ilc=0; //Instruction Location Counter
int base = 0;
char c,c2,c3,temp;
char opcode[100];
char machine_code[100];
char format[5];
input_opcode = fopen("input_opcode.txt","r+"); //input_opcode contains a list
of opcodes followed by their format and mac.code
if (input_opcode == NULL)
printf("FILE OPENING PROBLEM");
char test1[50];
do{
c = fscanf(input_opcode,"%s",opcode);//Assuming to get an opcode as a string
in opcode array
c2= fscanf(input_opcode,"%s",machine_code);//Assuming to get an integer as a
string in machine_code array
c3= fscanf(input_opcode,"%s",format);
//Create node of each string
struct Opcode* Node = (struct Opcode *) malloc(sizeof(Opcode));
strcpy(Node->name,opcode); //Name of the opcode is fed
strcpy(Node->code,machine_code); //Machine code of the opcode is fed
strcpy(Node->format,format); //Format of the opcode is fed
// printf("BEFORE INSERTING NAME:: %s ,CODE:: %s and format",Node->name,Node-
>code,Node->format);
insertIntoHashMap(Node);
}while(fgets(test1, sizeof test1, input_opcode)!=NULL);
//At this point we have a hash-map of Opcodes
printf("Hash-map Created Successfully!\n");
/*TEST:: PRINTING HASHTABLE with hashcode*/
int i=0;
for(i=0;i<13;i++){
if(hash_table[i]!=NULL){
Opcode* temp = hash_table[i];
while(temp!=NULL){
printf("Instruction: %s\tCODE: %s\tformat: %s \n",temp->name,temp->code,temp-
>format);
temp = temp->next;
}
}
}
printf("Now reading Opcodes and Converting them to machine codes\n");
input_instructions = fopen("input_instructions.txt","r+");
output_machine_code = fopen("output_machine_code.txt","w+");
char k;
char op[100];
/*****************************************************************************
******************/
/*First pass for generation of symbol table*/
while ( fgets ( op, sizeof op, input_instructions ) != NULL ){ /* read a line
*/
int l=0;
while(op[l+1]!='\0'){
if(op[l]==':'){ //Its a label
Symbol *t;
struct Symbol *temp = (struct Symbol*) malloc(sizeof(Symbol)); //dynamic
memory allocation for a node of symbol
int i=0;
for(;i<l;i++)
temp->name[i] = op[i];
temp->name[i] = '\0';
temp->add = ilc + 1 + base;
temp->next = NULL;
if(head == NULL) //boundary condition for implementing symbol table using
linked list
head = temp;
else{ //adding new symbol node to existing table of nodes
t = head;
while(t->next!=NULL)
t= t->next;
t->next = temp;
}
//handle label
}
l++;
}
ilc++;
}
fclose(input_instructions);
/*****************************************************************************
******************/
/*Second pass for generation of binary codes*/
input_instructions = fopen("input_instructions.txt","r+");
int * binary;
char test[100];
int count;
do{
k=fscanf(input_instructions,"%s",op);
printf("WORD SCANNED IS %s \n",op);
/*check if opcode or label*/
int l=0;
while(op[l+1]!='\0'){
l++;
}
if(op[l]==':'){ //Its a label
printf("Label Found!\n");
fprintf(output_machine_code,"\n");
//handle label
}
else{
char temp[100];
char temp2[100];
char temp3[100];
int temp4;
//handle opcode and print corresponding machine code
Opcode* current_node = getOpcodeNode(op);
fprintf(output_machine_code,"%s",current_node->code);//print machine code of
the opcode
if (strcmp("z",getOpcodeFormat(current_node))==0){ //ZERO OPERAND INSTRUCTION
fprintf(output_machine_code,"\n");//Do nothing
}
else if(strcmp("r",getOpcodeFormat(current_node))==0){ //ONE OPERAND REGISTER
OPERAND INSTRUCTION
k = fscanf(input_instructions,"%s",temp); //read corresponding register code
fprintf(output_machine_code,"%s",getRegisterCode(temp)); //write corresponding
register code in binary
fprintf(output_machine_code,"\n");
}
else if(strcmp("a",getOpcodeFormat(current_node))==0){ //ONE OPERAND ADDRESS
OPERAND INSTRUCTION
k = fscanf(input_instructions,"%s",temp);
binary = getAddressCode(temp); //write corresponding address in binary
for(count=0;count<10;count++){
fprintf(output_machine_code,"%d",binary[count]);
}
fprintf(output_machine_code,"\n");
}
else if(strcmp("rr",getOpcodeFormat(current_node))==0){ //TWO OPERAND REGISTER
REGISTER OPERAND INSTRUCTION
//printf("inside two");
k = fscanf(input_instructions,"%s",temp);
k = fscanf(input_instructions,"%s",temp2);
fprintf(output_machine_code,"%s",getRegisterCode(temp));
fprintf(output_machine_code,"%s",getRegisterCode(temp2));
fprintf(output_machine_code,"\n");
}
else if(strcmp("ri",getOpcodeFormat(current_node))==0){ //TWO OPERAND REGISTER
CONSTANT INSTRUCTION
k = fscanf(input_instructions,"%s",temp);
k = fscanf(input_instructions,"%d",&temp4);
fprintf(output_machine_code,"%s",getRegisterCode(temp));
// fprintf(output_machine_code,"%s",getConstantCode(temp4));
binary = conBin(temp4);
for(count=0;count<10;count++){
fprintf(output_machine_code,"%d",binary[count]);
}
fprintf(output_machine_code,"\n");
}
else if(strcmp("rrr",getOpcodeFormat(current_node))==0){ //THREE OPERAND
REGISTER-REGISTER-REGISTER INSTRUCTION
k = fscanf(input_instructions,"%s",temp);
k = fscanf(input_instructions,"%s",temp2);
k = fscanf(input_instructions,"%s",temp3);
fprintf(output_machine_code,"%s",getRegisterCode(temp));
fprintf(output_machine_code,"%s",getRegisterCode(temp2));
fprintf(output_machine_code,"%s",getRegisterCode(temp3));
fprintf(output_machine_code,"\n");
}
else if(strcmp("rri",getOpcodeFormat(current_node))==0){ //THREE OPERAND
REGISTER-REGISTER-INTERMEDIATE INSTRUCTION
k = fscanf(input_instructions,"%s",temp);
k = fscanf(input_instructions,"%s",temp2);
k = fscanf(input_instructions,"%d",&temp4);
fprintf(output_machine_code,"%s",getRegisterCode(temp));
fprintf(output_machine_code,"%s",getRegisterCode(temp2));
binary = conBin(temp4);
for(count=0;count<10;count++){
fprintf(output_machine_code,"%d",binary[count]);
}
fprintf(output_machine_code,"\n");
}
}
}while(fgets(test, sizeof test, input_instructions)!=NULL);
printf("\n\nSymbol Table\n\n");
fclose(input_instructions);
fclose(output_machine_code);
fclose(input_opcode);
/*PRINT SYMBOL TABLE*/
Symbol *p;
p=head;
FILE *f = fopen("symbol_table.txt","w+");
while(p!=NULL){
printf("%s :: ",p->name);
fprintf(f,"%s :: ",p->name);
printf("%d\n",p->add);
fprintf(f,"%d\n",p->add);
p = p->next;
}
return 0;
}
Input Files
Output Files

You might also like