Computer Organization Assembler and Simulator CSE112 Project
Computer Organization Assembler and Simulator CSE112 Project
TEST CASES:
We will release some test cases with the assignment so that you can test your implementations.
During the evaluations, a superset of these test cases would be provided to you, on which you
will get graded.
DEADLINES:
You will have two deadlines for this assignment:
1. The mid evaluation:
a. By this deadline you must have the assembler ready.
b. You will be tested mostly on the test cases already provided to you with the
assignment.
c. However, we might add some other test cases as well.
d. You will only be evaluated on the assembler. (20%)
2. The final evaluation:
a. By this deadline you must have both the assembler and the simulator
ready(70%).
b. You should also have completed Q3(10%).
c. You will be evaluated on a much larger set of test cases this time.
d. You will also be evaluated on the bonus question at this stage.
❖ The mid evaluation will be worth 20% of your final assignment grade. The final
evaluation will be worth the rest 80% of your final assignment grade. The bonus will
be worth 10% making the total 110%.
GRADING:
● For Q1 and Q2: Grading will be based on the number of test cases that your
program passes.
a. Assembler: The test cases are divided into 3 sets:
i. ErrorGen: These tests are supposed to generate errors
ii. simpleBin: These are simple test cases which are supposed to
generate a binary.
iii. hardGen: These are hard test cases which are supposed to generate
a binary.
b. Simulator: The test cases are divided into 2 sets:
i. simpleBin: These are simple test cases which are supposed to
generate a trace.
ii. hardGen: These are hard test cases which are supposed to generate
a trace.
➔ The TA will grade the errorGen cases manually.
● For Q3:
a. Assembler: The test cases are divided into 3 sets:
■ ErrorGen: These tests are supposed to generate errors
■ simpleBin: These are simple test cases which are supposed to generate a
binary.
b. Simulator: The test cases are divided into 2 sets:
■ simpleBin: These are simple test cases which are supposed to generate a
trace.
➔ The TA will grade the errorGen cases manually.
For Q4:
● For the bonus question, you need to generate some graphs which you must show to the
TA during the final evaluation.
● The test cases for the bonus question will be provided to you separately well before the
date of the deadline.
● Till then you can make some new test cases to try this on your own.
For Q5:
● For the bonus question, you need a basic command line interface which you must show to
the TA during the final evaluation.
● The prompts or the format of the oupt (how it is printed) does not matter as long as the
outputs are correct.
● Test cases for the bonus question will not be provided to you , the testing will be done by
your TAs during your demo as you run the command line interface.
● In the meantime, you can make some new test cases to try this on your own.
EVALUATION PROCEDURE:
1. The date for the demo of the mid and final evaluation will be announced in due time.
2. On the day of your demo, a compressed archive of all tests will be shared with you on
the google classroom. This archive will include other test cases as well which will not be
provided to you beforehand.
3. On the day of evaluation, you must
a. Prove that you are not running code written after the deadline by running “git log
HEAD” which prints the date and time of the commit pointed to by the HEAD. You
must also run “git status” to show that you don’t have any uncommitted changes.
b. Prove the integrity of the tests archive by computing the sha256sum of the
archive. To compute the checksum, you can run “sha256sum
i. <path/to/the/archive>”. The TA will then match the checksum to verify the
integrity.
4. Then you can extract the archive and replace the “automatedTesting/tests” directory.
5. Then you need to execute the automated testing infrastructure, which will run all the
tests and finally print your score.
6. The TA will verify the correctness for the test cases which are supposed to generate
errors. You do not need to run these tests by yourself. The testing infrastructure will do
this automatically for you.
PLAGIARISM
1. Any copying of code from your peers or from the internet will invoke institute Plagiarism
policy.
2. Provide proper references if you're taking your code from some other resource. Needless
to say, if the said code is the main part of the assignment, You will be awarded 0 marks.
3. If you are found indulging in any bad practice to circumvent the above mentioned
evaluation procedure, you will be awarded 0 marks and institute plagiarism policy will be
applied.
Assignment Description
ISA description:
Consider a 16 bit ISA with the following instructions and opcodes, along with the syntax of an
assembly language which supports this ISA.
The ISA has 6 encoding types of instructions. The description of the types is given later.
where reg(x) denotes register, mem_addr is a memory address (must be an 8-bit binary
number), and Imm denotes a constant value (must be an 8-bit binary number). The ISA has 7
general purpose registers and 1 flag register. The ISA supports an address size of 8 bits, which
is double byte addressable. Therefore, each address fetches two bytes of data. This results in
a total address space of 512 bytes. This ISA only supports whole number arithmetic. If the
subtraction results in a negative number; for example “3 - 4”, the reg value will be set to 0 and
overflow bit will be set. All the representations of the number are hence unsigned.
The registers in assembly are named as R0, R1, R2, ... , R6 and FLAGS. Each register is 16
bits.
Note: “mov reg $Imm”: This instruction copies the Imm(8bit) value in the register’s lower
8 bits. The upper 8 bits are zeroed out.
Example:
Suppose R0 has 1110_1010_1000_1110 stored, and mov R0 $13 is
executed. The final value of R0 will be 0000_0000_0000_1101.
FLAGS semantics
The semantics of the flags register are:
● Overflow (V): This flag is set by add, sub and mul, when the result of the operation
overflows. This shows the overflow status for the last executed instruction.
● Less than (L): This flag is set by the “cmp reg1 reg2” instruction if reg1 < reg2
● Greater than (G): This flag is set by the “cmp reg1 reg2” instruction if the value of
reg1 > reg2
● Equal (E): This flag is set by the “cmp reg1 reg2” instruction if reg1 = reg2 The
default state of the FLAGS register is all zeros. If an instruction does not affect the FLAGS
register, then the state of the FLAGS register is reset to 0 upon the execution.
The only operation allowed in the FLAGS register is “mov reg1 FLAGS”, where reg1 can
be any of the registers from R0 to R6. This instruction reads FLAGS register and writes the
data into reg1. All other operations on the FLAGS register are prohibited.
The cmp instruction can implicitly write to the FLAGS register. Similarly, conditional
jump instructions can implicitly read the FLAGS register.
Example:
R0 has 5, R1 has 10
Implicit write: cmp R0 R1 will set the L (less than) flag in the FLAGS register. Implicit read:
jlt 10001001 will read the FLAGS register and figure out that the L flag was set, and
then jump to address 10001001.
Binary Encoding
The ISA has 6 types of instructions with distinct encoding styles. However, each instruction is of
16 bits, regardless of the type.
15 10 8 5 2 0
15 10 7 0
● Type C: 2 registers type
opcode unused reg1 reg2
(5 bits) (5 bits) (3 bits) (3 bits)
15 10 5 2 0
15 10 7 0
● Type F: halt
opcode unused
(5 bits) (11 bits)
15 10 0
R0 000
R1 001
R2 010
R3 011
R4 100
R5 101
R6 110
FLAGS 111
Executable binary syntax
The machine exposed by the ISA starts executing the code provided to it in the following
format, until it reaches hlt instruction. There can only be one hlt instruction in the whole
program, and it must be the last instruction. The execution starts from the 0th address. The ISA
follows von-neumann architecture with a unified code and data memory.
The variables must be allocated in the binary in the program order.
code
variables
Questions:
Q1: Assembler:
Program an assembler for the aforementioned ISA and assembly. The input to the assembler is
a text file containing the assembly instructions. Each line of the text file may be of one of 3
types:
● Empty line: Ignore these lines
● A label
● An instruction
● A variable definition
Each of these entities have the following grammar:
● The syntax of all the supported instructions is given above. The fields of an instruction are
whitespace separated. The instruction itself might also have whitespace before it. An
instruction can be one of the following:
● The opcode must be one of the supported mnemonic.
● A register can be one of R0, R1, … R6, and FLAGS.
● A mem_addr in jump instructions must be a label.
● A Imm must be a whole number <= 255 and >= 0.
● A mem_addr in load and store must be a variable.
● A label marks a location in the code and must be followed by a colon (:). No spaces are
allowed between label name and colon(:)
● A variable definition is of the following format:
var xyz
which declares a 16 bit variable called xyz. This variable name can be used in place of
mem_addr fields in load and store instructions. All variables must be defined at the
very beginning of the assembly program.
The assembler should be capable of:
1. Handling all supported instructions
2. Handling labels
3. Handling variables
4. Making sure that any illegal instruction (any instruction (or instruction usage) which is not
supported) results in a syntax error. In particular you must handle:
a. Typos in instruction name or register name
b. Use of undefined variables
c. Use of undefined labels
d. Illegal use of FLAGS register
e. Illegal Immediate values (more than 8 bits)
f. Misuse of labels as variables or vice-versa
g. Variables not declared at the beginning
h. Missing hlt instruction
i. hlt not being used as the last instruction
You need to generate distinct readable errors for all these conditions. If you find any
other illegal usage, you are required to generate a “General Syntax Error”. The
assembler must print out all these errors.
5. If the code is error free, then the corresponding binary is generated. The binary file is a
text file in which each line is a 16bit binary number written using 0s and 1s in ASCII. The
assembler can write less than or equal to 256 lines.
Input/Output format:
● The assembler must read the assembly program as an input text file (stdin).
● The assembler must generate the binary (if there are no errors) as an output text file
(stdout).
● The assembler must generate the error notifications along with line number on which the
error was encountered (if there are errors) as an output text file (stdout). In case of
multiple errors, the assembler may print any one of the errors.
Example of an assembly program
var X
mov R1 $10
mov R2 $100
mul R3 R1 R2
st R3 X
hlt
The above program will be converted into the following machine code
1001000100001010
1001001001100100
1011000011001010
1010101100000101
0101000000000000
Q2: Simulator:
You need to write a simulator for the given ISA. The input to the simulator is a binary file (the
format is the same as the format of the binary file generated by the assembler in Q1. The
simulator should load the binary in the system memory at the beginning, and then start
executing the code at address 0. The code is executed until hlt is reached. After execution of
each instruction, the simulator should output one line containing an 8 bit number denoting the
program counter. This should be followed by 8 space separated 16 bit binary numbers denoting
the values of the registers (R0, R1, … R6 and FLAGS).
<PC (8 bits)><space><R0 (16 bits)><space>...<R6 (16 bits)><space><FLAGS (16 bits)>.
The output must be written to stdout. Similarly, the input must be read from stdin. After the
program is halted, print the memory dump of the whole memory. This should be 256 lines, each
having a 16 bit value
while(not halted)
{
Instruction = MEM.getData(PC); // Get current instruction
halted, new_PC = EE.execute(Instruction); // Update RF compute new_PC
PC.dump(); // Print PC
RF.dump(); // Print RF state
PC.update(new_PC); // Update PC
}
MEM.dump() // Print memory state
Q3: Floating-Point Arithmetic:
Note:
● For moving 1.5 into reg1. The instruction(in assembly language) should be:
movf reg1 $1.5
● The students must only apply the operations for the floating-point numbers that can be
represented in the given system(8 bits), else they should report it as an error.
Q4: (Bonus) Memory Access Trace:
In Q2, generate a scatter plot with the cycle number on the x-axis and the memory address on
the y-axis. You need to plot which memory address is accessed at what time.
Note : Byte Addressable Memory is industry standard (if not mentioned use this as
default).
Imp Note : CPU is always word addressable(ie the number of bits of cpu indicate its
word size) so we need an interface to connect CPU with memory.
Your aim is to create a command line program to solve questions related to memory organization:
Firstly input:
1. The space in memory (eg 16 MB) Make sure you recognize inputs in this very format Mb
should be read as mega bits and MB as mega byte.
2. Then input how the memory is addressed as mentioned above (either of the four options)
TYPE 1:
Example -
Current input:
Output:
-2 ( ie 2 pins saved)
TYPE 2:
Input the following
Current input:
34 address pins
Output:
Example 2 -
Current input:
34 address pins
Output: