0% found this document useful (0 votes)

29 views10 pages

Archlab

Uploaded by

shichang719

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views10 pages

Archlab

Uploaded by

shichang719

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Arch Lab: Optimizing the Performance of a Pipelined

Processor

1 Introduction
In this lab, you will learn about the design and implementation of a pipelined Y86-64
processor, optimizing both it and a benchmark program to maximize performance. You
are allowed to make any semantics-preserving transformation to the benchmark program, or
to make enhancements to the pipelined processor, or both. When you have completed the
lab, you will have a deep understanding of the interactions between code and hardware that
affects the performance of your programs.
The lab is organized into three parts, each with its own handin. In Part A you will write
some simple Y86-64 programs and become familiar with the Y86-64 tools. In Part B, you
will first extend the SEQ simulator with new instructions, then explore the road of CPU
piplining. These two parts will prepare you for Part C, the heart of the lab, where you will
optimize the Y86-64 benchmark program and its architecture design.

2 Logistics
You will work on this lab alone. Any clarifications and revisions to the assignment will
be posted on the course Web page.

3 Handout Instructions
1. You can do the lab on a Linux system.

2. Start by copying the file archlab-handout.tar to a (protected) directory in which you

plan to do your work.

3. Then give the command: tar xvf archlab-handout.tar. This command extracts
files to the same directory of archlab-handout.tar. You should check README to find
out what these files are for.

4. Go to archlab-project directory and build the Y86-64 simulator tools with the help
of archlab-project/README.md.

1
4 Environment Installation
This project is written in Rust, so you’d have your Rust toolchain installed. If you
haven’t, please execute the following command to install rustup. This installation requires
network access. Therefore make sure to connect to the gateway via lcpu connect before
installation.
curl --proto ’=https’ --tlsv1.2 -sSf https://sh.rustup.rs | sh
You can verify the installation by executing the command rustup.
Now install the Rust toolchain by executing the following command (by the time of
writing, the latest stable version is 1.81):
rustup install 1.81
rustup default 1.81

5 Build the Project

Every time after applying changes to Rust source code (*.rs), you should
rebuild the project to make up-to-date binaries:
(cd archlab-project; cargo build)
fter running this command, a folder archlab-project/target will be created to store
the output binaries and other intermediate files. The output executables are
• Y86-64 Assembler: target/debug/yas
• Y86-64 Debugger: target/debug/ydb
• Y86-64 ISA Simulator: target/debug/yis
• Y86-64 Pipeline Simulator: target/debug/ysim
• Local Grader for Part A, Part B, and Part C: target/debug/grader

You can learn more about their usage in archlab-project/README.md.

6 Part A
You will be working in directory archlab-project/misc in this part. Your task is to
write and simulate the following three Y86-64 programs. The required behavior of these
programs is defined by the example C functions in examples.c. Be sure to put your name
and ID(e.g. 2300054321) in a comment at the beginning of each program. You can test your
programs by first assembling them with the program yas and then running them with the
instruction set simulator yis.
In all of your Y86-64 functions, you should follow the x86-64 conventions for passing
function arguments, using registers, and using the stack. This includes saving and restoring
any callee-save registers that you use.

2
sum.ys: Iteratively sum linked list elements
Write a Y86-64 program sum.ys that iteratively sums the elements of a linked list. Your
program should consist of some code that sets up the stack structure, invokes a function,
and then halts. In this case, the function should be Y86-64 code for a function (sum list)
that is functionally equivalent to the C sum list function. Test your program using the
following three-element list:

# Sample linked list

.align 8
ele1:
.quad 0x00d
.quad ele2
ele2:
.quad 0x0e0
.quad ele3
ele3:
.quad 0xf00
.quad 0

rsum.ys: Recursively sum linked list elements

Write a Y86-64 program rsum.ys that recursively sums the elements of a linked list. This
code should be similar to the code in sum.ys, except that it should use a function rsum list
that recursively sums a list of numbers, as shown with the C function rsum list. Test your
program using the same three-element list you used for testing list.ys.

bubble.ys: Sort a block of ints(8 byte) in ascending order using

bubble-sort
Write a program (bubble.ys) that sorts a block of ints(8 bytes) in-place, with the result
in ascending order. That is, you may copy the numbers from one part of memory to another,
but the answer should be in the same piece of memory.
Your program should consist of code that sets up a stack frame, invokes a function
bubble sort, and then halts. The function should be functionally equivalent to the C func-
tion bubble sort Test your program using the following six-element source and destination
blocks:

.align 8
Array:
.quad 0xbca
.quad 0xcba
.quad 0xacb
.quad 0xcab
.quad 0xabc
.quad 0xbac

3
7 Part B
You will be working in directory archlab-project/sim/src/architectures/extra.

Extend the SEQ architecture with iopq instruction

Your first task in Part B is to extend the SEQ processor to support the following instruc-
tions:

• iopq V, rB: compute rB op V and store the result in register rB.

To add these instructions, you will modify the file seq full.rs, which implements the
version of SEQ described in the CS:APP3e textbook. The format of iopq looks like a
combination of irmovq and opq. We’ve provide instruction code IOPQ for you. The function
code of iopq is the same as that of opq.
After implementing this extension, you can write iaddq,isubq,ixorq in your Y86 as-
sembly code.

Road to a pipelined architecture

The textbook describes little details about how the SEQ (or SEQ+) architecture changes
into the PIPE architecture. Therefore, in this part, you will be given 8 architectures in order:

• pipe_s2: An example of 2-stage pipeline, containing instruction fetch stage and all
the rest part as decode stage.

• pipe_s3a,pipe_s3b,pipe_s3c,pipe_s3d: 3-stage pipelines, containing instruction

fetch stage, decode stage and all the rest part as execute stage.

• pipe_s4a,pipe_s4b,pipe_s4c: 4-stage pipelines, containing instruction fetch stage,

decode stage, execute stage and all the rest part as memory stage.

Starting from pipe s2, which is modified from the SEQ+ architecture, each architec-
ture makes some minor improvements based on the previous one, which effectively shows a
possible evolution of CPU pipelining.
Among the above architectures, except pipe s2, some expressions in the HCL descrip-
tions are masked by placeholders. Your task is to read the HCL descriptions of each architec-
ture and replace each placeholder by a correct expression. There are 3 types of placeholders:
BOOL PLACEHOLDER, U8 PLACEHOLDER, and U64 PLACEHOLDER. The prefix of
a placeholder indicates the value type of its expression.
After completing all architectures, you should gain a deeper understanding of pipelining,
which can help you overcome some challenges in part C.
Suggestion: Read the comments around each placeholder carefully.

4
HCL (textbook & previous labs) HCL-rs (current lab)
D icode, E icode, F predPC, ... D.icode, E.icode, F.pred pc, ...
INOP, IHALT, IRRMOVQ, IIRMOVQ NOP, HALT, CMOVQ, IRMOVQ
SADR, SINS, SBUB Adr, Ins, Bub
word f_icode = [...]; u8 f_icode = [...];
word f_predPC = [...]; u64 f_pred_pc = [...];

Table 1: Differences between legacy HCL and HCL-rs

Notice
In order to provide a better lab experience with greater flexibility and modern develop-
ment practices, we reimplement the HCL parser and simulator in Rust. With the help of the
VSCode extension rust-analyzer, we can now enjoy syntax highlighting, auto-completion,
type checking, and error reporting when writing HCL source code.
As a result, the hardware control language used in this lab (HCL-rs) is a bit different from
previous one, whose specification can be found at archlab-project/assets/hcl-rs.pdf.
To give a brief summary, the table 1 shows the major differences.
You should only pay attention to the HCL description in sim_macros::hcl block, and
pipeline register definitions in crate::defin_stages block.

8 Part C
You will be working in directory archlab-project/misc and
archlab-project/sim/src/architectures/extra in this part.
The ncopy function copies a len-element integer array src to a non-overlapping dst,
returning a count of the number of positive integers contained in src.The C description of
ncopy is in misc/ncopy.c.
Your task in Part C is to modify archlab-project/misc/ncopy.ys and
archlab-project/sim/src/architectures/extra/ncopy.rs with the goal of making
ncopy run as fast as possible.

• archlab-project/misc/ncopy.ys assembly file of ncopy function.

• archlab-project/sim/src/architectures/extra/ncopy.rs the description of CPU

architecture that the ncopy function runs on.

You will be handing in these two files. Each file should begin with a header comment
with the following information:

• Your name and ID.

• A high-level description of your code. In each case, describe how and why you modified
your code.

5
Coding Rules
You are free to make any modifications you wish, with the following constraints:

• Your ncopy function must work for arbitrary array sizes. You might be tempted to
hardwire your solution for 64-element arrays by simply coding 64 copy instructions,
but this would be a bad idea because we will be grading your solution based on its
performance on arbitrary arrays.

• Your ncopy function must run correctly with yis. By correctly, we mean that it
must correctly copy the src block and return (in %rax) the correct number of positive
integers.

• Size of assembled version of ncopy plus stack size is limited to 4Kb.(A little less than
4Kb in fact, you can check the grader code for an exact value) We will set the stack
register and argument registers well before calling your ncopy.

• Your ncopy.rs implementation must pass the correctness tests for general y86-64 code.

Other than that, you are free to implement other instructions if you think that will help.
You may make any semantics preserving transformations to the ncopy function, such as
reordering instructions, replacing groups of instructions with single instructions, deleting
some instructions, and adding other instructions. You may find it useful to read about loop
unrolling in Section 5.8 of CS:APP3e.

9 Grade your solution

After finishing some parts of the lab, you should first rebuild the project and then execute
the grader:

cd archlab-project
cargo build

# If you want to grade part A:

./target/debug/grader part-a

# If you want to grade part B:

./target/debug/grader part-b

# If you want to grade part C:

./target/debug/grader part-c

You may execute ./target/debug/grader -h for grader usage.

6
10 Evaluation
The lab is worth 190 points: 30 points for Part A, 60 points for Part B, and 100 points
for Part C. You can run the follow command to grade your implementation locally:

(cd archlab-project; cargo run --bin grader)

The remote machine uses the same grader and all the checks(including vali-
dation test and length test...) are included. Note that the score here does not include
the credits for your descriptions.

Part A
Part A is worth 30 points, 10 points for each Y86-64 solution program. Each solution pro-
gram will be evaluated for correctness, including proper handling of the stack and registers,
as well as functional equivalence with the example C functions in examples.c.
The programs sum.ys and rsum.ys will be considered correct if the graders do not spot
any errors in them, and their respective sum list and rsum list functions return the sum
0xfed in register %rax.
The program bubble.ys will be considered correct if the graders do not spot any errors
in them, and the bubble sort function sorts the 6 integers correctly in ascending order,
with the results in the same 48 bytes beginning at address Array, and does not corrupt
other memory locations.

Part B
This part is worth 40 points, 5 points for the implementation of seq full.rs, and 5
points for each of pipe s3a,pipe s3b,pipe s3c,pipe s3d,pipe s4a,pipe s4b,pipe s4c.
For seq full.rs, you need to pass the ISA checks extended with the iopq instruction.
For pipe s3a,pipe s3b,pipe s3c,pipe s3d,pipe s4a,pipe s4b, each of the architec-
ture is required to pass ISA checks. Moreover, for each of them, the runtime status of each
CPU cycle is compared with its corresponding ground truth architecture to verify the cor-
rectness of each placeholder expression. Since the ground truths are not provided in the
student’s handout, this check is performed on autolab server.

Part C
This part of the Lab is worth 100 points: You will not receive any credit if either your
code for ncopy.ys or your modified architecture ncopy.rs fails.

• 20 points each for your descriptions in the headers of archlab-project/misc/ncopy.ys

and archlab-project/sim/src/architectures/extra/ncopy.rs and the quality of
these implementations.

• 60 points for performance. To receive credit here, your solution should not fail to pass
ISA checks.

7
We will evaluate the performance based on cpe (cycles per element) and ac (architecture
cost) of your implementation.

• cpe: if the simulated code requires C cycles to copy a block of N elements, then the
CPE is C/N . Since some cycles are used to set up the call to ncopy and to set up
the loop within ncopy, you will find that you will get different values of the CPE for
different block lengths (generally the CPE will drop as N increases). We will therefore
evaluate the performance of your function by computing the average of the CPEs for
blocks ranging from 1 to 64 elements.
Simply run the command

(cd archlab-project; cargo run --bin grader -- part-c)

to see what happens. For example, the baseline version of the ncopy function and
architecture has CPE values ranging between 23.00 and 12.04, with an average of
12.82.

• ac: the length of the critical path of the ncopy architecture. Formally, the critical path
of a CPU architecture is the longest path of combinational logic between clocked ele-
ments (like flip-flops). The length of the critical path can be used to mesure the CPU’s
clock frequency, which in turn can be used to estimate the architecture performance.
In this lab, the length of the critical path is simplified as: 1 plus the maximum number
of hardware devices (units) that line up in a path of the architecture. For example,
seq_std has a critical path of length 8, and pipe_std has a critical path of length 4.
You can execute

(cd archlab-project; cargo run --bin ysim -- -A [arch_name] -I)

to inspect the length of the critical path and the devices execution order of an architec-
ture. This command will also generate an HTML file that visualizes the dependency
graph of the architecture.

Let c = cpe + 2 × ac. Your score S for Part C will be:


0,
 c > 19.0
19 · (19.0 − c) , 16.0 < c ≤ 19.0


S = 

 57 , 15.0 < c ≤ 16.0
60 , c ≤ 15.0


8
11 Handin Instructions
• You will be handing in three sets of files:

– Part A:
archlab-project/misc/bubble.ys
archlab-project/misc/sum.ys
archlab-project/misc/rsum.ys
– Part B:
archlab-project/sim/src/architectures/extra/seq_full.rs
archlab-project/sim/src/architectures/extra/pipe_s3a.rs
archlab-project/sim/src/architectures/extra/pipe_s3b.rs
archlab-project/sim/src/architectures/extra/pipe_s3c.rs
archlab-project/sim/src/architectures/extra/pipe_s3d.rs
archlab-project/sim/src/architectures/extra/pipe_s4a.rs
archlab-project/sim/src/architectures/extra/pipe_s4b.rs
archlab-project/sim/src/architectures/extra/pipe_s4c.rs
– Part C:
archlab-project/misc/ncopy.ys
archlab-project/sim/src/architectures/extra/ncopy.rs

• Make sure you have included your name and ID in a comment at the top of each of
your handin files.

• To create your handin files for the lab, go to your archlab-handout. Run make handin
to create archlab-handin.tar. Upload this tar to autolab for grading.

12 Hints
• ysim -A [arch] -I can generate an HTML file that visualizes the architecture com-
putational dependency graph. -v can be used to print detailed information of each
cycle. --max-cpu-cycle can be used to limit the number of CPU cycles.

• All Rust source files under archlab-project/sim/src/architectures/extra, except

mod.rs, are considered as CPU architectures. If you want to create a new architecture,
just create a new file there. ydb can debug your custom architecture via --arch option.

• yis simulates your program w.r.t. the Y86 ISA specification. Its output can be seen
as ground truth.

• In part B, you can rely on editor features to display the differences between two files.
For vim users you can use vimdiff. For VSCode users you can first open one of the
source files, then goto “Help > Show All Commands” and type “Compare Active File

9
With...”, and select another file for comparison (at this point you’re able to edit both
files concurrently).

• In part C, the default HCL description ncopy.rs is simply a copy of seq_std.rs.

You may want to replace it with another pipelined architecture, and then apply some
modifications on it.

HSBC Bank Statement TemplateLab Com
100% (1)
HSBC Bank Statement TemplateLab Com
1 page
Bus Uncle Chatbot - Creating A Successful Digital Business (A)
No ratings yet
Bus Uncle Chatbot - Creating A Successful Digital Business (A)
10 pages
Course 3 Module 5
No ratings yet
Course 3 Module 5
23 pages
Strategy Papers and Cases Questions
0% (1)
Strategy Papers and Cases Questions
9 pages
John Zink Burner Control Narratives
100% (3)
John Zink Burner Control Narratives
19 pages
14S Operator Manual
100% (1)
14S Operator Manual
106 pages
Injection Engine Control System. VAZ 21213, 21214 (Niva)
No ratings yet
Injection Engine Control System. VAZ 21213, 21214 (Niva)
3 pages
Archlab
No ratings yet
Archlab
13 pages
Arch Lab
0% (1)
Arch Lab
12 pages
Archlab
No ratings yet
Archlab
6 pages
CS:APP3e Guide To Y86-64 Processor Simulators: Write Back
No ratings yet
CS:APP3e Guide To Y86-64 Processor Simulators: Write Back
13 pages
Sim Guide For Y86 Processor Simulators
No ratings yet
Sim Guide For Y86 Processor Simulators
14 pages
Workshop Slides
No ratings yet
Workshop Slides
73 pages
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Project Info
No ratings yet
Project Info
31 pages
(78s) (2018) (Azeria) HITB-LAB - ARM ExploitationLab
No ratings yet
(78s) (2018) (Azeria) HITB-LAB - ARM ExploitationLab
78 pages
CPEN 311: Digital Systems Design Slide Set 19: High-Level Synthesis
No ratings yet
CPEN 311: Digital Systems Design Slide Set 19: High-Level Synthesis
28 pages
Hello World in EDK
No ratings yet
Hello World in EDK
21 pages
18csl48-Mes-Lab-Manual New
No ratings yet
18csl48-Mes-Lab-Manual New
43 pages
Lab 3: Multiprogramming in Nachos: CMSC 442 Due: Monday, Nov. 2 by 11:59:59Pm
No ratings yet
Lab 3: Multiprogramming in Nachos: CMSC 442 Due: Monday, Nov. 2 by 11:59:59Pm
6 pages
HITB-v1.0 - Lab: ARM Assembly Shellcode
No ratings yet
HITB-v1.0 - Lab: ARM Assembly Shellcode
66 pages
ARM
No ratings yet
ARM
44 pages
6 Etscyll
No ratings yet
6 Etscyll
49 pages
By Funkysh
No ratings yet
By Funkysh
10 pages
Engineering
No ratings yet
Engineering
20 pages
Embedded System Design
No ratings yet
Embedded System Design
4 pages
6 Ecesyll
No ratings yet
6 Ecesyll
55 pages
Kien-Truc-May-Tinh - David-Brooks - cs146-hw2 - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh - David-Brooks - cs146-hw2 - (Cuuduongthancong - Com)
5 pages
Introduction To ARM Systems-11!17!2012
No ratings yet
Introduction To ARM Systems-11!17!2012
203 pages
Lab 1
No ratings yet
Lab 1
8 pages
02 Arm
No ratings yet
02 Arm
53 pages
Fpga: Digital Designs: Team Name:Digital Dreamers
No ratings yet
Fpga: Digital Designs: Team Name:Digital Dreamers
8 pages
Sdca Course Info
No ratings yet
Sdca Course Info
5 pages
Avisto Internship - Embedded Software
No ratings yet
Avisto Internship - Embedded Software
9 pages
ARM Tutorial
No ratings yet
ARM Tutorial
6 pages
Lecture 3
100% (1)
Lecture 3
88 pages
L05 PipeliningII
No ratings yet
L05 PipeliningII
36 pages
ES Lab Manual 1
100% (1)
ES Lab Manual 1
95 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
4 pages
Soc11 Leon Tutorial
No ratings yet
Soc11 Leon Tutorial
12 pages
IntroSoC Lab04
No ratings yet
IntroSoC Lab04
10 pages
Mpca Lab1a Introduction To Armsim
No ratings yet
Mpca Lab1a Introduction To Armsim
21 pages
Exe On Pipelining
No ratings yet
Exe On Pipelining
12 pages
Chapter 01 See Program Running
No ratings yet
Chapter 01 See Program Running
38 pages
Unit 2 Arm7
No ratings yet
Unit 2 Arm7
67 pages
Lab Manual
100% (1)
Lab Manual
15 pages
Chapter 1
No ratings yet
Chapter 1
48 pages
1H
No ratings yet
1H
8 pages
Processor Architecture
No ratings yet
Processor Architecture
25 pages
4th Sem RR Campus Course Information
No ratings yet
4th Sem RR Campus Course Information
20 pages
Osdev Report
No ratings yet
Osdev Report
29 pages
Lab Work 01 Google
No ratings yet
Lab Work 01 Google
8 pages
Basic Interrupt Stack Design and Implementation
No ratings yet
Basic Interrupt Stack Design and Implementation
44 pages
8086 Assemblyprogramming
No ratings yet
8086 Assemblyprogramming
300 pages
Cs 331: Intro To Computer Organization, Fall 2016 Programming Assignment 3: Y86 Emulation
No ratings yet
Cs 331: Intro To Computer Organization, Fall 2016 Programming Assignment 3: Y86 Emulation
7 pages
Chapter 2 - Edited
No ratings yet
Chapter 2 - Edited
82 pages
Gujarat Technological University: Electronics and Communication Engineering (11) SUBJECT CODE: 2161102
No ratings yet
Gujarat Technological University: Electronics and Communication Engineering (11) SUBJECT CODE: 2161102
4 pages
ADvance MIcroprocessor Scheme GTU
No ratings yet
ADvance MIcroprocessor Scheme GTU
4 pages
Embedded System and IOT Lab Manual Final
No ratings yet
Embedded System and IOT Lab Manual Final
60 pages
CSD Lec1 Arm Intro
No ratings yet
CSD Lec1 Arm Intro
43 pages
Microcontroller Advanced Topics: #4: Bootloading
No ratings yet
Microcontroller Advanced Topics: #4: Bootloading
31 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
Introduction To ARM Assembly Language and Keil Uvision5
No ratings yet
Introduction To ARM Assembly Language and Keil Uvision5
20 pages
COAL Assignment (Y86 Processor Architecture)
100% (1)
COAL Assignment (Y86 Processor Architecture)
32 pages
CS466 Chapter 1
No ratings yet
CS466 Chapter 1
34 pages
Pipeline Mips
No ratings yet
Pipeline Mips
28 pages
Grade 5 Write Expressions A
No ratings yet
Grade 5 Write Expressions A
2 pages
Operating Room
No ratings yet
Operating Room
1 page
Uipath - Uipath-Ardv1.V2021-01-22.Q52: Leave A Reply
No ratings yet
Uipath - Uipath-Ardv1.V2021-01-22.Q52: Leave A Reply
15 pages
Article On Hedonic Loss
No ratings yet
Article On Hedonic Loss
14 pages
Calcium Carbonate
33% (3)
Calcium Carbonate
1 page
Project Scope Statement1
No ratings yet
Project Scope Statement1
6 pages
Abrahams & Millar (2008)
No ratings yet
Abrahams & Millar (2008)
27 pages
Important: Service Data Sheet
No ratings yet
Important: Service Data Sheet
4 pages
Porsche Case Study
No ratings yet
Porsche Case Study
4 pages
Volume Bible - Set Volume For Muscle Size - The Ultimate Evidence Based Bible (UPDATED MARCH 2020) James Krieger
100% (1)
Volume Bible - Set Volume For Muscle Size - The Ultimate Evidence Based Bible (UPDATED MARCH 2020) James Krieger
54 pages
2nde Unit 6 Speaking
No ratings yet
2nde Unit 6 Speaking
3 pages
IELTS Writing
0% (1)
IELTS Writing
8 pages
IELTS Writing Task 2
No ratings yet
IELTS Writing Task 2
34 pages
#01 G.R. No. 100113
No ratings yet
#01 G.R. No. 100113
19 pages
4-Quantity Calculations
No ratings yet
4-Quantity Calculations
18 pages
Understanding SAP EWM Wave
No ratings yet
Understanding SAP EWM Wave
8 pages
Bachelor Thesis
No ratings yet
Bachelor Thesis
88 pages
Hyaluronic Acid
No ratings yet
Hyaluronic Acid
7 pages
Herbs and Spices
No ratings yet
Herbs and Spices
13 pages
Sport
No ratings yet
Sport
1 page
Complete Guide To Service Learning 2
No ratings yet
Complete Guide To Service Learning 2
110 pages
Customer Inquiry Report-9
No ratings yet
Customer Inquiry Report-9
7 pages
Essay Topics Grade 11
100% (2)
Essay Topics Grade 11
5 pages
FICM Unit 3
No ratings yet
FICM Unit 3
6 pages

Archlab

Uploaded by

Archlab

Uploaded by

Arch Lab: Optimizing the Performance of a Pipelined

2. Start by copying the file archlab-handout.tar to a (protected) directory in which you

5 Build the Project

You can learn more about their usage in archlab-project/README.md.

# Sample linked list

rsum.ys: Recursively sum linked list elements

bubble.ys: Sort a block of ints(8 byte) in ascending order using

Extend the SEQ architecture with iopq instruction

• iopq V, rB: compute rB op V and store the result in register rB.

Road to a pipelined architecture

• pipe_s3a,pipe_s3b,pipe_s3c,pipe_s3d: 3-stage pipelines, containing instruction

• pipe_s4a,pipe_s4b,pipe_s4c: 4-stage pipelines, containing instruction fetch stage,

Table 1: Differences between legacy HCL and HCL-rs

• archlab-project/misc/ncopy.ys assembly file of ncopy function.

• archlab-project/sim/src/architectures/extra/ncopy.rs the description of CPU

• Your name and ID.

9 Grade your solution

# If you want to grade part A:

# If you want to grade part B:

# If you want to grade part C:

You may execute ./target/debug/grader -h for grader usage.

(cd archlab-project; cargo run --bin grader)

• 20 points each for your descriptions in the headers of archlab-project/misc/ncopy.ys

(cd archlab-project; cargo run --bin grader -- part-c)

(cd archlab-project; cargo run --bin ysim -- -A [arch_name] -I)

Let c = cpe + 2 × ac. Your score S for Part C will be:

• All Rust source files under archlab-project/sim/src/architectures/extra, except

• In part C, the default HCL description ncopy.rs is simply a copy of seq_std.rs.

You might also like