0% found this document useful (0 votes)
50 views24 pages

Gem5 X TechnicalManual Wireless

This document describes gem5-X, a full-system simulator based on gem5 with architectural extensions. It provides out-of-the-box simulation of ARM systems running Linux, and extensions such as custom ISA instructions, core clustering, heterogeneous cores, and an HBM2 memory model. It also enables emulation of systems with in-package wireless communication capabilities.

Uploaded by

rajeswarips1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views24 pages

Gem5 X TechnicalManual Wireless

This document describes gem5-X, a full-system simulator based on gem5 with architectural extensions. It provides out-of-the-box simulation of ARM systems running Linux, and extensions such as custom ISA instructions, core clustering, heterogeneous cores, and an HBM2 memory model. It also enables emulation of systems with in-package wireless communication capabilities.

Uploaded by

rajeswarips1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Gem5-X + On-Chip-Wireless

Full System Manual

*
E MBEDDED S YSTEMS L ABORATORY,
S WISS F EDERAL I NSTITUTE OF T ECHNOLOGY, L AUSANNE (EPFL)


S CHOOL OF E NGINEERING AND M ANAGEMENT VAUD (HEIG-VD),
U NIVERSITY OF A PPLIED S CIENCES W ESTERN S WITZERLAND (HES-SO)

**
D EPARTMENT OF C OMPUTER A RCHITECTURE ,
C OMPLUTENSE U NIVERSITY OF M ADRID

V 2.2-O N -C HIP -W IRELESS BY R AFAEL M EDINA * , J OSHUA K LEIN * , M ARINA Z APATER *‡ ,


AND G IOVANNI A NSALONI *

B ASED ON V 2.0 OF GEM 5-X TECHNICAL MANUAL BY YASIR Q URESHI * , W ILLIAM S IMON * , M ARINA
Z APATER *‡ , K ATZALIN O LCOZ ** , AND DAVID ATIENZA *

December 2022
CONTENTS

Contents
1 Executive Summary 2
1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Release Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Collaboration and Contact Information . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Running gem5-X Full System (FS) Mode with ARMv8 and Linux 4
2.1 Necessary Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Full System Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Device Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Quick-Start Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Building the gem5 Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Running Your FS Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Hot-fixes for running gem5-X on Ubuntu 20.04 using Docker . . . . . . . . . . . . . . 6

3 Support Enhancements of Gem5-X 8


3.1 Enhanced Checkpointing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Gperf Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 9P over Virtio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Modifying disk image using QEMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 ARMv8 ISA Extension 12


4.1 Adding a new custom instruction to the ARMv8 ISA . . . . . . . . . . . . . . . . . . . 12
4.2 Handling the new ADD1 instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Testing that the ISA extension works . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 High Bandwidth Memory v2 (HBM2) 15

6 Core Clustering 16

7 Heterogeneous Cores 17

8 Scratchpad Memory (SPM) 19

9 On-Chip Wireless Networking Integration and Modeling 21


9.1 Running gem5-X Full System Mode with ARMv8, Linux and wireless extensions . . 21
9.1.1 fining wireless-capable systems from the command line . . . . . . . . . . . . 21
9.1.2 Defining wireless-capable systems with configuration files . . . . . . . . . . . 22
9.2 On-Chip-Wireless module implementation files . . . . . . . . . . . . . . . . . . . . . 23
9.3 Example Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Page 1 of 23
1 Executive Summary
1.1 Abstract
The gem5 architectural simulator is well established and widely used in both the industry and
academia. Based on gem5, we present we present gem5-X (”a gem5-based full-system simulator
with architectural eXtensions”), a simulation framework that enables fast profiling and architectural
exploration and optimization for system level architectural innovations. Gem5-X provides out-of-
the-box simulation of ARM based systems with full Linux stack, along with several architectural ex-
tensions like ISA extensions, clustering, heterogeneous many-core simulation and HBM2 memory
model. Several enhanced features have also been added, like advanced check-pointing, workload
automation (WA) and gperf profiler support.
This version of the gem5-X repository, named gem5-X-On-Chip-Wireless, further provides
support for emulating systems featuring in-package wireless communication enabled by nano-
antennae and on-chip transceivers. This gem5-X fork was developed in the context of the Wiplash
Horizon 2020 project (https://www.wiplash.eu).
In this technical manual, we first provide guidelines on how to use various architecture features
and support enhancements of gem5-X. More information on downloading and source code for
gem5-X can be found at https://esl.epfl.ch/gem5-x.
Then in Section 9, we describe the features specific to gem5-X-On-Chip-Wireless.

1.2 Release Information

Version Authors Date Changes


v2.2-On- Rafael Medina, December Forked gem5-X manual for in-package wireless
Chip- Joshua Klein, Gio- 2022 extension.
Wireless vanni Ansaloni, and
Marina Zapater
v2.2 Joshua Klein, Rafael October Reformatting and setting up for new extensions.
Medina, Alireza 2022
Amirshahi, Marina
Zapater, and Gio-
vanni Ansaloni
v2.1 Joshua Klein and February Updated version information for gem5-X depen-
Darong Huang 2022 dencies on Ubuntu 20.04 LTS, expanded con-
tact info for current maintainers.
v2.0 Yasir Qureshi, August Core clustering, heterogeneous cores and SPM
William Simon, 2021 support added in Gem5-X.
Marina Zapater,
Katzalin Olcoz, and
David Atienza

1.3 Collaboration and Contact Information


The maintainers of this project can be contacted via email at {joshua.klein, rafael medina,
giovanni.ansaloni, david.atienza}@epfl.ch, [email protected].

Page 2 of 23
1.3 Collaboration and Contact Information

Because the scope of this project is very large, we are always interested in potential collabo-
ration efforts to develop new features and keep gem5-X updated to gem5 master. For inquiries,
source code, and additional information, please contact one of the aforementioned emails.

Page 3 of 23
2 Running gem5-X Full System (FS) Mode with ARMv8 and Linux
In this chapter we describe how to configure and run our ARMv8 64-bit FS simulation in gem5-X
.

2.1 Necessary Files


Because our model is run in FS mode with a full Linux environment, we need several major
system components. This includes,

• A bootloader

• A kernel binary, e.g., vmlinux

• A disk image

• A device tree binary

All of the aforementioned components must be compatible with the ARMv8.

2.1.1 Full System Files

Once you register for gem5-X at https://esl.epfl.ch/gem5-x, you will receive an email with a link
to all the system files, except for the device tree. The file downloaded is named full system images.tar.gz.
This contains the disk image, bootloader and kernel binary. Follow the instructions below to set it
up
1 t a r − z x v f f u l l s y s t e m i m a g e s . t a r . gz
The files are as follows:

• Bootloader is under [path to full system images]/binaries/

• Kernels (vmlimux and vmlinux wa) are at [path to full system images]/binaries/

• Disk image (gem5 ubuntu16.img) can be found at [path to full system images]/disks/

We now need to setup the path to full system images, so that the files under it can be used
and recognized by gem5-X during FS simulation.
1 cd <path to gem5 −X>
2 . / apply − patch . sh <PATH TO FULL SYSTEM IMAGES>
Alternatively, you can also do
1 e x p o r t M5 PATH=<PATH TO FULL SYSTEM IMAGES>
The full system files are now setup and ready to be used in FS mode.

Page 4 of 23
2.2 Quick-Start Guide

2.1.2 Device Tree

The device tree files are under


<path to gem5 −X>/ system / arm / d t
If running on an Ubuntu-based host system , the following prerequisites need to be installed
before generating the device tree binaries.
1 sudo apt − g e t i n s t a l l gcc −arm− l i n u x − g n u e a b i h f gcc −aarch64 − l i n u x −gnu
2 sudo apt − g e t i n s t a l l device − t r e e − c o m p i l e r
To generate the device tree binary files,
1 cd <path to gem5 −X>
2 make −C system / arm / d t

2.2 Quick-Start Guide


In this brief start-up guide, we will guide you through the basic steps to running your first full
system (FS) simualtion with gem5-X . This guide assumes you have already the bootloader, device
tree, kernel file, and disk image setup as described in the previous sections.

2.2.1 Prerequisites

You will need to set up the gem5-X environment in order to compile and run the gem5-X binary
using the SCons (SConstruct) builder. If running on an Ubuntu-based host system, you can use the
following command to get all the required libraries. However, there are some known dependency
problems on the latest Ubuntu image, i.e., 20.04. If you are running these host systems, we
recommend you follow Section 2.3 to build a docker image to run gem5-X inside.
1 sudo a p t i n s t a l l b u i l d − e s s e n t i a l g i t m4 scons z l i b 1 g \
2 z l i b 1 g −dev l i b p r o t o b u f −dev p r o t o b u f − c o m p i l e r l i b p r o t o c −dev \
3 l i b g o o g l e − p e r f t o o l s −dev python −dev python − s i x python \
4 l i b b o o s t − a l l −dev swig

2.2.2 Building the gem5 Binary

Once the above is done, you will need to build a ARM gem5 binary. You can create multiple
builds including .fast, .opt, and .debug. If you are only concerned about running experiments, it
is recommended to only create gem5.fast. However, if you need to debug anything or want to
generate traces, you will need to build gem5.opt or gem5.debug. Do this with the following:
1 cd <path to gem5 −X>/
2 scons b u i l d /ARM/ gem5 . { f a s t , opt , debug}
Additionally, if you would like to speed up the compilation process, you can use the option ”-jN”
on the scons build line where N is the number of threads you want to assign for compilation.

Page 5 of 23
2.3 Hot-fixes for running gem5-X on Ubuntu 20.04 using Docker

2.2.3 Running Your FS Simulation

Once the build process is complete you can launch your simulation in the following way
1 cd <path to gem5 −X>/
2
3 . / b u i l d /ARM/ gem5 . { f a s t , opt , debug} \
4 −−remote −gdb− p o r t =0 \
5 −d / path / t o / your / o u t p u t / d i r e c t o r y \
6 c o n f i g s / example / f s . py \
7 −−cpu− c l o c k =1GHz \
8 −− k e r n e l = v m l i n u x \
9 −−machine − t y p e =VExpress GEM5 V1 \
10 −−dtb − f i l e =<f u l l p a t h t o g e m 5 −X>/ system / arm / d t / armv8 gem5 v1 1cpu . d t b \
11 −n 1 \
12 −− d i s k −image=gem5 ubuntu16 . img \
13 −−caches \
14 −− l2cache \
15 −− l 1 i s i z e =32kB \
16 −− l 1 d s i z e =32kB \
17 −− l 2 s i z e =1MB \
18 −− l 2 a s s o c =2 \
19 −−mem− t y p e =DDR4 2400 4x16 \
20 −−mem− ranks =4 \
21 −−mem− s i z e =4GB \
22 −−sys − c l o c k =1600MHz
At this point you should be able to connect to your running gem5 instance in another terminal
with,
1 t e l n e t l o c a l h o s t 3456
Alternatively, you can also build the terminal program provided with gem5-X and use it
1 cd <path to gem5 −X>/ u t i l / term /
2 make
3 m5term 1 2 7 . 0 . 0 . 1 3456
Upon connecting to your gem5 instance, you should be able to the kernel dmesg, followed
finally by a login and a terminal in the gem5-X FS mode

2.3 Hot-fixes for running gem5-X on Ubuntu 20.04 using Docker


Because of updates to gem5-X dependencies in Ubuntu 20.04, specific dependency versions
must be installed. In particular, python 2.7.5 and SCons 3.0.0 (build version py27h8a56064 0)
must be used. These packages can be easily configured via a virtual environment. Here we
provided a Dockerfile to create the docker image, containing all of the necessary dependencies to
run gem5-X.
1 FROM ubuntu : 2 0 . 0 4
2 SHELL [ ” / b i n / bash ” , ” −c ” ]
3 ENV DEBIAN FRONTEND= n o n i n t e r a c t i v e

Page 6 of 23
2.3 Hot-fixes for running gem5-X on Ubuntu 20.04 using Docker

4 RUN echo ” deb h t t p : / / dk . a r c h i v e . ubuntu . com / ubuntu / x e n i a l main ” \


5 >> / e t c / a p t / sources . l i s t \
6 && echo \
7 ” deb h t t p : / / dk . a r c h i v e . ubuntu . com / ubuntu / x e n i a l u n i v e r s e ” \
8 >> / e t c / a p t / sources . l i s t \
9 && a p t −y update \
10 && a p t i n s t a l l −y wget \
11 && wget \
12 h t t p s : / / repo . anaconda . com / miniconda / Miniconda3 − l a t e s t −Linux −x86 64 . sh\
13 && bash Miniconda3 − l a t e s t −Linux −x86 64 . sh −b −p / o p t / miniconda3
14 RUN source / o p t / miniconda3 / b i n / a c t i v a t e \
15 && conda i n i t \
16 && conda c r e a t e −−name py275 −c f r e e python = 2 . 7 . 5 −y \
17 && conda a c t i v a t e py275 \
18 && conda i n s t a l l scons = 3 . 0 . 0 = py27h8a56064 0 −y
19
20 RUN apt − g e t −y i n s t a l l b u i l d − e s s e n t i a l gcc −arm− l i n u x − g n u e a b i h f \
21 gcc −aarch64 − l i n u x −gnu \
22 device − t r e e − c o m p i l e r make g i t m4 z l i b 1 g \
23 z l i b 1 g −dev l i b p r o t o b u f −dev p r o t o b u f − c o m p i l e r l i b p r o t o c −dev \
24 l i b g o o g l e − p e r f t o o l s −dev python −dev \
25 l i b b o o s t − a l l −dev swig =3.0.8 −0 ubuntu3 \
26 && apt − g e t −y i n s t a l l d i o d \
27 && apt − g e t −y i n s t a l l qemu qemu−user qemu−system \
28 qemu−user − s t a t i c
The above docker file should be put inside a newly created folder, e.g. gem5x-docker/Dockerfile.
Then you can build the docker image in the terminal:
1 cd <path t o gem5x−docker / D o c k e r f i l e >
2 docker b u i l d − t gem5x .
Wait until the image is successfully created, then you can run the image by using the following
command:
1 ( sudo ) docker run − i t gem5x
Before you go to Section 2.2.2 to build the gem5, you need to first enable the conda environ-
ment:
1 source / o p t / miniconda3 / b i n / a c t i v a t e
2 conda a c t i v a t e py275
Besides, two additional hotfixes are required in gem5-X’s SConstruct file (gem5-X/SConstruct):
First, due to deprecated features in gcc 9.3.0+, lines 365 - 370 should be commented out. Second,
the GNU assembler version in the Sconstruct file needs to be updated, so change [-1] to [3] in line
435.
Now you can go back to Section 2.2.2 to build the gem5 binary. Please contact the maintainers
if issues continue to arise.

Page 7 of 23
3 Support Enhancements of Gem5-X
In this chapter we will look into the following support enhancements we have added in gem5-X:

• Enhanced checkpointing

• gperf profiler

• File sharing between gem5-X and host system using 9P over Virtio

• Modifying disk image using QEMU

3.1 Enhanced Checkpointing


The boot process during the FS simulation in gem5-X automatically takes a checkpoint when
the boot and login is complete and we get the terminal. Since the boot is in SimpleAtomic CPU
model, the timing information is not there in the simulation. We can now switch to an accurate
in-order or our-of-order (OoO) CPU model with all the timing information.
If your simulation is still running after the boot, you can exit it using the following command in
the connected terminal,
m5 e x i t
Now we can use the checkpoint, that was automatically taken after boot and login, and switch
to an accurate CPU model.
1 . / b u i l d /ARM/ gem5 . { f a s t , opt , debug} \
2 −−remote −gdb− p o r t =0 \
3 −d / path / t o / your / o u t p u t / d i r e c t o r y \
4 c o n f i g s / example / f s . py \
5 −−cpu− c l o c k =1GHz \
6 −− k e r n e l = v m l i n u x \
7 −−machine − t y p e =VExpress GEM5 V1 \
8 −−dtb − f i l e =<f u l l p a t h t o g e m 5 −X>/ system / arm / d t / armv8 gem5 v1 1cpu . d t b \
9 −n 1 \
10 −− d i s k −image=gem5 ubuntu16 . img \
11 −−caches \
12 −− l2cache \
13 −− l 1 i s i z e =32kB \
14 −− l 1 d s i z e =32kB \
15 −− l 2 s i z e =1MB \
16 −− l 2 a s s o c =2 \
17 −−mem− t y p e =DDR4 2400 4x16 \
18 −−mem− ranks =4 \
19 −−mem− s i z e =4GB \
20 −−sys − c l o c k =1600MHz \
21 −r 1 \
22 −−cpu− t y p e ={MinorCPU , DerivO3CPU}
The number after –r is the checkpoint number. In this case we are resuming from the first
checkpoint. The CPU type can be MinorCPU for in-order core or DerivO3CPU for OoO cores.

Page 8 of 23
3.2 Gperf Profiler

Sometimes it is feasible to take a checkpoint using SimpleAtomic CPU model just before your
region-of-interest (ROI) and then switch to an accurate in-order or OoO CPU. You can do this in
either the command prompt or a script using the following command:
m5 c h e c k p o i n t
If you are in a C/C++ program, you can use the following call within the program,
system ( ”m5 c h e c k p o i n t ” ) ;

3.2 Gperf Profiler


Profiling capabilities within FS, by installing the gperf profiler on the disk image. The gperf
statistical profiler developed by Google provides profiling capabilities on gem5-X itself with minimal
overhead, enabling the identification of application bottlenecks and exploration of the effectiveness
of architectural modifications and extensions.
To enable profiling using gperf when running a program, follow the instructions below;
LD PRELOAD= / u s r / l i b / l i b p r o f i l e r . so . 0 CPUPROFILE=<FILE TO SAVE PROFILING>
CPUFREQUENCY=1000 <program>
This will launch the program to be profiled with profiling data being saved to file mentioned in
CPUPROFILE parameter.
To view the data, we first convert it to .pdf file and then write it to the host machine as follows;
1 google − p p r o f −− p d f <FILE WITH PROFILING DATA> > <FILENAME>. p d f
2 m5 w r i t e f i l e <FILENAME>. p d f
The file FILENAME.pdf will now be available to be viewed in the host system under path passed
to –d parameter when launching gem5-X simulation
−d / path / t o / your / o u t p u t / d i r e c t o r y

3.3 9P over Virtio


We utilize the 9P protocol developed by Bell Labsover a virtio device driver to allow fast mod-
ification of files without modifying the root file system in gem5-X. While this feature is available in
vanilla gem5, it is not enabled by default and has no kernel support. Both of these features are
provided in gem5-X. Once Linux is booted, a folder on the host machine can be mounted within
gem5 to access files on the host system. Without 9P mounting, every time a program is modified,
we need to reload the disk image required for FS simulation and reboot Linux. In gem5, this pro-
cess can take up to 20-30 minutes, a bottleneck that gem5-X eliminates. To use 9p over Virtio,
follow the instructions below:

• First we need to install DIOD


sudo apt − g e t i n s t a l l d i o d

• After installation, check where DIOD is installed by typing ”which diod”. This path should be
updated in the file src/dev/virtio/VirtIO9P.py at line 62. Then re-compile gem5-X using scons
command as usual.

Page 9 of 23
3.4 Modifying disk image using QEMU

1 cd <path to gem5 −X>/


2 scons b u i l d /ARM/ gem5 . { f a s t , opt , debug}

• Use kernel ”vmlinux wa”, during the gem5 simulation. This file is provided with gem5-X under
full system images/binaries

• Use the following additional parameter when launching the simulation


workload −automation − v i o=<FULL PATH TO SHARED FOLDER ON HOST SYSTEM>

• Once the system is booted, run the following in gem5 terminal


mount . sh <FULL PATH TO SHARED FOLDER ON HOST SYSTEM>

• Now any file under the ”SHARED FOLDER ON HOST SYSTEM” appears in the /mnt direc-
tory in gem5 simulation.

3.4 Modifying disk image using QEMU


To run experiments an application and benchmarks in gem5-X, they need to be on the disk
image. To to this we need to update and modify the disk image with the applications.
QEMU is used to modify the disk image. If running on an Ubuntu-based host system , the
following prerequisites need to be installed.
1 sudo apt − g e t i n s t a l l qemu qemu−user qemu−system qemu−user − s t a t i c
To mount the image
1 cd <PATH TO FULL SYSTEM IMAGES>/ d i s k s /
2 mkdir l o c a l m n t
3 sudo mount −o loop , o f f s e t =$ ( ( 2 0 4 8 * 5 1 2 ) ) gem5 ubuntu16 . img l o c a l m n t
4 sudo mount −o b i n d / proc l o c a l m n t / proc
5 sudo mount −o b i n d / dev l o c a l m n t / dev
6 sudo mount −o b i n d / dev / p t s l o c a l m n t / dev / p t s
7 sudo mount −o b i n d / sys l o c a l m n t / sys
Now we chroot into the image emulating using QEMU
1 cd l o c a l m n t /
2 sudo c h r o o t . /
At this point we are in the ARMv8 disk image and can now compile or download applications
within the image. Since it is a Ubuntu 16.04 image, you can run the following first, before installing
any new packages on it,
apt − g e t update
When the disk image has been update with the applications or benchmarks, we can exit it and
unmount the image.
1 exit
2 cd . .
3 sudo umount l o c a l m n t / proc
4 sudo umount l o c a l m n t / dev / p t s

Page 10 of 23
3.4 Modifying disk image using QEMU

5 sudo umount l o c a l m n t / dev


6 sudo umount l o c a l m n t / sys
7 sudo umount l o c a l m n t
The modified image is now ready to be used for gem5-X simulation with new applications or
benchmarks

Page 11 of 23
4 ARMv8 ISA Extension
This guide describes how to extend the ARMv8 ISA, using as an example the creation of a
custom “ADD1” instruction, which does exactly the same than the ARM ADD instruction, but using
one of the unallocated opcodes.
In order to extend the ISA of any architecture in Gem5, some unallocated opcodes need to
exist. There are several opportunities to extend the ARMv8 ISA in gem5-X as there exist a lot of
unallocated opcodes. The complete ARMv8 ISA can be found at:
https://static.docs.arm.com/ddi0487/ca/DDI0487C a armv8 arm.pdf

4.1 Adding a new custom instruction to the ARMv8 ISA


The first step towards extending the ISA is to find an unallocated opcode. If you have a look at
section “C4.1 A64 instruction set encoding” of the ARM ISA manual, you will find an unallocated
opcode field in Table C4-1. We will use this unallocated field to add our custom instruction. To add
it in gem5-X

1. Go to the ARM ISA folder


cd s r c / arch / arm / i s a /

2. Go to the Formats folder


cd f o r m a t s

3. Open the file aarch64.isa, which contains the top-level decoder functionality
vim aarch64 . i s a

4. Go to the end of the file and look for the function “def format Aarch64()”. This is where the
top-level decoding is done according to Table C4-1. You will see that when bit[27]=0 and
bit [28]=0, there is a call to “Unknown64()” function, as there is no instruction allocated for
this opcode. This is where we will add our instruction. You can remove the line where there
is a return of Unknown4() and add the following, where our instruction will be decoded to
“decodeCusDataProcImm” function.
/ / b i t 28:27=00
r e t u r n decodeCusDataProcImm ( machInst ) ;

5. Now, we need to add this new function. To do it, add it in the beginning of the file under
o u t p u t header {{
namespace Aarch64
{

6. Now, to add the ADD functionality to this new function, we implement it in a similar way as
is done for “AddXImm” function in the file. We also name our new instruction as Add1XImm.
Please refer to the modified aarch64.isa file that can be found here under the modified files
folder.

Page 12 of 23
4.2 Handling the new ADD1 instruction

7. The above was to add the instruction into the decoding path. To have the actual functionality
implemented as ADD, we need to modify the following:
cd . . / i n s t s / ( f u l l path i s gem5 / s r c / arch / arm / i s a / i n s t s )
vim data64 . i s a

8. Search for the line containing the original ADD instruction as


b u i l d D a t a I n s t ( ” add ” , ” Dest64 = resTemp = Op164 + secOp ; ” , ” add ” )

9. Below this we will add our custom ADD instruction as


b u i l d D a t a I n s t ( ” add1 ” , ” Dest64 = resTemp = Op164 + secOp ; ” , ” add1 ” ,
overrideOpClass = ” CusAluOp ” )

10. Note that have assigned this instruction an OpClass of CusAluOp. To enable this, we had to
add the overrideOpClass parameter to different functions in the file. Please see the attached
data64.isa file here under the modified files folder to see how we did it.

4.2 Handling the new ADD1 instruction


The new ADD1 instruction is now added to the ARM ISA. But here is no functional unit in
the CPU to handle this new instruction. The OpClass parameter tells which functional unit will
execute a given instruction. For the original ADD instruction, IntAlu executes it. For our new ADD1
instruction, we mentioned above that the CusAlu functional unit will implement it. To do it we need
to add the functional units in the CPU models in gem5-X.

1. cd . . / . . / . . / . . / cpu / ( f u l l path gem5 / s r c / cpu )


vim FuncUnit . py

2. We need to add the CusAlu unit under (Please refer to attached FuncUnit.py file here under
the modified files folder)
c l a s s OpClass (Enum ) :
v a l s c l a s s OpClass (Enum ) :
vals = [

3. Open the file op class.hh and add the following to the end
s t a t i c const OpClass CusAluOp = Enums : : CusAlu ;

4. Add the functional unit to the Minor CPU model.


cd minor /
vim MinorCPU . py

See the attached MinorCPU.py file for details

5. Add the functional unit to O3 CPU model


cd . . / o3

Edit the following files. (See the attached files here under the modified files folder for details)

Page 13 of 23
4.3 Testing that the ISA extension works

• FUPool.py
• FuncUnitConfig.py

6. Add the stats information in simple cpu


cd . . / s i m p l e /

Edit the following files. (Attached here under the modified files folder )

• base.cc
• exec context.hh

7. cd . . / . . / p r o t o / ( F u l l path gem5 / s r c / p r o t o )

8. Edit the file inst.proto and add following to the enum InstType
CusAlu = 3 4 ;

9. You can now compile gem5 using


scons b u i l d /ARM/ gem5 . f a s t

4.3 Testing that the ISA extension works


To test that the ARM64 ISA extension works, we have also written a test program in C++ using
inline assembly to call the new instruction and test it. The test program is also attached here under
the add1 test folder. (The program adds a constant 3 to the value being passed)

1. Run Gem5 in SE (System Emulation) mode using the test program. The output should look
like this:
C[ 0 ] = 13
C[ 1 ] = 23
C[ 2 ] = 33
C[ 3 ] = 43
C[ 4 ] = 53
C[ 5 ] = 63
C[ 6 ] = 73
C[ 7 ] = 83
C[ 8 ] = 93
C[ 9 ] = 103

The program should also work perfectly fine in full-system mode as we are using in-line
assembly to call the newly added instruction.

Page 14 of 23
5 High Bandwidth Memory v2 (HBM2)
High Bandwidth Memeory (HBM) is based on 3D stacked DRAM banks made possible due to
Through Silicon Vias (TSVs) achieving a high bandwidth of up to 307.2 GB/s. To implement the
functional behavior of the HBM2 memory model in gem5-X, we extend the DRAM controller model
of gem5 according to the architectural details of HBM2. To have 8-channels with memory inter-
leaving, we initialized 8 DRAM controllers, each 128 bits wide. We connect all 8 DRAM controllers
to a 1024-bit wide system bus, that connects to the cache hierarchy.
To use 8-channel HBM2 in gem5-X full system simulation, with appropriate bus widths through-
out the system all the way to the caches, use the following command:
1 cd <path to gem5 −X>/
2
3 . / b u i l d /ARM/ gem5 . { f a s t , opt , debug} \
4 −−remote −gdb− p o r t =0 \
5 −d / path / t o / your / o u t p u t / d i r e c t o r y \
6 c o n f i g s / example / f s . py \
7 −−cpu− c l o c k =1GHz \
8 −− k e r n e l = v m l i n u x \
9 −−machine − t y p e =VExpress GEM5 V1 \
10 −−dtb − f i l e =<f u l l p a t h t o g e m 5 −X>/ system / arm / d t / armv8 gem5 v1 1cpu . d t b \
11 −n 1 \
12 −− d i s k −image=gem5 ubuntu16 . img \
13 −−caches \
14 −−l2cache \
15 −− l 1 i s i z e =32kB \
16 −− l 1 d s i z e =32kB \
17 −− l 2 s i z e =1MB \
18 −− l 2 a s s o c =2 \
19 −−l2bus − w i d t h =128 \
20 −−membus− w i d t h =128 \
21 −−mem− t y p e =HBM2 2000 4H 1x128 \
22 −−mem− ranks =1 \
23 −−mem−channels =8 \
24 −−mem− s i z e =4GB \
25 −−sys − c l o c k =1600MHz \
No separate software support is required to use HBM2 in FS mode, and hence we are able to
boot the Ubuntu Linux distribution using HBM2.

The HBM2 memory model can be found in the following file


<f u l l p a t h t o g e m 5 −X>/ s r c /mem/ DRAMCtrl . py

Page 15 of 23
6 Core Clustering
Core clustering enables group of compute cores to have their own shared cache, which can be
last level cache (LLC), separate from other cores in the system. This reduces the shared resources
between different compute clusters in the system to just cross bar interconnect and memory. In
addition, clustering is also used when different type of cores are used in system. Same core types
are clustered together with their own LLCs. This enables to have a heterogeneous system.
Cluster in now supported in gem5-X. To have different core clusters in gem5-X, use the follow-
ing command:
1 . / b u i l d /ARM/ gem5 . { f a s t , opt , debug} \
2 −−remote −gdb− p o r t =0 \
3 −d / path / t o / your / o u t p u t / d i r e c t o r y \
4 c o n f i g s / example / f s . py \
5 −−cpu− c l o c k =1GHz \
6 −− k e r n e l = v m l i n u x \
7 −−machine − t y p e =VExpress GEM5 V1 \
8 −−dtb − f i l e =<path to gem5 −X>/ system / arm / d t / armv8 gem5 v1 <NUM CORES>cpu . d t b \
9 −n <NUM OF CORES> \
10 −− d i s k −image=gem5 ubuntu16 . img \
11 −−caches \
12 −−l2cache \
13 −− l 1 i s i z e =32kB \
14 −− l 1 d s i z e =32kB \
15 −− l 2 s i z e =1MB \
16 −− l 2 a s s o c =2 \
17 −− l 2 c l u s t e r s i z e =<NUM OF CORE PER CLUSTER> \
18 −−mem− t y p e =DDR4 2400 4x16 \
19 −−mem−channels =4 \
20 −−mem− ranks =4 \
21 −−mem− s i z e =4GB \
22 −−sys − c l o c k =1600MHz
This command will simulate a system with core clusters. Each cluster will have number of
cores defined in –l2 cluster size parameter. The number of cores defined by -n parameter should
be divisible by the –l2 cluster size. Dividing n by l2 cluster size, gives the number of clusters in
the system. Each cluster will have its own L2 (LLC) cache.

Page 16 of 23
7 Heterogeneous Cores
Heterogeneity enables different workloads with vartying performance and energy constraints to
be allocated to different core types in the system. Gem5-X supports both in-order and OoO cores
in the same system. Different core types are distributed into different clusters.
To use heterogeneity in gem5-X, first the system is launched to boot up the linux to reach the
region-of-interest (ROI), with the following command:
1 . / b u i l d /ARM/ gem5 . { f a s t , opt , debug} \
2 −−remote −gdb− p o r t =0 \
3 −d / path / t o / your / o u t p u t / d i r e c t o r y \
4 c o n f i g s / example / f s . py \
5 −−cpu− c l o c k =1GHz \
6 −− k e r n e l = v m l i n u x \
7 −−machine − t y p e =VExpress GEM5 V1 \
8 −−dtb − f i l e =<path to gem5 −X>/ system / arm / d t / armv8 gem5 v1 <NUM CORES>cpu . d t b \
9 −n <NUM OF CORES> \
10 −− d i s k −image=gem5 ubuntu16 . img \
11 −−caches \
12 −−l2cache \
13 −− l 1 i s i z e =32kB \
14 −− l 1 d s i z e =32kB \
15 −− l 2 s i z e =1MB \
16 −− l 2 a s s o c =2 \
17 −− l 2 c l u s t e r s i z e =<NUM OF CORE PER CLUSTER> \
18 −− c l u s t e r s i z e 1 =4 \
19 −−mem− t y p e =DDR4 2400 4x16 \
20 −−mem−channels =4 \
21 −−mem− ranks =4 \
22 −−mem− s i z e =4GB \
23 −−sys − c l o c k =1600MHz
This command will simulate a system with core clusters. The parameter –cluster size 1 defines
the size of the 1st cluster of type 1. This should be the same as –l2 cluster size. All the cores in
the remaining clusters will be of type 2. For instance, if number of cores is defined to be 16, and
both –l2 cluster size and –cluster size 1 are set to 4, this implies to have 4 clusters in the system,
each with 4 cores. The first cluster will have cores of type 1 and the remaining three clusters will
have cores of types 2.
Once the ROI is reached, take a checkpoint using ”m5 checkpoint” command. Then one can
resume from the checkpoint with the desired core types for each cluster. For the above code type
1 cores are set to be in-order and type 2 to be OoO, as in the following command:
1 . / b u i l d /ARM/ gem5 . { f a s t , opt , debug} \
2 −−remote −gdb− p o r t =0 \
3 −d / path / t o / your / o u t p u t / d i r e c t o r y \
4 c o n f i g s / example / f s . py \
5 −−cpu− c l o c k =1GHz \
6 −− k e r n e l = v m l i n u x \
7 −−machine − t y p e =VExpress GEM5 V1 \
8 −−dtb − f i l e =<path to gem5 −X>/ system / arm / d t / armv8 gem5 v1 <NUM CORES>cpu . d t b \

Page 17 of 23
9 −n <NUM OF CORES> \
10 −− d i s k −image=gem5 ubuntu16 . img \
11 −−caches \
12 −−l2cache \
13 −− l 1 i s i z e =32kB \
14 −− l 1 d s i z e =32kB \
15 −− l 2 s i z e =1MB \
16 −− l 2 a s s o c =2 \
17 −− l 2 c l u s t e r s i z e =<NUM OF CORE PER CLUSTER> \
18 −− c l u s t e r s i z e 1 =4 \
19 −−mem− t y p e =DDR4 2400 4x16 \
20 −−mem−channels =4 \
21 −−mem− ranks =4 \
22 −−mem− s i z e =4GB \
23 −−sys − c l o c k =1600MHz \
24 −r 1 \
25 −−cpu− t y p e =MinorCPU \
26 −−cpu− t y p e 2 =DerivO3CPU \

Page 18 of 23
8 Scratchpad Memory (SPM)
Scratchpad Memories (SPMs) are software programmable memories at the same level as L1
cache, but controlled by the user. Gem5-X supports SPMs, which are both local and shared
between two consecutive cores.
To use SPMs gem5-X, the following command can be used:
1 . / b u i l d /ARM/ gem5 . { f a s t , opt , debug} \
2 −−remote −gdb− p o r t =0 \
3 −d / path / t o / your / o u t p u t / d i r e c t o r y \
4 c o n f i g s / example / f s . py \
5 −−cpu− c l o c k =1GHz \
6 −− k e r n e l = v m l i n u x \
7 −−machine − t y p e =VExpress GEM5 V1 \
8 −−dtb − f i l e =<path to gem5 −X>/ system / arm / d t / armv8 gem5 v1 <NUM CORES>cpu . d t b \
9 −n <NUM OF CORES> \
10 −− d i s k −image=gem5 ubuntu16 . img \
11 −−caches \
12 −−l2cache \
13 −− l 1 i s i z e =32kB \
14 −− l 1 d s i z e =32kB \
15 −− l 2 s i z e =1MB \
16 −− l 2 a s s o c =2 \
17 −−mem− t y p e =DDR4 2400 4x16 \
18 −−mem− ranks =4 \
19 −−mem− s i z e =4GB \
20 −−sys − c l o c k =1600MHz \
21 −−spm \
22 −− d spm size =128kB
The –spm commands enabes SPM in gem5-Xand –d spm size defines the SPM size, which is
set to 128KB in the above example. The SPMs can be accessed by two consecutive cores. For
instance, SPM0 is accessible by core0 and core1, SPM1 by core1 and core2, SPM2 by core2 and
core3 and so on.
Since this is a FS mode of gem5-X, to use SPM, they need to be mapped using mmapp, as in
the following code:
1 void * spm mem alloc ( u i n t 6 4 t mem size , u i n t 6 4 t mem address )
2 {
3
4 u i n t 6 4 t a l l o c m e m s i z e , page mask , p a g e s i z e ;
5 void * mem pointer ;
6 void * v i r t a d d r ;
7
8 p a g e s i z e = s y s c o n f ( SC PAGESIZE ) ;
9 a l l o c m e m s i z e = ( ( ( mem size / p a g e s i z e ) + 1 ) * p a g e s i z e ) ;
10 page mask = ( p a g e s i z e − 1 ) ;
11
12 i n t mem dev = open ( ” / dev /mem” , O RDWR | O SYNC ) ;
13 i f ( mem dev == −1)

Page 19 of 23
14 {
15 p e r r o r ( ” Cannot open / dev /mem \n ” ) ;
16 / / r e t u r n −1;
17
18 }
19
20 mem pointer = mmap( NULL ,
21 alloc mem size ,
22 PROT READ | PROT WRITE ,
23 MAP SHARED,
24 mem dev ,
25 ( mem address & ˜ page mask )
26 );
27
28 i f ( mem pointer == MAP FAILED )
29 {
30 p e r r o r ( ” Cannot MAP \n ” ) ;
31 / / r e t u r n −1;
32 }
33
34 p r i n t f ( ” Memory Mapped \n ” ) ;
35 v i r t a d d r = ( mem pointer + ( mem address & page mask ) ) ;
36
37
38 return v i r t a d d r ;
39 }
The above core snippet returns a virtual pointer in SPM in FS mode. The parameter uint64 t
mem size is used to define the size of memory allocated within SPM. The parameter uint64 t
mem address defines the memory address of the SPM in physical memory space. So for SPM0
this should be at an offset after the main memory and I/O devices in gem5-X. So for instance of the
main memory size is 4GB, the offset for SPM0 should be 4GB+2GB(I/O devices memory space),
i.e. 6GB=6442450944. SPM1 should be at an offset defined by main-memory size + I/O devices +
SPM0 size.

Page 20 of 23
9 On-Chip Wireless Networking Integration and Modeling
On-chip wireless communication is enabled by interfacing a transceiver and a nano-antenna
to the system components. gem5-X-On-Chip-Wireless supports the modelling of this interconnect
strategy. The behaviour a wireless link with different latencies, bandwidths, and Medium Access
Control (MAC) protocols can be explored. This extension was the basis for the conference paper
“System-Level Exploration of In-Package Wireless Communication for Multi-Chiplet Platforms1 ”.

9.1 Running gem5-X Full System Mode with ARMv8, Linux and wireless exten-
sions
Table 1 reports the compatibility of gem5-X-ALPINE with respect to other gem5 extensions
in gem5-X. No guarantees of compatibility with any present or future gem5-X version should be
assumed beyond the ones provided in this table.

Table 1: gem5-X-On-Chip-Wireless Compatibility Chart

Extension Section Compatible with On-chip wireless? Notes


Support Enhancements 3 Yes
ARMv8 ISA Extension 4 Yes
HBM2 5 Yes
Core Clustering 6 Yes
Heterogeneous Cores 7 Yes
SPM 8 Yes
ALPINE — Yes
TiC-SAT — Untested

gem5-X-On-Chip-Wireless can be cloned from the associated repository via the following com-
mand
1 g i t c l o n e h t t p s : / / g i t h u b . com / gem5−X / On−Chip − W i r e l e s s . g i t
After the environment is set up, wireless-capable systems can be specified either from a termi-
nal command line or from a configuration file.

9.1.1 fining wireless-capable systems from the command line

1 . / b u i l d /ARM/ gem5 . { f a s t , opt , debug} \


2 −−remote −gdb− p o r t =0 \
3 −d / path / t o / your / o u t p u t / d i r e c t o r y \
4 c o n f i g s / example / f s . py \
5 −−cpu− c l o c k =1GHz \
6 −− k e r n e l = v m l i n u x \
7 −−machine − t y p e =VExpress GEM5 V1 \
8 −−dtb − f i l e =<path to gem5 −X>/ system / arm / d t / armv8 gem5 v1 <NUM CORES>cpu . d t b \
9 −n <NUM OF CORES> \
10 −− d i s k −image=gem5 ubuntu16 . img \
1
R. Medina et al., ASP-DAC 2023.

Page 21 of 23
9.1 Running gem5-X Full System Mode with ARMv8, Linux and wireless extensions

11 −−caches \
12 −−l2cache \
13 −− l 1 i s i z e =32kB \
14 −− l 1 d s i z e =32kB \
15 −− l 2 s i z e =1MB \
16 −− l 2 a s s o c =2 \
17 −−mem− t y p e =DDR4 2400 4x16 \
18 −−mem− ranks =4 \
19 −−mem− s i z e =4GB \
20 −−sys − c l o c k =1600MHz \
21 −−membus− w i r e l e s s \
22 −− w i r e l e s s −bandwidth =12.5GB/ s \
23 −−mac− p r o t o c o l = e x p b a c k o f f
The command above generates a system with ¡NUM CORES¿ number of cores, L1 and L2
caches of the defined sizes and a CPU clock of 1GHz. The system will mount a disk containing
Ubuntu Linux and boot from it. Options specifically related to Gem5-X-On-Chip-Wireless are in the
last three lines of the command. Using them, a wireless memory bus is instantiated, connecting
main memory with the L2 cache. The wireless link has a bandwidth of 12.5GB per second, and
employs an exponential backoff protocol to arbitrate bus collisions.
Command line options related to in-package wireless links are

• --l2bus-wireless: instantiates a wireless link connecting L1 and L2 caches

• --membus-wireless: instantiates a wireless link connecting L2 caches and main memories

• --wireless-bandwidth=<BANDWIDTH>: set the bandwidth of the wireless link

• --mac-protocol=<exp backoff / token pass>: selects the MAC protocol, either as expo-
nential backoff or as token passing, as described in D5.3.

• --retry-slot-size=<SIZE>: sets the size of the retry slot when using the exponential back-
off protocol, specified as a multiple of the time required to transmit a byte according to the
available bandwidth.

• --backoff-ceil=<MAX EXPONENT>: sets the upper limit to of the size of the retransmission
window in the exponential backoff protocol.

Defining systems via command line offers a fast avenue towards exploring the performance
of in-package wireless. Nonetheless, it also limits flexibility in the system generation. Indeed,
only systems with a two-levels cache hierarchy are supported, and only either L1/L2 and L2/main-
memory wireless links can be instantiated.

9.1.2 Defining wireless-capable systems with configuration files

In a system configuration file, a wireless link can be instantiated similarly to a standard gem5
crossbar, adding the parameters specific to wireless transmission (bandwidth, employed MAC pro-
tocol etc..). An example related to link using an exponential backoff protocol is reported below.
system . w i r e l e s s l i n k = WirelessXBar (
c l k d o m a i n = system . clk domain ,

Page 22 of 23
9.2 On-Chip-Wireless module implementation files

bandwidth = o p t i o n s . w i r e l e s s b a n d w i d t h ,
m a c p r o t o c o l = o p t i o n s . ma c p ro to co l ,
r e t r y s l o t s i z e = options . r e t r y s l o t s i z e ,
b a c k o f f c e i l = options . b a c k o f f c e i l )
Alternatively, we provide components that define a wireless bus adapted to work as an inter-
connect between cache and memories: WirelessL2XBar and WirelessSystemBar.

To add one element to the wireless interconnect, its port can be attached to the wireless cross-
bar in the following way:
system . l1cache . master = system . w i r e l e s s l i n k . s l a v e
system . l2cache . s l a v e = system . w i r e l e s s l i n k . master
An example of a definition of a wireless bus between the L1 caches and the L2 cache can be
found in configs/common/CacheConfig wirelessExample.py. The example is run by executing the
example in configs/example/fs wirelessExample.py.

9.2 On-Chip-Wireless module implementation files


The main files containing the wireless module descriptions are in the directory src/mem/ . A
brief description is provided in the following:
• WirelessXBar.py contains a description of the crossbar and of its parameters.
• wireless xbar.hh is a header file defining variables and functions prototypes of the wireless
module.
• wireless xbar.cc describes the functionality of the gem5-X wireless module. It is build on
the standard crossbar implementation in gem-5, but supports collision detection and retrans-
mission based on token passing and exponential backoff protocols, as well as a parametric
link bandwidth.

9.3 Example Application


Included in the gem5-X-On-Chip-Wireless repository is the STREAM benchmark suite, avail-
able in the benchmarks/Stream/ Stream.c file.The suite is provided with no adaptation in the C
code, as the on-chip wireless components are transparent to software. OpenMP pragmas are
instead updated in order (by the TUNED pragma) to distribute the execution on different cores,
mapping a parallel thread to each processor. As an example, the code snippet below illustrates
the Copy() benchmark:
# i f d e f TUNED
/ s t u b s f o r ” tuned ” v e r s i o n s o f t h e k e r n e l s /
void tuned STREAM Copy ( )
{
ssize t j ;
#pragma omp p a r a l l e l f o r p r o c b i n d ( c l o s e )
f o r ( j =0; j <STREAM ARRAY SIZE ; j ++)
c[ j ] = a[ j ];
}
The modified benchmark can be compiled through through the script arm compile.sh.

Page 23 of 23

You might also like