Design and Evaluation of Open-Source Soft-Core Processors
Abstract
:1. Introduction
1.1. ASICs
1.2. NAND Gates
1.3. FPGAs
- LUT: LookUp tables implement all logic in FPGAs and can be categorized by the number of address lines they require. A LUT4 has 16 words of a single bit each and needs 4-bit addresses. A LUT6 has 64 bits and needs six address lines. A larger LUT can always perform the job of a smaller one by either tying unused address lines to zero or one or else duplicating the bits such that the output does not depend on that address line. Combining smaller LUTs into a larger one is facilitated by special “mux” blocks in some FPGAs (XC4000), enhancing their efficiency.
- Registers: The LUTs are purely combinational, and an optional flip–flop circuit at the output enables the implementation of sequential circuits. Normally, one register is associated with one LUT, but there tend to be some extra registers as part of the I/O pads.
- DSP: Digital signal processing blocks are hardware implementations of multiplication circuits. Otherwise, a very large number of LUTs would be required to implement this operation, which has many more uses beyond digital signal processing.
- Distributed memory: Each LUT is actually a very small random access memory (RAM), generally unaltered after the initial FPGA configuration. An additional circuit allows the use of all LUTs or a fraction as read/write memories.
- Block memory: The area needed to store a bit in a register or even in a LUT is very large compared to a dedicated RAM circuit. Since the 1990s, FPGAs have included a number of memory blocks that can efficiently handle a medium to large number of bits.
FPGA Families
1.4. RISC-V Soft Cores
1.5. Other Soft Cores
2. Objectives
3. Materials and Methods
3.1. Processor Specifications
3.1.1. State Registers
- PH/PL: 16-bit (8H/8L) program counter in normal execution mode;
- MH/ML: 16-bit (8H/8L) pointer for indirect “zero page” operands;
- IH/IL: 16-bit (8H/8L) program counter in interrupt mode;
- LH/LL: 16-bit (8H/8L) address saved in the last call instruction;
- ZH/ZL: 16-bit (8H/8L) address of the “zero page” operand;
- TH/TL: 16-bit (8H/8L) timer to define the number of cycles to pause before the next instruction;
- K: 8-bit single register for ’cascades’—values between pairs of instructions;
- W, X, and Y: 8-bit single registers accessible to the programmers for reading and writing.
3.1.2. Basic Syntax
3.1.3. K-Cascades
3.1.4. Source and Destination
- The basic operations opcodes;
- The immediate instructions opcodes;
- The control flow instructions opcodes;
- The conditional tests.
3.1.5. Shifts and Rotations
Listing 1. Shifts and rotations syntax example in Baby8. |
Listing 2. Shifts and rotations syntax example in Baby8. |
3.1.6. Interrupt
3.1.7. Timer
3.2. Custom Processor Design
3.2.1. DATAPATH
3.2.2. CONTROL UNIT
3.2.3. ALU
4. Results
4.1. Performance
4.2. Layouts
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ALU | Arithmetic logic unit |
ASIC | Application-specific integrated circuit |
CISC | Complex instruction set computer |
CPU | Central processing unit |
FPGA | Field-programmable gate array |
I/O | Input/output |
RAM | Random access memory |
RISC | Reduced instruction set computer |
References
- Glaser, J.; Wolf, C. Methodology and Example-Driven Interconnect Synthesis for Designing Heterogeneous Coarse-Grain Reconfigurable Architectures. In Proceedings of the Models, Methods, and Tools for Complex Chip Design; Haase, J., Ed.; Springer: Cham, Switzerland, 2014; pp. 201–221. [Google Scholar]
- Shah, D.; Hung, E.; Wolf, C.; Bazanski, S.; Gisselquist, D.; Milanovic, M. Yosys+nextpnr: An Open Source Framework from Verilog to Bitstream for Commercial FPGAs. In Proceedings of the 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, CA, USA, 28 April–1 May 2019; pp. 1–4. [Google Scholar] [CrossRef]
- Assumpção, J. Baby8. 2023. Available online: https://github.com/jeceljr/baby8 (accessed on 11 February 2024).
- Patterson, D.A.; Hennessy, J.L. Computer Organization and Design ARM Edition: The Hardware Software Interface; Morgan Kaufmann: Cambridge, MA, USA, 2016. [Google Scholar]
- Hiremath, S.; Chickerur, S.; Dandin, J.; Patil, M.; Muddinkoppa, B.; Adakoli, S. Open-source Hardware: Different Approaches to Softcore implementation. In Proceedings of the 2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), Shivamogga, India, 14–15 October 2022; pp. 76–83. [Google Scholar] [CrossRef]
- SiFive. 2023. Available online: https://www.sifive.com/about (accessed on 11 February 2024).
- Heinz, C.; Lavan, Y.; Hofmann, J.; Koch, A. A Catalog and In-Hardware Evaluation of Open-Source Drop-In Compatible RISC-V Softcore Processors. In Proceedings of the 2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig), Cancun, Mexico, 9–11 December 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Gruin, A.; Carle, T.; Cassé, H.; Rochange, C. Speculative Execution and Timing Predictability in an Open Source RISC-V Core. In Proceedings of the 2021 IEEE Real-Time Systems Symposium (RTSS), Dortmund, Germany, 7–10 December 2021; pp. 393–404. [Google Scholar] [CrossRef]
- Coluccio, A.; Ieva, A.; Riente, F.; Roch, M.R.; Ottavi, M.; Vacca, M. RISC-Vlim, a RISC-V Framework for Logic-in-Memory Architectures. Electronics 2022, 11, 2990. [Google Scholar] [CrossRef]
- Copeland, B. TheManchester Computer: A Revised History Part 2: The Baby Computer. IEEE Ann. Hist. Comput. 2011, 33, 22–37. [Google Scholar] [CrossRef]
- Ghazy, A.A.; Shalan, M. OpenLANE: The Open-Source Digital ASIC Implementation Flow. 2020. Available online: https://woset-workshop.github.io/PDFs/2020/a21.pdf (accessed on 11 February 2024).
- Wang, H.; Li, T.; Li, Y.; Chen, L.; Sima, C.; Liu, Z.; Wang, B.; Jia, P.; Wang, Y.; Jiang, S.; et al. OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HDMapping. In Proceedings of the Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
Source and Destination | |||||
---|---|---|---|---|---|
Syntax | Cycle 1,2 | Range | Cycle 3,4,5 | Cycle 6… | Access |
zpa zpa zpa *zpa *zpa++ | ZL = *(PH,PL)++ | 0 ≥ ZL ≥ 11 12 ≥ ZL ≥ 15 16 ≥ ZL ≥ 127 128 ≥ ZL ≥ 254 129 ≥ ZL ≥ 255 | M = *(ZH,ZL) M = *(ZH,ZL) | *(Z) = M + 1 | register I/O port *(ZH,ZL) *(MH,ML) *(MH,ML) |
Basic Instructions | ||||
---|---|---|---|---|
Opcode | Binary | Syntax | K-Binary | K-Syntax |
add | 0000 ssdd | D += S | 0001 ssdd | K = D + S |
subtract | 0010 ssdd | D −= S | 0011 ssdd | K = D − S |
move | 0100 ssdd | D = S | 0101 ssdd | K = D = S |
test | 0110 cccc | cond ? | 0111 cccc | K = cond |
and | 1000 ssdd | D &= S | 1001 ssdd | K = D & S |
or | 1010 ssdd | D |= S | 1011 ssdd | K = D | S |
exclusive or | 1100 ssdd | D ^= S | 1101 ssdd | K = D ^ S |
See next table | 1111 ffdd |
Immediate Instructions | ||
---|---|---|
Opcode | Binary | Syntax |
add | 1110 00dd | D += # |
subtract | 1110 01dd | D −= # |
move | 1110 10dd | D = # |
See next table | 1110 11ff | |
and | 1111 00dd | D &= # |
or | 1111 01dd | D |= # |
exclusive or | 1111 10dd | D ^= # |
See next table | 1111 11ff |
Control Flow Instructions | |||
---|---|---|---|
Opcode | Binary | Syntax | Internal Operation |
jump | 1110 1100 | >>>> expr | PH,PL := ## |
call | 1110 1101 | >>>>$ expr | LH,LL := PH,PL;PH,PL := ## |
nop | 1110 1110 | ~ | |
return | 1110 1111 | <<<< | PH,PL := LH,LL |
jump | 1111 1100 | >>>> *zpa | PH,PL := *zpa |
call | 1111 1101 | >>>>$ *zpa | LH,LL := PH,PL;PH,PL := *zpa |
branch | 1111 1110 | >>>> #expr | PH,PL += # |
table | 1111 1111 | <<<< zpa | PH,PL := [LH,LL + zpa] |
Conditional Tests | |||||||
---|---|---|---|---|---|---|---|
Conditional | Name | Code | Syntax | Alternative Syntax | Code | Syntax | Alternative Syntax |
Z | Equal | 011k 0000 | == | Z | 011k 0001 | != | !Z |
C | Greater equal | 011k 0010 | >= | C | 011k 0011 | < | !C |
N | Negative | 011k 0100 | <0 | N | 011k 0101 | >=0 | !N |
V | Overflow | 011k 0110 | V | 011k 0111 | !V | ||
C & !Z | Greater | 011k 1000 | > | 011k 1001 | <= | ||
N==V | Signed greater equal | 011k 1010 | $>= | 011k 1011 | $< | ||
!Z & N==V | Signed greater | 011k 1100 | $> | 011k 1101 | $<= | ||
1 | TRUE | 011k 1110 | true | 011k 1111 | false |
Address (8-Bits) | Control (32-Bits) | Address (8-Bits) | Control (32-Bits) |
---|---|---|---|
0x00 | 0xE3308801 | 0x40 | 0x30380803 |
0x01 | 0xF3308801 | 0x41 | #N/DISP |
0x02 | 0x43308801 | 0x42 | 0x303A6812 |
0x03 | 0x53308800 | 0x43 | 0xFF30180A |
0x04 | 0x0 | 0x44 | 0xFF30180A |
0x05 | 0x0 | 0x45 | 0x303A6811 |
0x06 | 0x0 | 0x46 | #N/DISP |
0x07 | 0x0 | 0x47 | 0x3CD00200 |
0x08 | 0x44311800 | 0x48 | 0x3300800 |
0x09 | 0x55311801 | 0x49 | 0x3300000 |
0x0A | 0x44310800 | 0x4A | 0 |
0x0B | 0x0 | 0x4B | 0x8E300801 |
0x0C | 0xEEF01C05 | 0x4C | 0x9F300809 |
0x0D | 0xFF30190F | 0x4D | 0xEC300801 |
0x0E | 0x3330010F | 0x4E | 0xFD300800 |
0x0F | 0x0 | 0x4F | 0x0 |
0x10 | 0xEEF01C05 | 0x50 | 0xC0880805 |
0x11 | 0xFF30180E | 0x51 | 0xD9301802 |
0x12 | 0x3330000E | 0x52 | 0xD9300801 |
0x13 | 0xEE301805 | 0x53 | 0xCCD01C05 |
0x14 | 0xFF301801 | 0x54 | 0xE3340802 |
0x15 | 0xEE301800 | 0x55 | 0xE3340802 |
0x16 | 0xFF301800 | 0x56 | 0x3CD00402 |
0x17 | 0x0 | 0x57 | 0xDD301802 |
0x18 | 0x3330B800 | 0x58 | 0xF3340800 |
0x19 | 0x0 | 0x59 | 0x3CD00401 |
0x1A | 0x21026810 | 0x5A | 0xF3340800 |
0x1B | 0x0 | 0x5B | 0x0 |
0x1C | 0x8E300801 | 0x5C | 0x201A6810 |
0x1D | 0x9F300801 | 0x5D | 0x0 |
0x1E | 0xEEF01C05 | 0x5E | #N/DISP |
0x1F | 0xC3340802 | 0x5F | 0x3CD00200 |
0x20 | 0xC3340802 | 0x60 | 0x3300800 |
0x21 | 0xFF301801 | 0x61 | 0x3300000 |
0x22 | 0x3EF00401 | 0x62 | #N/DISP |
0x23 | 0xF3340801 | 0x63 | 0x0 |
0x24 | 0xEC300800 | 0x64 | 0x30380809 |
0x25 | 0x0 | 0x65 | #N/DISP |
0x26 | 0xE8300801 | 0x66 | 0x3CD00200 |
0x27 | 0xF9300800 | 0x67 | 0x3300800 |
0x28 | 0x0 | 0x68 | 0x3300000 |
0x29 | 0xEEF01C05 | 0x69 | 0x0 |
0x2A | 0x13166812 | 0x6A | 0x0 |
0x2B | 0x13166810 | 0x6B | 0x0 |
0x2C | 0xFF301800 | 0x6C | 0x0 |
0x2D | 0x0 | 0x6D | 0x0 |
0x2E | 0x0 | 0x6E | 0x0 |
0x2F | 0x0 | 0x6F | 0x0 |
0x30 | 0x0 | 0x70 | 0xEEF01C05 |
0x31 | 0x0 | 0x71 | 0x63340802 |
0x32 | 0x0 | 0x72 | #N/DISP |
0x33 | 0x0 | 0x73 | #N/DISP |
0x34 | 0x0 | 0x74 | 0x36700408 |
0x35 | 0x0 | 0x75 | 0x367004C1 |
0x36 | 0x0 | 0x76 | 0xC3340801 |
0x37 | 0x0 | 0x77 | 0x367004E1 |
0x38 | 0x0 | 0x78 | #N/DISP |
0x39 | 0x0 | 0x79 | 0x3C301005 |
0x3A | 0x0 | 0x7A | 0x367002C2 |
0x3B | 0x0 | 0x7B | 0x367002C3 |
0x3C | 0x0 | 0x7C | 0x3D301001 |
0x3D | 0x0 | 0x7D | 0x367002E1 |
0x3E | #N/DISP | 0x7E | 0x3CD00408 |
0x3F | #N/DISP | 0x7F | 0x0 |
Device | Measured Item | Baby8 | Dark RISC-V | Vex RISC-V | Glacial | Pico RV32 | SERV |
---|---|---|---|---|---|---|---|
ASIC (130 nm) | Max.Clock (MHz) Power (mW) Efficiency (MHz/mW) DIE area CORE area (µm2) | 57.69 1.99 28.99 26,106 20,888 | 41.39 5.46 7.58 147,331 134,792 | 61.33 30.04 2.04 375,121 354,847 | 102.33 1.52 76.32 18,815 14,423 | 59.88 14.25 4.20 259,337 242,546 | 116.29 2.38 48.86 28,436 22,778 |
ASIC/ FPGA | NAND Gates | 3020 | 18,076 | 49,206 | 2063 | 34,508 | 3245 |
FPGAs Xilinx 7 | LUTs Registers D. Memory B. Memory DSPs | 31 8 4 - - | 1018 184 12 - - | 1233 914 - 3 - | 142 84 - - - | 1072 573 12 - - | 212 182 - - - |
FPGAs Cyclone V | LUTs Registers D. Memory B. Memory DSPs | 29 8 16 - - | 920 196 64 - - | 1184 944 28 3 - | 146 84 - - - | 907 609 - 2 - | 197 182 - - - |
FPGAs ICE40 | LUTs Registers D. Memory B. Memory DSPs | 285 136 - - - | 1414 210 - 4 - | 1697 1112 - 8 - | 232 84 - - - | 1648 597 - 4 - | 259 182 - - - |
FPGAs GoWin | LUTs Registers D. Memory B. Memory DSPs | 48 8 4 - - | 1750 184 16 - - | 2010 1112 - 4 - | 280 84 - - - | 1299 574 32 - - | 343 182 - - - |
FPGAs ECP5 | LUTs Registers D. Memory B. Memory DSPs | 77 8 4 - - | 1378 184 16 - - | 1774 1112 - 4 - | 267 84 - - - | 1233 574 32 - - | 287 182 - - - |
Device | Measured Item | Baby8 | 6502 | Femto 16 | J0 | MCPU | UKP | ZPU |
---|---|---|---|---|---|---|---|---|
ASIC (130 nm) | Max.Clock (MHz) Power (mW) Efficiency (MHz/mW) DIE area CORE area (µm2) | 57.69 1.99 28.99 26,106 20,888 | 58.87 1.47 40.04 43,788 36,697 | 65.10 1.23 52.92 61,532 53,495 | 55.68 5.06 11.00 166,680 153,327 | 102.91 0.31 331.96 4799 2733 | 135.05 4.42 30.55 34,026 27,858 | 63.94 14.55 4.39 70,809 61,171 |
ASIC/FPGA | NAND Gates | 3020 | 4890 | 6976 | 21,003 | 424 | 3909 | 8025 |
FPGAs Xilinx 7 | LUTs Registers D. Memory B. Memory DSPs | 31 8 4 - - | 352 114 - 1 - | 630 194 - - - | 382 43 6 - 1 | 36 24 - - - | 252 163 - - - | 646 239 - 1 - |
FPGAs Cyclone V | LUTs Registers D. Memory B. Memory DSPs | 29 8 16 - - | 307 96 - 3 - | 635 194 - - - | 338 67 - 2 1 | 26 24 - - - | 154 151 - 1 - | 512 240 - 2 - |
FPGAs ICE40 | LUTs Registers D. Memory B. Memory DSPs | 285 136 - - - | 544 96 - 7 - | 1100 194 - - - | 821 67 - 2 - | 35 24 - - - | 273 151 - 1 - | 851 240 - 5 - |
FPGAs GoWin | LUTs Registers D. Memory B. Memory DSPs | 48 8 4 - - | 502 95 - 2 - | 1099 194 - - - | 1085 43 16 - - | 29 24 - - - | 285 151 - 1 - | 796 239 - 1 - |
FPGAs ECP5 | LUTs Registers D. Memory B. Memory DSPs | 77 8 4 - - | 478 95 - 2 - | 1159 194 - - - | 784 43 16 - 1 | 30 24 - - - | 299 151 - 1 - | 901 239 - 1 - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gazziro, M.; Junior, J.M.d.A.; Junior, O.H.A.; Cavallari, M.R.; Carmo, J.P. Design and Evaluation of Open-Source Soft-Core Processors. Electronics 2024, 13, 781. https://doi.org/10.3390/electronics13040781
Gazziro M, Junior JMdA, Junior OHA, Cavallari MR, Carmo JP. Design and Evaluation of Open-Source Soft-Core Processors. Electronics. 2024; 13(4):781. https://doi.org/10.3390/electronics13040781
Chicago/Turabian StyleGazziro, Mario, Jecel Mattos de Assumpção Junior, Oswaldo Hideo Ando Junior, Marco Roberto Cavallari, and João Paulo Carmo. 2024. "Design and Evaluation of Open-Source Soft-Core Processors" Electronics 13, no. 4: 781. https://doi.org/10.3390/electronics13040781