0% found this document useful (0 votes)
49 views25 pages

Micro2022 Xiangshan Slides

The document discusses the development of the XiangShan open-source high-performance RISC-V processor and the MinJie platform that supports agile development flows and tools. It highlights the challenges in processor verification and presents solutions such as Diff-Rules based Agile Verification (DRAV) and Lightweight Simulation SnapShot (LightSSS). The work aims to lower barriers in chip development through open-source collaboration and efficient methodologies.

Uploaded by

wajahat.riaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views25 pages

Micro2022 Xiangshan Slides

The document discusses the development of the XiangShan open-source high-performance RISC-V processor and the MinJie platform that supports agile development flows and tools. It highlights the challenges in processor verification and presents solutions such as Diff-Rules based Agile Verification (DRAV) and Lightweight Simulation SnapShot (LightSSS). The work aims to lower barriers in chip development through open-source collaboration and efficient methodologies.

Uploaded by

wajahat.riaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

At the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), October 1–5, 2022

Yinan Xu∗†, Zihao Yu∗, Dan Tang∗‡, Guokai Chen∗†, Lu Chen∗†, Lingrui Gou∗†, Yue Jin∗†, Qianruo Li∗†, Xin Li∗†, Zuojun Li∗†,
Jiawei Lin∗†, Tong Liu∗, Zhigang Liu∗, Jiazhan Tan∗, Huaqiang Wang∗†, Huizhe Wang∗†, Kaifan Wang∗†, Chuanqi Zhang∗†, Fawang Zhang∥,
Linjuan Zhang∗†, Zifei Zhang∗†, Yangyang Zhao∗, Yaoyang Zhou∗†, Yike Zhou∗, Jiangrui Zou∥, Ye Cai∥, Dandan Huan¶, Zusong Li¶, Jiye Zhao¶,
Zihao Chen§, Wei He§, Qiyuan Quan§, Xingwu Liu∗∗, Sa Wang∗†, Kan Shi∗, Ninghui Sun∗† and Yungang Bao∗†

∗State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
†University of Chinese Academy of Sciences
‡Beijing Institute of Open Source Chip
§Peng Cheng Laboratory
¶Beijing VCore Technology
∥Shenzhen University
∗∗Dalian University of Technology
The Era of Agile and Open-Source Hardware
Customized code
< 10% LOC

Open-Source
Chip Ecosystem

ISAs/IPs/ Languages/ Verification/ > 90% LOC


OS/Compiler
SoCs EDA tools Simulation

Platform

To lower the barrier of chip development


by saving time-to-market and the cost of
IPs, EDA tools, facilities and engineers etc.
2
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Agile and Open-Source Chip Ecosystem
Customized code < 10% LOC
Open-Source Chip Ecosystem
Open-
Source
> 90%

ISAs/IPs/SoCs Languages/EDA tools Verification/Simulation OS/Compiler


Platform
3
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Agile Approaches Adopted in the Industry
• When we talked to some of the leading companies …

ISAs/IPs/ Languages/ Verification/


We: OS/Compiler
SoCs EDA tools Simulation

Big Ones:

4
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Major Concerns Regarding the Agile Methodology
Complexity
Concern #2: how do you verify the processors?

Verification Complexity

Verification Gap
Design Complexity

Concern #1: is it ready for complicated processors?

Time
Source: Serge Leef, Reimagining
5
Digital Simulation, DAC 2021. Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
This Work: Let’s Do It and See What’s Happening

ISAs/IPs/ Languages/ Verification/


We: SoCs EDA tools Simulation
OS/Compiler

Concern #1: it’s not ready for complicated processors.


Big Ones:
Concern #2: the verification process is still less agile.

We: XiangShan: High Performance RISC-V Processors

6
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
XiangShan: Open-Source High Performance Processors
• 1st generation: YQH
• RV64GC, single-core, superscalar OoO
• 28nm tape-out, 1.3GHz, July 2021
• SPEC CPU2006 7.01@1GHz, DDR4-1600
21.69

15.00 15.73
• 2nd generation: NH 7.16
7.03
8.65 9.00
9.24 9.90 11.00
6.11
• RV64GCBK, dual-core, superscalar OoO
• Scheduled 2GHz@14nm tape-out, Q4 2022
• Estimated** SPEC CPU2006 19.45@2GHz
SPECint 2006/GHz* (Proportional to IPC)
• 3rd generation: KMH
• RV64GCBKHV, quad-core, superscalar OoO
• Close collaboration with industrial partners

* Source: XT910@ISCA’20, SiFive, AnandTech 7


** Updated September 14, 2022 Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
How We Built XiangShan: MinJie
XiangShan: Open-Source High Performance RISC-V Processor

MinJie: Open-Source Platform with Agile Development Flows and Tools

8
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
MinJie: Platform with Agile Development Flows and Tools

Feature Request
Debugging

Verification
Agile HDL Verilog Loop

Silicon-Proven IP
Simulation

9
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
MinJie: Platform with Agile Development Flows and Tools

Feature Request
Debugging
Borken Workflow

Verification
Agile HDL Verilog Loop

Silicon-Proven IP
Simulation

10
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
MinJie: Platform with Agile Development Flows and Tools

To Build a Closed and Agile Loop


Feature Request
Debugging

Verification
Agile HDL Verilog Loop

Silicon-Proven IP
Simulation

11
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
This Work
• XiangShan: High Performance RISC-V Processor

• MinJie: Platform with Agile Development Flows and Tools


• DRAV and DiffTest: Functional Verification
• LightSSS: Debugging

Debugging
Verification
Agile HDL Verilog Loop

Simulation

12
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Processor Functional Verification

RISC-V RISC-V
Processor Specifications

An ideal verification world:

13
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Processor Functional Verification: Reality

NEMU

Dromajo
Non-Executable
RISC-V
Processor =? = RISC-V
Specifications
SOTA: Let’s co-simulate them and compare the results!
14
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Processor Co-simulation to Find Bugs
RISC-V Model Processor Under Test
Equal
Initial State Initial State

Instruction Instruction
Executed Executed

Compare
Next State Next State

Match: No Bug Mismatch: Bug!


Negative Result Positive Result

15
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Challenge: False Positives
• Example: Linux allocates valid PTEs and lazily executes memory-barrier instructions
• To avoid frequent TLB flushes for better performance
Most PTE Allocation PTE
PTE Allocation
Allocated PTE Allocated CPU
RISC-V by OS (SD) by OS (SD) by OS (SD) Under Test
Models
LD Instruction LD Instruction Store Queue
Page Access Page Access

LD Finishes Mismatch LD: Page Fault


Normally Exception
TLB Miss!
TLB Flushed by
OS (sfence.vma)
However, both are legal! LD Instruction Cache Hierarchy
It’s actually a false positive! Page Re-Access
TLB Hit!
LD Finishes
Normally
16
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Observation: False Positives and Non-Determinism
• Designs are deterministic – as RTL/C++ code is fixed.
next step

𝑹 𝑷 𝒊 : 𝑺𝑷 𝒊 × 𝑬 → 𝑺𝑷𝒊 for Processor 𝑷𝒊


• Verification should be non-deterministic – as specifications allow diverse designs.

𝓡: 𝑺𝓟 × 𝑬 → 𝟐𝑺𝓟 for Processors 𝓟 = {𝑷𝒊 , 𝒊 ∈ ℕ}


17
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
This Work: Diff-Rule based Agile Verification (DRAV)

Probes: bridge hardware design and verification


Design: 𝑅𝑃𝑖 : 𝑆𝑃𝑖 × 𝐸 → 𝑆𝑃𝑖 for Processor 𝑷𝒊


𝒇

Reference: ℛ: 𝑆𝒫 × 𝐸 → 2𝑆𝒫 for Processors 𝓟 = {𝑷𝒊 , 𝒊 ∈ ℕ}

Diff-Rules: abstract legal behaviors defined in specifications

DRAV: To Check Whether 𝒇 𝑹𝑷𝒊 𝒔𝑷𝒊 , 𝒆 ∈ 𝓡 𝒇 𝒔𝑷𝒊 , 𝒆

18
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Towards 1-to-N Correspondence Between REF and Designs

One REF for One Processor Design

Previously: 𝑅𝑃𝑖 : 𝑆𝑃𝑖 × 𝐸 → 𝑆𝑃𝑖 for Processor 𝑷𝒊

Now: ℛ: 𝑆𝒫 × 𝐸 → 2𝑆𝒫 for Processors 𝓟 = {𝑷𝒊 , 𝒊 ∈ ℕ}

One REF for N Processor Designs

19
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
This Work: DiffTest for RISC-V Processors
• Idea: acknowledge the non-deterministic nature of ISA – the ultimate golden model
• DiffTest: the state-of-the-art co-simulation framework for RISC-V processors
• This Work: Identify and Specify Sources of Behavioral Non-Determinism in ISA using Diff-Rules
• Dromajo[1]: to avoid non-deterministic sources such as the Debug Transport DTM (MMIO)
• Imperas[2]: to extract asynchronous information from micro-architecture RTL pipeline
Categories Sub-Categories Examples Addressed Before
Static Impl. Dependent Registers CSR, MMIO Devices [1][2]
Impl. Dependent Registers Timer, Counters [2]
Asynchronous Events External/Timer Interrupts [1][2]
Dynamic Speculative Execution Instr./Load/Store Page Fault This Work Only
Memory Model Memory Accesses in Multi-core This Work Only
Hardware Timing LR/SC, Instruction Fusion, Caches This Work Only

[1] Nursultan et al., Effective Processor Verification with Logic Fuzzer Enhanced Co-simulation. MICRO-54, 2021.
[2] Kevin McDermott. Brief Introduction to the 5 Levels of RISC-V Processor Verification. RISC-V Summit, 2021.
20
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Accelerating Simulation Debugging with Snapshots
• Idea: to re-construct the last cycles of simulation after abort

Time++ Time++

Snapshot Taken
Wakeup

Time++

Waveform
High Simulation Speed Enabled

Slow Simulation Speed


21
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
This Work: Lightweight Simulation SnapShot (LightSSS)
• Challenge: Minimize the Performance Overhead of Snapshots

• Key Insight: using fork from Linux to take snapshots of the simulation process
• Differential snapshot: Copy On Write mechanism provided by the OS
• Portability: transparent to the circuits, RTL simulators, external C/C++ models
• Efficiency: in-memory snapshots without disk I/O
Single-Core CoreMark Dual-Core Linux Boot
Simulation Time / s

2000
1500
Minor performance overhead:
1000
As snapshot interval size increases,
500 simulation speed remains stable
0

Snapshot Interval / s
22
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Practice of Agile Development on XiangShan
① Design Optimization ② Paper Reproduction

One third-year PhD student, 200 minutes on XiangShan

• 3 graduate students
• 11 days for a functionally correct prototype
• 37 bugs in 5 days
• 38 days to boot Linux with BPU
• 51 days for the overall frontend architecture

More details in our paper


23
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Conclusion
• XiangShan: High Performance RISC-V Processor

• MinJie: Platform with Agile Development Flows and Tools


• DRAV and DiffTest: Functional Verification
• LightSSS: Debugging

• Both XiangShan and MinJie are Open-Source


• Follow us at github/OpenXiangShan with 3K stars ❤ from the lovely community
• Artifacts Available, Functional, and Reproduced

24
Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Thanks!
Open-Source Chip Working Group

Together for a Shared Future


To Lower the Barrier of Chip Development

XiangShan
Team

Open-Source Collaborators

You might also like