Whitepaper - Whitebox Approach For Verifying PCIe Link Training and Status State Machine
Whitepaper - Whitebox Approach For Verifying PCIe Link Training and Status State Machine
Whitepaper - Whitebox Approach For Verifying PCIe Link Training and Status State Machine
Colm McSweeney
Joe McCann
Pinal Patel
Gaurang Chitroda
Synopsys Inc.,
Dublin, Ireland
www.synopsys.com
eInfochips,
Sunnyvale, USA
www.einfochips.com
ABSTRACT
Serial communication protocols like PCI Express and USB have evolved to enable high
operating speeds. This evolution has resulted in their PHY Layer protocol growing in
complexity, especially the Link Training and Status State Machines (LTSSM's) logic. The toplevel DUT behaviour does not reveal if the LTSSM functionality was correct and whether it hit
the expected state transitions properly.
A traditional testbench often concentrates on higher level functionality of LTSSM such as
achieving a working link, link speed and width updates, among other things. This paper talks
about a new approach that reduces verification time and verifies the micro-level details of
LTSSM functionality. Using this approach, we found more than 60 LTSSM RTL bugs in the DUT.
Also, number of tests required were reduced to around 50 as compared to 500 tests in a legacy
testbench. This approach is not just limited to the LTSSM, but can be re-purposed to verify any
other complex state machine.
Table of Contents
1.
Introduction ........................................................................................................................... 3
2.
3.
4.
5.
6.
Results ................................................................................................................................. 16
7.
Summary ............................................................................................................................. 18
8.
References ........................................................................................................................... 19
Table of Figures
Figure 1Basic Block Diagram for LTSSM WB Testbench
Figure 2 base_state class and its important member used in state emulation
10
11
13
15
SNUG 2014
1. Introduction
The serial protocols like PCI Express and USB have evolved over the years to provide very
high operating speeds and throughput. This evolution has resulted in their physical layer protocol
becoming very complex. One of the most essential processes at physical layer is link
initialization and training process. In the PCI Express devices, this process establishes many
important tasks such as link width negotiation, link data rate negotiation, bit lock per lane,
symbol lock/block alignment per lane, etc. All these functions are accomplished by Link
Training & Status State Machine (LTSSM), which observes the stimulus from remote link
partner as well as the current state of the link, and responds accordingly.
The PCI Express link training state machine has many states, which are further classified into
multiple sub-states. Each LTSSM sub-state performs a set of well-defined operations and makes
a next state transitions based on meeting required conditions. Even after the link is up, the device
may need to change these working parameters of the link at runtime. Sometimes, the device may
need to re-establish the link or it can be directed to go into a low-power state.
However, it is not always straightforward to take the LTSSM through the required state
transitions by controlling or manipulating its inputs and predict its behaviour. We use a PCI
Express LTSSM whitebox reference model, which is a part of the bigger UVM-based testbench
environment. The LTSSM reference model observes the same physical layer traffic as the DUT,
behaves as per the PCI Express Base Specification and also predicts the possible state transitions.
As opposed to the Black Box tetsbench which has no idea about the state of DUTs internal
blocks, this model is aware of DUTs LTSSM state and values of useful LTSSM parameters.
This is done by monitoring DUTs internal signals from the LTSSM design block. As this model
probes inside the DUT and is well-aware of the state of the DUT LTSSM all the time, we call it
an LTSSM Whitebox Model.
The PCI Express defines the state behaviour and and relevant state transitions so that there can
be multiple state transitions conditions to transition to the same next state. For some of the substates, there are multiple state transition paths that lead to different next states. To trigger all the
required state transitions and transition conditions, we use a mixture of directed and constrained
random stimulus generation. As each and every statement in the PCIe Base Specification
description of LTSSM requires attention, we create a detailed coverage for all sub-states that
includes all state transition paths, transition reasons/conditions, transmit rules, stimulus of
interest, etc. As our DUT is a configurable IP, we use a configurable HVP and Verification
planner as the final sign-off for all checks and coverage.
SNUG 2014
SNUG 2014
The PIPE interface signal widths are scaled according to number of lanes. We take a PIPE
Monitor instance per lane, which would observe the physical layer packets or OSs(Ordered Sets)
and Command/Status information for a particular lane. These packets are sent to the LTSSM WB
Emulation model, which in turn responds to the observed stimulus. It contains the following
types of analysis ports:
uvm_analysis_port#(os_item) for both transmitted and received ordered sets
uvm_analysis_port#(pipe_status_if) for received PIPE status updates from the PHY
uvm_analysis_port#(pipe_cmd_if) for PIPE commands sent from the core to the PHY
uvm_analysis_port#(pipe_eq_event) for equalization PIPE events to/from the core
SNUG 2014
monitored Ordered Sets within pipe_rx_agent_queue. Then the callback is used to inject errors in
these stored OSs. The pipe_rx_agent_driver drives them back on the DUT PIPE interface.
SNUG 2014
stat_ltssm_vars[*]
tx_pattern_matched
TX_OS[NL][$]
rx_pattern_matched
RX_OS[NL][$]
dut_state_transitioned
timeout_queue[$]
wb_state_transitioned
valid_next_states[$]
previous_state
timer
valid_reasons[$]
pipe_rx_status_update(lane, t)
pipe_tx_cmd_update(lane, t)
txos(lane, os)
timer_update()
rxos(lane, os)
timer()
dut_state_transition()
timer_expired()
wb_state_sransition()
start()
Figure 2 base_state class and its important member used in state emulation
SNUG 2014
The base_state class consists of an instance of a static class stat_ltssm_vars, which is used to
store the values of the LTSSM variables that are used throughout all the states. The WB model
updates, resets and reads these variables from static class as and when required in an emulated
state. Apart from this, base_state also implements multiple checks on WB and DUT state
transitions. Every emulated state inherits the following checks from the base_state class:
(i) If the DUT transition occurs after WB transition, the WB method dut_state_transition()
first checks if the new DUT state is a valid transition for currently emulated state. If not,
the emulated state shouts an error.
(ii) If DUT state transition occurs before WB transition, the new DUT state is stored into a
static queue. Now, there are two possibilities:
1. WB state transition occurs after some time, in which case the DUT state previously
stored in a queue is popped out and we check if the WB transition matches the DUT
transition.
2. Another DUT transition occurs after some time, in which the new DUT state is stored
again in the queue. This way we allow maximum of 2 DUT state transitions till WB
makes the required transition. However, if DUT makes third state transition before WB
transition, we shout an error.
(iii) If WB takes longer time to transit after DUT has already moved, there is a possibility that
the new DUT state may transmit/receive ordered sets as per the requirement of new state.
We have a threshold of 3 Tx/Rx ordered sets after DUT state transition. If WB sees four or
more Tx/Rx OS after DUT state transition, then it shouts an error. We can also change the
value of threshold using plusarg.
The following flowchart explains how the current emulated state implements its check based
on new Ordered set sent/received, DUT state transition or WB state transition:
YES
Seen TX/RX OS?
START
YES
dut_state_transitioned==1?
NO
Valid new
DUT State?
WB Timeout Occurred?
YES
YES
Set dut_state_transitioned=1.
NO
YES
ERROR
NO_OF_OS_SENT >
TX_THRESHOLD?
or
NO_OF_OS_RCVD >
RX_THRESHOLD?
wb_state_transitioned==1?
NO
YES
NO
YES
Set wb_state_transitioned=1.
ERROR
ERROR
dut_state_transitioned==1?
YES
YES
Pop out the DUT state from front
of the DUT state queue.
NO
ERROR
SNUG 2014
The model uses SV polymorphism for modelling state machine. Whenever both DUT and WB
make a valid and matching state transition, the ltssm_wb_comp overrides the base_state handle
current_wb_state by the object, which is of type next state to be emulated. The following pseudo
code explains this:
task start();
forever
begin
// wait until both model and DUT has made a transition
wait(current_wb_state.wb_state_transitioned==1 &&
current_wb_state.dut_state_transitioned==1);
next_state=current_wb_state.get_exp_state();
prev_state=current_wb_state.get_state();
`uvm_info(m_id, $psprintf("Changing from %0s to %0s",
current_wb_state.get_state(), next_state), UVM_MEDIUM)
l_string_name=get_state_class_name(next_state);
if(! $cast(current_wb_state , l_factory.create_object_by_name( l_string_name
`uvm_fatal(m_id, $psprintf(" Failed to cast %0s ", l_string_name ))
// configure the state, start it and add the callback.
current_wb_state.configure_state(device_index);
current_wb_state.set_srevious_state(prev_state);
current_wb_state.start();
end // forever
endtask : start
) ))
SNUG 2014
ltssm_wb_comp
pipe_status_if_mon[NL1]
rx_status_subscriber[NL-1]
current_wb_state
rx_status_subscriber[2]
rx_status_subscriber[1]
pipe_rx_status_update(lane,t)
pipe_status_if_mon[2]
pipe_status_if_mon[1]
pipe_status_if_mon[0]
rx_status_subscriber[0]
tx_cmd_subscriber[NL-1]
pipe_cmd_if_mon[NL-1]
pipe_cmd_if_mon[2]
pipe_cmd_if_mon[1]
tx_cmd_subscriber[2]
pipe_tx_cmd_update(lane,t)
tx_cmd_subscriber[1]
tx_cmd_subscriber[0]
pipe_cmd_if_mon[0]
tx_os_subscriber[NL-1]
tx_pipe_if_mon[NL-1]
tx_pipe_if_mon[2]
tx_os_subscriber[2]
tx_os_subscriber[1]
tx_os_subscriber[0]
txos(lane,os)
tx_pipe_if_mon[1]
tx_pipe_if_mon[0]
rx_os_subscriber[NL-1]
rx_pipe_if_mon[NL-1]
rx_os_subscriber[2]
rxos(lane,os)
rx_pipe_if_mon[2]
rx_pipe_if_mon[1]
rx_os_subscriber[1]
rx_os_subscriber[0]
rx_pipe_if_mon[0]
dut_state_transition()
subscriber
write Method
virtual function void write( os_item t );
ltssm_wb_comp local_wb_comp ;
int Lane=no_lanes;
// reference to parent wb component
$cast( local_wb_comp, parent );
if(tx_or_rx) begin
if(local_wb_comp.current_wb_state!=null)
local_wb_comp.current_wb_state.txos(Lane, t);
end
else begin
if(local_wb_comp.current_wb_state!=null)
local_wb_comp.current_wb_state.rxos(Lane, t);
end
endfunction : write
ltssm_wb_comp
(Parent)
SNUG 2014
10
pipe_rx_age
nt_monitor
pipe_rx
_agent_
queue
phy_mac_if
pipe_rx_age
nt_driver
MAC
The PIPE Rx Agent consists of a monitor, storage queue and driver. During the normal mode
of operation without any error injection, the pipe_rx_agent_monitor monitors the traffic on the
phy_mac_if and creates OS packets of type os_item, which are then pushed into the fifo and
driven back to the phy_mac_if connected to the DUT. Whenever the callback is implemented
and added along with supporting APIs, the pipe_rx_agent injects errors in the ordered sets taken
out from the FIFO before driving them back on the phy_mac_if.
There are many possible error injection scenarios that we would be interested in while
verifying the LTSSM sub-state functionalities. Some of the examples are,
Corrupting link and lane numbers of the TS Oss being received in the Configuration sub-states
SNUG 2014
11
SNUG 2014
12
The coverage plan for state transitions and state transition conditions is extracted using a script
from the emulated WB state. This scripts reads the queues valid_next_states[$] and
valid_reasons[$] mentioned within the emulated WB state and auto-generates the coverage plan
. However, the corresponding coverage bins are hand-written. The following is a screenshot of
final annotated coverage report that includes all the above mentioned items for the states:
As Synopsys treats each configuration of the IP separately, we require that the Verification
plan must be configurable as the RTL itself. Such a configurable verification plan
can exclude those cover properties which are applicable only if some feature is supported in the
DUT. While generating the final annotated coverage report, it simply ignores the coverage
bins/properties that are not applicable for the given configuration of the IP. For example,
We cover all state transitions by using transition coverage bins. As all the sub-states are
modelled as objects of type device_state, the sampling is done whenever the emulated WB state
makes a state transition. For this, we use with function sample construct to write the cover bins
of our interest. Apart from state transitions, we also cover different link speeds, supported link
SNUG 2014
13
widths, different Ordered Sets, etc. where we use custom function along with with construct.
This is a custom VCS feature that is supported under LCA. Following are few of the examples:
1. State Transitions:
covergroup cg_ltssm_state_transitions (string grp_name, string comment, ref
device_params params) with function sample( device_state tr );
cp_ltssm_state_transitions : coverpoint tr.current_ltssm_state {
bins detect_quiet_to_detect_active = (S_DETECT_QUIET => S_DETECT_ACT);
bins detect_active_to_polling_active = (S_DETECT_ACT => S_POLL_ACTIVE);
bins polling_active_to_polling_compliance = (S_POLL_ACTIVE => S_POLL_COMPLIANCE);
bins polling_active_to_polling_configuration = (S_POLL_ACTIVE => S_POLL_CONFIG);
2. Link Width:
covergroup cg_ltssm (string grp_name, string comment, ref device_params params)
with function sample(device_state tr );
type_option.comment = "Link state/mode coverage";
option.name = grp_name;
option.comment = comment;
cp_link_width: coverpoint tr.current_link_width {
bins cb_sup_widths[] = cp_link_width with (
is_width_supported(e_link_width'(item), params
}
endgroup : cg_ltssm
));
SNUG 2014
14
Transaction Logging:
Usually, any testbench has many debug displays that would help debug failures and narrow
down the issues to either the testbench issue or the RTL bug. However, for the complex
testbench like the one for LTSSM, it may be difficult to debug using displays as there are many
ordered sets being sent and received over the PIPE interface. Transaction logging is a means to
create a separate transaction log which would just log the required transaction and displays
within. The most important part is that this log can be viewed within the waves along with the
design signals. This helps debugging as we can see the design and model behaviour side by side.
The following screenshots shows and example of LTSSM transaction logger seen in the waves:
SNUG 2014
15
6. Results
The following are the examples of some of the corner case RTL bugs that were discovered
using this testbench:
(1) Link number F7 (non-PAD) when mixed with F7 (PAD) link number, the DUT was not
resetting
its
counter
for
hitting
required
transition
condition.
State transition condition:
The next state is Configuration.Idle immediately after all Lanes that are transmitting TS2
Ordered Sets receive eight consecutive TS2 Ordered Sets with matching Lane and Link
numbers (non-PAD) and identical data rate identifiers (including identical Link Upconfigure
Capability (Symbol 4 bit 6)), and 16 TS2 Ordered Sets are sent after receiving one TS2
Ordered Set.
Description of bug:
The value F7 can be a valid non-PAD link number when its corresponding K-symbol signal
is 0. However, when K=1, the same F7 value is treated as PAD. We discovered a case in
Configuration.Complete LTSSM sub-state where there were 6 TS2 Ordered Sets received
with F7 non-PAD link number. This was followed by 2 additional TS2 Ordered Sets with
link number PAD (F7 with K-symbol=1). This meant the condition of 8 consecutive TS2s
with non-PAD link number was not satisfied. However, the DUTs checking was improper,
which was causing state transition. The WB model was robust enough to catch this RTL
bug.
(2) The LTSSM was not detecting EQTS2s as valid Ordered Sets in Configuration sub-states.
State transition condition:
The next state is Configuration.Lanenum.Accept if any Lane receives two consecutive TS2
Ordered Sets.
Description of bug:
The Equalization TS Ordered Sets are used in Recovery Equalization states. However, they
are considered as TS2 ordered sets anyway. Now, there is a state transition in
Configuration.Lanenum.Wait sub-state, which required receiving 2 TS2 Ordered Sets. Due
to error injection, we were converting the standard TS2 OSs to EQTS2 OSs, which would
still satisfy the state transition condition. Hence the WB model was making a proper state
transition, whereas the DUT timed out in the sub-state as it did not treat EQTS2s as valid
TS2 Ordered Sets inside Configuration sub-states.
This UVM based LTSSM WB model and the Rx PIPE Agent have proven very effective in
verifying the tricky LTSSM conditions. As compared to a traditional approach, this testbench is
able to hit many corner scenarios very quickly and hence finding the RTL bugs. Also, we have
created around 25 directed random and around 20 constrained random tests for verifying all
Configuration and Recovery states. In a fully directed environment, verifying the same
functionality would have required around 500 directed testcases. Using this approach, many RTL
bugs have been discovered that were never discovered using the existing testbench using directed
testing approach. Also, the pace at which the bugs were discovered was very fast as compared to
legacy testbench. We have discovered more than 60 RTL bugs within a verified IP using this
SNUG 2014
16
testbench. While the directed tests have their share of verification coverage and stimulus, most of
these bugs were discovered through the constrained random tests.
SNUG 2014
17
7. Summary
Initial debug efforts are required to make the LTSSM WB reference model stable. However,
once the WB model is robust, the generated random stimulus uncovers a lot of mismatches
between WB and DUT behaviour, which in turn helps discover many LTSSM RTL issues. While
this whole approach has proved very effective and fast in uncovering RTL bugs of complex
LTSSM, it is not just limited to it. We can use this approach to verify any other complex state
machines as well.
Lessons Learned:
As mentioned earlier, the initial approach of using VIP for injecting stimulus errors had a
dependency on another project. The VIPs known issues and limitation had an impact on the
project that slowed the progress of LTSSM verification initially. This inspired the Rx Pipe Agent
concept which can be used irrespective of whether the remote link partner is the VIP or the DUT.
Next Steps/Ideas for Improvement:
1. Currently, whenever the WB state transition condition gets hit first, the model always waits till
the DUT also makes a state transition. This can be enhanced so that the model shouts error
after number of Rx/Tx OSs greater than the threshold are seen after WB state transition
condition is hit.
2. Another limitation of the model is that we simply match between the next state of WB and the
DUT, but we do not check whether the state transition reasons/conditions are the same. This is
a little tricky thing to do, but would surely make the model even more robust.
SNUG 2014
18
8. References
[1] PCI Express Base Specification Revision 3.0 Version 1.0 November 10, 2010
[2] Universal Verification Methodology (UVM) 1.1 Class Reference
SNUG 2014
19