Data Storage:: The Solid State Drive

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

DATA STORAGE: THE SOLID STATE DRIVE

In this column, we review the state-of-the-art in solid state drives (SSDs) and
propose a methodology for the verification and validation of the SSD controller

By Lauro Rizzatti, An SSD stores data on semiconductor fabric. Currently, the most popular technology
Verification for SSDs is NAND Flash, a non-volatile memory (NVM) component. Based on a single
Consultant
transistor per bit of data, NAND comes in a few variations as follows:

1. The single layer cell (SLC) ed to go from 100 Exabytes (EB) in 2016 to
2. The multi-level cell (MLC) 750 EB in 2020. The SSD business includes
3. The triple-level cell (TLC) five companies manufacturing NAND Flash
NAND Flash can include more than three memories, such as Samsung, SK Hynix
levels, but with additional levels the data and Micron; 19 companies developing
access slows and reliability is compromised. SSD application specific integrated circuit
Today, NAND Flash dominates the SSD (ASIC) controllers; and more than 160
landscape, with forecasted growth expect- companies integrating the controllers in
SSD devices.

NOTE: This is the third in a series of articles on digital data Unlike the HDD business, the barrier to

storage. In Part 1, Digital Data Storage is Undergoing Mind- entry for a new SSD business consists of

Boggling Growth, we considered the evolution of digital data. assembling all the parts and selling a com-

In Part 2, Data Storage: The Hard Disk Drive, Ben Whitehead, plete product. Third-party integrators buy

Mentor Graphics’ storage product specialist, covered the hard off-the-shelf components; i.e., they pur-

disk drive (HDD), discussing its technology and the verification chase a controller from one of the 19 com-

challenges that pave the way for testing its design. Now, panies developing them and combine this

in Part 3, I review the state-of-the-art in solid state drives withFlash from one of the big five. They

(SSDs) and propose a methodology for the verification and then develop software or firmware on top,

validation of the SSD controller, which is the most critical part encapsulate the whole into a package, and

of an SSD drive design. sell it as a finished storage device.


SSD TECHNOLOGY conductor cells.1 Logic and firmware built
An SSD has no moving mechanical parts, into the drives dynamically manage the
a fundamental difference from the tra- SSD operations to minimize problems and
ditional electromechanical storage par- extend their life.
adigm. Instead, an array of NAND Flash 1
The programming process for a Flash cell
memory cells makes up the storage requires the use of high voltage to charge/
media. Lacking any spinning mechanical discharge the floating-gate transistor
parts, the SSD operates much faster than sandwiched between the control-gate
traditional HDD devices, and is virtually transistor and an oxide layer. Charging
free of the data access latencies present the floating gate is referred to as program-
in electromechanical storage devices. ming the cell and equates to a logic “0.”
Additionally, solid-state storage con- Discharging it is referred to as erasing the
sumes less power, produces less heat, and cell and equates to a logic “1.” Over time,
runs quietly with no vibrations during the charging/discharging or erasing/pro-
operation. SSDs are more resistant to gramming cycles via high voltage stresses
physical shock, and data is not erased by the oxide layer, and shortens the cell life.
proximity to magnetic sources. All com- This leads to a finite number of erase/write
bined, these characteristics contribute to cycles of the NAND Flash.
their significantly higher reliability than Further, SSD devices have higher per-gi-
HDDs. The typical mean-time-between- gabyte prices than electromechanical
failure (MTBF) of an HDD hovers at around storage devices and generally support
one-million hours, compared to well over smaller capacities.
two-million hours for an SSD. Data stored on a NAND Flash fabric is
There are, however, a few downsides to organized in a hierarchical structure. From
SSD technology. Although there are no the bottom up, the memory cells are
moving parts inside an SSD, each memory organized in strings, pages, and blocks.
cell has a finite life expectancy — namely, Strings are typically comprised of 32- or
a limit on the number of times it can be 64-NAND cells and provide the minimum
written to and read from before it stops readable units. Multiple strings make
working. This is due to the physics of the up a page, which typically includes 64K
2 writing/erasing mechanism of the semi- or 128K cells, referred to as 2 kilobytes
(Kbytes), 4KBytes, 8KBytes, etc. Pages are looking the programming mechanism of
the minimum programmable units, and a NAND cell impose erasing the cell prior
multiple pages form a block, which is the to writing new data. Since the smallest
minimum erasable unit. Currently, the erasing unit is a block, typically made up
maximum pages per block are approach- of 32 to 64 pages, erasing a block is time
ing 512 and block sizes are reaching 8 consuming. Due to the limited number of
megabytes. erase/write cycles inherent to the technol-
SSD devices possess a few characteris- ogy, called endurance, erasing/writing a
tics that set them apart from HDD devic- block repeatedly without involving other
es. Among the most important are write blocks would wear out that block before all
amplification, wear leveling, garbage other blocks, prematurely ending the life
collection, and performance degradation of the SSD. To avoid the catastrophic event,
over time. Other issues such as overpro- SSD controllers use a technique called
visioning and Self-Monitoring, Analysis, wear leveling that distributes writes as
and Reporting Technology (SMART) add evenly as possible across all blocks in the
complexity to the drive. SSD. The technique is based on complex
Write Amplification: In an SSD, NAND algorithms implemented in firmware that
Flash cells must be erased before they can require exhaustive testing in the develop-
be rewritten again. The net result is that ment stage.
data is rewritten multiple times, thereby Garbage Collection: Another difference
increasing the number of program/erase between an HDD and an SSD concerns the
cycles over the life of the device. This leads deleting of a file on the host computer.
to a write amplification issue that decreas- In an HDD, deleting a file in the OS
es the lifespan of the device and impacts running on the host leaves bits on the
performance. hard drive. Since writing a file on empty
Wear Leveling: Unlike in an HDD where blocks in an SSD is faster than on a written
data can be written over and over in the block, a TRIM feature automatically deletes
same location on the disk surface without the entire file’s data as soon as the file is
causing any problem, new data is written deleted in the OS running on the host. As
in an SSD to different NAND cells for the the SSD fills up, fewer and fewer empty
3 purpose of wear leveling. The physics over- blocks become available. In their place are
partially filled blocks. The SSD cannot write Garbage Collection is the process of
Figure 1. The degrada- the new data onto these partially filled reclaiming previously written blocks of
tion curves for eight blocks since it would erase the existing data so they can be rewritten with new
types of SSD devices
are obvious from the data. Instead, the SSD reads the data of the data. Garbage Collection is implemented
chart (Click here to
block into its cache, modifies the old data in algorithms mapped in firmware in the
see a larger image.
Source: Tom’s IT Pro) with the new, and then writes it back. SSD controller and critical to the SSD’s
operations.
Performance Degradation: In a brand
new SSD device, all NAND Flash cells have
never been written, and their floating-gate
transistors have never been charged. In
other words, they include logic “1s.” When
deployed, the erase/write process begins.
At this stage, the performance measured
in input/out operations per second (IOPS)
exhibits its maximum value.
As the workload increases, the SSD con-
troller is forced into an erase/write cycle for
every pending write operation. Due to this
process, the SSD performance gradually
drops until it finally settles into a steady
state. The chart shown in Figure 1 shows the
degradation curves for eight types of SSD
devices. IOPS (input/output operations per
second) in a steady state is typically less
than 50% and as low as 5% of its maximum
value when the device is new.

Figure 2. The SSD controller includes host controller inter-


face, controller SoC, NAND Array, DDR RAM, system firm- SSD CONTROLLER
ware, and NOR Flash (Click here to see a larger image.
Source: Mentor Graphics) The conceptual simplicity of the NAND
4 Flash fabric belies the complexity of the
SSD operations. The burden to manage the Firmware: The firmware is the most
SSD’s uniqueness and peculiarities falls on complex part of the controller. The
the SSD controller. embedded microcode manages all the
Figure 2 represents the block diagram operations that set the SSD apart from the
of an SSD controller. It encompasses six traditional HDD. It implements the algo-
sections: host controller interface; con- rithms that perform the Garbage Collec-
troller System-on-Chip (SoC); NAND Array; tion, the wear leveling and several other
DDR RAM for caching both user data and tasks. If not properly designed, the algo-
internal SSD metadata; and, the most criti- rithms may affect efficiency, reliability and
cal component, system firmware stored in performance of the SSD. A major cause of
large SRAMs, plus a NOR Flash. data loss in SSDs is due to firmware bugs.2
Host interface: The host interface of 2
The firmware is a challenge faced by
an SSD controller typically is based on third-party integrators. Attracted by the
one of the industry standard interface growing business opportunities, they
specifications, the most popular being purchase a few off-the-shelf components,
SATA/SAS and PCIe. Since the host inter- assemble them on a PCB, add firmware
face is the performance bottleneck, the and assume the drive is ready to be sold,
preferred standard is PCIe. An emerging only to discover that the controller does
standard called Non-Volatile Memory not work as expected or not at all.
express (NVMe), using PCIe as the fabric, Vis-à-vis the complexity of the firmware,
provides the fastest performance. the SSD hardware is not particularly large
Controller SOC: The Controller SoC is when compared with designs in other mar-
built around a CPU/RISC processor com- ket segments like networking or proces-
plex that may include multiple proces- sor/graphics. An enterprise-level controller
sors. All the processors communicate with design with loads of functionality imple-
each other but perform different tasks, mented in hardware may reach a capacity
such as managing the PCI traffic, read and of 60-100 million gates. Meanwhile, a
write caching, encryption, error correc- client-level controller in which most of the
tion, wear leveling, and garbage collec- functionality is realized in firmware may
tion, to name a few. have a gate count of 20 million.
5
SSD CONTROLLER DESIGN TRENDS in mobile designs that may be built on
SSD technology is on the move as the commercial intellectual property (IP) or run
firmware continues to grow and imple- on a standard Linux kernel, the SSD con-
ment more and more functionality. So too troller must be designed from scratch. The
is the NAND array in size and in number of hardware and embedded firmware must
instances. be highly customized and closely coupled.
The speed of the interface between the The hardware may use a few standard IP
NAND array and the controller SoC is faster blocks such as CPUs, system buses, and
than the host interface and continues to peripherals like UARTs, but the firmware
increase, enlarging the performance gap determines the major features of the SSD
between the two. controller. To achieve the best perfor-
mance and the lowest power consump-
tion, the firmware must be fine-tuned on
optimized hardware.
To accomplish this, the industry is adopt-
Figure 3. Hardware and firmware design is serialized or pipelined in a ing a software-driven chip design flow, as
traditional hardware/software design flow (Click here to see a larger
image. Source: Lauro Rizzatti) opposed to the traditional hardware/soft-
ware design flow.
The overall complexity and size of the In a traditional hardware/software
controller SoC also expands to cope with design flow (Figure 3), hardware and firm-
the increasing complexity of the NAND ware designs are serialized or pipelined.
management. Firmware engineers may participate in
As with all modern SoC designs, reduc- the system definition specs, but firmware
ing power consumption is a critical implementation details are deferred to
requirement, and this trend will continue. post tape-out when the hardware is com-
pleted. This flow leads to poor firmware
DESIGN AND VERIFICATION functionality and limited performance.
OF AN SSD CONTROLLER Also, since firmware development starts
The uniqueness of the SSD controller late, the design cycle increases consider-
design is amplified by its verification/ ably, oftentimes leading to missed sched-
6 validation process. Unlike most chips ule deadlines.
By comparison, in a software-driven mance optimization can occur early on.
design flow (Figure 4), firmware devel- By the time the design is ready for tape-
opment starts at the same time as the out, hardware and firmware have been
hardware design. Firmware engineers optimized and are virtually ready for mass
participate in the spec definitions of the production. When engineering samples
hardware and gain intimate knowledge of are returned from the foundry, the lab
bring up of hardware and firmware may
require only one or two weeks, thereby
dramatically accelerating time to market
compared to the traditional design flow.
The design verification/validation in a
software-driven design flow necessitates
Figure 4. In a software-driven design flow, firmware development and a high-performance system as close to the
hardware design proceed in parallel (Click here to see a larger image.
Source: Lauro Rizzatti) real chip environment as possible with
powerful debug capabilities and easy
bring up.
A traditional field programmable gate
array (FPGA)-based prototyping board, as
depicted in Figure 5, meets some of the
requirements, such as speed of execution,
but it also brings along a few drawbacks.
Figure 5. An FPGA-based prototyping board offers This setup involves the deployment of
speed of execution (Click here to see a larger image.
physical peripherals: an array of NAND Flash
Source: Lauro Rizzatti)
devices connects to the NAND controller
the hardware details during the hardware mapped inside the FPGA and a set of DDR
design phase. Since the development of memory connect to the DDR controller in
hardware and software proceeds in paral- the FPGA. Peripheral interfaces, like UART
lel, they can influence each other. Testing and SPI, and a physical PCIe or NVMe inter-
early firmware on the hardware in develop- face to the host complete the setup. While
ment unearths bugs that otherwise would an FPGA board is close to the real hardware,
7 be found after tape-out. Likewise, perfor- and supports the high-speed execution
necessary for firmware validation, several over an FPGA-based prototype. Mapping
disadvantages affect its deployment. the design RTL code onto the emulator
For example, the complier supports requires limited code modification. The
only a synthesizable subset of the reg- emulator offers full design visibility sim-
ister transfer level (RTL) language and ilar to a hardware description language
models often need to be rewritten to fit (HDL) simulator, but at four to five orders
the structure of the FPGAs. The bring up of magnitude faster speed, closer to that of
of the real peripheral devices imposes the FPGA prototype board, which is man-
substantial effort to cope with electrical, datory for firmware development. Such
mechanical, power dissipation, frequen- an emulator becomes a shared resource,
accessible remotely if deployed in virtual
mode, which is not possible with FPGA-
based prototypes.
Emulators can be deployed in in-cir-
cuit-emulation (ICE) or virtual mode. The
ICE mode follows the same path as the
FPGA-based prototyping approach, and
Figure 6. In a virtual setup, engineers suffers from the same problems intro-
control the emulation environment
(Click here to see a larger image. duced by the physical peripherals. The
Source: Lauro Rizzatti) virtual mode resolves all of the issues,
cy modulation (FM) interference, and preserving the benefits of an emulator.
noise issues. Design debug in an FPGA In a virtual setup, all peripherals are
is hard and time-consuming due to the modeled in software giving engineers full
poor internal design visibility created control and visibility of the emulation envi-
via compilation. Adding visibility forc- ronments, not just the design under test
es recompilation of the whole design, (DUT), but also of peripherals and interfac-
thereby slowing down the entire verifica- es (Figure 6).
tion process. Soft models implement the NAND Flash
Fortuitously, hardware emulation and DDR memory running on the emulator
solves all of these difficulties. A modern or host server. A serial peripheral interface
8 emulator possesses several advantages (SPI) NOR Flash model can be synthesized
and executed on the emulator. The PC ging environment because everything can
server, connected to the DUT inside the be seen and modified.
emulator via a virtual PCIe based on a DPI In the case of the PCIe interface, for
interface, hosts a Quick EMUlator (QUMU) example, VirtuaLAB PCIe from Mentor sup-
virtualizer. A UART model connects to an ports thorough debugging of the interface
xterm window running on the host serv- and monitoring of the traffic via a built-in
er. All virtual models are controllable and virtual PCIe analyzer.
visible on a single host server connected to The virtual setup adds three capabilities
the emulator. not viable with a physical setup as follows:
The virtual solution affords several advan- 1. It can be accessible remotely 24/7
from anywhere in the world.
2. It can be shared by a multitude of
concurrent users.
3. It supports the same clock fre-
quencies as the actual design since
the peripheral clock frequencies
do not have to be slowed down
Figure 7. The Mentor VirtuaLAB PCIe supports thorough debugging of
via speed adapters to match the
the PCie interface and monitors traffic through a built-in virtual analyz-
er (Click here to see a larger image. Source: Mentor Graphics) slow-running clock of the emula-
tor. This avoids discrepancies from
tages. First, it permits full-signal visibility different speed ratios and enables
of the DUT and the external virtual devices realistic performance evaluations.
and their interfaces. PCI traffic can be mon- The virtual emulation environment is
itored. The host OS runs a QEMU virtualizer identical to the real-chip environment
providing complete control of the behavior after the chip is released by the foundry.
of the BIOS. The contents of the DDR mem- It can test the firmware, check design
ory, NAND Flash, and SPI NOR Flash can be performance, and find bugs that cannot
read and written. Their types and sizes can be found via an HDL simulator, and it
be many different configurations, which is facilitates architectural experimentation
an impossible feat in a physical setup. The to accommodate different media storage
9 virtual domain offers a convenient debug- types and interfaces.
CONCLUSIONS The SSD industry has proven it is pos-
To take advantage of the escalating sible to shed two to four months off the
market growth of SSD devices, engi- development cycle with emulation. When
neers must accelerate the development combined with deployment as a shared
cycle and avoid the risk of delivering and remotely accessible resource, the
under-tested and low-performing chips. return on investment (ROI) of an emula-
Only a state-of-the-art emulation tion system can certainly be justified.
platform can accomplish this demanding
mission. Via its fast execution speed com-
parable to an FPGA prototyping board, Dr. Lauro Rizzatti is a verification consultant
and a fast and straightforward setup, and industry expert on hardware emulation
an emulator deployed in virtual mode (www.rizzatti.com). Previously, Dr. Rizzatti
provides a suite of debugging capabilities held positions in management, product mar-
that become the centerpiece of a soft- keting, technical marketing, and engineer-
ware-driven design flow. ing. He can be reached at [email protected].

RELATED POSTS:
•W
 hat’s Up With All Those New Use Models On •V
 erification Flow: Panel Gauges Future Flows
An Emulator?
•P
 redicting Semiconductor Industry Growth:
•B
 ridging the Gap between Pre-Silicon Verifica- Drop the Crystal Ball and Use the Gompertz
tion and Post-Silicon Validation in Networking Curve
SoC designs
• F ive Questions Regarding Hardware Emulation’s
•D
 igital Data Storage is Undergoing Mind-Bog- Rising Status
gling Growth
•D
 ata Storage: The Hard Disk Drive

10

All contents are Copyright © 2017 by AspenCore, Inc. All Rights Reserved.

You might also like