Memory Technology Evolution
Memory Technology Evolution
memory technologies
technology brief, 5th edition
Abstract.............................................................................................................................................. 2
Introduction......................................................................................................................................... 2
Basic DRAM operation ......................................................................................................................... 2
DRAM storage density and power consumption ................................................................................... 4
Memory access time......................................................................................................................... 4
Chipsets and system bus timing.......................................................................................................... 4
Memory bus speed........................................................................................................................... 5
Burst mode access............................................................................................................................ 5
SDRAM technology .............................................................................................................................. 6
Bank interleaving ............................................................................................................................. 7
Increased bandwidth ........................................................................................................................ 7
Registered SDRAM modules .............................................................................................................. 7
DIMM Configurations ....................................................................................................................... 8
Single-sided and double-sided DIMMs ............................................................................................ 8
Single-rank and dual-rank DIMMs................................................................................................... 8
Advanced SDRAM technologies .......................................................................................................... 10
Double Data Rate SDRAM............................................................................................................... 10
Prefetching ................................................................................................................................ 10
Double transition clocking ........................................................................................................... 10
SSTL_2 low-voltage signaling technology....................................................................................... 11
Stobe-based data bus ................................................................................................................. 11
DDR SDRAM DIMMs................................................................................................................... 11
Backward compatibility............................................................................................................... 12
DDR-2 SDRAM............................................................................................................................... 12
DDR-3 SDRAM............................................................................................................................... 13
Fully-Buffered DIMMs...................................................................................................................... 13
Rambus DRAM .............................................................................................................................. 14
Importance of using HP-certified memory modules in ProLiant servers ....................................................... 16
Conclusion........................................................................................................................................ 16
For more information.......................................................................................................................... 17
Abstract
The widening performance gap between processors and memory along with the growth of memory-
intensive business applications are driving the need for better memory technologies for servers and
workstations. Consequently, there are several memory technologies on the market at any given time.
HP evaluates developing memory technologies in terms of price, performance, and backward
compatibility and implements the most promising technologies in ProLiant servers. HP is committed to
providing customers with the most reliable memory at the lowest possible cost.
This paper summarizes the evolution of memory technology and provides an overview of some the
newest memory technologies that HP is evaluating for servers and workstations. The purpose is to
allay some of the confusion about the performance and benefits of the dynamic random access
memory (DRAM) technologies on the market.
Introduction
Processors use system memory to temporarily store the operating system, mission-critical applications,
and the data they use and manipulate. Therefore, the performance of the applications and reliability
of the data are intrinsically tied to the speed and bandwidth of the system memory. Over the years,
these factors have driven the evolution of system memory from asynchronous DRAM technologies,
such as Fast Page Mode (FPM) memory and Extended Data Out (EDO) memory, to high-bandwidth
synchronous DRAM (SDRAM) technologies. Yet, system memory bandwidth has not kept pace with
improvements in processor performance, thus creating a “performance gap.” Processor performance,
which is often equated to the number of transistors in a chip, doubles every couple of years. On the
other hand, memory bandwidth doubles roughly every three years. Therefore, if processor and
memory performance continue to increase at these rates, the performance gap between them will
widen.
Why is the processor-memory performance gap important? The processor is forced to idle while it
waits for data from system memory. Thus, the performance gap prevents many applications from
effectively using the full computing power of modern processors. In an attempt to narrow the
performance gap, the industry vigorously pursues the development of new memory technologies. HP
works with Joint Electronic Device Engineering Council (JEDEC) memory vendors and chipset
developers during memory technology development to ensure that new memory products fulfill
customer needs in regards to reliability, cost, and backward compatibility.
This paper describes the benefits and drawbacks regarding price, performance, and compatibility of
DRAM technologies. Some descriptions are very technical. For readers who are not familiar with
memory technology, the paper begins with a description of basic DRAM operation and terminology.
2
recharged, or refreshed, thousands of times per second to maintain the validity of the data. These
refresh mechanisms are described later in this section.
The memory subsystem operates at the memory bus speed. Typically, a DRAM cell is accessed when
the memory controller sends electronic address signals that specify the row address and column
address of the target cell. The memory controller sends these signals to the DRAM chip by way of the
memory bus. The memory bus consists of two sub-buses: the address/command bus and the data bus.
The data bus is a set of lines (traces) that carry the data to and from DRAM. Each trace carries one
data bit at a time. The throughput (bandwidth) of the data bus depends on its width (in bits) and its
frequency. The data width of a memory bus is usually 64-bits, which means that the bus has 64
traces, each of which transports one bit at a time. Each 64-bit unit of data is called a data word.
The address portion of the address/command bus is a set of traces that carry signals identifying the
location of data in memory. The command portion of the address/command bus conveys instructions
such as read, write, or refresh.
When FPM or EDO memory writes data to a particular cell, the location where the data will be
written is selected by the memory controller. The memory controller first selects the page by strobing
the Row Address onto the address/command bus. It then selects the exact location by strobing the
Column Address onto the address/command bus (see Figure 2). These actions are called Row
Address Strobe (RAS) and Column Address Strobe (CAS). The Write Enable (WE) signal is activated
at the same time as the CAS to specify that a write operation is to be performed. The memory
controller then drives the data onto the memory bus. The DRAM devices latch the data and store it
into the respective cells.
During a DRAM read operation, RAS followed by CAS are driven onto the memory bus. The WE
signal is held inactive, indicating a read operation. After a delay called CAS Latency, the DRAM
devices drive the data onto the memory bus.
While DRAM is being refreshed, it cannot be accessed. If the processor makes a data request while
the DRAM is being refreshed, the data will not be available until after the refresh is complete. There
are many mechanisms to refresh DRAM, including RAS only refresh, CAS before RAS (CBR) refresh,
and Hidden refresh. CBR, which involves driving CAS active before driving RAS active, is used most
often.
3
Figure 2. Representation of a write operation for FPM or EDO RAM
4
Figure 3. Representation of a bus clock signal
Over the years, some computer components have gained in speed more than others have. For this
reason, the components in a typical server are controlled by different clocks that run at different, but
related, speeds. These clocks are created by using various clock multiplier and divider circuits to
generate multiple signals based on the main system bus clock. For example, if the main system bus
operates at 100 MHz, a divider circuit can generate a PCI bus frequency of 33 MHz (system clock ÷
3) and a multiplier circuit can generate a processor frequency of 400 MHz (system clock x 4).
Computer components that operate in whole multiples of the system clock are termed synchronous
because they are “in sync” with the system clock.
Synchronous components operate more efficiently than components that are not synchronized
(asynchronous) with the system bus clock. With asynchronous components, either the rest of the
system or the component itself must wait one or more additional clock cycles for data or instructions
due to clock resynchronization. In contrast, synchronized components know on which clock cycle data
will be available, thus eliminating these timing delays.
5
additional data sections are accessed with every clock cycle after the first access (6-1-1-1) before the
memory controller has to send another CAS.
Clock
Command Active NOP NOP Read NOP NOP NOP NOP NOP NOP
SDRAM technology
FPM and EDO DRAMs are controlled asynchronously, that is, without a memory bus clock. The
memory controller determined when to assert signals and when to expect data based on absolute
timing. The inefficiencies of transferring data between a synchronous system bus and an
asynchronous memory bus resulted in longer latency.
Consequently, JEDEC—the electronics industry standards agency for memory devices and modules—
developed the synchronous DRAM standard to reduce the number of system clock cycles required to
read or write data. SDRAM uses a memory bus clock to synchronize the input and output signals on
the memory chip. This simplified the memory controller and reduced the latency from CPU to memory.
In addition to synchronous operation and burst mode access, SDRAM has other features that
accelerate data retrieval and increase memory capacity—multiple memory banks, greater bandwidth,
and register logic chips.
6
Bank interleaving
SDRAM divides memory into two to four banks for simultaneous access to more data. This division
and simultaneous access is known as interleaving. Using a notebook analogy, two-way interleaving is
like dividing each page in a notebook into two parts and having two assistants to each retrieve a
different part of the page. Even though each assistant must take a break (be refreshed), breaks are
staggered so that at least one assistant is working at all times. Therefore, they retrieve the data much
faster than a single assistant could get the same data from one whole page, especially since no data
can be accessed when a single assistant takes a break. In other words, while one memory bank is
being accessed, the other bank remains ready to be accessed. This allows the processor to initiate a
new memory access before the previous access has been completed, resulting in continuous data
flow.
Increased bandwidth
The bandwidth capacity of the memory bus increases with its width (in bits) and its frequency (in
MHz). By transferring 8 bytes (64 bits) at a time and running at 100 MHz, SDRAM increases memory
bandwidth to 800 MB/s, 50 percent more than EDO DRAMs (533 MB/s at 66 MHz).
7
DIMM Configurations
Single-sided and double-sided DIMMs
Each DRAM chip on a DIMM provides either 4 bits or 8 bits of a 64-bit data word. Chips that
provide 4 bits are called x4 (by 4), and chips that provide 8 bits are called x8 (by 8). It takes eight
x8 chips or sixteen x4 chips to make a 64-bit word, so at least eight chips are located on one or both
sides of a DIMM. However, a standard DIMM has enough room to hold a ninth chip on each side.
The ninth chip is used to store 4 bits or 8 bits of Error Correction Code, or ECC (see “Parity and ECC
DIMMs” sidebar on next page).
An ECC DIMM with all nine DRAM chips on one side is called single-sided, and an ECC DIMM with
nine DRAM chips on each side is called double-sided (Figure 7). A single-sided x8 ECC DIMM and a
double-sided x4 ECC DIMM each create a single block of 72 bits (64 bits plus 8 ECC bits). In both
cases, a single chip-select signal from the chipset is used to activate all the chips on the DIMM. In
contrast, a double-sided x8 DIMM (bottom illustration) requires two chip-select signals to access two
72-bit blocks on two sets of DRAM chips.
Figure 7. Single-sided and double-sided DDR SDRAM DIMMs and corresponding DIMM rank
8
Parity and ECC DIMMs
The ninth DRAM chip on one side of a DIMM is used to store parity or ECC
bits. With parity, the memory controller is capable of detecting single-bit
errors, but it is unable to correct any errors. Also, it cannot detect multiple-
bit errors. With ECC, the memory controller is capable of detecting and
correcting single bit errors and multiple-bit errors that are contiguous.
Multiple-bit contiguous errors occur when an entire x4 or x8 chip fails. The
chipset (memory controller) is also capable of detecting double-bit errors
that are not contiguous. The chipset halts the system and logs an error
when uncorrectable errors are detected. Servers use ECC DIMMs to
improve availability and reliability.
Memory ranks are not new, but their role has become more critical with the advent of new chipset
and memory technologies and growing server memory capacities. Dual-rank DIMMs improve memory
density by placing the components of two single-rank DIMMs in the space of one module. The chipset
considers each rank as an electrical load on the memory bus. At slower bus speeds, the number of
loads does not adversely affect bus signal integrity. However, for faster memory technologies such as
DDR333 and DDR2-400, there is a maximum number of these loads (typically 8) that the chipset can
drive. For example, if a server has six DIMM sockets, a typical chipset will only support four dual-rank
DIMMs (8 loads) in the first four sockets (2 loads per socket). Therefore, the last two sockets must not
be populated. If the total number of ranks in the populated DIMM sockets exceeds the maximum
number of loads the chipset can support, the server may not boot properly or it may not operate
reliably. Some systems check the memory configuration while booting to detect invalid memory bus
loading. When an invalid memory configuration is detected, the system stops the boot process, thus
avoiding unreliable operation.
To prevent this and other memory-related problems, customers should only use HP-certified DIMMs
available in the memory option kits for each ProLiant server (see the “Importance of using HP-certified
memory modules in ProLiant servers” section).
Another important difference between single-rank and dual-rank DIMMs is cost. Typically, memory
costs increase with DRAM density. This means that a x4 DRAM chip, which has twice the memory
addresses as a x8 DRAM chip, is typically more expensive. As a result, a single-rank x4 DIMM
usually costs more than a dual-rank x8 DIMM.
9
Advanced SDRAM technologies
Despite the performance improvement in the overall system due to use of SDRAM, the growing
performance gap between the memory and processor must be filled by more advanced memory
technologies. These technologies, which are described on the following pages, boost the overall
performance of systems using the latest high-speed processors (Figure 8).
7000
6400
6000
5328
5000
Bandwidth (MB/s)
4256
400-MHz Bus
4000
333-MHz Bus
3200
3000 2656
266-MHz Bus
2128
166-MHz Bus
2000
200-MHz Bus
1600 1600
133-MHz Bus
1200
400-MHz Bus
100-MHz Bus
1064
1000 800
300-MHz
133-MHz
528
Bus
Bus
MHz
100
MHz
66
0
SDRAM RDRAM DDR SDRAM DDR II SDRAM
10
Figure 9. Data transfer rate comparison between SDRAM (with burst mode access) and DDR SDRAM
11
Figure 10. The 184-pin DRR SDRAM Registered DIMM. The DDR SDRAM DIMM has one notch instead of the two notches found
on SDRAM DIMMs.
Backward compatibility
Because of their different data strobes, voltage levels, and signaling technologies, it is not possible to
mix SDRAM and DDR SDRAM DIMMS within the same memory subsystem.
DDR-2 SDRAM
DDR-2 SDRAM is the second generation of DDR SDRAM. It offers data rates of up to 6.4 GB/s, lower
power consumption, and improvements in packaging. At 400 MHz, DDR-2 increases memory
bandwidth to 3.2 GB/s, which is 400 percent more than original SDRAM. DDR-2 SDRAM achieves
this higher level of performance and lower power consumption through faster clocks, 1.8-V operation
and signaling, and simplification of the command set. The 240-pin connector on DDR-2 is needed to
accommodate differential strobes signals.
Table 1 summarizes the various types of DDR and DDR II SDRAM and their associated naming
conventions.
Originally, the module naming convention for DDR-SDRAM was based on the effective clock rate of
the data transfer: PC200 for DDR SDRAM that operates at 100 MHz; PC266 for 133 MHz; and so
forth. But after confusion arose over the Rambus naming convention, the industry based the DDR-
SDRAM naming convention on the actual peak data transfer rate in MB/s. For example, PC266 is
equivalent to PC2100 (64 bit * 2 * 133 MHz = 2.1 GB/s or 2100 MB/s).
12
DDR-3 SDRAM
JEDEC is currently developing the third-generation DDR SDRAM technology, DDR-3, which will
continue to make improvements in bandwidth and power consumption. DDR-3 is an evolution of DDR-
2 technology. For example, DDR-3 will use 1.5-V signaling compared to 1.8 V for DDR-2 and 2.5 V
for DDR SDRAM.
Fully-Buffered DIMMs
As memory speed continues to increase with DDR-2 and DDR-3 SDRAM, the number of DIMMs
supported per channel decreases. This decrease is related to the parallel stub-bus topology. In the
stub-bus topology, electrical signals travel along 72 data lines (64 for data bits and 8 for error
checking bits) from the memory controller to every DIMM on the bus. Signal degradation at bus-pin
connections and latency resulting from complex routing of data lines cause the error rate to increase
as the bus speed increases. To achieve higher bus speeds, designers had to sacrifice capacity. For
example, Figure 11 shows the number of loads supported per channel on the parallel stub-bus at data
rates ranging from PC 100 to DDR-3 1600. Note that the number of supported loads dropped from
eight to two as data rates increased. Likewise, the capacity per channel decreased, even as the
density per DIMM increased. As a result, the benefits of higher data rates are mitigated by reduced
capacity per channel.
Figure 11. Maximum number of loads per channel based on DRAM data rate
The parallel interface for each DRAM-II channel requires 240 pins. Increasing the number of channels
to compensate for the drop in capacity per channel was not a viable option due to increased cost and
board complexity. Therefore, a new serial memory interface with a lower pin count and point-to-point
connections was needed to reach the higher memory bandwidth required by future generations of
servers.
JEDEC is currently developing the Fully-Buffered DIMM (FB-DIMM) specification, a new bi-directional
serial interface that eliminates the parallel stub-bus topology and allows higher memory bandwidth
while maintaining or increasing memory capacity. The FB-DIMM can achieve a maximum speed of
4.8 Gb/s. It features 69 pins (less than one-third the pin count of DDR-2 SDRAM). The reduced pin
count greatly simplifies board design due to fewer signal traces and the ability to use traces of
unequal length. FB-DIMMs also feature an advanced memory buffer (AMB) chip that transmits signals
between the memory controller and memory modules using a point-to-point architecture (Figure 12).
The bi-directional interface allows simultaneous reads and writes, thus eliminating delays between
13
data transfers. HP supports the FB-DIMM standard because it increases reliability, speed, and density
while using cost optimized, industry-standard DDR-2 SDRAM components.
Rambus DRAM
Rambus DRAM (RDRAM) allows data transfer through a bus operating in a higher frequency range
than DDR SDRAM. In essence, Rambus moves small amounts of data very fast, whereas DDR SDRAM
moves large amounts of data more slowly. The Rambus design consists of three key elements:
RDRAMs, Rambus application-specific integrated circuits, and an interconnect called the Rambus
Channel. The Rambus design provides higher performance than traditional SDRAM because RDRAM
transfers data on both edges of a synchronous, high-speed clock pulse. RDRAM uses a separate row
and column command bus that allows multiple commands to be issued at the same time, thereby
increasing the bandwidth efficiency of the memory bus. This dual command bus is a unique feature of
RDRAM.
With only an 8-bit-wide command bus and an 18-bit data bus, RDRAM (Figure 13) has the lowest
signal count of all of the memory technologies. RDRAM incorporates a packet protocol and is capable
of operating at 800 MHz and providing a peak bandwidth of 2.4 GB/s. One packet of information
is transferred in 8 ticks of the clock, which allows sending128 bits of data in a 150-MHz clock
period. Since it requires 8 ticks of the clock to transfer a packet, the internal memory controller only
needs to run at a speed of 150 MHz to keep up with the packet transfer rate at 1.2 GHz. This allows
for plenty of timing margin in the design of the memory controller.
14
Figure 13. Rambus DRAM
RDRAM is capable of supporting up to 32 RDRAM devices on one memory channel while maintaining
a 1.2-GHz data rate. Through the use of a repeater chip, even more devices can be placed on one
RDRAM channel. The repeater will interface to two different RDRAM channels and pass the data and
command signals between them. One channel will communicate with the memory controller, and the
other channel will communicate with the RDRAM devices. Thus, the memory controller essentially will
be communicating only with the repeater chips. Up to eight repeater chips can be placed on the
memory controller, and 32 RDRAM devices can be placed on each channel. This allows one channel
to support a maximum of 256 devices. However, using the repeater chips will add 1 to 1.5 clocks of
additional delay.
To account for differences in distance of the devices on the channel, more latency in increments of the
clock can be added. This allows the memory controller to receive data from all devices in the same
amount of time, thus preventing data collision on the bus when consecutive reads are performed to
different devices.
Another feature of RDRAM that helps to increase the efficiency is an internal 128-bit write buffer. All
write data is placed into this buffer before being sent to the DRAM core. The write buffer reduces the
delay needed to turn around the internal data bus by allowing the sense amps to remain in the read
direction until data needs to be retired from the buffer to the core. Essentially, a read can immediately
follow a write with little bandwidth lost on the data bus.
While the RDRAM bus efficiency is high, the packet protocol increases the latency. The packet
translation between the internal memory controller bus and the fast external bus requires one to two
clocks of additional delay. This delay cannot be avoided when using a very fast packet protocol.
With the high data rate of Rambus, signal integrity is troublesome. System boards must be designed
to accommodate the extremely stringent timing of Rambus, and this increases product time to market.
Additionally, each Rambus channel is limited to 32 devices, imposing an upper limit on memory
capacity supported by a single bus. Use of repeater chips enables use of additional devices and
increases potential memory capacity, but repeater chips have been very challenging to design.
Finally, the larger dies and more limited production of RDRAM compared to those of other memory
technologies have increased the cost of RDRAM. RDRAM still costs up to twice as much as SDRAM.
RDRAM technology offers performance advantages and lower pin count than SDRAM and DDR
SDRAM. However, SDRAM and DDR SDRAM offer more memory capacity and lower cost than
RDRAM.
15
Importance of using HP-certified memory modules in
ProLiant servers
There are several reasons why customers should use only HP memory option kits when replacing or
adding memory in ProLiant servers. This section describes three of the most important reasons.
First, not all DIMMs are created equal; they can vary greatly in quality and reliability. In the highly
competitive memory market, some third-party memory resellers forego the level of qualification and
testing needed for servers because it adds to the price of DIMMs. HP uses proprietary diagnostic tools
and specialized server memory diagnostic tests that exceed industry-standards to ensure the highest
level of performance and availability for ProLiant servers. The costs of system downtime, data loss,
and reduced productivity caused by lower quality memory are far greater than the price difference
between HP-certified memory and third-party DIMMs.
Second, HP offers three levels of Advanced Memory Protection (AMP) that go beyond error correction
to increase the fault tolerance of HP ProLiant servers. These AMP technologies—Online Spare
Memory, Hot Plug Mirrored Memory, and Hot Plug RAID Memory—are optimized for each server
series. For ProLiant servers with AMP, the DIMM configuration requirements are determined by the
AMP mode selected. The HP memory option kits available for these servers prevent the violation of the
configuration requirements for each AMP mode.
Third, use of HP memory option kits prevents improper mixing of single-rank and dual-rank DIMMs.
Although single-rank and dual-rank DIMMs may have the same capacity, they differ in the way in
which they are accessed by the chipset (see the “DIMM configurations” section). Therefore, to ensure
that the server boots properly and operates reliably, single-rank and dual-rank DIMMs should not be
used in the same bank. On the other hand, some ProLiant server platforms have configuration
guidelines that allow the mixing of single-rank and dual-rank DIMMs. HP memory option kits precisely
match the capabilities and requirements of the ProLiant server for which they are designated.
Therefore, they prevent improper mixing of single-rank and dual-rank DIMMs.
HP memory option kits are listed in each server’s user guide and in the product QuickSpecs available
at www.hp.com.
Conclusion
The increasing performance gap between processors and memory has generated development of
several memory technologies. While some memory manufacturers prefer a revolutionary approach to
memory technology development, others favor an open, evolutionary approach. Memory
manufacturers must balance the cost of performance enhancements against the laws of physics and
compatibility with existing technologies. HP will continue to evaluate relevant memory technologies in
order to offer customers products with the most reliable, best performing memory at the lowest
possible cost.
16
For more information
For additional information, refer to the resources listed below.
TC050802TB, 08/2005