0% found this document useful (0 votes)
28 views3 pages

Memory Chapter2

The document discusses different memory architectures and RAM technologies. It describes how static RAM uses 6 transistors per cell while dynamic RAM uses 1 transistor and 1 capacitor per cell. It also covers architectures with memory controllers integrated into CPUs and connected directly to memory, versus using an external northbridge chip.

Uploaded by

zhuoyan xu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views3 pages

Memory Chapter2

The document discusses different memory architectures and RAM technologies. It describes how static RAM uses 6 transistors per cell while dynamic RAM uses 1 transistor and 1 capacitor per cell. It also covers architectures with memory controllers integrated into CPUs and connected directly to memory, versus using an external northbridge chip.

Uploaded by

zhuoyan xu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

and must wait to access memory, despite the use of CPU caches.

If multiple hyper-threads, cores,


or processors access memory at the same time, the wait times for memory access are even longer.
This is also true for DMA operations. There is more to accessing memory than concurrency,
however. Access patterns themselves also greatly influence the performance of the memory
subsystem, especially with multiple memory channels. In section 2.2 we wil cover more details of
RAM access patterns. On some more expensive systems, the Northbridge does not actually
contain the memory controller. Instead the Northbridge can be connected to a number of external
memory controllers (in the following example, four of them). PCI-E Southbridge SATA USB
Northbridge RAM MC2 RAM MC1 MC4 RAM MC3 RAM CPU1 CPU2 Figure 2.2: Northbridge
with External Controllers The advantage of this architecture is that more than one memory bus
exists and therefore total available bandwidth increases. This design also supports more memory.
Concurrent memory access patterns reduce delays by simultaneously accessing different memory
banks. This is especially true when multiple processors are directly connected to the Northbridge,
as in Figure 2.2. For such a design, the primary limitation is the internal bandwidth of the
Northbridge, which is phenomenal for this architecture (from Intel).4 Using multiple external
memory controllers is not the only way to increase memory bandwidth. One other increasingly
popular way is to integrate memory controllers into the CPUs and attach memory to each CPU.
This architecture is made popular by SMP systems based on AMD’s Opteron processor. Figure 2.3
shows such a system. Intel will have support for the Common System Interface (CSI) starting with
the Nehalem processors; this is basically the same approach: an integrated memory controller with
the possibility of local memory for each processor. With an architecture like this there are as many
memory banks available as there are processors. On a quad-CPU machine the memory bandwidth
is quadrupled without the need for a complicated Northbridge with enormous bandwidth. Having a
memory controller integrated into the CPU has some additional advantages; we will not dig 4For
completeness it should be mentioned that such a memory controller arrangement can be used for
other purposes such as “memory RAID” which is useful in combination with hotplug memory.
CPU3 CPU4 CPU1 CPU2 RAM RAM RAM RAM PCI-E Southbridge SATA USB Figure 2.3:
Integrated Memory Controller deeper into this technology here. There are disadvantages to this
architecture, too. First of all, because the machine still has to make all the memory of the system
accessible to all processors, the memory is not uniform anymore (hence the name NUMA - Non-
Uniform Memory Architecture - for such an architecture). Local memory (memory attached to a
processor) can be accessed with the usual speed. The situation is different when memory attached
to another processor is accessed. In this case the interconnects between the processors have to be
used. To access memory attached to CPU2 from CPU1 requires communication across one
interconnect. When the same CPU accesses memory attached to CPU4 two interconnects have to
be crossed. Each such communication has an associated cost. We talk about “NUMA factors”
when we describe the extra time needed to access remote memory. The example architecture in
Figure 2.3 has two levels for each CPU: immediately adjacent CPUs and one CPU which is two
interconnects away. With more complicated machines the number of levels can grow significantly.
There are also machine architectures (for instance IBM’s x445 and SGI’s Altix series) where there
is more than one type of connection. CPUs are organized into nodes; within a node the time to
access the memory might be uniform or have only small NUMA factors. The connection between
nodes can be very expensive, though, and the NUMA factor can be quite high. Commodity
NUMA machines exist today and will likely play an even greater role in the future. It is expected
that, from late 2008 on, every SMP machine will use NUMA. The costs associated with NUMA
make it important to recognize when a program is running on a NUMA machine. In section 5 we
will discuss more machine architectures and some technologies the Linux kernel provides for
these programs. Beyond the technical details described in the remainder of this section, there are
several additional factors which influence the performance of RAM. They are not controllable by
software, which is why they are not covered in this section. The interested reader can learn about
some of these factors in section 2.1. They are really only needed to get a more complete picture of
RAM technology and possibly to make better decisions when purchasing computers. 4 Version 1.0
What Every Programmer Should Know About Memory The following two sections discuss
hardware details at the gate level and the access protocol between the memory controller and the
DRAM chips. Programmers will likely find this information enlightening since these details
explain why RAM access works the way it does. It is optional knowledge, though, and the reader
anxious to get to topics with more immediate relevance for everyday life can jump ahead to
section 2.2.5. 2.1 RAM Types There have been many types of RAM over the years and each type
varies, sometimes significantly, from the other. The older types are today really only interesting to
the historians. We will not explore the details of those. Instead we will concentrate on modern
RAM types; we will only scrape the surface, exploring some details which are visible to the kernel
or application developer through their performance characteristics. The first interesting details are
centered around the question why there are different types of RAM in the same machine. More
specifically, why are there both static RAM (SRAM5 ) and dynamic RAM (DRAM). The former
is much faster and provides the same functionality. Why is not all RAM in a machine SRAM? The
answer is, as one might expect, cost. SRAM is much more expensive to produce and to use than
DRAM. Both these cost factors are important, the second one increasing in importance more and
more. To understand these differences we look at the implementation of a bit of storage for both
SRAM and DRAM. In the remainder of this section we will discuss some low-level details of the
implementation of RAM. We will keep the level of detail as low as possible. To that end, we will
discuss the signals at a “logic level” and not at a level a hardware designer would have to use.
That level of detail is unnecessary for our purpose here. 2.1.1 Static RAM M1 M3 M2 M4 M5 M6
Vdd BL BL WL Figure 2.4: 6-T Static RAM Figure 2.4 shows the structure of a 6 transistor
SRAM cell. The core of this cell is formed by the four transistors M1 to M4 which form two
cross-coupled inverters. They have two stable states, representing 0 and 1 respectively. The state is
stable as long as power on Vdd is available. 5 In other contexts SRAM might mean “synchronous
RAM”. If access to the state of the cell is needed the word access line WL is raised. This makes
the state of the cell immediately available for reading on BL and BL. If the cell state must be
overwritten the BL and BL lines are first set to the desired values and then WL is raised. Since the
outside drivers are stronger than the four transistors (M1 through M4 ) this allows the old state to
be overwritten. See [20] for a more detailed description of the way the cell works. For the
following discussion it is important to note that • one cell requires six transistors. There are
variants with four transistors but they have disadvantages. • maintaining the state of the cell
requires constant power. • the cell state is available for reading almost immediately once the word
access line WL is raised. The signal is as rectangular (changing quickly between the two binary
states) as other transistorcontrolled signals. • the cell state is stable, no refresh cycles are needed.
There are other, slower and less power-hungry, SRAM forms available, but those are not of
interest here since we are looking at fast RAM. These slow variants are mainly interesting because
they can be more easily used in a system than dynamic RAM because of their simpler interface.
2.1.2 Dynamic RAM Dynamic RAM is, in its structure, much simpler than static RAM. Figure 2.5
shows the structure of a usual DRAM cell design. All it consists of is one transistor and one
capacitor. This huge difference in complexity of course means that it functions very differently
than static RAM. DL AL M C Figure 2.5: 1-T Dynamic RAM A dynamic RAM cell keeps its state
in the capacitor C. The transistor M is used to guard the access to the state. To read the state of the
cell the access line AL is raised; this either causes a current to flow on the data line DL or not,
depending on the charge in the capacitor. To write to the cell the data line DL is appropriately set
and then AL is raised for a time long enough to charge or drain the capacitor. There are a number
of complications with the design of dynamic RAM. The use of a capacitor means that reading
Ulrich Drepper Version 1.0

You might also like