Hard Drive: What If You Were To Design A Solid State Hard Disk Out of Normal Memory Modules?
Hard Drive: What If You Were To Design A Solid State Hard Disk Out of Normal Memory Modules?
For years there have been attempts to speed up the performance ability of the
ROM disk memory subsystem, because, with the exception of subordinate storage drives
such as CD-ROM, DVD-ROM, diskettes or other exchangeable media, the hard drive is
still the slowest device on a modern PC. It is not for nothing that the industry is trying to
further increase the spin speed of hard disk drives for the long-term.
Those dissatisfied with the performance of mechanical storage solutions can tap
solid-state storage devices that substitute silicon for spinning platters. Such devices shed
the mechanical shackles that limit hard drive performance, but they've hardly been
affordable options for most users.
Solid State memory modules are used that are protected from data loss for a
limited period of time throughbattery switching. Otherwise, without electrical charging
the CMOS transistors in today's memory chips cannot retain their state. The result though
is very convincing. Solid state drives are much more robust when it comes to vibration
and, as expected, are many times faster than magnetic memory media - and we're talking
about access here.
What if you were to design a solid state hard disk out of normal memory modules?
The result was the i-RAM, a board of medium size with four DIMM sockets, a
buffer battery and a serial ATA interface. Power is supplied by a PCI plug, so you need
to have at least one free slot, while the card requires the space of two expansion cards due
to the size of the memory module on the side.
1
Inspirations of IRAM:
The division of the semiconductor industry into microprocessor and memory
camps provides many advantages.
Quantitative evidence of the success of the industry is its size: in 1995 DRAMs
were a $37B industry and microprocessors were a $20B industry. In addition to financial
success, the technologies of these industries have improved at unparalleled rates. DRAM
capacity has quadrupled on average every 3 years since 1976, while microprocessor
speed has done the same since 1986.
The split into two camps has its disadvantages as well. Figure 1 shows that while
microprocessor performance has been improving at a rate of 60% per year, the access
time to DRAM has been improving at less than 10% per year. Hence computer designers
are faced with an increasing “Processor-Memory Performance Gap,” which is now the
primary obstacle to improved computer system performance.
2
Fig 2.1: Processor – Memory Gap
3
To overcome the performance gap we have to provide a cache memory which needs
to invest more money. Also because of the increase in the number of transistors the power
consumption also is increased. But the cache is inefficient as the performance gap is very high
and is growing at a rate of 50% per year. The extraordinary delays in the memory hierarchy
occur despite tremendous resources being spent trying the bridge the processor-memory
performance gap. We call the percent of die area and transistors dedicated to caches and other
memory latency-hiding hardware the “Memory Gap Penalty”. Table 2.1 quantifies the penalty; it
has grown to 60% of the area and almost 90% of the transistors in several microprocessors. Also
due to the rapid growth there may arise a situation where the degree of integration has to be
increased and fabrication size of transistors need to be decreased which will further increase the
cost of production which implies the penalty will increase with new generations of
microprocessors.
Memory Revenue
The memory revenue is decreasing rapidly nowadays. Even though the need for
DRAM chips is increasing the DRAM manufacturers are not getting the benefit because of the
high cost of production. This makes the RAM manufactures to think of another alternative which
can reduce the cost of production to maintain the revenue as well as their business.
Figure 2.2 shows that the DRAM revenue has been falling continuously from first
quarter of the year 1999 after reaching a maximum of 16 Billion U S Dollars. Again in the first
quarter of the year 2000 it showed a slight rise by reaching 7 Billion after which seems to be
sliding down continuously for three consecutive years which has not ceased till now.
4
Figure 1.2: Merory Revenue
I/O Bus Performance Lag
The parallel I/O bus is not efficient because it lags behind the processor and memory
in band width. If we scale the bus by increasing clock speed and bus width for increasing
performance the packaging cost is increased. Scaling also results in increased number of pins. So
the performance lag of parallel I/O bus points to the requirement of a much more efficient
technological implementation which rectifies the band width scarcity and increase in both cost of
production and in number of pins while scaling it for increasing the performance and efficiency.
5
Database Demand for Processing Power and Memory
The database or software is demanding for more and more processing power and
memory. But both of them are inadequate when compared to the actual demand. Also the
demand is increasing rapidly because more and more high end applications like multimedia
applications which need high processing power and RAM. Also their requirements are increasing
by the release of their continuing versions which are capable of squeezing the final drop of
performance from the computing system.
Figure 2.2
Figure 2.3 shows the database demand for processing power and memory is
increasing by a multiple of 2 or becoming twice in 9 months according to Greg’s law. But the
microprocessor speed and DRAM speed becomes twice in 18 months and 120 months
respectively. The microprocessor speed is increasing according to the Moore’s Law. So both
microprocessor and memory is less when compared to the actual requirement demanded by the
database and other software applications making a Database - Processor performance gap and
Database - Memory performance gap respectively. Also the performance gap is growing
continuously and rapidly since the new software applications are hungrier in matters of
processing power and memory. So the database demand for more processing power and memory
6
made the computer experts to think of a technology which reduces the performance gap between
the database demand and processor as well as the memory.
Fewer DRAMs/System over Time
While the Processor-Memory Performance Gap has widened to the point where it is
dominating performance for many applications, the cumulative effect of two decades of 60% per
year improvement in DRAM capacity has resulted in huge individual DRAM chips. This has put
the DRAM industry in something of a bind. That is the DRAM width or the memory per DRAM
is growing at a rate of 60% per year. But the minimum memory required per system is growing
only at a rate of 25% - 30% per year.
Figure 2.3
Figure 2.4 shows that over time the number of DRAM chips required for a reasonably
configured PC has been shrinking. The required minimum memory size, reflecting application
and operating system memory usage, has been growing at only about half to three-quarters the
rate of DRAM chip capacity. For example, consider a word processor that requires 8MB; if its
memory needs had increased at the rate of DRAM chip capacity growth, that word processor
would have had to fit in 80KB in 1986 and 800 bytes in 1976. The result of the prolonged rapid
improvement in DRAM capacity is fewer DRAM chips needed per PC, to the point where soon
many PC customers may require only a single DRAM chip. Also unused memory bits increases
7
effective cost. So customers may no longer automatically switch to the larger capacity DRAM as
soon as the next generation matches the same cost per bit in the same organization because 1) the
minimum memory increment may be much larger than needed, 2) the larger capacity DRAM
will need to be in a wider configuration that is more expensive per bit than the narrow version of
the smaller DRAM, or 3) the wider capacity does not match the width needed for error checking
and hence results in even higher costs.
8
3. IRAM - Architecture
Key Technologies
The Key Technologies behind the IRAM technology are,
1) Vector Processing 2) Embedded DRAM and 3) Serial
I/O
Vector Processing
High speed microprocessors rely on instruction level parallelism (ILP) in programs,
which means the hardware has the potential short instruction sequences to execute in parallel. As
mentioned above, these high speed microprocessors rely on getting hits in the cache to supply
instructions and operands at a sufficient rate to keep the processors busy.
An alternative model to exploiting ILP that does not rely on caches is vector
processing. It is a well established architecture and compiler model that was popularized by
supercomputers, and it is considerably older than superscalar. Vector processors have high-level
operations that work on linear arrays of numbers.
Advantages of vector computers and the vectorized programs on them include:
1. Each result is independent of previous results, which enables deep pipelines and high clock
rates in them.
2. A single vector instruction does a great deal of work, which means fewer instruction fetches in
general and fewer branch instructions and so fewer mispredicted branches.
3. Vector instructions often access memory a block at a time, which allows memory latency to be
amortized over, say, 64 elements.
4. Vector instructions often access memory with regular (constant stride) patterns, which allows
multiple memory banks to simultaneously supply operands.
9
Figure 4.1
Figure 4.1 shows the ‘vector processing model’ in which the difference between
scalar and vector instructions is schematically represented. In scalar processing the instructions
are carried out sequentially while in vector processing a number of instructions are carried out in
parallel which depends on the vector length of the processor. So parallel processing is much
faster than scalar processing.
10
Figure 4.2
The 32 flag registers vf0, vf1, ……, vf31 are for executing floating point instructions. It
also consists of 32 control registers vcr0, vcr1, ……., vcr31 for the control of instruction execution
carried out by the processor.
11
Embedded DRAM
The embedded DRAM technology used in IRAM is by means of embedded
technology. It is the technology by which a chip is embedded into a device for the control and
well execution of the operations of that particular device. Usually chip embedding is done in
devices handled by common people where they don’t need to interact with the chip directly, but
by means of embedded chip he executes and controls the device.
Figure 4.3
Figure 4.3 shows how embedded technology is used in the manufacturing of IRAM.
During the fabrication the memory chip is embedded into the microprocessor to produce IRAM.
Thus IRAM becomes a single chip into which both memory and processor are integrated for high
quality performance due to their coexistence.
12
4. The memory flexibility of IRAM is due to the embedded technology used in its
manufacturing process. The designers can specify exact length and width of the DRAM since
it is not restricted by powers of 2. So embedded DRAM offers system memory size benefits.
Serial I/O
Due to the poor performance of parallel I/O both in the case of band width and
scaling processes the I/O system of IRAM is using a much more efficient and cost effective
technology the ‘Serial I/O system’. It enhances the performance of IRAM without hindering the
memory and processor performances by offering a smooth and faster path for data transfer.
Figure 4.4
Figure 4.4 shows the schematic representation of the interaction between IRAM and
the I/O devices. The interaction is through the serial I/O lines implemented in it as shown in
figure. Due to the high band width offered by the serial I/O lines the data transfer takes place
much faster and efficiently in IRAM which enhances its performance.
13
3. Serial I/O band width can be incrementally scaled for increasing the band width and
efficiency. Also the scaling will not cause any increase in pin number which implies that the
scaling is very cost effective in the case of serial I/O while scaling in parallel I/O increases
both the number of pins and cost of production.
4. The power consumption of serial I/O system is very less when compared to that of
parallel I/O system. So serial I/O enhances the performance of IRAM along with reduction in
power consumption. The reduced power consumption of serial I/O system makes it suitable
for low power consuming devices.
14
Figure 4.5
15
Figure 4.6
The 2-way superscalar processor is for scalar processing. It is called 2-way because
the issue of control logic is different for FPU (Floating Point Unit) operations and LSC
(Load/Store/Coprocessor) operations. The vector registers are meant for vector instruction
executions which register the vector instructions issued by the vector processor. The vector
instructions are queued from 2-way superscalar processor to the vector instruction queue unit.
The arithmetic and logic unit is for the execution of various arithmetic and logic instructions
issued by both super scalar processor and vector processor. The Load/Store unit is meant for the
various memory load and store operations. The serial I/O lines are for the interaction of IRAM
with the various input and output devices. The memory crossbar switch acts as a link between
the memory, processor and input-output devices for their mutual interactions during their
operations. The integrated architecture makes the memory and processor to coexist and perform
as a single unit. Since there is high level of interaction between the memory, processor and the
I/O devices the performance of IRAM is very high compared to the separate chip processor
-memory unit. This unified architecture from the same fabrication line is in fact the secret behind
the excellent performance of IRAM in both processing and memory intensive operations.
16
4. IRAM -Benchmarking
Benchmarking is a process or group of processes by means of which one can take a
right decision upon the performance and efficiency of a product or technology in comparison
with the other product or technology which took part in the event. Benchmarking helps us to find
the product or technology which is appropriate for our requirements. So it is a solid proof for the
performance and efficiency of a product or technology in comparison with others.
Benchmarking Environment
Benchmarking environment is the environment where the whole processes are carried
out for arriving at a right decision. It may include the various products based on the same
technology or different technology depending upon the type and requirement of benchmarking.
In this case we are benchmarking the technology IRAM with other processor-memory
combinations for proving the real potential of IRAM. So the benchmarking environment will
consist of the various processors from different manufacturers along with various memory
modules from different sources to perform as a single unit while the IRAM is itself a single unit
manufactured from a single fabrication process. Table 5.1 shows the various processors selected
for the benchmarking and the different memory modules used for the benchmarking process.
Table 5.1
The various processors used for the benchmarking are SPARC from Sun, R10K from
Origin, P III and P 4 from Intel, EV6 from Alpha and IRAM from Berkeley. The clock
frequencies of the various processors, the L1 and L2 caches and the memory modules used with
17
them are all mentioned in the table 5.1 given above. Also it can be noted that the processor with
minimum L1 and L2 cache (L2 cache is not possessed by IRAM) and minimum memory
capacity is IRAM. All other processors are having more L1 and L2 caches and memory capacity
associated with them for their operations.
Benchmarking Processes
The various benchmarking processes carried out are,
1. Transitive Closure: The first benchmark problem is to compute the transitive closure
of a directed graph in a dense representation. The code taken from the DIS reference
implementation used non-unit stride, but was easily changed to unit stride. This benchmark
performs only 2 arithmetic operations (an addition and a subtraction) at each step, while it
executes 2 loads and 1 store.
4. Histogram: Computing a histogram of a set of integers can be used for sorting and in
some image processing problems. Two important considerations govern the algorithmic choice:
the number of buckets, b, and the likelihood of duplicates. For image processing, the number of
buckets is large and collisions are common because there are typically many occurrences of
18
certain colors (e.g.: white) in an image. Histogram is nearly identical to GUPS in its memory
behavior, but differs due to the possibility of collisions, which limit parallelism and are
particularly challenging in a data-parallel model.
19
Figure 4.1
20
benchmarking conductors tidied up the compiler-generated assembly instructions for the inner
loops, which produced a 20-60% speedup.
Figure 5.2
In addition to the MOP rate, it is interesting to observe the memory bandwidth
consumed in this problem. GUPS achieves 1.77, 2.36, 3.54, and 4.87 GB/s memory bandwidth
on IRAM at 8, 16, 32, and 64-bit data widths, respectively. This is relatively close to the peak
memory bandwidth of 6.4 GB/s.
21
array of column indices and non-zero values for each row; SPMV is then performed as a series
of sparse dot products. The performance on IRAM is better than some cache-based machines, but
it suffers from lack of parallelism. The dot product is performed by recursive halving, so vectors
start with an average of 18 elements and drop from there. Both the P4 and EV6 exceed IRAM
performance for this reason. CRS-banded uses the same format and algorithm as CRS, but
reflects a different nonzero structure that would likely result from bandwidth reduction
orderings, such as reverse Cuthill-McKee (RCM). This has little effect on IRAM, but improves
the cache hit rate on some of the other machines.
Figure 5.3
The Ellpack (or Itpack) format forces all rows to have the same length by padding
them with zeros. It still has indexed memory operations, but increases available data parallelism
through vectorization across rows. The raw Ellpack performance is excellent, and this format
should be used on IRAM and PIII for matrices with the longest row length close to the average.
If we instead measure the effective performance (eff), which discounts operations performed on
padded zeros, the efficiency can be arbitrarily poor. Indeed, the randomly generated DIS matrix
has an enormous increase in the matrix size and number of operations, making it impractical.
The Segmented-sum algorithm was first proposed for the Cray PVP. The data structure is an
augmented form of the CRS format and the computational structure is similar to Ellpack,
22
although there is additional control complexity. The underlying Ellpack algorithm was modified
that converts roughly 2/3 of the memory accesses from a large stride to unit stride. The
remaining 1/3 are still indexed references. This was important on IRAM, because we are using
32-bit data and have only 4 address generators as discussed above.
Histogram
This benchmark builds a histogram for the pixels in a 500x500 image from the DIS
Specification. The number of buckets depends on the number of bits in each pixel, so we use the
base 2 logarithm (i.e., the pixel depth) as the parameter in our study. Performance results for
pixel depths of 7, 11, and 15 are shown in Figure 5.4. The first five sets are for IRAM, all but the
second (Retry 0%) use this image data set. The first set (Retry) uses the compiler default
vectorization algorithm, which vectorizes while ignoring duplicates, and corrects the duplicates
in a serial phase at the end. This works well if there are few duplicates, but performs poorly for
our case. The second set (Retry 0%) shows the performance when the same algorithm is used on
data containing no duplicates. The third set (Priv) makes several private copies of the buckets
with the copies merged at the end. It performs poorly due to the large number of buckets and
gets worse as this number increases with the pixel depth.
23
Figure 5.4
The fourth and fifth algorithms use a more sophisticated sort-diff-find-diff algorithm
that performs inregister sorting. Bitonic sort was used because the communication requirements
are regular and it proved to be a good match for IRAM's “butterfly” permutation instructions,
designed primarily for reductions and FFTs. The compiler automatically generates in-register
permutation code for reductions, but the sorting algorithm used here was hand-coded. The two
sort algorithms differ on the allowed data width: one works when the width is less than 16 bits
and the other when it is up to 32 bits. The narrower width takes advantage of the higher
arithmetic performance for narrow data on IRAM. Results show that on IRAM, the sort-based
and privatized optimization methods consistently give the best performance over the range of bit
depths. It also demonstrates the improvements that can be obtained when the algorithm is
tailored to shorter bit depths. Overall, IRAM does not do as well as on the other benchmarks,
because the presence of duplicates hurts vectorization, but can actually help improve cache hits
on cache-based machines. We therefore see excellent timings for the histogram computation on
these machines without any special optimizations. A memory system advantage starts to be
apparent for 15-bit pixels, where the histograms do not fit in cache, and at this point IRAM's
performance is comparable to the faster microprocessors.
24
Mesh Adaptation
This benchmark performs a single level of refinement starting with a mesh of 4802
triangular elements, 2500 vertices, and 7301 edges. In this application, we use a different
algorithm organization for the different machines: The original code was designed for
conventional processors and is used for those machines, while the vector algorithm uses more
memory bandwidth but contains smaller loop bodies, which helps the compiler perform
vectorization. The vectorized code also pre-sorts the mesh points to avoid branches in the inner
loop, as in Histogram. Although the branches negatively affect superscalar performance,
presorting is too expensive on those machines. Mesh adaptation also requires indexed memory
operations, so address generation again limits IRAM.
Figure 5.5
Figure 5.5 shows the performance of processors in Mesh Adaptation. It indicates that
that IRAM has performed well in mesh adaptation when compared to other processors. The only
competitor was Intel Pentium 4. So IRAM has emerged a clear winner in the processing
intensive benchmark, Mesh Adaptation.
25
IRAM and the MOPS rate achieved on each of the benchmarks using the best algorithm on the
most challenging input. GUPS uses the 64-bit version of the problem, SPMV uses the segmented
sum algorithm, and Histogram uses the 16-bit sort. While all of these problems have low
operation counts per memory operation, the memory and operation rates are quite different in
practice. Of these benchmarks, GUPS is the most memory-intensive, where as Mesh Adaptation
is the least. Histogram, SPMV and Transitive Closure have roughly the same balance between
computation and memory, although their absolute performance varies dramatically due to
differences in parallelism. In particular, although GUPS and Histogram are nearly identical in the
characteristics, the difference in parallelism results in a very different absolute performance as
well as relative bandwidth to operation rate.
Figure 5.7 shows the summary of performance for each of the benchmarks across
machines. The y-axis is a log scale, and IRAM is significantly faster than the other machines on
all applications except SPMV and Histogram.
An even more dramatic picture is seen from measuring the MOPS/Watt ratio, as
shown in Figure 5.8. Most of the cache-based machines use a small amount of parallelism, but
spend a great deal of power on a high clock rate. Indeed a graph of Flops per machine cycle is
very similar. Only the Pentium III, designed for portable machines, has a comparable power
consumption of 4 Watts compared to IRAM’s 2 Watts. The Pentium III cannot compete on
performance, however, due to lack of parallelism.
Figure 5.6
26
Figure 5.7
Figure 5.8
27
5.Advantages of IRAM
Low Latency
To reduce latency, the wire length should be kept as short as possible. This suggests
the fewer bits per block the better. In addition, the DRAM cells furthest away from the processor
will be slower than the closest ones. Rather than restricting the access timing to accommodate
the worst case, the processor could be designed to be aware when it is accessing “slow” or “fast”
memory. Some additional reduction in latency can be obtained simply by not multiplexing the
address as there is no reason to do so on an IRAM. Also, being on the same chip with the
DRAM, the processor avoids driving the off chip wires, potentially turning around the data bus,
and accessing an external memory controller. In summary, the access latency of an IRAM
processor does not need to be limited by the same constraints as a standard DRAM part. Much
lower latency may be obtained by intelligent floor planning, utilizing faster circuit topologies,
and redesigning the address/data bussing schemes.
The potential memory latency for random addresses of less than 30 ns is possible for
a latency-oriented DRAM design on the same chip as the processor; this is as fast as second level
caches. Recall that the memory latency on the Alpha Server 8400 is 253 ns.
IRAM offers performance opportunities for applications with unpredictable memory
accesses and very large memory “footprints”, such as data bases, which may take advantage of
the potential 5X to 10X decrease in IRAM latency. The lower latency of IRAM is due to the fact
that it doesn’t have any parallel DRAM’s, memory controller, longer bus to turn around and also
it has lower number of pins. The latency of an IRAM chip is 10-15 ns for a 64-128 MB memory
capacity which is very low compared to other memory chips.
High Bandwidth
A DRAM naturally has extraordinary internal bandwidth, essentially fetching the
square root of its capacity each DRAM clock cycle; an on-chip processor can tap that bandwidth.
The potential bandwidth of the gigabit DRAM is even greater than indicated by its logical
organization. Since it is important to keep the storage cell small, the normal solution is to limit
the length of the bit lines, typically with 256 to 512 bits per sense amp. This quadruples the
number of sense amplifiers. To save die area, each block has a small number of I/O lines, which
28
reduces the internal bandwidth by a factor of about 5 to 10 but still meets the external demand.
One IRAM goal is to capture a larger fraction of the potential on-chip bandwidth. For example,
two prototypes 1 gigabit DRAMs were presented at ISSCC in1996. As mentioned above, to cope
with the long wires inherent in 600 mm2 dies of the gigabit DRAMs, vendors are using more
metal layers: 3 for Mitsubishi and 4 for Samsung. The total number of memory modules on chip
is 512 2-Mbit modules and 1024 1-Mbit modules, respectively. Thus a gigabit IRAM might have
1024 memory modules each 1K bits wide. Not only would there be tremendous bandwidth at the
sense amps of each block, the extra metal layers enable more cross-chip bandwidth. Assuming a
1Kbit metal bus needs just 1mm, a 600 mm 2 IRAM might have 16 busses running at 50 to 100
MHz. Thus the internal IRAM bandwidth should be as high as 200-300 GBytes/sec. For
comparison, the sustained memory bandwidth of the Alpha Server 8400 which includes a 75
MHz, 256-bit memory bus is 1.2 Gbytes/sec. Cross bar switch in IRAM architecture delivers
only 1/3 to 2/3 of the theoretical band width, so actual band width will be 100-200 GB/Sec.
Applications with predictable memory accesses, such as matrix manipulations, may take
advantage of the potential 50X to 100X increase in IRAM bandwidth.
29
The main contributing term in the above equation is the access energy of off-chip
which vanishes along with the second term from the equation for energy per instruction for
IRAM since there is no L2 cache in IRAM. So the energy per memory access will be only the
access energy of L1 cache which is also less when compared to the L1 cache access energy of
other microprocessor chips. So the energy consumption of IRAM chips is very low which is very
good for low power consuming devices and portable devices.
Memory Flexibility
Another advantage of IRAM over conventional designs is the ability to adjust both
the size and width of the on-chip DRAM. Rather than being limited by powers of 2 in length or
width, as is conventional DRAM, IRAM designers can specify exactly the number of words and
their width. This flexibility can improve the cost of IRAM solutions versus memories made from
conventional DRAMs.
30
6.Disadvantages of IRAM
31
7.Conclusion
Merging a microprocessor and DRAM on the same chip presents opportunities in
performance, energy efficiency, and cost: a factor of 5 to 10 reduction in latency, a factor of 50
to 100 increase in bandwidth, a factor of 2 to 4 advantage in energy efficiency, and an
unquantified cost savings by removing superfluous memory and by reducing board area. The
surprise is that these claims are not based on some exotic, unproven technology; they based
instead on tapping the potential of a technology in use for the last 10 years. The popularity of
IRAM is only limited by the amount of memory on-chip, which should expand by about 60% per
year. A best case scenario would be for IRAM to expand its beachhead in graphics, which
requires about 10 Mbits, to the game, embedded, and personal digital assistant markets, which
require about 32 Mbits of storage. Such high volume applications could in turn justify creation of
a process that is more friendly to IRAM, with DRAM cells that are a little bigger than in a
DRAM fabrication but much more amenable to logic and SRAM. As IRAM grows to 128 to 256
Mbits of storage, an IRAM might be adopted by the network computer or portable PC markets.
Such a success could in turn entice either microprocessor manufacturers to include substantial
DRAM on chip, or DRAM manufacturers to include processors on chip. Hence IRAM presents
an opportunity to change the nature of the semiconductor industry. From the current division into
logic and memory camps, a more homogeneous industry might emerge with historical
microprocessor manufacturers shipping substantial amounts of DRAM - just as they ship
substantial amounts of SRAM today - or historical DRAM manufacturers shipping substantial
numbers of microprocessors. Both scenarios might even occur, with one set of manufacturers
oriented towards high performance and the other towards low cost. Also IRAM with its potential
can create a new generation of computers with increased portability, reduced size and power
consumption without compromising on performance and efficiency.
32