Chapter # 1 COAL
Chapter # 1 COAL
Chapter # 1 COAL
FIGURE 1.1 The number manufactured per year of tablets and smart phones, which reflect the PostPC
era, versus personal computers and traditional cell phones. Smart phones represent the recent growth in
the cell phone industry, and they passed PCs in 2011. Tablets are the fastest growing category, nearly
doubling between 2011 and 2012. Recent PCs and traditional cell phone categories are relatively flat or
declining.
Cloud Computing
o Warehouse Scale Computers (WSC): Companies like Amazon and Google build
these WSCs containing 100,000 servers and then let companies rent portions of
them so that they can provide software services to PMDs with having to build
WSCs of their own.
We now introduce eight great ideas that computer architects have been invented in
the last 60 years of computer design.
These ideas are so powerful they have lasted long after the first computer that used
them, with newer architects demonstrating their admiration by imitating their
predecessors.
7. Hierarchy of Memories
o Architects have found that they can address these conflicting demands with a
hierarchy of memories with the fastest, smallest, and most expensive memory
per bit at the top of the hierarchy and the slowest, largest, and cheapest per bit at
the bottom.
FIGURE 1.3 A simplified view of hardware and software as hierarchical layers, shown as concentric
circles with hardware in the center and applications software outermost. In complex applications, there
are often multiple layers of application software as well. For example, a database system may run on
top of the systems software hosting an application, which in turn runs on top of the database.
FIGURE 1.4 C program compiled into assembly language and then assembled into binary machine
language. Although the translation from high-level language to binary machine language is shown in
two steps, some compilers cut out the middleman and produce binary machine language directly.
These languages and this program are examined in more detail in Chapter 2.
The five classic components of a computer are input, output, memory, datapath,
and control, with the last two (datapath and control) sometimes combined and called
the processor.
o Input device: A mechanism through which the computer is fed information, such
as a keyboard.
o Output device: A mechanism that conveys the result of a computation to a user,
such as a display, or to anther computer such as network adapters.
o Memory: The storage area in which program are kept when they are running and
that contains the data needed by the running programs such as hard disk,
CD/DVD, flash memory.
o Central processor unit (CPU): Also processor. The active part the computer,
which contains the datapath and control and which add numbers, test number,
signals I/O device to activate, and so on.
o Datapath: The component of the processor that performs arithmetic operations
o Control: The component of the processor tat commands the datapath memory,
and, and I/O devices according to the instructions of the program
FIGURE 1.5 The organization of a computer, showing the five classic components. The processor
gets instructions and data from memory. Input writes data to memory, and output reads data from
memory. Control sends the signals that determine the operations of the datapath, memory, input, and
output.
FIGURE 1.6 Each coordinate in the frame buffer on the left determines the shade of the
corresponding coordinate for the raster scan CRT display on the right. Pixel (X0, Y0) contains the bit
pattern 0011, which is a lighter shade on the screen than the bit pattern 1101 in pixel (X1, Y1).
Touchscreen
While PCs also use LCD displays, the tablets and smartphones of the PostPC era
have replaced the keyboard and mouse with touch sensitive display, which has the
wonderful user interface advantage of user pointing directly what they are interested
in rat that indirectly with a mouse.
FIGURE 1.7 Components of the Apple iPad 2 A1395. The metal back of the iPad (with the reversed
Apple logo in the middle) is in the center. At the top is the capacitive multitouch screen and LCD
display. To the far right is the 3.8 V, 25 watt-hour, polymer battery, which consists of three Li-ion cell
cases and offers 10 hours of battery life. To the far left is the metal frame that attaches the LCD to the
back of the iPad. The small components surrounding the metal back in the center are what we think of
as the computer; they are often L-shaped to fit compactly inside the case next to the battery. Figure 1.8
shows a close-up of the L-shaped board to the lower left of the metal case, which is the logic printed
circuit board that contains the processor and the memory. The tiny rectangle below the logic board
contains a chip that provides wireless communication: Wi-Fi, Bluetooth, and FM tuner. It fits into a
small slot in the lower left corner of the logic board. Near the upper left corner of the case is another L-
shaped component, which is a front-facing camera assembly that includes the camera, headphone jack,
and microphone. Near the right upper corner of the case is the board containing the volume control and
silent/screen rotation lock button along with a gyroscope and accelerometer. These last two chips
combine to allow the iPad to recognize 6-axis motion. The tiny rectangle next to it is the rear-facing
camera. Near the bottom right of the case is the L-shaped speaker assembly. The cable at the bottom is
the connector between the logic board and the camera/volume control board. The board between the
cable and the speaker assembly is the controller for the capacitive touchscreen. (Courtesy iFixit,
www.ifixit.com)
FIGURE 1.9 The processor integrated circuit inside the A5 package. The size of chip is 12.1 by 10.1
mm, and it was manufactured originally in a 45-nm process (see Section 1.5). It has two identical
ARM processors or cores in the middle left of the chip and a PowerVR graphical processor unit (GPU)
with four datapaths in the upper left quadrant. To the left and bottom side of the ARM cores are
interfaces to main memory (DRAM). (Courtesy Chipworks, www.chipworks.com)
FIGURE 1.11 Growth of capacity per DRAM chip over time. The y-axis is measured in kibibits (210
bits). The DRAM industry quadrupled capacity almost every three years, a 60% increase per year, for
20 years. In recent years, the rate has slowed down and is somewhat closer to doubling every two years
to three years.
FIGURE 1.12 The chip manufacturing process. After being sliced from the silicon ingot, blank
wafers are put through 20 to 40 steps to create patterned wafers (see Figure 1.13). These patterned
wafers are then tested with a wafer tester, and a map of the good parts is made. Then, the wafers are
diced into dies (see Figure 1.9). In this figure, one wafer produced 20 dies, of which 17 passed testing.
(X means the die is bad.) The yield of good dies in this case was 17/20, or 85%. These good dies are
then bonded into packages and tested one more time before shipping the packaged parts to customers.
One bad packaged part was found in this final test
FIGURE 1.13 A 12-inch (300 mm) wafer of Intel Core i7 (Courtesy Intel). The number of dies on
this 300 mm (12 inch) wafer at 100% yield is 280, each 20.7 by 10.5 mm. The several dozen partially
rounded chips at the boundaries of the wafer are useless; they are included because it’s easier to create
the masks used to pattern the silicon. This die uses a 32-nanometer technology, which means that the
smallest features are approximately 32 nm in size, although they are typically somewhat smaller than
the actual feature size, which refers to the size of the transistors as “drawn” versus the final
manufactured size.
Clock cycle: Also called tick, clock tick, clock period, clock, or cycle. The time for
one clock period, usually of the processor clock, which runs at a constant rate
Performance improved by
o Reducing number of clock cycles
o Increasing clock rate
o Hardware designer must often trade off clock rate against cycle count
CPU Time Example: Our favorite program runs in 10 seconds on computer A, which
has a 2 GHz clock. We are trying to help a computer designer build a computer, B,
which with run this program in 6 seconds. The designer has determined that a
substantial increase in the clock rate is possible, but this increase will affect the rest of
the CPU design, causing computer B to require 1.2 time as many clock cycles as
computer A for this program. What clack rate should we tell the designer to target?
o Computer A: 2 GHz clock, 10sec CPU time
o Designing Computer B
Aim for 6sec CPU time
Can do faster clock, but causes 1.2 × clock cycles
o How fast must Computer B clock be?
o Answer: 4GHz
Clock cycles per instruction (CPI): Average number of clock cycles per instruction
for a program or program fragment.
o Instruction Count for a program
Determined by program, ISA and compiler
o Average cycles per instruction
Determined by CPU hardware
If different instructions have different CPI
Average CPI affected by instruction mix
CPI Example: Suppose we have two implementations of the same instruction set
architecture. Computer A has a clock cycle time of 250ps and a CPI of 2.0 for some
program, and computer B has a clock cycle time of 500ps and a CPI of 1.2 for the
same program. Which computer is faster for this program and by how much?
o Computer A: Cycle Time = 250ps, CPI = 2.0
o Computer B: Cycle Time = 500ps, CPI = 1.2
o Same instruction set architecture (ISA)
o Which is faster, and by how much?
B I 600ps 1.2
CPU Time
CPU Time I 500ps
A
o Answer: 1.2
n
Clock Cycles (CPIi Instruction Counti )
i 1
Clock Cycles n
Instruction Counti
CPI CPIi
Instruction Count i1 Instruction Count
CPI Example: A compiler designer is trying to decide between two code sequences
for a particular computer. The hardware designers have supplied the following facts:
Class A B C
For a particular high-level language statement, the compiler writer is considering two
code sequences that required the following instruction counts:
Class A B C
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
Which code sequence executes the most instructions? Which will be faster? What is
the CPI for each sequence?
Performance depends on
o Algorithm: affects IC, possibly CPI
o Programming language: affects IC, CPI
o Compiler: affects IC, CPI
o Instruction set architecture: affects IC, CPI, Clock rate
FIGURE 1.16 Clock rate and Power for Intel x86 microprocessors over eight generations and 25
years. The Pentium 4 made a dramatic jump in clock rate and power but less so in performance. The
Prescott thermal problems led to the abandonment of the Pentium 4 line. The Core 2 line reverts to a
simpler pipeline with lower clock rates and multiple processors per chip. The Core i5 pipelines follow
in its footsteps.
Reducing Power Example: Suppose we developed a new, simpler processor that has
85% of the capacitive load of the more complex older processor. Further, assume that
it has adjustable voltage so that it can reduce voltage 15% compared to processor B,
which results in 15% shrink in frequency. What is the impact on dynamic power?
o Answer: the new processor uses about half (0.52) the power of the old processor
FIGURE 1.17 Growth in processor performance since the mid-1980s. This chart plots performance
relative to the VAX 11/780 as measured by the SPECint benchmarks (see Section 1.10). Prior to the
mid-1980s, processor performance growth was largely technology-driven and averaged about 25% per
year. The increase in growth to about 52% since then is attributable to more advanced architectural and
organizational ideas. The higher annual performance improvement of 52% since the mid-1980s meant
performance was about a factor of seven higher in 2002 than it would have been had it stayed at 25%.
Since 2002, the limits of power, available instruction-level parallelism, and long memory latency have
slowed uniprocessor performance recently, to about 22% per year.
FIGURE 1.18 SPECINTC2006 benchmarks running on a 2.66 GHz Intel Core i7 920. As the
equation on page 35 explains, execution time is the product of the three factors in this table: instruction
count in billions, clocks per instruction (CPI), and clock cycle time in nanoseconds. SPECratio is
simply the reference time, which is supplied by SPEC, divided by the measured execution time. The
single number quoted as SPECINTC2006 is the geometric mean of the SPECratios.
n
n
Execution time ratio
i1
i