0% found this document useful (0 votes)
167 views

Trends in Computer Architecture

ECE 521 is a course on computer design and technology at NC State University. Moore's law states that the number of transistors on integrated circuits doubles approximately every two years. This results in computers becoming smaller, faster, and less expensive over time. However, improvements in bandwidth have exceeded improvements in latency, creating gaps such as the memory wall and power wall. Architects must address these challenges through techniques like multi-core and many-core processors.

Uploaded by

Dileep Karpur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views

Trends in Computer Architecture

ECE 521 is a course on computer design and technology at NC State University. Moore's law states that the number of transistors on integrated circuits doubles approximately every two years. This results in computers becoming smaller, faster, and less expensive over time. However, improvements in bandwidth have exceeded improvements in latency, creating gaps such as the memory wall and power wall. Architects must address these challenges through techniques like multi-core and many-core processors.

Uploaded by

Dileep Karpur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

ECE 521: Computer Design and Technology

ECE463: Advanced Microprocessor Systems Design


http://courses.ncsu.edu/ece521/lec/001/

Technology Trends

Huiyang Zhou
Why technology trends?
• Technology push
– Look back the path leading to where we are
– Understanding the constraints
– Predict what is ahead

2
Moore’s Law
The complexity for minimum component costs has increased at a rate
of roughly a factor of two per year ... Certainly over the short term
this rate can be expected to continue, if not to increase. Over the
longer term, the rate of increase is a bit more uncertain, although
there is no reason to believe it will not remain nearly constant for at
least 10 years. That means by 1975, the number of components per
integrated circuit for minimum cost will be 65,000. I believe that
such a large circuit can be built on a single wafer.
- Gorden Moore
4/19/1965

In a word, smaller and faster transistors each generation (# of


transistors double every N months, 12<N<24). (10 µm in 1971 to
32nm in 2010)
Moore’s Law – Performance Aspect
• Computing performance doubles every 18 months, for
equal cost
– Buy first computer today, then buy a second computer at the
same price 18 months from now: Second computer is twice
as fast
– Moore’s Law speedup equation
• If using months as unit of time…
m
perf (m months from now)  2  perf (now)
18

m
speedup  2 18

• If using years as unit of time…


y
perf (y years from now)  2  perf (now)
1.5

y
speedup  2 1.5
All Hail the Architects!
All Hail the Architects!

Source: The future of uP, ACM Comm. 2011

6
Moore’s Law on clock frequency
Processor Pins
Power Consumption

9
Consequence on power density
Consequence on performance

There is a exponentially growing gap whenever two curves have


different exponential rates, e.g., memory wall, power wall.
Moore’s Law
Multi-core and many-core processors

Challenge: how to code for a large number of cores


(Note that we are in year 2011!)
The future of uP
• A quick summary of the reading assignment

14
Disks: Archaic(Nostalgic) v. Modern(Newfangled)

• CDC Wren I, 1983 • Seagate 373453, 2003


• 3600 RPM • 15000 RPM (4X)
• 0.03 GBytes capacity • 73.4 GBytes (2500X)
• Tracks/Inch: 800 • Tracks/Inch: 64000 (80X)
• Bits/Inch: 9550 • Bits/Inch: 533,000 (60X)
• Three 5.25” platters • Four 2.5” platters
(in 3.5” form factor)
• Bandwidth: • Bandwidth:
0.6 MBytes/sec 86 MBytes/sec (140X)
• Latency: 48.3 ms • Latency: 5.7 ms (8X)
• Cache: none • Cache: 8 MBytes
Latency Lags Bandwidth (for last ~20 years)
10000
• Performance Milestones

1000

Relative
BW Disk
100
Improve
ment

10

(Latency improvement
= Bandwidth improvement) • Disk: 3600, 5400, 7200, 10000,
1
1 10 100
15000 RPM (8x, 143x)
(latency = simple operation w/o contention
Relative Latency Improvement BW = best-case)
Memory: Archaic (Nostalgic) v. Modern (Newfangled)

• 1980 DRAM • 2000 Double Data Rate Synchr.


(asynchronous) (clocked) DRAM
• 0.06 Mbits/chip • 256.00 Mbits/chip (4000X)
• 64,000 xtors, 35 mm2 • 256,000,000 xtors, 204 mm2

• 16-bit data bus per module, 64-bit data bus per
16 pins/chip DIMM, 66 pins/chip (4X)
• 13 Mbytes/sec • 1600 Mbytes/sec (120X)
• Latency: 225 ns • Latency: 52 ns (4X)
• (no block transfer) • Block transfers (page mode)
Latency Lags Bandwidth (last ~20 years)
10000
• Performance Milestones

1000

Relative Memory
BW Disk
100
Improve
ment • Memory Module: 16bit plain
DRAM, Page Mode DRAM, 32b,
10
64b, SDRAM,
DDR SDRAM (4x,120x)
(Latency improvement • Disk: 3600, 5400, 7200, 10000,
1
= Bandwidth improvement) 15000 RPM (8x, 143x)
(latency = simple operation w/o contention
1 10 100 BW = best-case)

Relative Latency Improvement


LANs: Archaic (Nostalgic)v. Modern (Newfangled)

• Ethernet 802.3 • Ethernet 802.3ae


• Year of Standard: 1978 • Year of Standard: 2003
• 10 Mbits/s • 10,000 Mbits/s (1000X)
link speed link speed
• Latency: 3000 msec • Latency: 190 msec (15X)
• Shared media • Switched media
• Coaxial cable • Category 5 copper wire
"Cat 5" is 4 twisted pairs in bundle
Coaxial Cable: Plastic Covering Twisted Pair:
Braided outer conductor
Insulator
Copper core Copper, 1mm thick,
twisted to avoid antenna effect
Latency Lags Bandwidth (last ~20 years)
10000
• Performance Milestones

1000
Network
Relative Memory
Disk
• Ethernet: 10Mb, 100Mb,
BW
Improve
100 1000Mb, 10000 Mb/s (16x,1000x)
ment • Memory Module: 16bit plain
DRAM, Page Mode DRAM, 32b,
10 64b, SDRAM,
DDR SDRAM (4x,120x)
(Latency improvement
= Bandwidth improvement) • Disk: 3600, 5400, 7200, 10000,
1
15000 RPM (8x, 143x)
1 10 100
(latency = simple operation w/o contention
Relative Latency Improvement BW = best-case)
CPUs: Archaic (Nostalgic) v. Modern (Newfangled)

• 1982 Intel 80286 • 2001 Intel Pentium 4


• 12.5 MHz • 1500 MHz (120X)
• 2 MIPS (peak) • 4500 MIPS (peak) (2250X)
• Latency 320 ns • Latency 15 ns (20X)
• 134,000 xtors, 47 mm2 • 42,000,000 xtors, 217 mm2
• 16-bit data bus, 68 pins • 64-bit data bus, 423 pins
• Microcode interpreter, • 3-way superscalar,
separate FPU chip Dynamic translate to RISC,
• (no caches) Superpipelined (22 stage),
Out-of-Order execution
• On-chip 8KB Data caches,
96KB Instr. Trace cache,
256KB L2 cache
Latency Lags Bandwidth (last ~20 years)

10000 • Performance Milestones


CPU high,
Memory low
Processor • Processor: ‘286, ‘386, ‘486,
(“Memory Pentium, Pentium Pro, Pentium
Wall”) 1000 4 (21x,2250x)
Network
• Ethernet: 10Mb, 100Mb,
Relative Memory
BW Disk 1000Mb, 10000 Mb/s (16x,1000x)
100
Improve
ment
• Memory Module: 16bit plain
DRAM, Page Mode DRAM, 32b,
10
64b, SDRAM,
DDR SDRAM (4x,120x)
(Latency improvement • Disk : 3600, 5400, 7200, 10000,
1
= Bandwidth improvement) 15000 RPM (8x, 143x)
1 10 100
Relative Latency Improvement
Rule of Thumb for Latency Lagging BW

• In the time that bandwidth doubles, latency


improves by no more than a factor of 1.2 to 1.4
(and capacity improves faster than bandwidth)

• Stated alternatively:
Bandwidth improves by more than the square of the
improvement in Latency
Computer Technology - Dramatic Change!

• Processor
– 2X in speed every 1.5 years (since ‘85);
– 100X performance in last decade.
• Memory
– DRAM capacity: 2X / 2 years (since ‘96);
– 64X size improvement in last decade
– Only 3X in speed in the last decade
• Disk
– Capacity: 2X / 1 year (since ‘97)
– 250X size in last decade
– Only 3X in speed in the last decade
What Computer Architecture brings to Table
• Other fields often borrow ideas from architecture
• Quantitative Principles of Design
1. Take Advantage of Parallelism
2. Principle of Locality
3. Focus on the Common Case
4. Amdahl’s Law
5. The Processor Performance Equation
• Careful, quantitative comparisons
– Define, quantity, and summarize relative performance
– Define and quantity relative cost
– Define and quantity dependability
– Define and quantity power
• Culture of anticipating and exploiting advances in
technology
• Culture of well-defined interfaces that are carefully
implemented and thoroughly checked
• A research area with high impact
Computer Science/Engineering at a Crossroads
• Old CW: Uniprocessor performance 2X / 1.5 yrs
• New CW: Power Wall + ILP Wall + Memory Wall = Brick
Wall
– Uniprocessor performance now 2X / 5(?) yrs
 Sea change in chip design: multiple “cores”
(2X processors per chip / ~ 2 years)
• More simpler processors are more power efficient
• The Free (performance) Lunch is over: A Fundamental
Turn Toward Concurrency in Software
– The biggest sea change in software development since the
OO revolution is knocking at the door, and its name is
Concurrency (by Herb Sutter)
Problems with Sea Change (D. Patterson)

• Algorithms, Programming Languages, Compilers, Operating


Systems, Architectures, Libraries, … not ready to supply
Thread Level Parallelism or Data Level Parallelism for 1000
CPUs / chip,
• Architectures not ready for 1000 CPUs / chip
• Unlike Instruction Level Parallelism, cannot be solved by just by
computer architects and compiler writers alone, but also cannot
be solved without participation of computer architects
• Modern GPUs run hundreds or thousands threads / chip
• Shifts from Instruction Level Parallelism to Thread Level
Parallelism / Data Level Parallelism
Cost of IC die
defects

Die tests Cost of die  Cost of testing die  Cost of packaging


take 5 to 90 Cost of IC 
sec on Final test yield
average
Cost of wafer
Cost of die  wafer
Dice per wafer  Die yield

  (wafer diameter / 2)2   wafer diameter second term compensates for


Dice per wafer   “square peg in round hole”
Die area 2  Die area


 Defects per unit area  Die area    Number of masking levels (i.e., complexity )
Die yield  wafer yield  1  
  
 3 to 5 (today), 2 (simple MOS process)

measure of randomness
Defects per unit area 
typically 0.6 to 1.2

example:

wafer yield = 90%


Bottom line:
2
1 defect/cm
Cost of die  (Die area ) 4
(for  = 3) die area = 1 cm 2

Die yield  Die size-1 die yield = 0 . 90  (1  (1  1) / 3 ) 3 .0  38 %
138 potential dice yields only 52
59 yields 22
Defects Per Die
Why technology trends?
• Technology push
– Look back the path leading to where we are
• Moore’s law
– Smaller, quicker, and more power efficient transistors
– Higher performance (2x per 18-24 months)
– Understanding the constraints
• Smaller transistors
• Reaching the end of voltage scaling => Energy/power
consumption becomes THE limiting factor
• Memory wall remains but does not widen much
– Predict what is ahead (near future)
• Multi-core + customization (why?)
• Even more emphasis on data locality

You might also like