0% found this document useful (0 votes)
88 views71 pages

Computer Evolution & Performance

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 71


Computer Evolution &


 An overview of the evolution of computer technology.
 Von Neuman (IAS) machine.
 Understand the key performance issues relating to computer
 Present an overview of the evolution of the x86 architecture.
 The issues in computer performance assessment.

• The 1st generation

Evolution of • The 2nd generation
Computer • The 3rd generation

Technology • Evolution of Intel processors


The First Generation

Vacuum tube

 The ENIAC (Electronic Numerical Integrator And Computer),

designed and constructed at the University of Pennsylvania:
The world’s first general purpose electronic digital computer.

 Weighing 30 tons, occupying 1500 square feet of floor space,

and containing more than 18,000 vacuum tubes. When
operating, it consumed 140 kilowatts of power. It was capable
of 5000 additions per second.

 The major drawback of the ENIAC was that it had to be programmed

manually by setting switches, plugging and unplugging cables

The First Generation


 In 1946, von Neumann and his colleagues began the design

of a new stored-program computer, referred to as the IAS
computer, at the Princeton Institute for Advanced Studies.

 Although not completed until 1952, is the prototype of all

subsequent general-purpose computers.

The First Generation

The Structure of IAS computer:

The IAS registers

 The control unit operates the IAS by fetching instructions from

memory and executing them one at a time

 Both the control unit and the ALU contain storage locations,
called registers,

The IAS registers

 Memory buffer register (MBR): word to be stored in memory or
sent to the I/O unit, or is used to receive a word from memory or
from the I/O unit.
 Memory address register (MAR): Specifies the address in memory
of the word to be written from or read into the MBR.
 Instruction register (IR): Contains the 8-bit opcode instruction
being executed.
 Instruction buffer register (IBR): Employed to hold temporarily the
right-hand instruction from a word in memory.
 Program counter (PC): Contains the address of the next
instruction pair to be fetched from memory.
 Accumulator (AC) and multiplier quotient (MQ):

The IAS instruction set example


The First Generation

 The UNIVAC I (Universal Automatic Computer) (1950) was the first
successful commercial computer. It was intended for both scientific
and commercial applications.
 The UNIVAC II, which had greater memory capacity and higher
performance than the UNIVAC I, was delivered in the late 1950s.
 The IBM 701 (1953), which was delivered by IBM - the first electronic
stored-program computer (punched-card processing equipment),
which intended primarily for scientific applications.
 The IBM 702 (1955) which had a number of hardware features that
suited it to business applications.

The Second Generation


The second generation:

 More complex arithmetic and logic units and control units,

the use of high-level programming languages, and
the provision of system software with the computer.

The Second Generation

 PDP-1 (1957), delivered by DEC

 IBM 7094

The Third Generation


The Third Generation


 The industry’s first planned family of computers which covered a

wide range of performance and cost.

The Third Generation


 Low cost: $16,000 in comparison with $100,000 of IBM

system/360 series,

 Small size: Another manufacturer purchase a PDP-8 and

integrate it into a total system for resale

Later Generations

Processor Fabrication Process


The growth of transistors count


The Moore’s Law

 The number of transistors that could be put on the same chip
area was doubling every 18-24 months
 Consequences:
 Processor speed →
 Energy for the operation of processor →
 Memory capacity →


1. The cost of a chip has remained virtually unchanged with the

rapid growth in density →

2. Logic and memory elements are placed closer together on

more densely packed chips →

3. The computer size →

4. The interconnections on the integrated circuit →


Memory Wall

According to the Moore’s


 Instructions/second: 2x
every 18-24 months

 Memory capacity: 2x
every 18-24 months

 Memory performance:
1.1x every 18-24 months

The Microprocessors
 In 1971, Intel developed 4004: the first chip to contain all of the
components of a CPU on a single chip. The 4004 can add two
4-bit numbers and can multiply only by repeated addition.

 In 1972, Intel developed 8008. This was the first 8-bit

microprocessor and was almost twice as complex as the 4004.

 In 1974, Intel developed 8080 (8-bit), which was designed to be

the CPU of a general-purpose microcomputer.

 By the end of 70s, general-purpose 16-bit microprocessors

appeared. One of these was the 8086.

Evolution of Intel Processors


Evolution of Intel Processors


Evolution of Intel Processors


Evolution of Intel Processors

Recent processors

• Microprocessor speed
• Performance balance
Designing for • Multicore, MICS and GPGPUS

performance • Evolution of Intel x86 Architecture


Microprocessor speed Techniques

 Pipelining
 Branch prediction
 Speculative execution
 Data flow analysis

Simultaneously work on multiple instructions.

Other techiques
 Branch prediction: Looks ahead in the instruction code
fetched from memory and predicts which branches, or groups
of instructions, are likely to be processed next
 Speculative execution: Using branch prediction and data flow
analysis, some processors speculatively execute instructions
ahead of their actual appearance in the program execution,
holding the results in temporary locations.
 Data flow analysis: The processor analyzes which instructions
are dependent on each other’s results, or data, to create an

optimized schedule of instructions.


Performance balance
Processor → Memory: (p.39)
 Increase the number of bits that are retrieved at one time by
making DRAMs “wider” → wider data bus.
 Reduce the frequency of memory access by incorporating
increasingly complex and efficient cache structures between the
processor and main memory, including the incorporation of one
or more caches on the processor chip as well as on an off-chip
cache close to the processor chip.
 Increase the interconnect bandwidth between processors and
memory by using higher-speed buses.

Performance balance
Handling of IO devices (bps)

Performance balance
Improvements in Chip Organization and Architecture (p.41):
 To increase the hardware speed of the processor:
Shrinking the size of the logic gates on the processor chip (process of
fabrication)→ the propagation time for signals is significantly reduced
→ speeding up of the processor.
An increase in clock rate → individual operations are executed more

Performance balance
 Increase the size and speed of caches between the processor and
main memory (In particular, by dedicating a portion of the processor
chip itself to the cache) → cache access times drop significantly.

 Make changes to the processor organization and architecture that

increase the effective speed of instruction execution. Typically, this
involves using parallelism in one form or another.

Performance balance
Processor trends

Multicore, MICS, GPUS

Improvements in Chip Organization and Architecture :
 Increase the hardware speed (clock speed) of the processor:
 Increase heat dissipation (w/cm2)
 RC delay
 Memory latency

 The use of multiple processors on the same chip, also referred to as

multiple cores, or multicore, provides the potential to increase
performance without increasing the clock rate.
 Chip manufacturers are now in the process of making a huge leap
forward in the number of cores per chip (more than 50).
 The leap in performance as well as the challenges in developing
software to exploit such a large number of cores have led to the
introduction of a new term: many integrated core (MIC)

Evolution of Intel x86 Architecture

Two processor families:

 Intel x86: the sophisticated design principles once found on

mainframes, supercomputers and serves (CISC – Complex
Instruction Set Computers),

 The ARM architecture is used in a wide variety of embedded

systems and is one of the most powerful and best-designed
RISC-based systems on the market (RISC - Reduced Instruction
Set Computers),

Topic for students to do research and present.

Evolution of Intel x86 Architecture

 8080 (8-bit) The world’s first general-purpose microprocessor. The 8080
was used in the first personal computer, the Altair.

 8086 (16-bit) sported an instruction cache, or queue, that prefetches a

few instructions before they are executed. A variant of this processor, the
8088, was used in IBM’s first personal computer The 8086 is the first
appearance of the x86 architecture.

 80286: This extension of the 8086 enabled addressing a 16-MByte memory

instead of just 1 MB.

 80386: Intel’s first 32-bit machine. With a 32-bit architecture, the 80386
rivaled the complexity and power of minicomputers and mainframes
introduced just a few years earlier. the first Intel processor to support

Evolution of Intel x86 Architecture

 80486: The 80486 introduced the use of much more sophisticated
and powerful cache technology and sophisticated instruction
pipelining. The 80486 also offered a built-in math coprocessor,
offloading complex math operations from the main CPU.
 Pentium: With the Pentium, Intel introduced the use of superscalar
techniques, which allow multiple instructions to execute in parallel.
 Pentium Pro: The Pentium Pro continued the move into superscalar
organization begun with the Pentium, with aggressive use of register
renaming, branch prediction, data flow analysis, and speculative
 Pentium II: The Pentium II incorporated Intel MMX technology, which
is designed specifically to process video, audio, and graphics data

Evolution of Intel x86 Architecture

 Pentium III: The Pentium III incorporates additional floating-point
instructions to support 3D graphics software.

 Pentium 4: The Pentium 4 includes additional floating-point and other

enhancements for multimedia.

 Core: This is the first Intel x86 microprocessor with a dual core,
referring to the implementation of two processors on a single chip.

 Core 2: The Core 2 extends the architecture to 64 bits. The Core 2

Quad provides four processors on a single chip. More recent Core
offerings have up to 10 processors per chip.

Evolution of Intel x86 Architecture

 The x86 provides an excellent illustration of the advances in
computer hardware over the past 30 years. The 1978 8086 was
introduced with a clock speed of 5 MHz and had 29,000 transistors.

 A quad-core Intel Core 2 introduced in 2008 operates at 3 GHz, a

speedup of a factor of 600, and has 820 million transistors, about
28,000 times as many s the 8086.

 The Core 2 is in only a slightly larger package than the 8086 and has
a comparable cost.

• Embedded Systems
Systems and • ARM evolution

Embedded Systems
 A combination of computer hardware and software, and
perhaps additional mechanical or other parts, designed to
perform a dedicated function. In many cases, embedded
systems are part of a larger system or product, as in the case of
an antilock braking system in a car.

Embedded Systems

Embedded Systems

Embedded Systems

ARM Evolution
 A family of RISC-based microprocessors and microcontrollers
designed by ARM Inc., Cambridge, England.

 The company doesn’t make processors but instead designs

microprocessor and multicore architectures and licenses them
to manufacturers.

 ARM chips are high-speed processors that are known for their
small die size and low power requirements.

 It is the most widely used embedded processor architecture

and indeed the most widely used processor architecture of
any kind in the world.

ARM Evolution
 ARM originated by British-based Acorn Computers company.
 ARM1 (1985) was used for internal research and development as well
as being used as a coprocessor in the BBC machine.
 Also in 1985, Acorn released the ARM2, which had greater
functionality and speed within the same physical space.
ARM processors are designed to meet the needs of three system
 Embedded real-time systems: Systems for storage, automotive body
and power-train, industrial, and networking applications
 Application platforms: Devices running open operating systems
including Linux, Palm OS, Symbian OS, and Windows CE in wireless,
consumer entertainment and digital imaging applications
 Secure applications: Smart cards, SIM cards, and payment terminals


In evaluating processor hardware and setting requirements for
new systems, parameters need considering are:

 Performance,

 Cost,

 Size,

 Security,

 Reliability, and

 Power consumption

Processor Power, Speed & Cost

1. Power consumption
Active power

𝑃 ≈ 𝐶. 𝑉 2 . 𝑓. ∝
 C: Capacitance ≈ Chip area

 V: Voltage

 f: Frequency

 ∝: Activity factor

Static power

Active Power example

Active power: A processor can work at different voltage & frequency

V: 0.9 .. 1.5v (0.1v step), f: 1.8 Ghz .. 3Ghz (0.2 Ghz step)

f 1.8 2.0 2.2 … 3Ghz

v 0.9 1.0 1.1 … 1.5v
Suppose the power consumption measured equals 30w at 1.0v, 2Ghz.

What is p for most power-efficient settings

What is p for most highest-performance settings


Fabrication cost
1. Fabrication:
Silicon ingot → blank wafer → pattern wafer → test wafer → test dices →
bond die to package → test packages

2. Fabrication yield:
𝑤𝑜𝑟𝑘𝑖𝑛𝑔 𝑐ℎ𝑖𝑝𝑠
𝐶ℎ𝑖𝑝𝑠 𝑜𝑛 𝑤𝑎𝑓𝑒𝑟𝑠
3. Example:
small vs large vs huge chips per wafer

Benchmark suites
 A benchmark suite consists of a set of programs that represent the
characteristics of programs that run into a particular system. After
running benchmark suites, devices are given scores based on the
time taken to execute them.

 The best-known benchmark suites is the SPEC suite, produced by

Standard Performance Evaluation Corporation

Clock speed and Instructions per sec


 Governs operations performed by a processor, such as fetching an

instruction, decoding the instruction, performing an arithmetic
operation and so on.

 The execution of an instruction involves fetching the instruction from

memory, decoding the various portions of the instruction, loading/
storing data, and performing arithmetic and logical operations. Most
instructions on most processors require multiple clock cycles to

Iron Law of Performance

 CPU time (Execution time)= (number of instructions executed) *

(cycles per instruction (CPI)) * (clock cycle time)

Iron Law Quiz 1

 CPU time (Execution time)= (number of instructions executed) *
(cycles per instruction (CPI)) * (clock cycle time)

 A program executes 3 billion instructions in a processor, the processor

spends 2 cycles on each instruction and is working at 3GHz. What is
the execution time of this program?

Iron Law Quiz 2

 CPU time (Execution time)= (number of instructions executed) *
(cycles per instruction (CPI)) * (clock cycle time)
 A program contains 50 billion instruction whose composition is:
 10 billion branch instructions, CPI=4
 15 billion load instructions, CPI=2
 5 billion store instructions, CPI=3
 20 billion integer-type instructions, CPI=1
 Evaluate the execution time for above program if the system is
clocked at 4 Ghz

Performance Factors and System Attributes


 The processor time T needed to execute a given program:



 p: the number of processor cycles needed to decode and execute the


 m: the number of memory references needed,

 k: is the ratio between memory cycle time and processor cycle time

Performance Factors and System Attributes

The five performance factors (𝑰𝒄 , p, m, k, 𝜏) are influenced by four
system attributes:
• The design of the instruction set (instruction set architecture),
• The compiler technology
• The Processor implementation and
• The Cache and memory hierarchy.

MIPS rate and MFLOPS rate

 A common measure of performance for a processor is the rate at
which instructions are executed, expressed as millions of instructions
per second (MIPS) referred to as the MIPS rate

 Common performance measure deals only with floating-point is

expressed as millions of floating-point operations per second (MFLOPS)

Example 1


 Consider the execution of a program that consists of 2 million
instructions on a 400-MHz processor. The program consists of four major
types of instructions. The instruction mix and the CPI for each
instruction type are given below based on the result of a program
trace experiment:

Example 2
 Assuming the following data, which code sequence will be faster?

Instruction type CPI

A 1
B 2
C 3

Code Instruction count for a type

sequence A B C
1 2 1 2
2 4 1 1

Amdahl’s law
 Amdahl’s law (Gene Amdahl) deals with the potential speedup
of a program using multiple processors compared to a single

 Let T be the total execution time of the program using a single

processor. Then the speedup using a parallel processor with N
processors that fully exploits the parallel portion of the program
is as follows

Amdahl’s law

Amdahl’s law implications

 Consider the following 2 enhancements:

1. Speed up of 20 on 10% of time vs.

2. Speed up of 1.6 on 80% of time


Amdahl’s law Quiz

 Consider the following processor Possible improvements:
which is clocked at 2GHz ❑ Branch CPI change: 4 -> 3
❑ Increase clock frequency:
Instr type % of time CPI
2 → 2.3 GHZ
Add integer 40% 1
Branch 20% 4 ❑ Store CPI 3 → 2
Load 30% 2 Which is best?
Store 10% 3

Amdahl’s law

You might also like