Module1 - Overview and Computer System
Module1 - Overview and Computer System
Overview: Introduction to
Computer Architecture and
Organization
Architecture & Organization 1
Architecture is those attributes visible to the programmer
Instruction set, number of bits used for data
repeated addition?
Architecture & Organization 2
All Intel x86 family share the same basic
architecture
The IBM System/370 family share the
same basic architecture
This gives code compatibility
At least backwards
Organization differs between different
versions
Structure & Function
Structure is the way in which
components relate to each other
Function is the operation of individual
components as part of the structure
Function
All basic computer functions are:
Data processing–process data in variety of forms
and requirements
Data storage–short and long term data storage for
using instruction.
Functional View
Operations (b) Data
Operations (a) Data movement storage device operation –
device operation read and write
Operation (c)
Processing from/to Operation (d)
storage Processing from storage to I/O
Structure - Top Level - 1
Peripherals Computer
Central Main
Processing Memory
Unit
Computer
Systems
Interconnection
Input
Output
Communication
lines
Structure - Top Level - 2
Central Processing Unit (CPU) – controls
operation of the computer and performs data
processing functions
Main memory – stores data
I/O – moves data between computer and
external environment
System interconnection – provides
communication between CPU, main memory
and I/O
Structure - The CPU - 1
CPU
Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection
Control
Unit
Structure - The CPU - 2
Control unit (CU)–control the operation of
CPU
Arithmetic and logic unit (ALU)-performs data
processing functions
Registers-provides internal storage to the CPU
CPU Interconnection-provides communication
between control unit (CU), ALU and registers
Structure - The Control Unit
Control Unit
CPU
Sequencing
ALU Login
Control
Internal
Unit
Bus
Control Unit
Registers Registers and
Decoders
Control
Memory
Completed 1952
Structure of von Neumann
machine
Von Neumann Architecture
Data and instruction are stored in a
single read-write memory
The contents of this memory is
addressable by location
Execution occurs in a sequential fashion
from one instruction to the next
IAS – details-1
1000 x 40 bit words
Binary number – number word
2 x 20 bit instructions –
instruction word1
IAS – details-2
Set of registers (storage in CPU)
Memory Buffer Register–contains word to be stored in
instruction
Program Counter-contains next instruction to be
More memory
IBM
Punched-card processing equipment
1953 - the 701
IBM’s first stored program computer
Scientific calculations
1955 - the 702
Business applications
Lead to 700/7000 series
Second Generation:
Transistors
Replaced vacuum tubes
Smaller
Cheaper
Less heat dissipation
Solid State device
Made from Silicon (Sand)
Invented 1947 at Bell Labs
William Shockley et al.
Transistor Based Computers
Second generation machines
More complex arithmetic and logic
unit(ALU)and control unit(CU)
Use of high level programming languages
NCR & RCA produced small transistor
machines
IBM 7000
DEC - 1957
Produced PDP-1
Transistors Based Computers
The Third Generation: Integrated Circuits
Microelectronics
Literally - “small electronics”
A computer is made up of gates, memory cells and
interconnections
Data storage-provided by memory cells
Data processing-provided by gates
Data movement-the paths between components that are
used to move data
Control-the paths between components that carry control
singnals
These can be manufactured on a semiconductor
e.g. silicon wafer
Integrated Circuits
Early integrated
circuits- know as small
scale integration (SSI)
Moore’s Law
Increased density of components on chip-refer chart
Gordon Moore – co-founder of Intel
Number of transistors on a chip will double every year-refer
chart
Since 1970’s development has slowed a little
Number of transistors doubles every 18 months
Increasing speed
Increased cost
4 bit design
1974 - 8080
Intel’s first general purpose microprocessor
Pentium
Superscalar technique-multiple instructions executed in
parallel
Pentium Pro
Increased superscalar organization
branch prediction
speculative execution
Pentium Evolution - 3
Pentium II
MMX technology
Pentium III
Additional floating point instructions for 3D graphics
Pentium 4
Note Arabic rather than Roman numerals
Itanium
64 bit
see chapter 15
Itanium 2
Hardware enhancements to increase speed
memory
Cache on chip/processor
Hierarchy of buses
Performance Balance: I/O
Devices
Peripherals with intensive I/O demands-refer chart
Large data throughput demands-refer chart
Processors can handle this I/O process, but the problem
is moving data between processor and devices
Solutions:
Caching
Buffering
Multiple-processor configurations
Performance Balance: I/O
Devices
Key is Balance : Designers
2 factors:
The rate at which performance is changing in the
various technology areas (processor, busses,
memory, peripherals) differs greatly from one type
of element to another
New applications and new peripheral devices
constantly change the nature of demand on the
system in term of typical instruction profile and
the data access patterns
Improvements in Chip
Organization and Architecture
Increase hardware speed of processor
Fundamentally due to shrinking logic gate size
Parallelism
Problems with Clock Speed
and Logic Density
Power
Power density increases with density of logic and clock speed
Dissipating heat
RC delay
Speed at which electrons flow limited by resistance and
Memory latency
Memory speeds lag processor speeds
Solution:
More emphasis on organizational and architectural approaches
Intel Microprocessor
Performance
Approach 1: Increased Cache
Capacity
Typically two or three levels of cache between
processor and main memory (L1, L2, L3)
Chip density increased
More cache memory on chip
Faster cache access
Pentium chip devoted about 10% of chip area
to cache
Pentium 4 devotes about 50%
Approach 2: More Complex
Execution Logic
Enable parallel execution of instructions
Two approaches introduced:
Pipelining
Superscalar
relatively modest
Benefits from cache are reaching limit
Increasing clock rate runs into power dissipation
problem
Some fundamental physical limits are being
reached
New Approach – Multiple
Cores
Multiple processors on single chip
With large shared cache
Within a processor, increase in performance proportional
to square root of increase in complexity
If software can use multiple processors, doubling number
of processors almost doubles performance
So, use two simpler processors on the chip rather than
one more complex processor
With two processors, larger caches are justified
Power consumption of memory logic less than processing logic
Example: IBM POWER4
Two cores based on PowerPC
POWER4 Chip Organization
Module 1
Computer System: Designing and
Understanding Performance
(Book: Computer Organization and Design,3ed, David L. Patterson and
John L. Hannessy, Morgan Kaufmann Publishers)
Introduction
Hardware performance is often key to the effectiveness of an entire
system of hardware and software
For different types of applications, different performance metrics
may by appropriate, and different aspects of a computer systems
may be the most significant in determining overall performance
Understanding how best to measure performance and limitations of
performance is important when selecting a computer system
To understand the issues of assessing performance.
Why a piece of software performs as it does?
Why one instruction set can be implemented to perform better than another?
How some hardware feature affects performance?
Defining Performance - 1
To avoid the potential confusion between the terms increasing and decreasing,
we usually say “improve performance” or “improve execution time”
Measuring Performance
Time – measure of computer performance
Definition of time - Wall-clock time, response time, or elapsed time
CPU execution time (or CPU time)
The time the CPU spends computing for this task and does not include
time spent waiting for I/O or running other programs
User CPU time vs. system CPU time
Clock cycles (e.g., 0.25 ns) vs. clock rate (e.g., 4 GHz)
Different applications are sensitive to different aspects of the performance of
a computer system
Many applications, especially those running on servers, depend as much on
I/O performance and total elapsed time measured by a wall clock is of interest
In some application environments, the user may care about throughput,
response time, or a complex combination of the two (e.g., maximum
throughput with a worst-case response time)
CPU Execution Time
A simple formula relates the most
basic metrics (i.e., clock cycles and
clock cycle time) to CPU time
Improving Performance
Our favorite program runs in 10 seconds on computer A, which has a 4 GHz
clock. If a computer B will run this program in 6 seconds given that computer B
requires 1.2 times as many clock cycles as computer A for this program. What is
computer B’s clock rate?
CPU Time(B)
CPU Time(B) = 1.2 X CPU Clock Cycles(A) / Clock Rate(B)
6s = 1.2 X CPU Clock Cycles(A) / Clock Rate(B)
Clock Rate (B) = 1.2 X 40 X 10*9 cycles / 6 seconds
Clock Rate (B) = 48 X 10*9 cycles / 6 seconds
Clock Rate (B) = 8 X 10*9 cycles / seconds
Clock Rate (B) = 8 GHz
Clock Cycles Per Instruction (CPI)
The execution time must depend on the number of
instructions in a program and the average time per
instruction
CPU clock cycles = Instructions for a program × Average
clock cycles per instruction
execute
CPI can provide one way of comparing two different