0% found this document useful (0 votes)
180 views37 pages

Parallel Computer Models: CSE7002: Advanced Computer Architecture

This document describes a chapter on parallel computer models from the textbook "Advanced Computer Architecture - Parallelism, Scalability, Programmability". The chapter discusses the state of computing and milestones in computer development from the abacus to modern parallel and scalable architectures. It covers parallel computer models including multiprocessors, multicomputers, vector/SIMD machines, and architectural development tracks. Performance attributes discussed include instruction count, clock rate, cycles per instruction, memory access time, and how they impact MIPS rate and throughput.

Uploaded by

Abhishek singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
180 views37 pages

Parallel Computer Models: CSE7002: Advanced Computer Architecture

This document describes a chapter on parallel computer models from the textbook "Advanced Computer Architecture - Parallelism, Scalability, Programmability". The chapter discusses the state of computing and milestones in computer development from the abacus to modern parallel and scalable architectures. It covers parallel computer models including multiprocessors, multicomputers, vector/SIMD machines, and architectural development tracks. Performance attributes discussed include instruction count, clock rate, cycles per instruction, memory access time, and how they impact MIPS rate and throughput.

Uploaded by

Abhishek singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

CSE7002: Advanced Computer Architecture

Chapter 1
Parallel Computer Models
Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani

Amit Chaurasia
Assistant Professor, CEA
GLA University
[email protected]

Amit Chaurasia, Assistant Professor, CEA, GLA University 1


What is this course all about
and
Why study this course?

Amit Chaurasia, Assistant Professor, CEA, GLA University 2


In this chapter…

• THE STATE OF COMPUTING


• MULTIPROCESSORS AND MULTICOMPUTERS
• MULTIVECTOR AND SIMD COMPUTERS
• PRAM AND VLSI MODELS
• ARCHITECTURAL DEVELOPMENT TRACKS

Amit Chaurasia, Assistant Professor, CEA, GLA University 3


THE STATE OF COMPUTING
Computer Development Milestones
• How it all started…
o 500 BC: Abacus (China) – The earliest mechanical computer/calculating device.
• Operated to perform decimal arithmetic with carry propagation digit by digit
o 1642: Mechanical Adder/Subtractor (Blaise Pascal)
o 1827: Difference Engine (Charles Babbage)
o 1941: First binary mechanical computer (Konrad Zuse; Germany)
o 1944: Harvard Mark I (IBM)
• The very first electromechanical decimal computer as proposed by Howard Aiken
• Computer Generations
o 1st 2nd 3rd 4th 5th
o Division into generations marked primarily by changes in hardware and software
technologies

Amit Chaurasia, Assistant Professor, CEA, GLA University 4


THE STATE OF COMPUTING
Computer Development Milestones
• First Generation (1945 – 54)
o Technology & Architecture:
• Vacuum Tubes
• Relay Memories
• CPU driven by PC and accumulator
• Fixed Point Arithmetic
o Software and Applications:
• Machine/Assembly Languages
• Single user
• No subroutine linkage
• Programmed I/O using CPU
o Representative Systems: ENIAC, Princeton IAS, IBM 701

Amit Chaurasia, Assistant Professor, CEA, GLA University 5


THE STATE OF COMPUTING
Computer Development Milestones
• Second Generation (1955 – 64)
o Technology & Architecture:
• Discrete Transistors
• Core Memories
• Floating Point Arithmetic
• I/O Processors
• Multiplexed memory access
o Software and Applications:
• High level languages used with compilers
• Subroutine libraries
• Processing Monitor
o Representative Systems: IBM 7090, CDC 1604, Univac LARC

Amit Chaurasia, Assistant Professor, CEA, GLA University 6


THE STATE OF COMPUTING
Computer Development Milestones
• Third Generation (1965 – 74)
o Technology & Architecture:
• IC Chips (SSI/MSI)
• Microprogramming
• Pipelining
• Cache
• Look-ahead processors
o Software and Applications:
• Multiprogramming and Timesharing OS
• Multiuser applications
o Representative Systems: IBM 360/370, CDC 6600, T1-ASC, PDP-8

Amit Chaurasia, Assistant Professor, CEA, GLA University 7


THE STATE OF COMPUTING
Computer Development Milestones
• Fourth Generation (1975 – 90)
o Technology & Architecture:
• LSI/VLSI
• Semiconductor memories
• Multiprocessors
• Multi-computers
• Vector supercomputers
o Software and Applications:
• Multiprocessor OS
• Languages, Compilers and environment for parallel processing
o Representative Systems: VAX 9000, Cray X-MP, IBM 3090

Amit Chaurasia, Assistant Professor, CEA, GLA University 8


THE STATE OF COMPUTING
Computer Development Milestones
• Fifth Generation (1991 onwards)
o Technology & Architecture:
• Advanced VLSI processors
• Scalable Architectures
• Superscalar processors
o Software and Applications:
• Systems on a chip
• Massively parallel processing
• Grand challenge applications
• Heterogeneous processing
o Representative Systems: S-81, IBM ES/9000, Intel Paragon, nCUBE 6480, MPP, VPP500

Amit Chaurasia, Assistant Professor, CEA, GLA University 10


9
THE STATE OF COMPUTING
Elements of Modern Computers
• Computing Problems
• Algorithms and Data Structures
• Hardware Resources
• Operating System
• System Software Support
• Compiler Support

Amit Chaurasia, Assistant Professor, CEA, GLA University 10


THE STATE OF COMPUTING
Evolution of Computer Architecture
• The study of computer architecture involves both the following:
o Hardware organization
o Programming/software requirements
• The evolution of computer architecture is believed to have started
with von Neumann architecture
o Built as a sequential machine
o Executing scalar data
• Major leaps in this context came as…
o Look-ahead, parallelism and pipelining
o Flynn’s classification
o Parallel/Vector Computers
o Development Layers

Amit Chaurasia, Assistant Professor, CEA, GLA University 11


THE STATE OF COMPUTING
Evolution of Computer Architecture

SumitChaurasia,
Amit Mittu, Assistant Professor,
Assistant CSE/IT,
Professor, Lovely
CEA, GLAProfessional
University University 13
12
THE STATE OF COMPUTING
Evolution of Computer Architecture

Amit Chaurasia, Assistant Professor, CEA, GLA University 13


THE STATE OF COMPUTING
System Attributes to Performance
• Machine Capability and Program Behaviour
• Peak Performance
• Turnaround time
• Cycle Time, Clock Rate and Cycles Per Instruction (CPI)
• Performance Factors
o Instruction Count, Average CPI, Cycle Time, Memory Cycle Time and No. of memory cycles
• System Attributes
o Instruction Set Architecture, Compiler Technology, Processor Implementation and control,
Cache and Memory Hierarchy
• MIPS Rate, FLOPS and Throughput Rate
• Programming Environments – Implicit and Explicit Parallelism

Amit Chaurasia, Assistant Professor, CEA, GLA University 14


THE STATE OF COMPUTING
Evolution of Computer Architecture

SumitChaurasia,
Amit Mittu, Assistant Professor,
Assistant CSE/IT,
Professor, Lovely
CEA, GLAProfessional
University University 16
15
THE STATE OF COMPUTING
System Attributes to Performance
• Cycle Time (processor) 𝜏
• Clock Rate 𝑓 = 1/𝜏
• Average no. of cycles per instruction 𝐶𝑃𝐼
• No. of instructions in program 𝐼𝑐
• CPU Time 𝑇 = 𝐼𝑐 × 𝐶𝑃𝐼 × 𝜏
• Memory Cycle Time 𝑘𝜏
• No. of Processor Cycles 𝑝
• needed No. of Memory Cycles 𝑚
• needed Effective CPU Time 𝑇 = 𝐼𝑐 × (𝑝 + 𝑚 × 𝑘) × 𝜏

Amit Chaurasia, Assistant Professor, CEA, GLA University 16


THE STATE OF COMPUTING
System Attributes to Performance
• MIPS Rate
𝐼𝑐 𝑓 𝑓 ×𝐼𝑐
• 𝜏= = =
T×106 𝐶𝑃𝐼×106 𝐶×106
Throghput Rate
𝑀𝐼𝑃S×106
𝑊𝑝 = =𝑓
𝐼𝑐 𝐶𝑃𝐼×𝐼𝑐

Amit Chaurasia, Assistant Professor, CEA, GLA University 17


THE STATE OF COMPUTING
System Attributes to Performance
• A benchmark program contains 450000 arithmetic
instructions, 320000 data transfer instructions and 230000
control transfer instructions. Each arithmetic instruction takes
1 clock cycle to execute whereas each data transfer and
control transfer instruction takes 2 clock cycles to execute. On a
400 MHz processors, determine:
o Effective no. of cycles per instruction (CPI)
o Instruction execution rate (MIPS rate)
o Execution time for this program

Amit Chaurasia, Assistant Professor, CEA, GLA University 20


18
Consider the execution of an object code with 200,000 instructions on
a 40-MHz processor. The program consists of four major types of instructions. The
instruction mix and the number of cycles (CPI) needed for each instruction type are
given below based on the result of a program trace experiment: Calculate the Average
CPI and MIPS rate

Instruction Type CPI Instruction mix

Arithmetic and logic 1 60%


Load/store with cache hit 2 18%
Branch 4 12%
Memory reference with cache
8 10%
miss

Amit Chaurasia, Assistant Professor, CEA, GLA University 19


THE STATE OF COMPUTING
System Attributes to Performance
Performance Factors
Instruction Average Cycles per Instruction (CPI) Processor Cycle
Count (Ic) Time (𝜏)
Processor Memory Memory Access
Cycles per References per Latency (k)
System Instruction Instruction (m)
Attributes (CPI and p)
Instruction-set
Architecture
Compiler
Technology
Processor
Implementation
and Control
Cache and
Memory
Hierarchy
Amit Chaurasia, Assistant Professor, CEA, GLA University 20
Multiprocessors and Multicomputers
• Shared Memory Multiprocessors
oThe UMA Model o
The NUMA Model o
The COMA Model
o The CC-NUMA Model
• Distributed-Memory Multicomputers
o The NORMA Machines
o Message Passing multicomputers
• Taxonomy of MIMD Computers
• Representative Systems
o Multiprocessors: BBN TC-200, MPP, S-81, IBM ES/9000 Model 900/VF,
o Multicomputers: Intel Paragon XP/S, nCUBE/2 6480, SuperNode 1000, CM5, KSR-1

Amit Chaurasia, Assistant Professor, CEA, GLA University 21


Multiprocessors and Multicomputers

Amit Chaurasia, Assistant Professor, CEA, GLA University 22


Multiprocessors and Multicomputers

Amit Chaurasia, Assistant Professor, CEA, GLA University 23


Multiprocessors and Multicomputers

Amit Chaurasia, Assistant Professor, CEA, GLA University 24


Multiprocessors and Multicomputers

Amit Chaurasia, Assistant Professor, CEA, GLA University 25


Multiprocessors and Multicomputers

Amit Chaurasia, Assistant Professor, CEA, GLA University 26


Multivector and SIMD Computers
• Vector Processors
o Vector Processor Variants
• Vector Supercomputers
• Attached Processors
o Vector Processor Models/Architectures
• Register-to-register architecture
• Memory-to-memory architecture
o Representative Systems:
• Cray-I
• Cray Y-MP (2,4, or 8 processors with 16Gflops peak performance)
• Convex C1, C2, C3 series (C3800 family with 8 processors, 4 GB main memory, 2
Gflops peak performance)
• DEC VAX 9000 (pipeline chaining support)

Amit Chaurasia, Assistant Professor, CEA, GLA University 27


Multivector and SIMD Computers

SumitChaurasia,
Amit Mittu, Assistant Professor,
Assistant CSE/IT,
Professor, Lovely
CEA, GLAProfessional
University University 29
28
Multivector and SIMD Computers
• SIMD Supercomputers
o SIMD Machine Model
• S = < N, C, I, M, R >
• N: No. of PEs in the machine
• C: Set of instructions (scalar/program flow) directly executed by control unit
• I: Set of instructions broadcast by CU to all PEs for parallel execution
• M: Set of masking schemes
• R: Set of data routing functions
o Representative Systems:
• MasPar MP-1 (1024 to 16384 PEs)
• CM-2 (65536 PEs)
• DAP600 Family (up to 4096 PEs)
• Illiac-IV (64 PEs)

Amit Chaurasia, Assistant Professor, CEA, GLA University 30


29
Multivector and SIMD Computers

Amit Chaurasia, Assistant Professor, CEA, GLA University 30


PRAM and VLSI Models

• Parallel Random Access Machines


o Time and Space Complexities
• Time complexity
• Space complexity
• Serial and Parallel complexity
• Deterministic and Non-deterministic algorithm
o PRAM
• Developed by Fortune and Wyllie (1978)
• Objective:
o Modelling idealized parallel computers with zero synchronization or memory
access overhead
• An n-processor PRAM has a globally addressable Memory

Amit Chaurasia, Assistant Professor, CEA, GLA University 31


PRAM and VLSI Models

Amit Chaurasia, Assistant Professor, CEA, GLA University 32


PRAM and VLSI Models
• Parallel Random Access Machines
o PRAM Models
o PRAM Variants
• EREW-PRAM Model
• CREW-PRAM Model
• ERCW-PRAM Model
• CRCW-PRAM Model
o Discrepancy with Physical Models
• Most popular variants: EREW and CRCW
• SIMD machine with shared architecture is closest architecture modelled by PRAM
• PRAM Allows different instructions to be executed on different processors
simultaneously. Thus, PRAM really operates in synchronized MIMD mode with
shared memory

Amit Chaurasia, Assistant Professor, CEA, GLA University 33


PRAM and VLSI Models

• VLSI Complexity
Model
o The 𝑨𝑻𝟐 Model
• Memory Bound
on Chip Area
• I/O Bound on
Volume 𝑨𝑻
• Bisection
Communication
Bound (Cross-
section area)
𝑨𝑻
• Square of this area used as lower
bound

Amit Chaurasia, Assistant Professor, CEA, GLA University 34


Architectural Development Tracks

SumitChaurasia,
Amit Mittu, Assistant Professor,
Assistant CSE/IT,
Professor, Lovely
CEA, GLAProfessional
University University 36
35
Architectural Development Tracks

SumitChaurasia,
Amit Mittu, Assistant Professor,
Assistant CSE/IT,
Professor, Lovely
CEA, GLAProfessional
University University 37
36
Architectural Development Tracks

SumitChaurasia,
Amit Mittu, Assistant Professor,
Assistant CSE/IT,
Professor, Lovely
CEA, GLAProfessional
University University 38
37

You might also like