0% found this document useful (0 votes)

108 views50 pages

Pipelining, Introduction To Parallel Processing and Operating System

The document discusses pipelining as a technique to improve processor performance by overlapping the execution of instructions, and describes different pipeline stages and designs including two-stage, six-stage, and dealing with branches. It also introduces parallel processing concepts such as symmetric multiprocessors, clusters, and non-uniform memory access systems. Finally, it categorizes different multiple processor organizations including single instruction single data, single instruction multiple data, multiple instruction single data, and multiple instruction multiple data architectures.

Uploaded by

rajkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views50 pages

Pipelining, Introduction To Parallel Processing and Operating System

Uploaded by

rajkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

PIPELINING, INTRODUCTION TO PARALLEL PROCESSING

AND OPERATING SYSTEM

1
PIPELINING
 greater performance can be achieved by taking advantage of
improvements in technology, such as faster circuitry.
 In addition, organizational enhancements to the processor can
improve performance.
 We have already seen some examples of this, such as the use of
multiple registers rather than a single accumulator, and the use
of a cache memory.
 Another approach is Instruction Pipelining

2
PIPELINING
 new inputs are accepted at one end before previously
accepted inputs appear as outputs at the other end
 an instruction has a number of stages.

 Two stage Instruction Pipeline has two stages:

 Fetch
 Execute

3
TWO-STAGE PIPELINING

4
PROBLEMS WITH TWO-STAGE PIPELINE
 The execution time will generally be longer than the
fetch time. Thus, the fetch stage may have to wait for
some time before it can empty its buffer.
 A conditional branch instruction makes the address of
the next instruction to be fetched unknown. Thus, the
fetch stage must wait until it receives the next instruction
address from the execute stage. The execute stage may
then have to wait while the next instruction is fetched.

5
SIX-STAGE PIPELINE
 To gain further speedup, the pipeline must have more stages.
 Fetch instruction (FI): Read the next expected instruction into a
buffer.
 Decode instruction (DI): Determine the opcode and the operand
specifiers.
 Calculate operands (CO): Calculate the effective address of each
source operand. This may involve displacement, register indirect,
indirect, or other forms of address calculation.
 Fetch operands (FO): Fetch each operand from memory.
Operands in registers need not be fetched.
 Execute instruction (EI): Perform the indicated operation and
store the result, if any, in the specified destination operand
location.
 Write operand (WO): Store the result in memory. 6
SIX-STAGE PIPELINE
 With this decomposition, the various stages will be of
more nearly equal duration.
 For the sake of illustration, let us assume equal duration.

 Using this assumption, Figure below shows that a six-

stage pipeline can reduce the execution time for 9
instructions from 54 time units to 14 time units.

7
SIX-STAGE PIPELINE

8
SIX-STAGE PIPELINE
 Factors that limit performance
 Ifthe six stages are not of equal duration, there will be some
waiting involved at various pipeline stages
 Conditional Brach instruction-invalidate several instruction
fetches
 Interrupt

9
BRANCH PENALTY

10
THE LOGIC NEEDED FOR
SIX STAGE
INSTRUCTION PIPELINE

11
DEALING WITH BRANCHES
 A variety of approaches have been taken for dealing with
conditional branches:
 Multiple streams
 Prefetch branch target
 Loop buffer
 Branch prediction
 Delayed branch

12
MULTIPLE STREAMS
 A simple pipeline suffers a penalty for a branch
instruction because it must choose one of two
instructions to fetch next and may make the wrong
choice. A brute-force approach is to replicate the initial
portions of the pipeline and allow the pipeline to fetch
both instructions, making use of two streams.
 There are two problems with this approach:
 With multiple pipelines there are contention delays for access
to the registers and to memory.
 Additional branch instructions may enter the pipeline (either
stream) before the original branch decision is resolved. Each
such instruction needs an additional stream. 13
PREFETCH BRANCH TARGET
 When a conditional branch is recognized, the target of
the branch is prefetched, in addition to the instruction
following the branch.
 This target is then saved until the branch instruction is
executed. If the branch is taken, the target has already
been prefetched.

14
LOOP BUFFER
 A loop buffer is a small, very-high-speed memory
maintained by the instruction fetch stage of the pipeline
and containing the n most recently fetched instructions, in
sequence.
 If a branch is to be taken, the hardware first checks
whether the branch target is within the buffer. If so, the
next instruction is fetched from the buffer
 Benefits:
 Reduces memory access
 useful for the rather common occurrence of IF–THEN and IF–
THEN–ELSE sequences
 well suited to dealing with loops, or iterations; hence the name 15
loop buffer
BRANCH PREDICTION (1)
 Predict never taken
 Assume that jump will not happen
 Always fetch next instruction
 68020 & VAX 11/780
 VAX will not prefetch after branch if a page fault would
result (O/S v CPU design)
 Predict always taken
 Assume that jump will happen
 Always fetch target instruction

16
BRANCH PREDICTION (2)
 Predict by Opcode
 Some instructions are more likely to result in a jump than
others
 Can get up to 75% success

 Taken/Not taken switch

 Based on previous history
 Good for loops

17
BRANCH PREDICTION (3)
 Delayed Branch
 Do not take jump until you have to
 Rearrange instructions

18
BRANCH PREDICTION FLOWCHART

19
PARALLEL PROCESSING
A traditional way to increase system performance is
to use multiple processors that can execute in parallel to
support a given workload.
The two most common multiple-processor
organizations are symmetric multiprocessors (SMPs) and
clusters.
More recently, nonuniform memory access (NUMA)
systems have been introduced commercially.

20
Symmetric multiprocessors
An SMP consists of multiple similar processors within
the same computer, interconnected by a bus or some sort
of switching arrangement.

The most critical problem to address in an SMP is

that of cache coherence.

Each processor has its own cache and so it is possible

for a given line of data to be present in more than one
cache.
21
If such a line is altered in one cache, then both
main memory and the other cache have an invalid
version of that line. Cache coherence protocols are
designed to cope with this problem.

chip multiprocessing :
When more than one processor are implemented on a
single chip, the configuration is referred to as chip
multiprocessing.

22
Multithreaded processor:
A related design scheme is to replicate some of the
components of a single processor so that the processor can
execute multiple threads concurrently; this is known as a
multithreaded processor.

Cluster:
A cluster is a group of interconnected, whole computers
working together as a unified computing resource that can
create the illusion of being one machine.
The term whole computer means a system that can run on
its own, apart from the cluster. 23
Non-uniform memory access

A NUMA system is a shared-memory multiprocessor in
which the access time from a given processor to a word
in memory varies with the location of the memory word.

24
MULTIPLE PROCESSOR ORGANIZATION

I. Single instruction, single data stream - SISD

II. Single instruction, multiple data stream - SIMD
III. Multiple instruction, single data stream - MISD
IV. Multiple instruction, multiple data stream- MIMD

25
SINGLE INSTRUCTION, SINGLE DATA
STREAM - SISD
 Single processor
 Single instruction stream
 Data stored in single memory
 Uni-processor

26
SINGLE INSTRUCTION, MULTIPLE DATA
STREAM - SIMD
 Single machine instruction Controls simultaneous
execution of a Number of processing elements on
Lockstep basis
 Each processing element has associated data memory

 Each instruction executed on different set of data by

different processors
 Vector and array processors

27
MULTIPLE INSTRUCTION, SINGLE
DATA STREAM - MISD
 Sequence of data Transmitted to set of processors
 Each processor executes different instruction sequence

 Never been implemented

28
MULTIPLE INSTRUCTION, MULTIPLE DATA STREAM- MIMD
 Set of processors Simultaneously execute different
instruction sequences on Different sets of data
 SMPs, clusters and NUMA systems

29
TAXONOMY OF PARALLEL PROCESSOR
ARCHITECTURES

30
OPERATING SYSTEM
Operating Systems Definition: The set of software
products that jointly controls the system resources and
processes using these resources on a computer.

It can be thought of as having two objectives:

Convenience: An OS makes a computer more
convenient to use.
Efficiency: An OS allows the computer system resources
to be used in an efficient manner.

31
An operating system (OS) is a software program that
manages the hardware and software resources of a
computer.

The OS performs basic tasks, such as:

 Controlling and allocating memory
 Prioritizing the processing of instructions
 Controlling input and output devices
 Facilitating networking
 Managing files.

32
Layers and Views of a Computer System:

33
Operating System Services:
 Program creation
 Program execution
 Access to I/O devices
 Controlled access to files
 System access
 Error detection and response
 Accounting

34
Kernel:
An operating system generally consists of two parts:
• Kernel space (kernel mode) and
• User space (user mode).
Kernel is the smallest and central component of an
operating system.
Its services include managing memory and devices and
also to provide an interface for software applications to
use the resources.
Additional services such as managing protection of
35
programs and multitasking may be included depending
on architecture of operating system.
There are three broad categories of kernel models
available, namely:
I. Monolithic kernel
II. Microkernel
III. Exokernel

36
Why Operating Systems?
Operating systems provide a layer of abstraction
between the programmer and the machine
It screens the complexities of the computer from the
programmer
It allows the computer to be treated as a virtual machine

37
Operating system types:

Real-time:
A real-time operating system is a multitasking operating
system that aims at executing real-time applications.

 Real-time operating systems often use specialized

scheduling algorithms so that they can achieve a
deterministic nature of behavior.

The main objective of Real-Time operating systems is

their quick and predictable response to events. 38
Multi-user:
A multi-user operating system allows multiple users to
access a computer system concurrently.

Time-sharing system can be classified as multi-user

systems as they enable a multiple user access to a
computer through the sharing of time.

Single-user operating systems, as opposed to a multi-

user operating system, are usable by a single user at a
time.
39
Being able to use multiple accounts on a Windows
operating system does not make it a multi-user system.
For a UNIX-like operating system, it is possible for two
users to login at a time and this capability of the OS
makes it a multi-user operating system.

Below are some examples of multi-user operating

systems.
• Linux
• Unix
• Windows 2000

40
Distributed:
A distributed operating system manages a group of
independent computers and makes them appear to be a
single computer.

The development of networked computers that could

be linked and communicate with each other gave rise to
distributed computing.

Distributed computations are carried out on more than

one machine. When computers in a group work in
cooperation, they make a distributed system. 41
Embedded:
Embedded operating systems are designed to be used
in embedded computer systems.

They are designed to operate on small machines like

PDAs with less autonomy.

They are able to operate with a limited number of

resources.

They are very compact and extremely efficient by

design.
Windows CE and Minix 3 are some examples of 42

embedded operating systems.

CORE I3, CORE I5, CORE I7 — THE
DIFFERENCE IN A NUTSHELL
 Core i7s are better than Core i5s, which are in turn better
than Core i3s.
 Core i7 does not have seven cores nor does Core i3 have
three cores.
 The numbers are simply indicative of their relative
processing powers.

43
RELATIVE PROCESSING POWER
 Determined:
 number of cores,
 clock speed (in GHz), size of cache,
 Intel technologies like Turbo Boost and Hyper-Threading

44
NUMBER OF CORES
 The more cores there are, the more tasks (known as
threads) can be served at the same time.
 The lowest number of cores can be found in Core i3
CPUs, i.e., which have only two cores. Currently, all
Core i3s are dual-core processors

45
INTEL TURBO BOOST
 The Intel Turbo Boost Technology allows a processor to
dynamically increase its clock speed whenever the need
arises.
 The maximum amount that Turbo Boost can raise clock
speed at any given time is dependent on the number of
active cores, the estimated current consumption, the
estimated power consumption, and the processor
temperature.
 Because none of the Core i3 CPUs have Turbo Boost, the
i5-4570T can outrun them whenever it needs to.

46
CACHE SIZE
 Whenever the CPU finds that it keeps on using the same
data over and over, it stores that data in its cache
 with a larger cache, more data can be accessed quickly.

 The Haswell (fourth generation) Core i3 processors have

either 3MB or 4MB of cache.
 The Haswell Core i5s have either 4MB or 6MB of cache.

 Finally, all Core i7 CPUs have 8MB of cache, except for

i7-4770R, which has 6MB.
 This is clearly one reason why an i7 outperforms an i5
— and why an i5 outperforms an i3.
47
HYPER-THREADING
 Strictly speaking, only one thread can be served by one core at a
time.
 So if a CPU is a dual core, then supposedly only two threads can
be served simultaneously.
 However, Intel has a technology called Hyper-Threading. This
enables a single core to serve multiple threads.
 For instance, a Core i3, which is only a dual core, can actually
serve two threads per core. In other words, a total of four threads
can run simultaneously.
 Thus, even if Core i5 processors are quad cores, since they don’t
support Hyper-Threading (again, except the i5-4570T) the
number of threads they can serve at the same time is just about
48
equal to those of their Core i3 counterparts
HYPER-THREADING
 Core i7s Not only are they quad cores, they also support
Hyper-Threading.
 Thus, a total of eight threads can run on them at the same
time.
 Combine that with 8MB of cache and Intel Turbo Boost
Technology, which all of them have, and you’ll see what
sets the Core i7 apart from its siblings.

49
The End!
Thank You So Much!

Onur Ddca 2025 Lecture15b Branch Prediction Beforelecture
No ratings yet
Onur Ddca 2025 Lecture15b Branch Prediction Beforelecture
188 pages
Robin Milner A Calculus of Communicating Systems 1980
No ratings yet
Robin Milner A Calculus of Communicating Systems 1980
176 pages
William Stallings Computer Organization and Architecture: CPU Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture: CPU Structure and Function
22 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
Week 02
No ratings yet
Week 02
31 pages
20-Unit 7-22-04-2024
No ratings yet
20-Unit 7-22-04-2024
97 pages
Comparch 06 Advanced Concepts
No ratings yet
Comparch 06 Advanced Concepts
37 pages
Catalog
No ratings yet
Catalog
67 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
26 pages
Chapter 5
No ratings yet
Chapter 5
38 pages
06 Microprocessor Systems Lecture No 06 and 07 General Program Concept
No ratings yet
06 Microprocessor Systems Lecture No 06 and 07 General Program Concept
29 pages
Threads Programming: Refs: Chapter 23
No ratings yet
Threads Programming: Refs: Chapter 23
36 pages
Campmc Unit Ii
No ratings yet
Campmc Unit Ii
61 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
Pords12 Lec03 Global States
No ratings yet
Pords12 Lec03 Global States
39 pages
Unit 5
No ratings yet
Unit 5
36 pages
Moduel 5
No ratings yet
Moduel 5
46 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
Memory
No ratings yet
Memory
87 pages
Operating System
No ratings yet
Operating System
67 pages
CH 23
No ratings yet
CH 23
126 pages
Computer Architecture 1
No ratings yet
Computer Architecture 1
37 pages
Slides For Chapter 5: Distributed Objects and Remote Invocation
No ratings yet
Slides For Chapter 5: Distributed Objects and Remote Invocation
18 pages
Chapter 2 Computer Arithmetic
No ratings yet
Chapter 2 Computer Arithmetic
56 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Unit 5 - Pipeling and Multipoessors
No ratings yet
Unit 5 - Pipeling and Multipoessors
74 pages
CS3451 Os
No ratings yet
CS3451 Os
1 page
Computer Architecture - 2marks: 1) What Is The Need For Speculation? (NOV/DEC 2014)
No ratings yet
Computer Architecture - 2marks: 1) What Is The Need For Speculation? (NOV/DEC 2014)
11 pages
Implementing Condition Variables With Semaphores: Andrew D. Birrell Microsoft Research-Silicon Valley January 2003
No ratings yet
Implementing Condition Variables With Semaphores: Andrew D. Birrell Microsoft Research-Silicon Valley January 2003
8 pages
ACA Question Bank
No ratings yet
ACA Question Bank
19 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Basic Computer Organization and Design
No ratings yet
Basic Computer Organization and Design
46 pages
Classic Concurrency Problems
No ratings yet
Classic Concurrency Problems
5 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Uploads Notes Btech 4sem It OS End Term Sample Paper
No ratings yet
Uploads Notes Btech 4sem It OS End Term Sample Paper
9 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Lec#4 R
No ratings yet
Lec#4 R
59 pages
Computer Architecture Simd Vector Gpu
No ratings yet
Computer Architecture Simd Vector Gpu
16 pages
Chapter 02
No ratings yet
Chapter 02
67 pages
Design of Alternative Warehouse Layout
No ratings yet
Design of Alternative Warehouse Layout
8 pages
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
Unit V
No ratings yet
Unit V
23 pages
Finals Plattech
No ratings yet
Finals Plattech
4 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
A4 版本1 （未使用）
No ratings yet
A4 版本1 （未使用）
2 pages
Questions On Transactions Control
No ratings yet
Questions On Transactions Control
8 pages
CH08 COA9e
No ratings yet
CH08 COA9e
52 pages
Multithreading: The Java Thread Model
No ratings yet
Multithreading: The Java Thread Model
15 pages
(A) FCFS:: Solution: T .A.T Copletion Time-Arrival Time
No ratings yet
(A) FCFS:: Solution: T .A.T Copletion Time-Arrival Time
4 pages
Assignment#3
No ratings yet
Assignment#3
14 pages
Unit-3 Os Deadlock
No ratings yet
Unit-3 Os Deadlock
12 pages
Exam Summary 2INC0
No ratings yet
Exam Summary 2INC0
5 pages
Computer Architecture Assignment Group 3
No ratings yet
Computer Architecture Assignment Group 3
15 pages
Canvas Pipelining and Parallel Processors
No ratings yet
Canvas Pipelining and Parallel Processors
5 pages
Branch Handling 1
No ratings yet
Branch Handling 1
50 pages
Toward Using Higher-Level Abstractions To Teach Parallel Computing
No ratings yet
Toward Using Higher-Level Abstractions To Teach Parallel Computing
6 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
Research and Practice of Swoole Asynchronous Multithreading Design Method
No ratings yet
Research and Practice of Swoole Asynchronous Multithreading Design Method
7 pages
Preemptive
No ratings yet
Preemptive
6 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
pdc2: MODULE2
No ratings yet
pdc2: MODULE2
113 pages
Call For Paper
No ratings yet
Call For Paper
1 page
Ch#22 TRANSACTION - MANAGEMENT
No ratings yet
Ch#22 TRANSACTION - MANAGEMENT
80 pages
Cs Intro Os
No ratings yet
Cs Intro Os
58 pages
cs8603 Syllabus
No ratings yet
cs8603 Syllabus
2 pages
Assignment 1
No ratings yet
Assignment 1
1 page
Computer Operating System Assignment 2
No ratings yet
Computer Operating System Assignment 2
5 pages
Lec2 ParallelProgrammingPlatforms
No ratings yet
Lec2 ParallelProgrammingPlatforms
26 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
32 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Module 1
No ratings yet
Module 1
41 pages
Postgraduate PG - Master Computer Applications Mca - Semester 3 - 2024 - May - Cloud Computing 2020 Pattern
No ratings yet
Postgraduate PG - Master Computer Applications Mca - Semester 3 - 2024 - May - Cloud Computing 2020 Pattern
2 pages
Two Marks: 1. Operating System
No ratings yet
Two Marks: 1. Operating System
11 pages
CA Final PDF
No ratings yet
CA Final PDF
13 pages
Operating Systems Overview
No ratings yet
Operating Systems Overview
37 pages
OS System Call 15
No ratings yet
OS System Call 15
9 pages
Unit 2
No ratings yet
Unit 2
48 pages
CP0158 12-Jan-2012 RM01
No ratings yet
CP0158 12-Jan-2012 RM01
78 pages
Athentic
No ratings yet
Athentic
90 pages
Chapter02 OSedition7Final
No ratings yet
Chapter02 OSedition7Final
81 pages
CMSC 125 Operating Systems 1. Operating System Objectives and Functions
No ratings yet
CMSC 125 Operating Systems 1. Operating System Objectives and Functions
9 pages
Operating Systems 1. Operating System Objectives and Functions
No ratings yet
Operating Systems 1. Operating System Objectives and Functions
9 pages
Ca Unit 4 Prabu
No ratings yet
Ca Unit 4 Prabu
24 pages
Study Guide (1) 2
No ratings yet
Study Guide (1) 2
18 pages
Platform Technologies Module 3
No ratings yet
Platform Technologies Module 3
54 pages
OS IT-1 Answer Key
No ratings yet
OS IT-1 Answer Key
15 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Ch.1 - Introduction: Study Guide To Accompany Operating Systems Concepts 9
No ratings yet
Ch.1 - Introduction: Study Guide To Accompany Operating Systems Concepts 9
7 pages
What Is EJB
No ratings yet
What Is EJB
24 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Pipelining, Introduction To Parallel Processing and Operating System

Uploaded by

Pipelining, Introduction To Parallel Processing and Operating System

Uploaded by

PIPELINING, INTRODUCTION TO PARALLEL PROCESSING

AND OPERATING SYSTEM

 Two stage Instruction Pipeline has two stages:

 Using this assumption, Figure below shows that a six-

 Taken/Not taken switch

The most critical problem to address in an SMP is

Each processor has its own cache and so it is possible

I. Single instruction, single data stream - SISD

 Each instruction executed on different set of data by

 Never been implemented

It can be thought of as having two objectives:

The OS performs basic tasks, such as:

 Real-time operating systems often use specialized

The main objective of Real-Time operating systems is

Time-sharing system can be classified as multi-user

Single-user operating systems, as opposed to a multi-

Below are some examples of multi-user operating

The development of networked computers that could

Distributed computations are carried out on more than

They are designed to operate on small machines like

They are able to operate with a limited number of

They are very compact and extremely efficient by

embedded operating systems.

 The Haswell (fourth generation) Core i3 processors have

 Finally, all Core i7 CPUs have 8MB of cache, except for

You might also like