0% found this document useful (0 votes)
108 views50 pages

Pipelining, Introduction To Parallel Processing and Operating System

The document discusses pipelining as a technique to improve processor performance by overlapping the execution of instructions, and describes different pipeline stages and designs including two-stage, six-stage, and dealing with branches. It also introduces parallel processing concepts such as symmetric multiprocessors, clusters, and non-uniform memory access systems. Finally, it categorizes different multiple processor organizations including single instruction single data, single instruction multiple data, multiple instruction single data, and multiple instruction multiple data architectures.

Uploaded by

rajkumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views50 pages

Pipelining, Introduction To Parallel Processing and Operating System

The document discusses pipelining as a technique to improve processor performance by overlapping the execution of instructions, and describes different pipeline stages and designs including two-stage, six-stage, and dealing with branches. It also introduces parallel processing concepts such as symmetric multiprocessors, clusters, and non-uniform memory access systems. Finally, it categorizes different multiple processor organizations including single instruction single data, single instruction multiple data, multiple instruction single data, and multiple instruction multiple data architectures.

Uploaded by

rajkumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

PIPELINING, INTRODUCTION TO PARALLEL PROCESSING

AND OPERATING SYSTEM

1
PIPELINING
 greater performance can be achieved by taking advantage of
improvements in technology, such as faster circuitry.
 In addition, organizational enhancements to the processor can
improve performance.
 We have already seen some examples of this, such as the use of
multiple registers rather than a single accumulator, and the use
of a cache memory.
 Another approach is Instruction Pipelining

2
PIPELINING
 new inputs are accepted at one end before previously
accepted inputs appear as outputs at the other end
 an instruction has a number of stages.

 Two stage Instruction Pipeline has two stages:


 Fetch
 Execute

3
TWO-STAGE PIPELINING

4
PROBLEMS WITH TWO-STAGE PIPELINE
 The execution time will generally be longer than the
fetch time. Thus, the fetch stage may have to wait for
some time before it can empty its buffer.
 A conditional branch instruction makes the address of
the next instruction to be fetched unknown. Thus, the
fetch stage must wait until it receives the next instruction
address from the execute stage. The execute stage may
then have to wait while the next instruction is fetched.

5
SIX-STAGE PIPELINE
 To gain further speedup, the pipeline must have more stages.
 Fetch instruction (FI): Read the next expected instruction into a
buffer.
 Decode instruction (DI): Determine the opcode and the operand
specifiers.
 Calculate operands (CO): Calculate the effective address of each
source operand. This may involve displacement, register indirect,
indirect, or other forms of address calculation.
 Fetch operands (FO): Fetch each operand from memory.
Operands in registers need not be fetched.
 Execute instruction (EI): Perform the indicated operation and
store the result, if any, in the specified destination operand
location.
 Write operand (WO): Store the result in memory. 6
SIX-STAGE PIPELINE
 With this decomposition, the various stages will be of
more nearly equal duration.
 For the sake of illustration, let us assume equal duration.

 Using this assumption, Figure below shows that a six-


stage pipeline can reduce the execution time for 9
instructions from 54 time units to 14 time units.

7
SIX-STAGE PIPELINE

8
SIX-STAGE PIPELINE
 Factors that limit performance
 Ifthe six stages are not of equal duration, there will be some
waiting involved at various pipeline stages
 Conditional Brach instruction-invalidate several instruction
fetches
 Interrupt

9
BRANCH PENALTY

10
THE LOGIC NEEDED FOR
SIX STAGE
INSTRUCTION PIPELINE

11
DEALING WITH BRANCHES
 A variety of approaches have been taken for dealing with
conditional branches:
 Multiple streams
 Prefetch branch target
 Loop buffer
 Branch prediction
 Delayed branch

12
MULTIPLE STREAMS
 A simple pipeline suffers a penalty for a branch
instruction because it must choose one of two
instructions to fetch next and may make the wrong
choice. A brute-force approach is to replicate the initial
portions of the pipeline and allow the pipeline to fetch
both instructions, making use of two streams.
 There are two problems with this approach:
 With multiple pipelines there are contention delays for access
to the registers and to memory.
 Additional branch instructions may enter the pipeline (either
stream) before the original branch decision is resolved. Each
such instruction needs an additional stream. 13
PREFETCH BRANCH TARGET
 When a conditional branch is recognized, the target of
the branch is prefetched, in addition to the instruction
following the branch.
 This target is then saved until the branch instruction is
executed. If the branch is taken, the target has already
been prefetched.

14
LOOP BUFFER
 A loop buffer is a small, very-high-speed memory
maintained by the instruction fetch stage of the pipeline
and containing the n most recently fetched instructions, in
sequence.
 If a branch is to be taken, the hardware first checks
whether the branch target is within the buffer. If so, the
next instruction is fetched from the buffer
 Benefits:
 Reduces memory access
 useful for the rather common occurrence of IF–THEN and IF–
THEN–ELSE sequences
 well suited to dealing with loops, or iterations; hence the name 15
loop buffer
BRANCH PREDICTION (1)
 Predict never taken
 Assume that jump will not happen
 Always fetch next instruction
 68020 & VAX 11/780
 VAX will not prefetch after branch if a page fault would
result (O/S v CPU design)
 Predict always taken
 Assume that jump will happen
 Always fetch target instruction

16
BRANCH PREDICTION (2)
 Predict by Opcode
 Some instructions are more likely to result in a jump than
others
 Can get up to 75% success

 Taken/Not taken switch


 Based on previous history
 Good for loops

17
BRANCH PREDICTION (3)
 Delayed Branch
 Do not take jump until you have to
 Rearrange instructions

18
BRANCH PREDICTION FLOWCHART

19
PARALLEL PROCESSING
A traditional way to increase system performance is
to use multiple processors that can execute in parallel to
support a given workload.
The two most common multiple-processor
organizations are symmetric multiprocessors (SMPs) and
clusters.
More recently, nonuniform memory access (NUMA)
systems have been introduced commercially.

20
Symmetric multiprocessors
An SMP consists of multiple similar processors within
the same computer, interconnected by a bus or some sort
of switching arrangement.

The most critical problem to address in an SMP is


that of cache coherence.

Each processor has its own cache and so it is possible


for a given line of data to be present in more than one
cache.
21
If such a line is altered in one cache, then both
main memory and the other cache have an invalid
version of that line. Cache coherence protocols are
designed to cope with this problem.

chip multiprocessing :
When more than one processor are implemented on a
single chip, the configuration is referred to as chip
multiprocessing.

22
Multithreaded processor:
A related design scheme is to replicate some of the
components of a single processor so that the processor can
execute multiple threads concurrently; this is known as a
multithreaded processor.

Cluster:
A cluster is a group of interconnected, whole computers
working together as a unified computing resource that can
create the illusion of being one machine.
The term whole computer means a system that can run on
its own, apart from the cluster. 23
Non-uniform memory access

A NUMA system is a shared-memory multiprocessor in
which the access time from a given processor to a word
in memory varies with the location of the memory word.

24
MULTIPLE PROCESSOR ORGANIZATION

I. Single instruction, single data stream - SISD


II. Single instruction, multiple data stream - SIMD
III. Multiple instruction, single data stream - MISD
IV. Multiple instruction, multiple data stream- MIMD

25
SINGLE INSTRUCTION, SINGLE DATA
STREAM - SISD
 Single processor
 Single instruction stream
 Data stored in single memory
 Uni-processor

26
SINGLE INSTRUCTION, MULTIPLE DATA
STREAM - SIMD
 Single machine instruction Controls simultaneous
execution of a Number of processing elements on
Lockstep basis
 Each processing element has associated data memory

 Each instruction executed on different set of data by


different processors
 Vector and array processors

27
MULTIPLE INSTRUCTION, SINGLE
DATA STREAM - MISD
 Sequence of data Transmitted to set of processors
 Each processor executes different instruction sequence

 Never been implemented

28
MULTIPLE INSTRUCTION, MULTIPLE DATA STREAM- MIMD
 Set of processors Simultaneously execute different
instruction sequences on Different sets of data
 SMPs, clusters and NUMA systems

29
TAXONOMY OF PARALLEL PROCESSOR
ARCHITECTURES

30
OPERATING SYSTEM
Operating Systems Definition: The set of software
products that jointly controls the system resources and
processes using these resources on a computer.

It can be thought of as having two objectives:


Convenience: An OS makes a computer more
convenient to use.
Efficiency: An OS allows the computer system resources
to be used in an efficient manner.

31
An operating system (OS) is a software program that
manages the hardware and software resources of a
computer.

The OS performs basic tasks, such as:


 Controlling and allocating memory
 Prioritizing the processing of instructions
 Controlling input and output devices
 Facilitating networking
 Managing files.

32
Layers and Views of a Computer System:

33
Operating System Services:
 Program creation
 Program execution
 Access to I/O devices
 Controlled access to files
 System access
 Error detection and response
 Accounting

34
Kernel:
An operating system generally consists of two parts:
• Kernel space (kernel mode) and
• User space (user mode).
Kernel is the smallest and central component of an
operating system.
Its services include managing memory and devices and
also to provide an interface for software applications to
use the resources.
Additional services such as managing protection of
35
programs and multitasking may be included depending
on architecture of operating system.
There are three broad categories of kernel models
available, namely:
I. Monolithic kernel
II. Microkernel
III. Exokernel

36
Why Operating Systems?
Operating systems provide a layer of abstraction
between the programmer and the machine
It screens the complexities of the computer from the
programmer
It allows the computer to be treated as a virtual machine

37
Operating system types:

Real-time:
A real-time operating system is a multitasking operating
system that aims at executing real-time applications.

 Real-time operating systems often use specialized


scheduling algorithms so that they can achieve a
deterministic nature of behavior.

The main objective of Real-Time operating systems is


their quick and predictable response to events. 38
Multi-user:
A multi-user operating system allows multiple users to
access a computer system concurrently.

Time-sharing system can be classified as multi-user


systems as they enable a multiple user access to a
computer through the sharing of time.

Single-user operating systems, as opposed to a multi-


user operating system, are usable by a single user at a
time.
39
Being able to use multiple accounts on a Windows
operating system does not make it a multi-user system.
For a UNIX-like operating system, it is possible for two
users to login at a time and this capability of the OS
makes it a multi-user operating system.

Below are some examples of multi-user operating


systems.
• Linux
• Unix
• Windows 2000

40
Distributed:
A distributed operating system manages a group of
independent computers and makes them appear to be a
single computer.

The development of networked computers that could


be linked and communicate with each other gave rise to
distributed computing.

Distributed computations are carried out on more than


one machine. When computers in a group work in
cooperation, they make a distributed system. 41
Embedded:
Embedded operating systems are designed to be used
in embedded computer systems.

They are designed to operate on small machines like


PDAs with less autonomy.

They are able to operate with a limited number of


resources.

They are very compact and extremely efficient by


design.
Windows CE and Minix 3 are some examples of 42

embedded operating systems.


CORE I3, CORE I5, CORE I7 — THE
DIFFERENCE IN A NUTSHELL
 Core i7s are better than Core i5s, which are in turn better
than Core i3s.
 Core i7 does not have seven cores nor does Core i3 have
three cores.
 The numbers are simply indicative of their relative
processing powers.

43
RELATIVE PROCESSING POWER
 Determined:
 number of cores,
 clock speed (in GHz), size of cache,
 Intel technologies like Turbo Boost and Hyper-Threading

44
NUMBER OF CORES
 The more cores there are, the more tasks (known as
threads) can be served at the same time.
 The lowest number of cores can be found in Core i3
CPUs, i.e., which have only two cores. Currently, all
Core i3s are dual-core processors

45
INTEL TURBO BOOST
 The Intel Turbo Boost Technology allows a processor to
dynamically increase its clock speed whenever the need
arises.
 The maximum amount that Turbo Boost can raise clock
speed at any given time is dependent on the number of
active cores, the estimated current consumption, the
estimated power consumption, and the processor
temperature.
 Because none of the Core i3 CPUs have Turbo Boost, the
i5-4570T can outrun them whenever it needs to.

46
CACHE SIZE
 Whenever the CPU finds that it keeps on using the same
data over and over, it stores that data in its cache
 with a larger cache, more data can be accessed quickly.

 The Haswell (fourth generation) Core i3 processors have


either 3MB or 4MB of cache.
 The Haswell Core i5s have either 4MB or 6MB of cache.

 Finally, all Core i7 CPUs have 8MB of cache, except for


i7-4770R, which has 6MB.
 This is clearly one reason why an i7 outperforms an i5
— and why an i5 outperforms an i3.
47
HYPER-THREADING
 Strictly speaking, only one thread can be served by one core at a
time.
 So if a CPU is a dual core, then supposedly only two threads can
be served simultaneously.
 However, Intel has a technology called Hyper-Threading. This
enables a single core to serve multiple threads.
 For instance, a Core i3, which is only a dual core, can actually
serve two threads per core. In other words, a total of four threads
can run simultaneously.
 Thus, even if Core i5 processors are quad cores, since they don’t
support Hyper-Threading (again, except the i5-4570T) the
number of threads they can serve at the same time is just about
48
equal to those of their Core i3 counterparts
HYPER-THREADING
 Core i7s Not only are they quad cores, they also support
Hyper-Threading.
 Thus, a total of eight threads can run on them at the same
time.
 Combine that with 8MB of cache and Intel Turbo Boost
Technology, which all of them have, and you’ll see what
sets the Core i7 apart from its siblings.

49
The End!
Thank You So Much!

50

You might also like