0% found this document useful (0 votes)

4 views25 pages

GPU v1.1

The document provides an overview of Instruction Set Architecture (ISA), highlighting the differences between Complex Instruction Set Computers (CISC) and Reduced Instruction Set Computers (RISC). It explains common and complex instructions, the concept of SIMD (Single Instruction Multiple Data) and SIMT (Single Instruction Multiple Threads), and contrasts CPU and GPU functionalities. Additionally, it discusses the distinctions between programs, processes, threads, and tasks, emphasizing the advantages of using tasks over threads in programming.

Uploaded by

stefano

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views25 pages

GPU v1.1

Uploaded by

stefano

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

The path 2 GPU

Instruction Set Architecture (ISA)

In computer science, an instruction set architecture (ISA) is an abstract model of a computer. It is also
referred to as architecture or computer architecture. A realization of an ISA, such as a central processing
unit (CPU), is called an implementation.

An ISA may be classified in a number of different ways. A common classification is by architectural

complexity:
- A complex instruction set computer (CISC) has many specialized instructions, some of which may only
be rarely used in practical programs.
- A reduced instruction set computer (RISC) simplifies the processor by efficiently implementing only
the instructions that are frequently used in programs, while the less common operations are
implemented as subroutines, having their resulting additional processor execution time offset by
infrequent use
Common instructions
Examples of operations common to many instruction sets include:
- Data handling and memory operations
- set a register to a fixed constant value; copy data from a memory location to a register, or viceversa; load and
store operations; read and write data from hardware devices […]
- Arithmetic and logic operations
- add, subtract, multiply, or divide the values of two registers, placing the result in a register, increment,
decrement, perform bitwise operations, taking the negation of each bit in a register, compare two values in
registers, floating-point instructions for arithmetic on floating-point numbers […]
- Control flow operations
- branch to another location in the program and execute instructions there; conditionally branch to another
location if a certain condition holds; indirectly branch to another location; call another block of code […]
- Coprocessor instructions
- load/store data to and from a coprocessor or exchanging with CPU registers; perform coprocessor operations
Complex instructions
A single complex instruction does something that may take many instructions on other computers: such
instructions are typified by instructions that take multiple steps, control multiple functional units, or
otherwise appear on a larger scale than the bulk of simple instructions implemented by the given
processor.

Some examples of complex instructions include:

- transferring multiple registers to or from memory (especially the stack) at once
- moving large blocks of memory (e.g. string copy or DMA transfer) complicated integer and floating-
point arithmetic (e.g. square root, or transcendental functions such as logarithm, sine, cosine, etc.)
- SIMD instructions: a single instruction performing an operation on many homogeneous values in
parallel, possibly in dedicated SIMD registers
- performing an atomic test-and-set instruction or other read-modify-write atomic instruction
- instructions that perform ALU operations with an operand from memory rather than a register
Complex instructions
Complex instructions are more common in CISC instruction sets than in RISC instruction sets, but RISC
instruction sets may include them as well.

RISC instruction sets generally do not include ALU operations with memory operands, or instructions to
move large blocks of memory, but most RISC instruction sets include SIMD or vector instructions that
perform the same arithmetic operation on multiple pieces of data at the same time.

SIMD instructions have the ability of manipulating large vectors and matrices in minimal time.
SIMD instructions allow easy parallelization of algorithms commonly involved in sound, image, and video
processing.

Various SIMD implementations have been brought to market under trade names such as: MMX, 3DNow!,
AltiVec., SSE, NEON, AVX…
Top CPU manufactors (Intel, AMD etc.) typically use SIMD instruction sets inside their products.
Parallelis
m
Task parallelism vs Data
parallelism
Task parallelism Data parallelism
• Different operations are performed • Distribution of data across different parallel computing
concurrently nodes
• Task parallelism is achieved when the • Data parallelism is achieved when each processor performs
processors execute different threads the same task on different pieces of the data
(or processes) on the same or
different data for each element a
perform the same (set of) instruction(s) on a
• Examples: Scheduling on a multicore end
Each "PU"
(processing unit)
does not necessarily
correspond to a
processor, just
some functional
unit that can
perform processing.
The PU's are
indicated as such to
show relationship
between
instructions, data,
and the processing
of the data.
SIMD
Single instruction multiple data (SIMD) is a class of parallel computers in Flynn's taxonomy.
It describes computers with multiple processing elements that perform the same operation on multiple
data points simultaneously.
Such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel)
computations, but only a single process (instruction) at a given moment.

An application that may take advantage of SIMD is one where the same value is being added to (or subtracted from) a large number of
data points, a common operation in many multimedia applications. One example would be changing the brightness of an image. Each
pixel of an image consists of three values for the brightness of the red (R), green (G) and blue (B) portions of the color. To change the
brightness, the R, G and B values are read from memory, a value is added to (or subtracted from) them, and the resulting values are
written back out to memory.
Queste macchine (dette anche supercomputer) sono caratterizzate dal
fatto di avere:
• Un componente di controllo (che può essere assimilato al concetto
di CPU dei normali Personal Computer)
• Diversi PE (Processing Element) che eseguono computazione.

Le istruzioni vengono svolte in parte dal componente di controllo ed in

parte dai PE della macchina. Le macchine della famiglia SIMD vengono
dette anche Array Processors.

Le macchine SIMD possono seguire due approcci:

• MPP (Massively Parallel Processors): il numero dei componenti PE
varia da 1024 a 65.536 (cioè da 1K a 64K). I PE devono essere
collegati con una opportuna rete di interconnessione (inter-PE
network).
• Approccio CPU: quando si incorporano pochi PE (di solito 8)
all’interno della CPU stessa. Per far questo bisogna ampliare l’ISA
(Instruction Set Architecture) con istruzioni dedicate a dare
direttive simultanee ai PE della CPU.

Tutte le macchine SIMD sono caratterizzate dal fatto che quando arriva

SIMD Machines una direttiva, questa può essere svolta dagli n esecutori
simultaneamente, ma su insiemi di dati diversi. Lo Speedup è quindi
compreso tra 1 e n.
SIMT
Single Instruction Multiple Threads (SIMT) ≈ SIMD + multithreading

A thread of execution is the smallest sequence of programmed instructions that can be managed independently by a
scheduler (which is typically a part of the operating system).
A thread is a component of a process: multiple threads can exist within one process, executing concurrently and sharing
resources such as memory, while different processes do not share these resources; in particular, the threads of a
process share its executable code and the values of its dynamically allocated variables and non-thread-local global
variables at any given time.

In SIMT, multiple threads perform the same instruction on different data sets. The main advantage of SIMT
is that it reduces the latency that comes with instruction prefetching.
SIMD vs SIMT
CPU is the brain for every ingrained system: CPU
GPU is used to provide the images in
comprises the arithmetic logic unit (ALU) accustomed
computer games. GPU is faster than CPU’s
quickly to store the information and perform calculations
speed and it emphasis on high throughput.
and Control Unit (CU) for performing instruction
It’s generally incorporated with electronic
sequencing as well as branching.
equipment for sharing RAM with electronic
CPU interacts with more computer components such as
equipment that is nice for the foremost
memory, input and output for performing instruction.
computing task. It contains more ALU units
than CPU.
CPU GPU

CPU stands for Central Processing Unit. GPU stands for Graphics Processing Unit.

CPU consumes or needs more memory than GPU consumes or requires less memory than
GPU. CPU.
The speed of CPU is less than GPU’s speed. GPU is faster than CPU’s speed.
CPU contain minute powerful cores. GPU contains more weak cores.
CPU is suitable for serial instruction GPU is not suitable for serial instruction
processing. processing.
CPU is not suitable for parallel instruction GPU is suitable for parallel instruction
processing. processing.
CPU emphasis on low latency. GPU emphasis on high throughput.

CPU vs GPU
GPU vs CPU
CPUs vs GPUs As Fast As Possible

https://www.youtube.com/watch?v=1kypaBjJ-pg
GPU vs CPU
What is a GPU vs a CPU?

https://www.youtube.com/watch?v=XKOI9-G-wk8
GPU vs CPU
GPUs:
Explained

https://www.youtube.com/watch?v=LfdK-v0SbGI
Game Streaming
Is there ANY
hope for game
streaming? We
tried them all

https://www.youtube.com/watch?v=d3dNoCRzbAs
Appendix
: Process
vs
Thread
vs Task
Program vs Process vs Thread
A program in can be described as any executable file: it contains certain set of instructions written with
the intent of carrying out a specific operation. It resides in memory and it is a passive entity which doesn’t
go away when system reboots.

Any running instance of a program is called as process or it can also be described as a program under
execution. 1 program can have N processes. Process resides in main memory and hence disappears
whenever machine reboots. Multiple processes can be run in parallel on a multiprocessor system.

A thread is commonly described as a lightweight process. 1 process can have N threads. All threads which
are associated with a common process share same memory as of process: this allows threads to read from
and write to the common shared and data structures and variables, and also increases ease of
communication between threads. Communication between two or more processes – also known as Inter-
Process Communication i.e. IPC – is quite difficult and uses intensive resources.
Thread & Threadpool
A thread represents an actual OS-level thread. Thread allows the highest degree of control; you can Abort or Suspend
or Resume a thread, you can observe its state, and you can set thread-level properties like the stack size, apartment
state, or culture.

The problem with threads is that OS threads are costly: each thread you consumes a non-trivial amount of memory for
its stack, and adds additional CPU overhead as the processor context-switch between threads. Instead, it is better to
have a small pool of threads execute your code as work becomes available.

.NET Framework Common Language Runtime (CLR) or Java Virtual Machine offer a ThreadPool solution, that is a
wrapper around a pool of threads maintained by the CLR or Virtual Machine itself giving you no control at all; you can
submit work to execute at some point, and you can control the size of the pool, but you can’t set anything else. You
can’t even tell when the pool will start running the work you submit to it.

Using ThreadPool avoids the overhead of creating too many threads. However, if you submit too many long-running
tasks to the threadpool, it can get full, and later work that you submit can end up waiting for the earlier long-running
items to finish. In addition, the ThreadPool offers no way to find out when a work item has been completed, nor a way
to get the result. Therefore, ThreadPool is best used for short operations where the caller does not need the result.
Task
A Task is something you want done; ut is a set of program instructions that are loaded in memory .

A task will by default use the Threadpool, therefore it does not create its own OS thread.

Tasks are executed by a TaskScheduler (instead the default scheduler simply runs on the ThreadPool).

Unlike the ThreadPool, Task also allows you to find out when it finishes, and to return a result. You can call
ContinueWith on an existing Task to make it run more code once the task finishes. You can also synchronously wait for a
task to finish by calling Wait or, for a generic task, by getting the Result property. Like Thread.Join, this will block the
calling thread until the task finishes. Synchronously waiting for a task is usually bad idea; it prevents the calling thread
from doing any other work, and can also lead to deadlocks if the task ends up waiting (even asynchronously) for the
current thread.

Since tasks still run on the ThreadPool, they should not be used for long-running operations, since they can still fill up
the thread pool and block new work. Instead, Task provides a LongRunning option, which will tell the TaskScheduler to
spin up a new thread rather than running on the ThreadPool.
Thread vs Task
• Task is more abstract then threads. It is always advised to use tasks instead of thread as it is created on the thread
pool which has already system created threads to improve the performance.
• The task can return a result. There is no direct mechanism to return the result from a thread.
• Task supports cancellation through the use of cancellation tokens. But Thread doesn’t.
• A task can have multiple threads happening at the same time, threads can only have one task running at a time.
• You can attach task to the parent task, thus you can decide whether the parent or the child will exist first.
• While using thread if we get the exception in the long running method it is not possible to catch the exception in the
parent function but the same can be easily caught if we are using tasks.
• You can easily build chains of tasks. You can specify when a task should start after the previous task and you can
specify if there should be a synchronization context switch. That gives you the great opportunity to run a long
running task in background and after that a UI refreshing task on the UI thread.
• A task is by default a background task. You cannot have a foreground task. On the other hand a thread can be
background or foreground.
• The default TaskScheduler will use thread pooling, so some Tasks may not start until other pending Tasks have
completed. If you use Thread directly, every use will start a new Thread.

Patient Monitoring System Project Report Original
62% (13)
Patient Monitoring System Project Report Original
84 pages
B.tech - CSE All Year Syllabus
No ratings yet
B.tech - CSE All Year Syllabus
92 pages
IJARCCE6G S Prabhudev Parallel PDF
No ratings yet
IJARCCE6G S Prabhudev Parallel PDF
4 pages
Parallel Processing in Processor Organization: Prabhudev S Irabashetti
No ratings yet
Parallel Processing in Processor Organization: Prabhudev S Irabashetti
4 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
CP4253 Map Unit I
No ratings yet
CP4253 Map Unit I
31 pages
CA 4 Notes
No ratings yet
CA 4 Notes
34 pages
CS-3006!3!1 SIMD Intrinsic Programming Reduced
No ratings yet
CS-3006!3!1 SIMD Intrinsic Programming Reduced
55 pages
DCP Unit Iv
No ratings yet
DCP Unit Iv
72 pages
Processor and Computer Achitecture
No ratings yet
Processor and Computer Achitecture
26 pages
Processor Architecture
No ratings yet
Processor Architecture
13 pages
Chapter
No ratings yet
Chapter
9 pages
A Comprehensive Survey of Various Processor Types & Latest Architectures
No ratings yet
A Comprehensive Survey of Various Processor Types & Latest Architectures
7 pages
Lecture 10 - SIMD Architecture
No ratings yet
Lecture 10 - SIMD Architecture
27 pages
GPU Architecture
33% (3)
GPU Architecture
28 pages
Taxonomy Parallel Computer Architectures Instruction Data
No ratings yet
Taxonomy Parallel Computer Architectures Instruction Data
2 pages
Processor Architecture
No ratings yet
Processor Architecture
13 pages
A1.1 - Computer Hardware and Operations
No ratings yet
A1.1 - Computer Hardware and Operations
30 pages
Flynn's Classification
No ratings yet
Flynn's Classification
46 pages
Cs8083 MCP Unit I Notes
No ratings yet
Cs8083 MCP Unit I Notes
31 pages
SIMD
No ratings yet
SIMD
10 pages
Computer Architecture
No ratings yet
Computer Architecture
21 pages
Notes FT HA
No ratings yet
Notes FT HA
4 pages
1.1 The Characteristics of Contemporary Processors.280155520
No ratings yet
1.1 The Characteristics of Contemporary Processors.280155520
1 page
Lecture 2
No ratings yet
Lecture 2
51 pages
RISC and CISC, Parallel Processing
No ratings yet
RISC and CISC, Parallel Processing
23 pages
The Central Processing Unit:: What Goes On Inside The Computer
No ratings yet
The Central Processing Unit:: What Goes On Inside The Computer
42 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Lec 9 Taxonomy, RISC, CISC-computer Architecture
No ratings yet
Lec 9 Taxonomy, RISC, CISC-computer Architecture
31 pages
Emb Notes Unit 1
No ratings yet
Emb Notes Unit 1
49 pages
atII Bks Lec 2021 28
No ratings yet
atII Bks Lec 2021 28
6 pages
Chapter 18 Hardware and Virtual Machines
No ratings yet
Chapter 18 Hardware and Virtual Machines
56 pages
Flynn's Taxonomy
No ratings yet
Flynn's Taxonomy
42 pages
Aca Unit 1.1
No ratings yet
Aca Unit 1.1
20 pages
Basicfunctionalunit 190124043726
No ratings yet
Basicfunctionalunit 190124043726
37 pages
Computer System Organizations: Ms - Chit Su Mon
No ratings yet
Computer System Organizations: Ms - Chit Su Mon
74 pages
Chapter2 Part 3
No ratings yet
Chapter2 Part 3
27 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
04 DLP
No ratings yet
04 DLP
19 pages
EE6304 Lecture13 Processors
No ratings yet
EE6304 Lecture13 Processors
69 pages
ALevel Computer Science KO
No ratings yet
ALevel Computer Science KO
24 pages
Computer Hardware Components
No ratings yet
Computer Hardware Components
20 pages
Computer Hardware - Processing Device
No ratings yet
Computer Hardware - Processing Device
4 pages
12 Computer
No ratings yet
12 Computer
44 pages
Presentation1 (1) HPC Mod 3
No ratings yet
Presentation1 (1) HPC Mod 3
51 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
Advanced Computer Architecture: Presented By, Krishna
No ratings yet
Advanced Computer Architecture: Presented By, Krishna
35 pages
Superscalar Architectures: COMP375 Computer Architecture and Organization
No ratings yet
Superscalar Architectures: COMP375 Computer Architecture and Organization
35 pages
Computer Science All in One Paper 1
No ratings yet
Computer Science All in One Paper 1
72 pages
Hardware 1
No ratings yet
Hardware 1
60 pages
Chap4+Processors
No ratings yet
Chap4+Processors
58 pages
CH15 Hardware
No ratings yet
CH15 Hardware
4 pages
Flynn's Taxonomy and SISD SIMD MISD MIMD
86% (14)
Flynn's Taxonomy and SISD SIMD MISD MIMD
7 pages
Comp Arch
No ratings yet
Comp Arch
19 pages
Von Neumann Architecture
No ratings yet
Von Neumann Architecture
8 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
SoC System Design
100% (2)
SoC System Design
82 pages
Chapter 1 - Computer Technology: Classes of Computers
No ratings yet
Chapter 1 - Computer Technology: Classes of Computers
8 pages
Coa Unit 5
No ratings yet
Coa Unit 5
53 pages
Embedded System Design Module
No ratings yet
Embedded System Design Module
35 pages
Unit 3
No ratings yet
Unit 3
53 pages
14BCC51-Microprocessor Lab Manual
No ratings yet
14BCC51-Microprocessor Lab Manual
65 pages
SIMD Presentation
No ratings yet
SIMD Presentation
28 pages
Module 1.1
No ratings yet
Module 1.1
28 pages
Instruction Set of 8086
No ratings yet
Instruction Set of 8086
69 pages
Engineering
No ratings yet
Engineering
139 pages
Microcontrollers Lab
No ratings yet
Microcontrollers Lab
4 pages
15.1 Processors, Parallel Processing and Virtual Machines
No ratings yet
15.1 Processors, Parallel Processing and Virtual Machines
25 pages
STM Unit-2
No ratings yet
STM Unit-2
36 pages
ATmega8535 (L)
No ratings yet
ATmega8535 (L)
321 pages
15 3 Introduction To ISA (Instruction Set Architecture) 25-08-2021 (25 Aug 2021) Material - I - 25 Aug 2021 - Instructi
No ratings yet
15 3 Introduction To ISA (Instruction Set Architecture) 25-08-2021 (25 Aug 2021) Material - I - 25 Aug 2021 - Instructi
10 pages
Sap Part2
No ratings yet
Sap Part2
35 pages
Language Translators Student Notes
No ratings yet
Language Translators Student Notes
5 pages
Issyll PDF
No ratings yet
Issyll PDF
141 pages
QtSPIM Tutorial
No ratings yet
QtSPIM Tutorial
9 pages
MP
No ratings yet
MP
16 pages
Microprocessor Lab
No ratings yet
Microprocessor Lab
22 pages
DDI0403E D Armv7m Arm PDF
No ratings yet
DDI0403E D Armv7m Arm PDF
858 pages
Mic Project Done
No ratings yet
Mic Project Done
16 pages
FPGA Project Proposal
No ratings yet
FPGA Project Proposal
6 pages
Aplle History 1
No ratings yet
Aplle History 1
32 pages
Vortex: Opencl Compatible Risc-V Gpgpu: Fares Elsabbagh Blaise Tine Priyadarshini Roshan Ethan Lyons Euna Kim
No ratings yet
Vortex: Opencl Compatible Risc-V Gpgpu: Fares Elsabbagh Blaise Tine Priyadarshini Roshan Ethan Lyons Euna Kim
7 pages
A Stimulated Simulation System Based On Ovation Virtual DCS
No ratings yet
A Stimulated Simulation System Based On Ovation Virtual DCS
5 pages
Module 2
No ratings yet
Module 2
56 pages
Coe121 Q4 PDF
0% (1)
Coe121 Q4 PDF
6 pages
Caoo Final Exam
No ratings yet
Caoo Final Exam
19 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
4 pages
ROPdefender A Detection Tool To Defend Against Ret
No ratings yet
ROPdefender A Detection Tool To Defend Against Ret
13 pages
cs401 PDF
33% (3)
cs401 PDF
55 pages

GPU v1.1

Uploaded by

GPU v1.1

Uploaded by

The path 2 GPU

Instruction Set Architecture (ISA)

An ISA may be classified in a number of different ways. A common classification is by architectural

Some examples of complex instructions include:

Le istruzioni vengono svolte in parte dal componente di controllo ed in

Le macchine SIMD possono seguire due approcci:

You might also like