0% found this document useful (0 votes)
4 views25 pages

GPU v1.1

The document provides an overview of Instruction Set Architecture (ISA), highlighting the differences between Complex Instruction Set Computers (CISC) and Reduced Instruction Set Computers (RISC). It explains common and complex instructions, the concept of SIMD (Single Instruction Multiple Data) and SIMT (Single Instruction Multiple Threads), and contrasts CPU and GPU functionalities. Additionally, it discusses the distinctions between programs, processes, threads, and tasks, emphasizing the advantages of using tasks over threads in programming.

Uploaded by

stefano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views25 pages

GPU v1.1

The document provides an overview of Instruction Set Architecture (ISA), highlighting the differences between Complex Instruction Set Computers (CISC) and Reduced Instruction Set Computers (RISC). It explains common and complex instructions, the concept of SIMD (Single Instruction Multiple Data) and SIMT (Single Instruction Multiple Threads), and contrasts CPU and GPU functionalities. Additionally, it discusses the distinctions between programs, processes, threads, and tasks, emphasizing the advantages of using tasks over threads in programming.

Uploaded by

stefano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

The path 2 GPU

Instruction Set Architecture (ISA)


In computer science, an instruction set architecture (ISA) is an abstract model of a computer. It is also
referred to as architecture or computer architecture. A realization of an ISA, such as a central processing
unit (CPU), is called an implementation.

An ISA may be classified in a number of different ways. A common classification is by architectural


complexity:
- A complex instruction set computer (CISC) has many specialized instructions, some of which may only
be rarely used in practical programs.
- A reduced instruction set computer (RISC) simplifies the processor by efficiently implementing only
the instructions that are frequently used in programs, while the less common operations are
implemented as subroutines, having their resulting additional processor execution time offset by
infrequent use
Common instructions
Examples of operations common to many instruction sets include:
- Data handling and memory operations
- set a register to a fixed constant value; copy data from a memory location to a register, or viceversa; load and
store operations; read and write data from hardware devices […]
- Arithmetic and logic operations
- add, subtract, multiply, or divide the values of two registers, placing the result in a register, increment,
decrement, perform bitwise operations, taking the negation of each bit in a register, compare two values in
registers, floating-point instructions for arithmetic on floating-point numbers […]
- Control flow operations
- branch to another location in the program and execute instructions there; conditionally branch to another
location if a certain condition holds; indirectly branch to another location; call another block of code […]
- Coprocessor instructions
- load/store data to and from a coprocessor or exchanging with CPU registers; perform coprocessor operations
Complex instructions
A single complex instruction does something that may take many instructions on other computers: such
instructions are typified by instructions that take multiple steps, control multiple functional units, or
otherwise appear on a larger scale than the bulk of simple instructions implemented by the given
processor.

Some examples of complex instructions include:


- transferring multiple registers to or from memory (especially the stack) at once
- moving large blocks of memory (e.g. string copy or DMA transfer) complicated integer and floating-
point arithmetic (e.g. square root, or transcendental functions such as logarithm, sine, cosine, etc.)
- SIMD instructions: a single instruction performing an operation on many homogeneous values in
parallel, possibly in dedicated SIMD registers
- performing an atomic test-and-set instruction or other read-modify-write atomic instruction
- instructions that perform ALU operations with an operand from memory rather than a register
Complex instructions
Complex instructions are more common in CISC instruction sets than in RISC instruction sets, but RISC
instruction sets may include them as well.

RISC instruction sets generally do not include ALU operations with memory operands, or instructions to
move large blocks of memory, but most RISC instruction sets include SIMD or vector instructions that
perform the same arithmetic operation on multiple pieces of data at the same time.

SIMD instructions have the ability of manipulating large vectors and matrices in minimal time.
SIMD instructions allow easy parallelization of algorithms commonly involved in sound, image, and video
processing.

Various SIMD implementations have been brought to market under trade names such as: MMX, 3DNow!,
AltiVec., SSE, NEON, AVX…
Top CPU manufactors (Intel, AMD etc.) typically use SIMD instruction sets inside their products.
Parallelis
m
Task parallelism vs Data
parallelism
Task parallelism Data parallelism
• Different operations are performed • Distribution of data across different parallel computing
concurrently nodes
• Task parallelism is achieved when the • Data parallelism is achieved when each processor performs
processors execute different threads the same task on different pieces of the data
(or processes) on the same or
different data for each element a
perform the same (set of) instruction(s) on a
• Examples: Scheduling on a multicore end
Each "PU"
(processing unit)
does not necessarily
correspond to a
processor, just
some functional
unit that can
perform processing.
The PU's are
indicated as such to
show relationship
between
instructions, data,
and the processing
of the data.
SIMD
Single instruction multiple data (SIMD) is a class of parallel computers in Flynn's taxonomy.
It describes computers with multiple processing elements that perform the same operation on multiple
data points simultaneously.
Such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel)
computations, but only a single process (instruction) at a given moment.

An application that may take advantage of SIMD is one where the same value is being added to (or subtracted from) a large number of
data points, a common operation in many multimedia applications. One example would be changing the brightness of an image. Each
pixel of an image consists of three values for the brightness of the red (R), green (G) and blue (B) portions of the color. To change the
brightness, the R, G and B values are read from memory, a value is added to (or subtracted from) them, and the resulting values are
written back out to memory.
Queste macchine (dette anche supercomputer) sono caratterizzate dal
fatto di avere:
• Un componente di controllo (che può essere assimilato al concetto
di CPU dei normali Personal Computer)
• Diversi PE (Processing Element) che eseguono computazione.

Le istruzioni vengono svolte in parte dal componente di controllo ed in


parte dai PE della macchina. Le macchine della famiglia SIMD vengono
dette anche Array Processors.

Le macchine SIMD possono seguire due approcci:


• MPP (Massively Parallel Processors): il numero dei componenti PE
varia da 1024 a 65.536 (cioè da 1K a 64K). I PE devono essere
collegati con una opportuna rete di interconnessione (inter-PE
network).
• Approccio CPU: quando si incorporano pochi PE (di solito 8)
all’interno della CPU stessa. Per far questo bisogna ampliare l’ISA
(Instruction Set Architecture) con istruzioni dedicate a dare
direttive simultanee ai PE della CPU.

Tutte le macchine SIMD sono caratterizzate dal fatto che quando arriva

SIMD Machines una direttiva, questa può essere svolta dagli n esecutori
simultaneamente, ma su insiemi di dati diversi. Lo Speedup è quindi
compreso tra 1 e n.
SIMT
Single Instruction Multiple Threads (SIMT) ≈ SIMD + multithreading

A thread of execution is the smallest sequence of programmed instructions that can be managed independently by a
scheduler (which is typically a part of the operating system).
A thread is a component of a process: multiple threads can exist within one process, executing concurrently and sharing
resources such as memory, while different processes do not share these resources; in particular, the threads of a
process share its executable code and the values of its dynamically allocated variables and non-thread-local global
variables at any given time.

In SIMT, multiple threads perform the same instruction on different data sets. The main advantage of SIMT
is that it reduces the latency that comes with instruction prefetching.
SIMD vs SIMT
CPU is the brain for every ingrained system: CPU
GPU is used to provide the images in
comprises the arithmetic logic unit (ALU) accustomed
computer games. GPU is faster than CPU’s
quickly to store the information and perform calculations
speed and it emphasis on high throughput.
and Control Unit (CU) for performing instruction
It’s generally incorporated with electronic
sequencing as well as branching.
equipment for sharing RAM with electronic
CPU interacts with more computer components such as
equipment that is nice for the foremost
memory, input and output for performing instruction.
computing task. It contains more ALU units
than CPU.
CPU GPU

CPU stands for Central Processing Unit. GPU stands for Graphics Processing Unit.

CPU consumes or needs more memory than GPU consumes or requires less memory than
GPU. CPU.
The speed of CPU is less than GPU’s speed. GPU is faster than CPU’s speed.
CPU contain minute powerful cores. GPU contains more weak cores.
CPU is suitable for serial instruction GPU is not suitable for serial instruction
processing. processing.
CPU is not suitable for parallel instruction GPU is suitable for parallel instruction
processing. processing.
CPU emphasis on low latency. GPU emphasis on high throughput.

CPU vs GPU
GPU vs CPU
CPUs vs GPUs As Fast As Possible

https://www.youtube.com/watch?v=1kypaBjJ-pg
GPU vs CPU
What is a GPU vs a CPU?

https://www.youtube.com/watch?v=XKOI9-G-wk8
GPU vs CPU
GPUs:
Explained

https://www.youtube.com/watch?v=LfdK-v0SbGI
Game Streaming
Is there ANY
hope for game
streaming? We
tried them all

https://www.youtube.com/watch?v=d3dNoCRzbAs
Appendix
: Process
vs
Thread
vs Task
Program vs Process vs Thread
A program in can be described as any executable file: it contains certain set of instructions written with
the intent of carrying out a specific operation. It resides in memory and it is a passive entity which doesn’t
go away when system reboots.

Any running instance of a program is called as process or it can also be described as a program under
execution. 1 program can have N processes. Process resides in main memory and hence disappears
whenever machine reboots. Multiple processes can be run in parallel on a multiprocessor system.

A thread is commonly described as a lightweight process. 1 process can have N threads. All threads which
are associated with a common process share same memory as of process: this allows threads to read from
and write to the common shared and data structures and variables, and also increases ease of
communication between threads. Communication between two or more processes – also known as Inter-
Process Communication i.e. IPC – is quite difficult and uses intensive resources.
Thread & Threadpool
A thread represents an actual OS-level thread. Thread allows the highest degree of control; you can Abort or Suspend
or Resume a thread, you can observe its state, and you can set thread-level properties like the stack size, apartment
state, or culture.

The problem with threads is that OS threads are costly: each thread you consumes a non-trivial amount of memory for
its stack, and adds additional CPU overhead as the processor context-switch between threads. Instead, it is better to
have a small pool of threads execute your code as work becomes available.

.NET Framework Common Language Runtime (CLR) or Java Virtual Machine offer a ThreadPool solution, that is a
wrapper around a pool of threads maintained by the CLR or Virtual Machine itself giving you no control at all; you can
submit work to execute at some point, and you can control the size of the pool, but you can’t set anything else. You
can’t even tell when the pool will start running the work you submit to it.

Using ThreadPool avoids the overhead of creating too many threads. However, if you submit too many long-running
tasks to the threadpool, it can get full, and later work that you submit can end up waiting for the earlier long-running
items to finish. In addition, the ThreadPool offers no way to find out when a work item has been completed, nor a way
to get the result. Therefore, ThreadPool is best used for short operations where the caller does not need the result.
Task
A Task is something you want done; ut is a set of program instructions that are loaded in memory .

A task will by default use the Threadpool, therefore it does not create its own OS thread.

Tasks are executed by a TaskScheduler (instead the default scheduler simply runs on the ThreadPool).

Unlike the ThreadPool, Task also allows you to find out when it finishes, and to return a result. You can call
ContinueWith on an existing Task to make it run more code once the task finishes. You can also synchronously wait for a
task to finish by calling Wait or, for a generic task, by getting the Result property. Like Thread.Join, this will block the
calling thread until the task finishes. Synchronously waiting for a task is usually bad idea; it prevents the calling thread
from doing any other work, and can also lead to deadlocks if the task ends up waiting (even asynchronously) for the
current thread.

Since tasks still run on the ThreadPool, they should not be used for long-running operations, since they can still fill up
the thread pool and block new work. Instead, Task provides a LongRunning option, which will tell the TaskScheduler to
spin up a new thread rather than running on the ThreadPool.
Thread vs Task
• Task is more abstract then threads. It is always advised to use tasks instead of thread as it is created on the thread
pool which has already system created threads to improve the performance.
• The task can return a result. There is no direct mechanism to return the result from a thread.
• Task supports cancellation through the use of cancellation tokens. But Thread doesn’t.
• A task can have multiple threads happening at the same time, threads can only have one task running at a time.
• You can attach task to the parent task, thus you can decide whether the parent or the child will exist first.
• While using thread if we get the exception in the long running method it is not possible to catch the exception in the
parent function but the same can be easily caught if we are using tasks.
• You can easily build chains of tasks. You can specify when a task should start after the previous task and you can
specify if there should be a synchronization context switch. That gives you the great opportunity to run a long
running task in background and after that a UI refreshing task on the UI thread.
• A task is by default a background task. You cannot have a foreground task. On the other hand a thread can be
background or foreground.
• The default TaskScheduler will use thread pooling, so some Tasks may not start until other pending Tasks have
completed. If you use Thread directly, every use will start a new Thread.

You might also like