GPU v1.1
GPU v1.1
RISC instruction sets generally do not include ALU operations with memory operands, or instructions to
move large blocks of memory, but most RISC instruction sets include SIMD or vector instructions that
perform the same arithmetic operation on multiple pieces of data at the same time.
SIMD instructions have the ability of manipulating large vectors and matrices in minimal time.
SIMD instructions allow easy parallelization of algorithms commonly involved in sound, image, and video
processing.
Various SIMD implementations have been brought to market under trade names such as: MMX, 3DNow!,
AltiVec., SSE, NEON, AVX…
Top CPU manufactors (Intel, AMD etc.) typically use SIMD instruction sets inside their products.
Parallelis
m
Task parallelism vs Data
parallelism
Task parallelism Data parallelism
• Different operations are performed • Distribution of data across different parallel computing
concurrently nodes
• Task parallelism is achieved when the • Data parallelism is achieved when each processor performs
processors execute different threads the same task on different pieces of the data
(or processes) on the same or
different data for each element a
perform the same (set of) instruction(s) on a
• Examples: Scheduling on a multicore end
Each "PU"
(processing unit)
does not necessarily
correspond to a
processor, just
some functional
unit that can
perform processing.
The PU's are
indicated as such to
show relationship
between
instructions, data,
and the processing
of the data.
SIMD
Single instruction multiple data (SIMD) is a class of parallel computers in Flynn's taxonomy.
It describes computers with multiple processing elements that perform the same operation on multiple
data points simultaneously.
Such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel)
computations, but only a single process (instruction) at a given moment.
An application that may take advantage of SIMD is one where the same value is being added to (or subtracted from) a large number of
data points, a common operation in many multimedia applications. One example would be changing the brightness of an image. Each
pixel of an image consists of three values for the brightness of the red (R), green (G) and blue (B) portions of the color. To change the
brightness, the R, G and B values are read from memory, a value is added to (or subtracted from) them, and the resulting values are
written back out to memory.
Queste macchine (dette anche supercomputer) sono caratterizzate dal
fatto di avere:
• Un componente di controllo (che può essere assimilato al concetto
di CPU dei normali Personal Computer)
• Diversi PE (Processing Element) che eseguono computazione.
Tutte le macchine SIMD sono caratterizzate dal fatto che quando arriva
SIMD Machines una direttiva, questa può essere svolta dagli n esecutori
simultaneamente, ma su insiemi di dati diversi. Lo Speedup è quindi
compreso tra 1 e n.
SIMT
Single Instruction Multiple Threads (SIMT) ≈ SIMD + multithreading
A thread of execution is the smallest sequence of programmed instructions that can be managed independently by a
scheduler (which is typically a part of the operating system).
A thread is a component of a process: multiple threads can exist within one process, executing concurrently and sharing
resources such as memory, while different processes do not share these resources; in particular, the threads of a
process share its executable code and the values of its dynamically allocated variables and non-thread-local global
variables at any given time.
In SIMT, multiple threads perform the same instruction on different data sets. The main advantage of SIMT
is that it reduces the latency that comes with instruction prefetching.
SIMD vs SIMT
CPU is the brain for every ingrained system: CPU
GPU is used to provide the images in
comprises the arithmetic logic unit (ALU) accustomed
computer games. GPU is faster than CPU’s
quickly to store the information and perform calculations
speed and it emphasis on high throughput.
and Control Unit (CU) for performing instruction
It’s generally incorporated with electronic
sequencing as well as branching.
equipment for sharing RAM with electronic
CPU interacts with more computer components such as
equipment that is nice for the foremost
memory, input and output for performing instruction.
computing task. It contains more ALU units
than CPU.
CPU GPU
CPU stands for Central Processing Unit. GPU stands for Graphics Processing Unit.
CPU consumes or needs more memory than GPU consumes or requires less memory than
GPU. CPU.
The speed of CPU is less than GPU’s speed. GPU is faster than CPU’s speed.
CPU contain minute powerful cores. GPU contains more weak cores.
CPU is suitable for serial instruction GPU is not suitable for serial instruction
processing. processing.
CPU is not suitable for parallel instruction GPU is suitable for parallel instruction
processing. processing.
CPU emphasis on low latency. GPU emphasis on high throughput.
CPU vs GPU
GPU vs CPU
CPUs vs GPUs As Fast As Possible
https://www.youtube.com/watch?v=1kypaBjJ-pg
GPU vs CPU
What is a GPU vs a CPU?
https://www.youtube.com/watch?v=XKOI9-G-wk8
GPU vs CPU
GPUs:
Explained
https://www.youtube.com/watch?v=LfdK-v0SbGI
Game Streaming
Is there ANY
hope for game
streaming? We
tried them all
https://www.youtube.com/watch?v=d3dNoCRzbAs
Appendix
: Process
vs
Thread
vs Task
Program vs Process vs Thread
A program in can be described as any executable file: it contains certain set of instructions written with
the intent of carrying out a specific operation. It resides in memory and it is a passive entity which doesn’t
go away when system reboots.
Any running instance of a program is called as process or it can also be described as a program under
execution. 1 program can have N processes. Process resides in main memory and hence disappears
whenever machine reboots. Multiple processes can be run in parallel on a multiprocessor system.
A thread is commonly described as a lightweight process. 1 process can have N threads. All threads which
are associated with a common process share same memory as of process: this allows threads to read from
and write to the common shared and data structures and variables, and also increases ease of
communication between threads. Communication between two or more processes – also known as Inter-
Process Communication i.e. IPC – is quite difficult and uses intensive resources.
Thread & Threadpool
A thread represents an actual OS-level thread. Thread allows the highest degree of control; you can Abort or Suspend
or Resume a thread, you can observe its state, and you can set thread-level properties like the stack size, apartment
state, or culture.
The problem with threads is that OS threads are costly: each thread you consumes a non-trivial amount of memory for
its stack, and adds additional CPU overhead as the processor context-switch between threads. Instead, it is better to
have a small pool of threads execute your code as work becomes available.
.NET Framework Common Language Runtime (CLR) or Java Virtual Machine offer a ThreadPool solution, that is a
wrapper around a pool of threads maintained by the CLR or Virtual Machine itself giving you no control at all; you can
submit work to execute at some point, and you can control the size of the pool, but you can’t set anything else. You
can’t even tell when the pool will start running the work you submit to it.
Using ThreadPool avoids the overhead of creating too many threads. However, if you submit too many long-running
tasks to the threadpool, it can get full, and later work that you submit can end up waiting for the earlier long-running
items to finish. In addition, the ThreadPool offers no way to find out when a work item has been completed, nor a way
to get the result. Therefore, ThreadPool is best used for short operations where the caller does not need the result.
Task
A Task is something you want done; ut is a set of program instructions that are loaded in memory .
A task will by default use the Threadpool, therefore it does not create its own OS thread.
Tasks are executed by a TaskScheduler (instead the default scheduler simply runs on the ThreadPool).
Unlike the ThreadPool, Task also allows you to find out when it finishes, and to return a result. You can call
ContinueWith on an existing Task to make it run more code once the task finishes. You can also synchronously wait for a
task to finish by calling Wait or, for a generic task, by getting the Result property. Like Thread.Join, this will block the
calling thread until the task finishes. Synchronously waiting for a task is usually bad idea; it prevents the calling thread
from doing any other work, and can also lead to deadlocks if the task ends up waiting (even asynchronously) for the
current thread.
Since tasks still run on the ThreadPool, they should not be used for long-running operations, since they can still fill up
the thread pool and block new work. Instead, Task provides a LongRunning option, which will tell the TaskScheduler to
spin up a new thread rather than running on the ThreadPool.
Thread vs Task
• Task is more abstract then threads. It is always advised to use tasks instead of thread as it is created on the thread
pool which has already system created threads to improve the performance.
• The task can return a result. There is no direct mechanism to return the result from a thread.
• Task supports cancellation through the use of cancellation tokens. But Thread doesn’t.
• A task can have multiple threads happening at the same time, threads can only have one task running at a time.
• You can attach task to the parent task, thus you can decide whether the parent or the child will exist first.
• While using thread if we get the exception in the long running method it is not possible to catch the exception in the
parent function but the same can be easily caught if we are using tasks.
• You can easily build chains of tasks. You can specify when a task should start after the previous task and you can
specify if there should be a synchronization context switch. That gives you the great opportunity to run a long
running task in background and after that a UI refreshing task on the UI thread.
• A task is by default a background task. You cannot have a foreground task. On the other hand a thread can be
background or foreground.
• The default TaskScheduler will use thread pooling, so some Tasks may not start until other pending Tasks have
completed. If you use Thread directly, every use will start a new Thread.