0% found this document useful (0 votes)

180 views77 pages

TDCI Arch

The document provides an overview of the evolution of GPU architecture from 1995 to today. It discusses key innovations like hardware transform and lighting, vertex and pixel shaders, unified shader architectures, and how GPUs evolved from fixed function pipelines to general purpose parallel processors using a stream processing model.

Uploaded by

Cristian Garcia Alarcòn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

180 views77 pages

TDCI Arch

Uploaded by

Cristian Garcia Alarcòn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

An Introduction to

Modern GPU Architecture

Ashu Rege
Director of Developer Technology

Agenda
Evolution of GPUs
Computing Revolution
Stream Processing
Architecture details of modern GPUs

Evolution of GPUs

Evolution of GPUs
(1995-1999)
1995 NV1
1997 Riva 128 (NV3), DX3
1998 Riva TNT (NV4), DX5
32 bit color, 24 bit Z, 8 bit stencil
Dual texture, bilinear filtering
2 pixels per clock (ppc)

1999 Riva TNT2 (NV5), DX6

Faster TNT
128b memory interface
32 MB memory
The chip that would not die

Virtua Fighter
(SEGA Corporation)

NV1
50K triangles/sec
1M pixel ops/sec
1M transistors
16-bit color
Nearest filtering

1995

Evolution of GPUs
(Fixed Function)

GeForce 256 (NV10)

DirectX 7.0
Hardware T&L
Cubemaps
DOT3 bump mapping
Register combiners

2x Anisotropic filtering
Trilinear filtering
DXT texture compression
4 ppc
Term GPU introduced

Deus Ex
(Eidos/Ion Storm)

NV10
15M triangles/sec
480M pixel ops/sec
23M transistors
32-bit color
Trilinear filtering

1999

NV10 Register Combiners

Input RGB, Alpha
Registers

RGB Function

Input
Mappings

RGB
Scale/Bias

Next Combiners
RGB Registers

Alpha
Scale/Bias

Next Combiners
Alpha Registers

A
A op1 B
B

RGB
Portion

C op2 D
C

D
Input Alpha, Blue
Registers

Input
Mappings

AB op3 CD

Alpha
Function
A
AB

Alpha
Portion

B
CD
C

AB op4 CD

Evolution of GPUs
(Shader Model 1.0)
GeForce 3 (NV20)
NV2A Xbox GPU

DirectX 8.0
Vertex and Pixel Shaders
3D Textures
Hardware Shadow Maps
8x Anisotropic filtering
Multisample AA (MSAA)
4 ppc

Ragnarok Online
(Atari/Gravity)

NV20
100M triangles/sec
1G pixel ops/sec
57M transistors
Vertex/Pixel shaders
MSAA

2001

Evolution of GPUs
(Shader Model 2.0)
GeForce FX Series (NV3x)
DirectX 9.0
Floating Point and Long
Vertex and Pixel Shaders
Shader Model 2.0
256 vertex ops
32 tex + 64 arith pixel ops

Shader Model 2.0a

256 vertex ops
Up to 512 ops

Shading Languages

Dawn Demo
(NVIDIA)

NV30
200M triangles/sec
2G pixel ops/sec
125M transistors
Shader Model 2.0a

HLSL, Cg, GLSL

2003

Evolution of GPUs
(Shader Model 3.0)

GeForce 6 Series (NV4x)

DirectX 9.0c
Shader Model 3.0
Dynamic Flow Control in
Vertex and Pixel Shaders1
Branching, Looping, Predication,

Vertex Texture Fetch

High Dynamic Range (HDR)
64 bit render target
FP16x4 Texture Filtering and Blending
1Some flow control first introduced in SM2.0a

Far Cry HDR

(Ubisoft/Crytek)

NV40
600M triangles/sec
12.8G pixel ops/sec
220M transistors
Shader Model 3.0
Rotated Grid MSAA
16x Aniso, SLI

2004

Far Cry No HDR/HDR Comparison

Evolution of GPUs
(Shader Model 4.0)
GeForce 8 Series (G8x)
DirectX 10.0

Shader Model 4.0

Geometry Shaders
No caps bits
Unified Shaders

New Driver Model in Vista

CUDA based GPU computing
GPUs become true computing
processors measured in GFLOPS

Crysis
(EA/Crytek)

G80
Unified Shader Cores w/
Stream Processors
681M transistors
Shader Model 4.0
8x MSAA, CSAA

2006

Crysis. Images courtesy of Crytek.

As Of Today

GeForce GTX 280 (GT200)

DX10
1.4 billion transistors
576 mm2 in 65nm CMOS

240 stream processors

933 GFLOPS peak
1.3GHz processor clock
1GB DRAM
512 pin DRAM interface
142 GB/s peak

Stunning Graphics Realism

Lush, Rich Worlds

Crysis 2006 Crytek / Electronic Arts

Hellgate: London 2005-2006 Flagship Studios, Inc. Licensed by NAMCO BANDAI Games America, Inc.

Incredible Physics Effects

Core of the Definitive Gaming Platform

What Is Behind This Computing Revolution?

Unified Scalar Shader Architecture
Highly Data Parallel Stream Processing
Next, lets try to understand what these terms mean

Unified Scalar Shader Architecture

Graphics Pipelines For Last 20 Years

Processor per function
Vertex
Triangle

T&L evolved to vertex shading

Triangle, point, line setup

Pixel

Flat shading, texturing, eventually pixel

shading

ROP

Blending, Z-buffering, antialiasing

Memory

Wider and faster over the years

Shaders in Direct3D
DirectX 9:
Vertex Shader, Pixel Shader
DirectX 10:
Vertex Shader, Geometry Shader, Pixel Shader
DirectX 11:
Vertex Shader, Hull Shader, Domain Shader, Geometry Shader, Pixel
Shader, Compute Shader
Observation: All of these shaders require the same basic
functionality: Texturing (or Data Loads) and Math Ops.

Unified Pipeline

Physics
Vertex

Geometry
(new in DX10)

Future

Texture +
Floating
Point
Processor

Pixel

Compute

ROP
Memory

(CUDA, DX11
Compute, OpenCL)

Why Unify?
Vertex Shader

Pixel Shader
Idle hardware

Vertex Shader
Idle hardware

Unbalanced
and inefficient
utilization in nonunified architecture

Heavy Geometry
Workload Perf = 4

Pixel Shader
Heavy Pixel
Workload Perf = 8

Why Unify?
Unified Shader
Vertex Workload
Pixel

Optimal utilization
In unified architecture

Heavy Geometry
Workload Perf = 11

Unified Shader
Pixel Workload
Vertex

Heavy Pixel
Workload Perf = 11

Why Scalar Instruction Shader (1)

Vector ALU efficiency varies

4
MAD r2.xyzw, r0.xyzw, r1.xyzw 100% utilization

3
DP3 r2.w, r0.xyz, r1.xyz 75%

2
MUL r2.xy, r0.xy, r1.xy 50%

1
ADD r2.w, r0.x, r1.x 25%

Why Scalar Instruction Shader (2)

Vector ALU with co-issue better but not perfect

DP3 r2.x, r0.xyz, r1.xyz

} 100%
4
ADD r2.w, r0.w, r1.w

3
DP3 r2.w, r0.xyz, r1.xyz

Cannot co-issue
1
ADD r2.w, r0.w, r2.w
Vector/VLIW architecture More compiler work required
G8x, GT200: scalar always 100% efficient, simple to compile
Up to 2x effective throughput advantage relative to vector

Complex Shader Performance on Scalar Arch.

Procedural Perlin Noise Fire
Procedural Fire
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
7900GTX

8800GTX

Conclusion
Build a unified architecture with scalar cores where all shader
operations are done on the same processors

Stream Processing

The Supercomputing Revolution (1)

The Supercomputing Revolution (2)

What Accounts For This Difference?

Need to understand how CPUs and GPUs differ
Latency Intolerance versus Latency Tolerance
Task Parallelism versus Data Parallelism
Multi-threaded Cores versus SIMT (Single Instruction Multiple Thread) Cores
10s of Threads versus 10,000s of Threads

Latency and Throughput

Latency is a time delay between the moment something is initiated,
and the moment one of its effects begins or becomes detectable
For example, the time delay between a request for texture reading and texture
data returns

Throughput is the amount of work done in a given amount of time

For example, how many triangles processed per second

CPUs are low latency low throughput processors

GPUs are high latency high throughput processors

Latency (1)
GPUs are designed for tasks that can tolerate latency
Example: Graphics in a game (simplified scenario):
CPU

Generate
Frame 0

Generate
Frame 1

Generate
Frame 2

GPU

Idle

Render
Frame 0

Render
Frame 1

Latency between frame generation and rendering (order of milliseconds)

To be efficient, GPUs must have high throughput, i.e. processing

millions of pixels in a single frame

Latency (2)
CPUs are designed to minimize latency
Example: Mouse or keyboard input

Caches are needed to minimize latency

CPUs are designed to maximize running operations out of cache
Instruction pre-fetch
Out-of-order execution, flow control

CPUs need a large cache, GPUs do not

GPUs can dedicate more of the transistor area to computation horsepower

CPU versus GPU Transistor Allocation

GPUs can have more ALUs for the same sized chip and therefore run
many more threads of computation
ALU

ALU

Control

Cache
DRAM

DRAM

CPU

GPU

Modern GPUs run 10,000s of threads concurrently

Managing Threads On A GPU

How do we:
Avoid synchronization issues between so many threads?
Dispatch, schedule, cache, and context switch 10,000s of threads?
Program 10,000s of threads?

Design GPUs to run specific types of threads:

Independent of each other no synchronization issues
SIMD (Single Instruction Multiple Data) threads minimize thread management
Reduce hardware overhead for scheduling, caching etc.
Program blocks of threads (e.g. one pixel shader per draw call, or group of pixels)

Any problems which can be solved with this type of computation?

Data Parallel Problems

Plenty of problems fall into this category (luckily )
Graphics, image & video processing, physics, scientific computing,

This type of parallelism is called data parallelism

And GPUs are the perfect solution for them!
In fact the more the data, the more efficient GPUs become at these algorithms
Bonus: You can relatively easily add more processing cores to a GPU and
increase the throughput

Parallelism in CPUs v. GPUs

CPUs use task parallelism
Multiple tasks map to multiple
threads

GPUs use data parallelism

SIMD model (Single Instruction
Multiple Data)
Same instruction on different data

Tasks run different instructions

10s of relatively heavyweight threads
run on 10s of cores
Each thread managed and scheduled
explicitly
Each thread has to be individually
programmed

10,000s of lightweight threads on 100s

of cores
Threads are managed and scheduled
by hardware
Programming done for batches of
threads (e.g. one pixel shader per
group of pixels, or draw call)

Stream Processing
What we just described:
Given a (typically large) set of data (stream)
Run the same series of operations (kernel or shader) on all of the data (SIMD)

GPUs use various optimizations to improve throughput:

Some on-chip memory and local caches to reduce bandwidth to external memory
Batch groups of threads to minimize incoherent memory access
Bad access patterns will lead to higher latency and/or thread stalls.

Eliminate unnecessary operations by exiting or killing threads

Example: Z-Culling and Early-Z to kill pixels which will not be displayed

To Summarize
GPUs use stream processing to achieve high throughput

GPUs designed to solve problems that tolerate high latencies

High latency tolerance
Lower cache requirements
Less transistor area for cache
More area for computing units
More computing units
10,000s of SIMD threads and high throughput
GPUs win

Additionally:
Threads managed by hardware
You are not required to write code for each
thread and manage them yourself
Easier to increase parallelism by adding more processors

So, fundamental unit of a modern GPU is a stream processor

G80 and GT200 Streaming Processor

Architecture

Building a Programmable GPU

The future of high throughput computing is programmable
stream processing
So build the architecture around the unified scalar stream
processing cores
GeForce 8800 GTX (G80) was the first GPU architecture
built with this new paradigm

G80 Replaces The Pipeline Model

Host
Input Assembler

Vtx Thread Issue

L2
FB

Pixel Thread Issue

L2
FB

Geom Thread Issue

L2
FB

Thread Processor

128 Unified Streaming

Processors

Setup / Rstr / ZCull

L2
FB

GT200 Adds More Processing Power

Host CPU

System Memory

GPU

Host Interface
Viewport / Clip /
Setup / Raster /
ZCull

Input Assemble
Vertex Work
Distribution

Geometry Work
Distribution

Pixel Work
Distribution

Compute Work
Distribution

Interconnection Network
ROP

DRAM

ROP

DRAM

ROP

DRAM

ROP

DRAM

ROP

DRAM

ROP

DRAM

ROP

DRAM

ROP

DRAM

8800GTX (high-end G80)

16 Stream Multiprocessors
Each one contains 8 unified streaming processors 128 in total

GTX280 (high-end GT200)

24 Stream Multiprocessors
Each one contains 8 unified streaming processors 240 in total

Inside a Stream Multiprocessor (SM)

Scalar register-based ISA

I-Cache

Multithreaded Instruction Unit

Up to 1024 concurrent threads
Hardware thread scheduling
In-order issue

8 SP: Thread Processors

IEEE 754 32-bit floating point
32-bit and 64-bit integer
16K 32-bit registers

2 SFU: Special Function Units

sin, cos, log, exp

MT Issue
C-Cache
TPC

SP SP
SP SP
SP SP
SP SP
SFU SFU

Double Precision Unit

IEEE 754 64-bit floating point
Fused multiply-add

16KB Shared Memory

DP
Shared
Memory

Multiprocessor Programming Model

Workloads are partitioned into blocks of threads among
multiprocessors
a block runs to completion
a block doesnt run until resources are available

Allocation of hardware resources

shared memory is partitioned among blocks
registers are partitioned among threads

Hardware thread scheduling

any thread not waiting for something can run
context switching is free every cycle

Memory Hierarchy of G80 and GT200

SM can directly access device memory (video memory)
Not cached
Read & write
GT200: 140 GB/s peak

SM can access device memory via texture unit

Cached
Read-only, for textures and constants
GT200: 48 GTexels/s peak

On-chip shared memory shared among threads in an SM

important for communication amongst threads
provides low-latency temporary storage
G80 & GT200: 16KB per SM

Performance Per Millimeter

For GPU, performance == throughput
Cache are limited in the memory hierarchy

Strategy: hide latency with computation, not cache

Heavy multithreading
Switch to another group of threads when the current group is waiting for memory
access

Implication: need large number of threads to hide latency

Occupancy: typically 128 threads/SM minimum
Maximum 1024 threads/SM on GT200 (total 1024 * 24 = 24,576 threads)

Strategy: Single Instruction Multiple Thread (SIMT)

SIMT Thread Execution

Group 32 threads (vertices, pixels or primitives)
into warps
Threads in warp execute same instruction at a time
Shared instruction fetch/dispatch
Hardware automatically handles divergence
(branches)

Warps are the primitive unit of scheduling

Pick 1 of 24 warps for each instruction slot

SIMT execution is an implementation choice

Shared control logic leaves more space for ALUs
Largely invisible to programmer

I-Cache
MT Issue
C-Cache
TPC

SP SP
SP SP
SP SP
SP SP
SFU SFU

DP
Shared
Memory

Shader Branching Performance

G8x/G9x/GT200 branch
efficiency is 32 threads
(1 warp)

G80 32 pixel coherence

PS Branching Efficiency

48 pixel coherence
16

If threads diverge, both

sides of branch will execute
on all 32
More efficient compared to
architecture with branch
efficiency of 48 threads

number of coherent 4x4 tiles

14
12
10
8
6
4
2
0%

20%

40%

60%

80%

100%

120%

Conclusion:
G80 and GT200 Streaming Processor Architecture
Execute in blocks can maximally exploits data parallelism
Minimize incoherent memory access
Adding more ALU yields better performance

Performs data processing in SIMT fashion

Group 32 threads into warps
Threads in warp execute same instruction at a time

Thread scheduling is automatically handled by hardware

Context switching is free (every cycle)
Transparent scalability. Easy for programming

Memory latency is covered by large number of in-flight threads

Cache is mainly used for read-only memory access (texture, constants).

Stream Processing for Graphics

DX10 Graphics Pipeline (Logical View)

GPU

CPU

Vertex
Assembly

Vertex
Vertex
Vertex
Vertex
Shader
Shader
Shader
Shader

Vertex
Vertex
Vertex
Geometry
Shader
Shader
Shader
Shader

Rasterizer

Texture

Framebuffer

Vertex
Vertex
Vertex
Pixel
Shader
Shader
Shader
Shader

Blending

DX10 API Mapped To GPU

A-B

FB
H

GEOM
GS

SETUP

SHADER

TEXTURE

FB
H

I
Unified Shading Units

RASTERIZER

D, J

Fixed Function Units

Vertex Shading Path
Geometry Shading Path

ROP

Pixel Shading Path

A. Front End
B. Index and Vertex Input
C. Topology and Viewport Process
D. Z-Cull
E. Texture Fetch

F. Depth Test, Alpha-Blending, AA

G. Shader Cores
H. Frame Buffer Controller
I. Primitive Setup
J. Rasterization

Texturing
G80
Every 2 SMs connect to a TMU (Texture Mapping
Unit)
8 TMUs in total
SP

GT200
Every 3 SMs connect to a TMU
10 TMUs in total

Each TMU contains 4 TA (Texture

Addressing) and 8 TF (Texture Filtering)
TA:TF = 1:2, allowing full-speed FP16 bilinear
filtering and free 2xAF

Decoupled shader & texture pipelines

G7x Architecture

Math

Math
Tex

G8x and later Architecture

Math

Math
Tex

A
Tex

Cover texture latency with thousands of threads in flight

A
Tex

Texture Filtering Quality

16xAF, 7900GTX vs. 8800GTX

NVIDIA 7900 GTX

NVIDIA 8800 GTX and after

Detail of a single ROP pixel pipeline

Input Shaded
InputData
Shaded
Fragment

Fragment Data

Pixel X-Bar Interconnect

INT8, FP16 & FP32 formats Support

Up to 16x Multisampling

MSAA
CSAA

AA with HDR
Color & Z Compression (2x)

Z Comp

C Comp

Multiple Render Targets (8)

Z ROP

C ROP

Frame Buffer
Partition

ROP Unit in GPU: Early Z

Z-Buffer removes pixels not
visible
Z-Culling removes regions of
non-visible pixels @ high rate
Early-Z removes individual
pixel quads (2x2) before they
are processed

Z-Cull
Early-Z

Pixel
Killed

Shader
Texture

Z-Check

Memory

Pixel
Killed

Next Generation Antialiasing

High Quality AA with incredible performance

Highest Quality AA
AA with HDR
Coverage Sampling AA (CSAA)

3DMark05 Futuremark

Review: Aliased Rendering

Only one sample per pixel
Coverage, Color, Depth and Stencil

Precisely evaluated, but under-sampled

Anti-Aliasing Fundamentals
Pixels in an image need to represent finite areas from the scene
description, not just infinitely small sample points
Ideal anti-aliasing would be a smooth convolution over the
pixel area
Too expensive for general scenes

So we approximate by taking many samples

The more samples the better

Anti-Aliasing
Anti-aliasing properly accounts for the contribution of all
the primitives that intersect a pixel

Supersampling
Store a color and depth value for each
sub-sample at every pixel. Calculate unique color and depth
values at every sub-sample for every pixel of every drawn
triangle
Strengths
Robust image quality by brute force

Weaknesses
Inefficient: Expensive pixel shaders and texture fetches are executed for
every sub-sample; wasteful because color results within a pixel are
nearly identical

Multisampling (SGI, 1993)

Store a unique color and depth value for each sub-sample for
each pixel, but re-use one calculated color for all color subsamples for a polygon
Strengths
Only one color value calculated per pixel per triangle
Z and stencil evaluated precisely; interpenetrations and bulkheads
handled correctly

Weaknesses
Memory footprint N times larger than 1x
Expensive to extend to 8x quality and beyond

Motivation for CSAA

Multisampling evolved from 1 2 4 samples
Beyond 4 sub-samples, storage cost increases faster than the
image quality improves
Even more true with HDR
64b and 128b per color sub-sample!

For the vast majority of edge pixels, 2 colors

are enough
What matters is more detailed coverage information

Coverage Sampled Antialiasing

Compute and store boolean coverage at
16 sub-samples
Compress the redundant color and depth/stencil information
into the memory footprint and bandwidth of 4 or 8 multisamples
Performance of 4xMSAA with 16x quality

CSAA Quality Levels

Quality Level:

8xQ

16x

16xQ

Texture/Shader
Samples

Stored Color/Z
Samples

Coverage Samples

Half Life 2

NVIDIA 4x

NVIDIA 16xQ

Half Life 2 Valve Software

Shadow Edges in FEAR

FEAR - Monolith

FEAR 4x

FEAR - Monolith

FEAR 16x

Higher Quality Object Edges

FEAR - Monolith

Summary of CSAA Advantages

Only need to traverse the scene once

Small, fixed video memory footprint
Handles inter-penetrating objects
Pixel shaders are only run where color detail is needed
16x stored positional coverage

16x quality for ~4x performance

GPU beyond Graphics

A scalable parallel programming model and software

environment for parallel computing
Minimal extensions to familiar C/C++ environment
Heterogeneous serial-parallel programming model

NVIDIA Corporation 2008

The Democratization of Parallel Computing

GPUs and CUDA bring parallel computing to the masses
Over 100M CUDA-capable GPUs sold to date
60K CUDA developers
A developer kit (i.e. GPU) costs ~$200 (for 500 GFLOPS)

Data-parallel supercomputers are everywhere!

CUDA makes this power accessible
Were already seeing innovations in data-parallel computing

Massively parallel computing has become a commodity technology!

NVIDIA Corporation 2008

More about CUDA at Seminar

Parallel Computing with CUDA

Questions?

Rust Programming For Your First Finance Job - A Comprehensive - Publishing, Reactive & Van Der Post, Hayden - 2024 - Reactive Publishing - Anna's Archive
No ratings yet
Rust Programming For Your First Finance Job - A Comprehensive - Publishing, Reactive & Van Der Post, Hayden - 2024 - Reactive Publishing - Anna's Archive
537 pages
DXL Reference Manual
No ratings yet
DXL Reference Manual
998 pages
Cuda by Example - Book
No ratings yet
Cuda by Example - Book
311 pages
15890-Modern Processor Design
50% (4)
15890-Modern Processor Design
331 pages
Uvm LRM
No ratings yet
Uvm LRM
472 pages
(GPU Zen (Book 1) ) Wolfgang Engel - GPU Zen - Advanced Rendering Techniques (2017, Bowker Identifier Services)
No ratings yet
(GPU Zen (Book 1) ) Wolfgang Engel - GPU Zen - Advanced Rendering Techniques (2017, Bowker Identifier Services)
360 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
21 pages
Boolean Algebra & Logic Gates
100% (1)
Boolean Algebra & Logic Gates
72 pages
HEVC
No ratings yet
HEVC
208 pages
Banafa A Quantum Computing and Other Transformative Technologies 2023
No ratings yet
Banafa A Quantum Computing and Other Transformative Technologies 2023
260 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
42 pages
NVIDIA GPU Computing - A Journey From PC Gaming To Deep Learning
100% (1)
NVIDIA GPU Computing - A Journey From PC Gaming To Deep Learning
91 pages
Parallel Processing With Cuda
No ratings yet
Parallel Processing With Cuda
25 pages
Main GPU
No ratings yet
Main GPU
87 pages
001 Intro
No ratings yet
001 Intro
66 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
GPU Introduction
No ratings yet
GPU Introduction
52 pages
Arm Assembly Language Programming
100% (4)
Arm Assembly Language Programming
170 pages
GPU Architecture and Function: Michael Foster and Ian Frasch
No ratings yet
GPU Architecture and Function: Michael Foster and Ian Frasch
35 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
OpenACC Fundamentals
No ratings yet
OpenACC Fundamentals
38 pages
Open Quantum Assembly Language
No ratings yet
Open Quantum Assembly Language
45 pages
Uvm Tutorial
100% (2)
Uvm Tutorial
36 pages
Lab 4 - TCP, UDP - 2020 PDF
No ratings yet
Lab 4 - TCP, UDP - 2020 PDF
16 pages
C Programming For Embedded Microcontrollers, 2008, Warwick A. Smith, 0905705807, 9780905705804, Elektor International Media, 2008
100% (1)
C Programming For Embedded Microcontrollers, 2008, Warwick A. Smith, 0905705807, 9780905705804, Elektor International Media, 2008
21 pages
Verilog Session 1,2,3
No ratings yet
Verilog Session 1,2,3
68 pages
Python (Punithavathi)
No ratings yet
Python (Punithavathi)
228 pages
Formal Verificaiton of A FIFO
No ratings yet
Formal Verificaiton of A FIFO
8 pages
Introduction To OpenCL Programming (201005)
No ratings yet
Introduction To OpenCL Programming (201005)
132 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
88 pages
OpenACC For Programmers 2018
No ratings yet
OpenACC For Programmers 2018
317 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
141 pages
Module 4.1 - Memory and Data Locality: GPU Teaching Kit
No ratings yet
Module 4.1 - Memory and Data Locality: GPU Teaching Kit
132 pages
Audio Sample Selection With Generative Adversarial Networks
No ratings yet
Audio Sample Selection With Generative Adversarial Networks
80 pages
1 Introduction To Parallel Computing
No ratings yet
1 Introduction To Parallel Computing
58 pages
Arm Cortex m3 and m4 Mcu Architecture
No ratings yet
Arm Cortex m3 and m4 Mcu Architecture
7 pages
Quantum Computing Cribsheet
No ratings yet
Quantum Computing Cribsheet
1 page
Linux Io Stack Diagram v0.1
No ratings yet
Linux Io Stack Diagram v0.1
1 page
EEE3342LabManual Small
No ratings yet
EEE3342LabManual Small
133 pages
Deep Learning For Edge Computing Applications A ST
No ratings yet
Deep Learning For Edge Computing Applications A ST
14 pages
The Generative Enterprise
No ratings yet
The Generative Enterprise
26 pages
How To Create An Operating System
50% (2)
How To Create An Operating System
29 pages
MFML PDF
No ratings yet
MFML PDF
101 pages
Machine Learning, Deep Learning, Computer Vision On Raspberry Pi2019-20
No ratings yet
Machine Learning, Deep Learning, Computer Vision On Raspberry Pi2019-20
2 pages
VHDL File
No ratings yet
VHDL File
25 pages
HEVC
No ratings yet
HEVC
50 pages
Manual
100% (4)
Manual
30 pages
Mathematics For AI
No ratings yet
Mathematics For AI
8 pages
Experiment No 2 Aim: To Study Virtualization and Install KVM. Virtualization
No ratings yet
Experiment No 2 Aim: To Study Virtualization and Install KVM. Virtualization
12 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
State Machines For Event Driven Systems Samek
No ratings yet
State Machines For Event Driven Systems Samek
10 pages
Neuromorphic Computing
No ratings yet
Neuromorphic Computing
14 pages
Artificial Intelligence by Bala
No ratings yet
Artificial Intelligence by Bala
14 pages
Sensor Fusion
No ratings yet
Sensor Fusion
30 pages
Dokumen - Pub Introduction To Computer Graphics Using Opengl and Java 3nbsped 3031281349 9783031281341 9783031281358
No ratings yet
Dokumen - Pub Introduction To Computer Graphics Using Opengl and Java 3nbsped 3031281349 9783031281341 9783031281358
476 pages
Vodoo CPI Dusurme
No ratings yet
Vodoo CPI Dusurme
30 pages
Pre Silicon Validation Whitepaper
No ratings yet
Pre Silicon Validation Whitepaper
3 pages
Ds Viva Q
No ratings yet
Ds Viva Q
13 pages
Stronghold Legends Manual English
100% (1)
Stronghold Legends Manual English
51 pages
Sims 2 - (Comandos)
No ratings yet
Sims 2 - (Comandos)
10 pages
Install TensorFlow With Pip - TensorFlow
No ratings yet
Install TensorFlow With Pip - TensorFlow
3 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
22 pages
NVIDIA OpenCL JumpStart Guide
No ratings yet
NVIDIA OpenCL JumpStart Guide
15 pages
Introduction To The Universal Render Pipeline For Advanced Unity Creators 2024
No ratings yet
Introduction To The Universal Render Pipeline For Advanced Unity Creators 2024
140 pages
Latestlog
No ratings yet
Latestlog
24 pages
Light Map Tutorial
No ratings yet
Light Map Tutorial
14 pages
El Mansouri Jalal Rendering Rainbow Six PDF
No ratings yet
El Mansouri Jalal Rendering Rainbow Six PDF
82 pages
OpenGL ES 2 0 Programming Guide 1st Edition Aaftab Munshi Download PDF
100% (2)
OpenGL ES 2 0 Programming Guide 1st Edition Aaftab Munshi Download PDF
45 pages
GLDEFS
No ratings yet
GLDEFS
29 pages
Computer Graphics - Handouts
No ratings yet
Computer Graphics - Handouts
26 pages
SystemVerilog DPI With SystemC
No ratings yet
SystemVerilog DPI With SystemC
17 pages
User Manual Extreme PBR Combo +
No ratings yet
User Manual Extreme PBR Combo +
15 pages
Log
No ratings yet
Log
73 pages
Invector Third Person Controller - Basic Locomotion Guide
No ratings yet
Invector Third Person Controller - Basic Locomotion Guide
34 pages
Z-Buffer Optimizations: Patrick Cozzi Analytical Graphics, Inc
No ratings yet
Z-Buffer Optimizations: Patrick Cozzi Analytical Graphics, Inc
37 pages
Cgrfiii N
No ratings yet
Cgrfiii N
33 pages
vg2005 Stegmaier
No ratings yet
vg2005 Stegmaier
9 pages
RVGL Changelog PDF
No ratings yet
RVGL Changelog PDF
43 pages
True Impostors
No ratings yet
True Impostors
7 pages
2006-1252: Concept Modeling With Nurbs, Polygon, and Subdivision Surfaces
No ratings yet
2006-1252: Concept Modeling With Nurbs, Polygon, and Subdivision Surfaces
7 pages
Opengl: Introduction To Opengl - Opengl Architecture
No ratings yet
Opengl: Introduction To Opengl - Opengl Architecture
28 pages
Cga 9258atd
No ratings yet
Cga 9258atd
2 pages
Shader Mixer Tutorial
No ratings yet
Shader Mixer Tutorial
16 pages
Lab 4 - Modeling An Apple
No ratings yet
Lab 4 - Modeling An Apple
11 pages
UserBenchmark - Intel Arc A750 Vs Nvidia GTX 1660S (Super) PDF
No ratings yet
UserBenchmark - Intel Arc A750 Vs Nvidia GTX 1660S (Super) PDF
6 pages
Maya Bifrost Overview Sheet
No ratings yet
Maya Bifrost Overview Sheet
3 pages
N Switch
No ratings yet
N Switch
3 pages
Introduction To Replacement Shaders & Shader Keywords PDF
No ratings yet
Introduction To Replacement Shaders & Shader Keywords PDF
12 pages
A Survey Paper On OpenGL ES
No ratings yet
A Survey Paper On OpenGL ES
4 pages