00_CourseIntroduction
00_CourseIntroduction
• TensorFlow (Google)
• Optimization
• Gpus with python 3
• Computer lab: 1/2
• Due: September 15
4
5
• We have a project for the course, UPPMAX 2023/2-23
• You need to set up an account SUPR and on UPPMAX now, see the basic
information https://www.uppmax.uu.se/support/getting-started/ and specifically
step 1 on https://www.uppmax.uu.se/support/getting-started/applying-for-a-user-
account/ (if you do not have an account yet)
HOMEWORK 6
7
Week Day Date Start time End time Lokal (Ångström) Type Content
w35 Tuesday 2023-08-29 10:15 12:00 101130, Lecture Lecture 1 Syllabus, Introduction Gpu hardware
w35 Thursday 2023-08-31 10:15 12:00 101130, Lecture Lecture 2 Background Computer Architecture
w36 Monday 2023-09-04 15:15 17:00 4103/4102 ,Computer room Laboratory experiment Lab 1 Lab assignment 1
w36 Tuesday 2023-09-05 08:15 10:00 101130 Lecture Lecture 3 Cuda Programming+ Nvidia whitepaper
w36 Thursday 2023-09-07 10:15 12:00 101130 Lecture Lecture 4 Cuda Programming+ Final Project
w37 Tuesday 2023-09-12 08:15 10:00 4103/4102, Computer room, Laboratory experiment Lab 2 Lab assignment 1/2
w37 Thursday 2023-09-14 15:15 17:00 6K1101/6K1107, Computer room, Laboratory experiment Lab 3
w37 Friday 2023-09-15 13:15 15:00 101130 Lecture Lecture 5 Neural Networks + Tensorflow
w38 Monday 2023-09-18 10:15 12:00 10K1203, Computer room Laboratory experiment Lab 4
w38 Friday 2023-09-22 15:15 17:00 10K1203, Computer room, Laboratory experiment Lab 5
w39 Monday 2023-09-25 13:15 15:00 101130 Lecture Lecture 6 Kokkos
w39 Tuesday 2023-09-26 15:15 17:00 4103/4104 Laboratory experiment Lab 6
w39 Friday 2023-09-29 15:15 17:00 10K1203, Computer room Laboratory experiment Lab 7
w40 Monday 2023-10-02 15:15 17:00 4103/4102,Computer room Laboratory experiment Lab 8
w40 Tuesday 2023-10-03 13:15 15:00 2004, Ångström Lecture Lecture 7 Kokkos
w40 Friday 2023-10-06 13:15 15:00 10K1203, Computer room Laboratory experiment Lab 9 Project assignment
w41 Wednesday 2023-10-11 15:15 17:00 10K1203, Computer room Laboratory experiment Lab 10 Time to work on Project on your own
w41 Friday 2023-10-13 13:15 15:00 10K1203, Computer room Laboratory experiment Lab 11 Project assignment
w43 Wednesday 2023-10-25 08:15 17:00 101125 Oral exam Oral exam Discussion of Project
8
• GPU Hardware (today)
• CPU hardware (Thursday)
9
• GPUs (graphics processing units) for 2D
rendering of a 3D scene (transform, clipping and
lightning)
• Question: Use extended computing capabilities
be somethings else
13
Compute Memory BW Power What
NVIDIA 9.7 TFlop/s 1.6 TB/s 400 W HPC-oriented
Ampere A100 GPU
Intel Xeon 2.9 TFlop/s 205 GB/s 270 W generic CPU
Platinum (Ice 40 cores
Lake) 2.3 GHz,
AMD Epyc 2.0 TFlop/s 205 GB/s 225 W generic CPU,
7713 64 cores
2.0 GHz
Fujitsu A64FX 3.1 TFlop/s 900 GB/s 130 W special-
purpose HPC
48 cores
2.2 GHz
14
Source: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html 15
• CPU must be good at everything
• GPU focuses on massively parallel computations with independent work
• Less flexible and more specialized
• Provide lots of resources to enable many threads in flight, e.g. registers to hold
operands
• Multithreading can hide latency → limited-sized caches
• Share control logic across many threads
16
1. SIMD width not wide enough to hide control logic enough
2. SIMD execution units work with caches primarily designed by latency
tradeoffs
3. Less parallelism exposed by traditional CPU programs, out-of-order
mechanism does not reach far enough
4. Memory access latency covered by prefetching: More power-hungry
than real access
5. SIMD execution units “extend” latency hardware → have lower latency
and thus bigger, more power-hungry transistors
6. Memory outside caches (RAM) uses less parallel but lower latency
DDR RAM architecture rather than GDDR or HBM (high-bandwidth
memory)
7. Higher clock frequencies (NVIDIA A100: ∼ 1.4 GHz, CPUs >2 GHz all-
core)
17
• Classical CPU combined with GPU
• GPU is accelerator for suitable tasks (parallel, compute)
• CPU runs the remaining tasks (bookkeeping, setup, IO, . . . )
• CPU and GPU might also share parallel tasks
• Typical hardware is a hybrid of accelerator and host
• Case for UPPMAX GPU nodes: Intel host CPU and NVIDIA T4 accelerator
18
19
20
• CUDA core or Streaming Processor
21
• A collection of SMs +
memory
23
• A collection of SMs + memory
24
25
• CUDA cores
(INT/FP32/FP64)
• LD/ST (Load/Store)
• Special function units
• Tensor Core
• Register file
• Warp scheduler
• Data caches
• Instruction buffers/caches
• Texture units
26
nvidia-ampere-architecture-whitepaper.pdf
27
• US supercomputer Frontier (first to reach 1 Exaflop/s in double precision) uses AMD GPUs. Greenest
supercomputer Tflop/Watt
30
31
• both host and device memory;
• data transfer between host and device;
• starting device kernels.
32
You need to set up an account on SUPR and UPPMAX now, see the basic
information https://www.uppmax.uu.se/support/getting-started/ and specifically step
1 on https://www.uppmax.uu.se/support/getting-started/applying-for-a-user-
account/ (if you do not have an account yet)
33