0% found this document useful (0 votes)
46 views2 pages

Lec 01

This document provides an introduction to computational science and numerical methods. It discusses parallel computing models including shared memory with OpenMP and message passing with MPI. Key concepts covered include speedup, parallel efficiency, scaling, and Amdahl's law. Important notes state that GPUs are better for structured data while CPUs can handle varied computation and pipelining, and that GPUs embed many processing units without pipelining.

Uploaded by

SNaveenMathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views2 pages

Lec 01

This document provides an introduction to computational science and numerical methods. It discusses parallel computing models including shared memory with OpenMP and message passing with MPI. Key concepts covered include speedup, parallel efficiency, scaling, and Amdahl's law. Important notes state that GPUs are better for structured data while CPUs can handle varied computation and pipelining, and that GPUs embed many processing units without pipelining.

Uploaded by

SNaveenMathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Lecture 01

Naveen Mathew Nathan S.


8/27/2019

Introduction
Book:
Computational science:
• simulations: getting numerical solutions for experimental settings that arise from theory
• data science: arises from observations
Numerical methods: eg - converting differential equations into algebraic equations.
Example: some observed pressure, temperature and velocity, use differential equation and solve numerically.
Did not work because the time step was large.
Equations: ∂e/∂t + v∇e = −P/ρ∇v + HOT , ∂v/∂t = ... + HOT
CPU - processor register - CPU cache (level 1, 2 and 3) - physical memory (RAM) - solid state memory
(non-volatile flash based) - virtual memory (file based storage)
Writing code to maximize the use of cache can speed up computation.
Pipelining:
CPUs don’t compute just 1 task at a time. They process some of a task, start another task, process, start, . . .
Compiler handles pipelining.
Vectorization:
Adding two double vectors: c = a + b, len(a) = 8
8 64-bit load-add-stores
Vectorization: add in chunks of 8 each. Add all elements
Heat dissipation becomes a problem as number of FLOPs of a processor increases. Alternative: more threads
of execution (greater concurrency). Also: hybrid parallelism.
Multiplying 2 floats ~ 20 pJ, read operand from on-chip memory at far end of chip ~ 1 nJ, Read operand
from off-chip RAM ~ 16 nJ
In future FLOP will be free, but storage and communication will be expensive

Parallel computing models

Shared memory (Eg: OpenMP)

• Requires special hardwards, each process can see all memory, parallelism implemented via compiler
directives
!OM P P ARALLELSHARED(A, B)P RIV AT E(i)!OMP DO SCHEDULE(STATIC) do i = 1, 199 A(i) =
B(i) + . . . enddo !OM P EN DDO!OMP END PARALLEL

1
Message passing (Eg: MPI)

• Each process has private memory, parallelism implemented via explicit transfers, can work on any
networked CPUs
call mpi_init(err) call mpi_comm_rank(MPI_COMM_WORLD, me, err) call mpi_snedrecv(parameters)
do i = 1, 199 A(i) = B(i) + . . . enddo
call mpi_finalize(err)

Concepts

• Speedup: SN = ttN1 where ti is time in i processors


• Parallel efficiency: speedup/N (efficiency = 1 => perfect parallel performance)
• Scaling: strong-scaling: speedup ∝ N for fixed workload, weak-scaling: speedup constant for workload
∝ N (increasing workload and number of processors)
N
• Amdahl’s law: SN = BN +(1−B) , B = % of algo that is serial for N processors

Important notes:
Where GPU may not be best: when the pattern of computation is different in each data unit. GPUs are
better for structured data.
CPU can do a bunch of tasks. A previous gate decides whether a gate receives a signal.
GPU doesn’t allow pipelining, but allows large number of processing units to be embedded.

You might also like