Presentation 3
Presentation 3
Computing
(DJ19DSC802)
Basics of Parallelization:
• Data Parallelism
• Functional parallelism,
• Parallel Scalability,
• Factors that limit parallel execution
• Scalability matrices
• Refined Performance model
• load imbalance
Technical Challenges:
• Quantum Tunneling:
a transistor smaller than 5 nm will not be able to stop the flow of electrons due to
tunneling of electrons in its depletion region. Due to tunneling, the electrons will not
perceive the depletion region and it will ‘tunnel’ through it as if it did not exist. And a
transistor that cannot stop the flow of electrons is pretty useless.
• Size of Atom: we are now slowly approaching the size of an atom itself and you
cannot build a transistor smaller than an atom! The Silicon atom has a diameter of
around 1 nm and right now we are manufacturing transistors with gates at about 10
times that size. In a few years, not taking into account quantum effects, we will not be
able to go any smaller considering that we are reaching the physical limit of how small
something can be.
• Heating and Current Effects: As we go smaller, transistors tend to get more “leaky”,
meaning that even in their OFF state, they let some current pass through. This is called
the leakage current.
• Dennard scaling ignored the “leakage current” and “threshold voltage”, which establish
a baseline of power per transistor. As transistors get smaller, power density increases
because these don’t scale with size These created a “Power Wall” that has limited
practical processor frequency to around 4 GHz since 2006
• https://medium.com/@csoham358/beginners-guide-to-
moore-s-law-3e00dd8b5057
Dennard Scaling
• Power = alpha * CFV2
• Alpha – percent time switched
• C = capacitance ♦ F = frequency
• V = voltage • Capacitance is related to area
• So, as the size of the transistors shrunk, and the voltage was reduced,
circuits could operate at higher frequencies at the same power
End of Dennard Scaling
• Dennard scaling ignored the “leakage current” and “threshold
voltage”, which establish a baseline of power per transistor.
• As transistors get smaller, power density increases because these
don’t scale with size
• These created a “Power Wall” that has limited practical processor
frequency to around 4 GHz since 2006
Memory Latency and Bandwidth
Latency refers to the delay between a request for data from
the CPU and when that data is actually available to be used.
It's typically measured in nanoseconds (ns)
Bandwidth refers to the rate at which data can be
transferred between the memory and the CPU, usually
measured in megabytes per second (MB/s) or gigabytes per
second (GB/s).
After each core completes execution of this code, its variable my sum will store the sum
of the values computed by its calls to Compute next value. For example, if there are
eight cores, n = 24, and the 24 calls to Compute next value return the values If ‘n’ = 24
and 24 calls to compute_next_value() returns value:
If Core 0
mast : 8 + 19 + 7 + 15 + 7 +13 + 12 +14
er sum: = 95
core
=
Parallel Programming Platforms
- Ex: Instruction level parallelism (when number of cores is
large)
If ‘n’ is a power of 2,
these operations performed in log2(n)
steps
Ts = Θ(n), Tp = Θ(log(n))
• In a distributed memory system, each core has its own, private memory,
and the cores must communicate explicitly by doing something like sending
messages across a network.
Parallel Programming Platforms
There are two main types of parallel systems:
Shared memory systems and distributed-memory systems.
• MPI are libraries of type definitions, functions, and macros that can
be
used in C programs.
(a) Shared Memory System (b) Distributed Memory System (c) GPU Architecture
Limitation of Memory System Performance
• Performance of a program relies on
• the speed of processor and
• the speed of the memory system (feed data to the
processor)
• Every time memory request is made, the processor must wait 100 cycles
Basic concepts
Ts = Tp =
Θ(n), Θ(log(n))
se = F, fraction of calculation
that is serial