Unit IV Parallelism
Unit IV Parallelism
Deemed to be University
Unit IV - PARALLELISM
Dr. Shamanth N.
Unit IV - PARALLELISM
What is Parallelism?
● Doing Things Simultaneously
○ Same thing or different things
○ Solving one larger problem
● Serial Computing
○ Problem is broken into stream of instructions that are executed
sequentially one after another on a single processor.
○ One instruction executes at a time.
● Parallel Computing
○ Problem divided into parts that can be solved concurrently.
○ Each part further broken into stream of instructions
○ Instructions from different parts executes simultaneously.
Serial computation
Problem
CPU
N N-1 ……. 2 1
Instructions
Parallel Computing
CPU
CPU
Problem
CPU
CPU
● Bit level
● Instruction level
● Data parallelism
● Task parallelism
● Scientific Computing.
○ Numerically Intensive Simulations
● Database Operations and Information Systems
○ Web based services, Web search engines, Online transaction
processing.
○ Client and inventory database management, Data mining, MIS
○ Geographic information systems, Seismic data Processing
● Artificial intelligence, Machine Learning, Deep Learning
● Real time systems and Control Applications
○ Hardware and Robotic Control, Speech processing, Pattern
Recognition.
SISD SIMD
Single Instruction, Single Instruction,
Single Data Multiple Data
MISD MIMD
Multiple Instruction, Multiple Instruction,
Single Data Multiple Data
Excellence and Service
CHRIST
SISD: Deemed to be University
SISD (Contd..)
Vector Processors
Vector Processors
Vector Processors
Vector Processors
Vector Processor
● Hardware need only check for data hazards between two vector
instructions once per vector operand
● The cost of the latency to main memory is seen only once for the entire
vector, rather than once for each word of the vector.
● Control hazards that would normally arise from the loop branch are
non-existent.
● The savings in instruction bandwidth and hazard checking plus the
efficient use of memory bandwidth give vector architectures
advantages in power and energy versus scalar architectures.
Processors
Advantages: Instructions from other threads can be executed when one thread stalls. This
approach is known as interleaving and it improves throughput.
Disadvantages: it slows down the execution of the individual threads, since a thread that is
ready to execute without stalls will be delayed by instructions from other threads.
Hardware Multithreading (contd..)
• COARSE-GRAINED MULTITHREADING A version of hardware multithreading
that suggests switching between threads only after significant events, such as a cache
miss.
• Cycle i: instruction j from thread A is issued.
• Cycle i + 1: instruction j + 1 from thread A is issued.
• Cycle i + 2: instruction j + 2 from thread A is issued, which is a load instruction that
misses in all caches.
• Cycle i + 3: thread scheduler invoked, switches to thread B.
• Cycle i + 4: instruction k from thread B is issued.
• Cycle i + 5: instruction k + 1 from thread B is issued.
Advantages: relieves the need to have thread switching be extremely fast and is much less likely to
slow down the execution of an individual thread, since instructions from other threads will only
be issued when a thread encounters a costly stall.
Disadvantages: The new thread that begins executing after the stall must fill the pipeline before
instructions will be able to complete. This is called start-up overhead.
Hardware Multithreading (contd..)
• SIMULTANEOUS MULTITHREADING (SMT) A version of
multithreading that lowers the cost of multithreading
by utilizing the resources needed for multiple issue,
dynamically schedule microarchitecture.
• Cycle i: instructions j and j + 1 from thread A and
instruction k from thread B are simultaneously issued.
• Cycle i + 1: instruction j + 2 from thread A,
instruction k + 1 from thread B, and instruction m from
thread C are all simultaneously issued.
Hardware Multithreading (contd..)
Graphics Processing Unit
CPU vs GPU
CPU
● At the heart of any and every computer in existence is a central processing
unit or CPU. The CPU handles the core processing tasks in a
computer—the literal computation that drives every single action in a
computer system.
Standard components in CPU
● Core(s): The central architecture of the CPU is the “core,” where all
computation and logic happens. A core typically functions through what is
called the “instruction cycle,” where instructions are pulled from memory
(fetch), decoded into processing language (decode), and executed through
the logical gates of the core (execute). Initially, all CPUs were single-core,
but with the proliferation of multi-core CPUs, we’ve seen an increase in
processing power.
CPU vs GPU
Standard components in CPU …
● Cache: Cache is super-fast memory built either within the CPU or in
CPU-specific motherboards to facilitate quick access to data the CPU is
currently using. Since CPUs work so fast to complete millions of
calculations per second, they require ultra-fast (and expensive) memory to
do it—memory that is much faster than hard drive storage or even the
fastest RAM.
● Memory Management Unit (MMU): The MMU controls data movement
between the CPU and RAM during the instruction cycle.
● CPU Clock and Control Unit: Every CPU works on synchronizing
processing tasks through a clock. The CPU clock determines the
frequency at which the CPU can generate electrical pulses, its primary
way of processing and transmitting data, and how rapidly the CPU can
work. So, the higher the CPU clock rate, the faster it will run and quicker
processor-intensive tasks can be completed.
CPU vs GPU
GPU
● Cache: Cache is super-fast memory built either within the CPU or in
CPU-specific motherboards to facilitate quick access to data the CPU is
currently using. Since CPUs work so fast to complete millions of
calculations per second, they require ultra-fast (and expensive) memory to
do it—memory that is much faster than hard drive storage or even the
fastest RAM.
● Memory Management Unit (MMU): The MMU controls data movement
between the CPU and RAM during the instruction cycle.
● CPU Clock and Control Unit: Every CPU works on synchronizing
processing tasks through a clock. The CPU clock determines the
frequency at which the CPU can generate electrical pulses, its primary
way of processing and transmitting data, and how rapidly the CPU can
work. So, the higher the CPU clock rate, the faster it will run and quicker
processor-intensive tasks can be completed.
CPU vs GPU
GPU
● Graphical processing, is generally considered one of the more complex
processing tasks for the CPU. Solving that complexity has led to
technology with applications far beyond graphics.
● The challenge in processing graphics is that graphics call on complex
mathematics to render, and those complex mathematics must compute in
parallel to work correctly. For example, a graphically intense video game
might contain hundreds or thousands of polygons on the screen at any
given time, each with its individual movement, color, lighting, and so on.
CPUs aren’t made to handle that kind of workload. That’s where graphical
processing units (GPUs) come into play.
CPU vs GPU
GPU
● GPUs are similar in function to CPU: they contain cores, memory, and
other components. Instead of emphasizing context switching to manage
multiple tasks, GPU acceleration emphasizes parallel data processing
through a large number of cores.
CPU vs GPU
Advantages of a CPU
● Flexibility: CPUs are flexible and resilient and can handle a variety of
tasks outside of graphics processing. Because of their serial processing
capabilities, the CPU can multitask across multiple activities in your
computer. Because of this, a strong CPU can provide more speed for
typical computer use than a GPU.
● Contextual Power: In specific situations, the CPU will outperform the
GPU. For example, the CPU is significantly faster when handling several
different types of system operations (random access memory, mid-range
computational operations, managing an operating system, I/O operations).
CPU vs GPU
Advantages of a CPU
● Precision: CPUs can work on mid-range mathematical equations with a
higher level of precision. CPUs can handle the computational depth and
complexity more readily, becoming increasingly crucial for specific
applications.
● Access to Memory: CPUs usually contain significant local cache memory,
which means they can handle a larger set of linear instructions and, hence,
more complex system and computational operations.
● Cost and Availability: CPUs are more readily available, more widely
manufactured, and cost-effective for consumer and enterprise use.
Additionally, hardware manufacturers still create thousands of
motherboard designs to house a wide range of CPUs.
CPU vs GPU
Disadvantages of a CPU
● Parallel Processing: CPUs cannot handle parallel processing like a GPU,
so large tasks that require thousands or millions of identical operations
will choke a CPU’s capacity to process data.
● Slow Evolution: In line with Moore’s Law, developing more powerful
CPUs will eventually slow, which means less improvement year after
year. The expansion of multi-core CPUs has mitigated this somewhat.
● Compatibility: Not every system or software is compatible with every
processor. For example, applications written for x86 Intel Processors will
not run on ARM processors. This is less of a problem as more computer
manufacturers use standard processor sets (see Apple’s move to Intel
processors), but it still presents issues between PCs and mobile devices.
CPU vs GPU
Advantages of a GPU
● High Data Throughput: a GPU consist of hundreds of cores performing
the same operation on multiple data items in parallel. Because of that, a
GPU can push vast volumes of processed data through a workload,
speeding up specific tasks beyond what a CPU can handle.
● Massive Parallel Computing: Whereas CPUs excel in more complex
computations, GPUs excel in extensive calculations with numerous
similar operations, such as computing matrices or modeling complex
systems.
CPU vs GPU
Disadvantages of a GPU
● Multitasking: GPUs aren’t built for multitasking, so they don’t have much
impact in areas like general-purpose computing.
● Cost: While the price of GPUs has fallen somewhat over the years, they
are still significantly more expensive than CPUs. This cost rises more
when talking about a GPU built for specific tasks like mining or analytics.
● Power and Complexity: While a GPU can handle large amounts of parallel
computing and data throughput, they struggle when the processing
requirements become more chaotic. Branching logic paths, sequential
operations, and other approaches to computing impede the effectiveness
of a GPU.
Questions
● SIMD is at its weakest in case or switch statements, where each execution unit
must perform a different operation on its data, depending on what data it has.