0% found this document useful (0 votes)
18 views18 pages

14 Parallelismand Data Partitioningand Repartitioning Explaination

The document provides an introduction to Ab Initio, focusing on concepts of parallelism including component, pipeline, and data parallelism. It explains how each type of parallelism operates and the importance of data partitioning in processing tasks. Additionally, it discusses the methods and components used to implement these parallelism strategies effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views18 pages

14 Parallelismand Data Partitioningand Repartitioning Explaination

The document provides an introduction to Ab Initio, focusing on concepts of parallelism including component, pipeline, and data parallelism. It explains how each type of parallelism operates and the importance of data partitioning in processing tasks. Additionally, it discusses the methods and components used to implement these parallelism strategies effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

CapGemini

Ab initio Session 14
Introduction to Ab
Initio

Ab Initio Training 1

Ab Initio Training 1
CapGemini

➢Concepts of Parallelism
➢Explanation of Data partitioning
➢Concept of Repartitioning

CapGemini Ab Initio Training 2

Ab Initio Training 2
CapGemini

Forms of Parallelism

➢Component parallelism
➢Pipeline parallelism
➢Data parallelism

CapGemini Ab Initio Training 3

Ab Initio Training 3
CapGemini

Component parallelism

➢A graph with multiple processes running simultaneously on


separate data uses component parallelism
➢In this two or more components process the records in parallel.

CapGemini Ab Initio Training 4

Ab Initio Training 4
CapGemini

Component Parallelism

Sorting Customers

Sorting Transactions

CapGemini Ab Initio Training 5

Ab Initio Training 5
CapGemini

CapGemini Ab Initio Training 6

Ab Initio Training 6
CapGemini

Pipeline Parallelism

➢- A graph with multiple components running simultaneously on


the same data uses pipeline parallelism. Each component in the
pipeline continuously reads from upstream components,
processes data, and writes to downstream components. Since a
downstream component can process records previously written
by an upstream component, both components can operate in
parallel. NOTE: To limit the number of components running
simultaneously,set phases in the graph.

CapGemini Ab Initio Training 7

Ab Initio Training 7
CapGemini

Pipeline Parallelism-cont.

➢Each component in the pipeline continuously reads from


upstream components, processes data, and writes to
downstream components. Since a downstream component can
process records previously written by an upstream component,
both components can operate in parallel.

NOTE: To limit the number of components running


simultaneously, set phases in the graph.

CapGemini Ab Initio Training 8

Ab Initio Training 8
CapGemini

Pipeline Parallelism

Processing Record: 100

Processing Record: 99

CapGemini Ab Initio Training 9

Ab Initio Training 9
CapGemini

Pipeline Parallelism-cont.

➢In this the records are processed in pipeline, i.e. the


components do not have to wait for all the records to be
processed. The records that got processed are passed to next
component in pipeline.

CapGemini Ab Initio Training 10

Ab Initio Training 10
CapGemini

Data Parallelism

➢A graph that deals with data divided into segments and operates
on each segment simultaneously uses data parallelism. Nearly
all commercial data processing tasks can use data parallelism.
To support this form of parallelism, Ab Initio provides Partition
components to segment data, and Departition components to
merge segmented data back together.
➢Partitioning is an example of data parallelism.

CapGemini Ab Initio Training 11

Ab Initio Training 11
CapGemini

Data Parallelism

CapGemini Ab Initio Training 12

Ab Initio Training 12
CapGemini

CapGemini Ab Initio Training 13

Ab Initio Training 13
CapGemini

Two Ways of Looking at Data


Parallelism

Expanded View:

Global View:
* *

CapGemini Ab Initio Training 14

Ab Initio Training 14
CapGemini

Data Parallelism

➢Scales with data.

➢Requires data partitioning.

➢Dependent upon the application, different partitioning methods


are available.

CapGemini Ab Initio Training 15

Ab Initio Training 15
CapGemini

A Data Parallel Application: The


Expanded View

CapGemini Ab Initio Training 16

Ab Initio Training 16
CapGemini

A Data Parallel Application: The Global


View

Degree of Parallelism
(Abstract)

Fan-out Flow Multifile

CapGemini Ab Initio Training 17

Ab Initio Training 17
CapGemini

Thank You

CapGemini Ab Initio Training 18

Ab Initio Training 18

You might also like