0% found this document useful (0 votes)

10 views48 pages

Intel OpenMP Webinar

Uploaded by

Nicolas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views48 pages

Intel OpenMP Webinar

Uploaded by

Nicolas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Vasanth Tovinkere | Architect, Flow Graph Analyzer

Intel® Corporation

*Other names and brands may be claimed as the property of others

What will be covered today
Task-based parallelism and task graphs
• Challenges

Overview of Intel® Advisor - Flow Graph Analyzer (FGA)

Walking through a sample
Summary

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 2
*Other names and brands may be claimed as the property of others.
3
Task-based parallelism
Advantages of task-based parallelism
• Makes parallelization efficient for irregular and runtime dependent
execution
• Promotes higher level thinking
• Improves load balancing
Tasks with dependencies
• Fall into two categories: explicit and implicit
• Extends the expressiveness of task-based parallel programming
• Reduces need for global synchronization mechanism such as task barriers

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 4
*Other names and brands may be claimed as the property of others.
Applications often contain multiple levels of
parallelism
Visible in FGA

Task Parallelism/
Message Passing

Visible in FGA

fork-join fork-join

SIMD SIMD SIMD SIMD SIMD SIMD SIMD SIMD

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 5
*Other names and brands may be claimed as the property of others.
Asynchronous task graphs (implicit vs. explicit)
OpenMP* Threading Building Blocks (TBB)

Hello World f() f()

task task task task

#pragma omp parallel

{
#pragma omp single graph g;
{ continue_node<continue_msg> h( g,
std::string s; []( continue_msg & ) {
{ cout << “Hello “;
#pragma omp task depend(out: s) Implicit dependency } );
{
derived from the Explicit dependency
s = “Hello ”;
cout << s; depend clause, in continue_node<continue_msg> w( g, expressed through
} this case the []( continue_msg & ) { the make_edge()
#pragma omp task depend(out: s) variable ‘s’ cout << “World!\n“; call
{ } );
s = “World!\n”;
cout << s; make_edge(h, w);
} h.try_put(continue_msg());
} g.wait_for_all();
}
}
Implicit Explicit

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 6
*Other names and brands may be claimed as the property of others.
Challenges with asynchronous task graphs

Creating implicit or explicit task graphs programmatically is easy

• Determining what was created is hard in many cases
New programming paradigm
Allows you to stream data through the graph, which makes debugging
challenging
Graph algorithms can be latency-bound or throughput-bound
Parallelism is unstructured in certain types of graphs, so performance analysis
can be challenging

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 7
*Other names and brands may be claimed as the property of others.
8
Intel® Advisor – Flow Graph Analyzer Toolbar supporting basic file and edition operations, visualization and
analytics that operate on the graph or performance traces

General health of the

graph displayed as a
tree-map
Palette of supported Canvas for visualizing
The area of the graphs
TBB node types
squares represent
organized in like
the CPU time taken
groups
by a node as a
percentage of the
application run and
Hierarchical
the color view of
indicates
the graph displayed
the concurrency
shown as awhen
observed tree that
node was active

Displays the execution trace data,

graph statistics and output
generated by custom analytics and
allows interactions with this data

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 9
*Other names and brands may be claimed as the property of others.
Workflows and UI features

10
Workflows: Create, Debug, Visualize and Analyze
Design mode
• Allows you to create a graph topology interactively
• Validate the graph and explore what-if scenarios
• Add C/C++ code to the node body
• Export C++ code using Threading Building Blocks (TBB) flow graph API
Analysis mode
• Compile your application (with tracing enabled)
• Capture execution traces during the application run
• Visualize/analyze in Flow Graph Analyzer
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 11
*Other names and brands may be claimed as the property of others.
Creating Asynchronous Task-graphs

12
Intel® Advisor – Flow Graph Analyzer (Design mode)
Graph Creation

Drag and Drop Support

Interactive Canvas

Analytics and Modeling

Let’s make this

our “hello” node
Validation

Code Generation

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 13
*Other names and brands may be claimed as the property of others.
Intel® Advisor – Flow Graph Analyzer (Design mode)
Serialization

GraphML* file format – uses extensions C/C++ code generated from the graph

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 14
*Other names and brands may be claimed as the property of others.
Challenges With asynchronous task graphs

 New programming paradigm

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 15
*Other names and brands may be claimed as the property of others.
Intel® Advisor – Flow Graph Analyzer (Design mode)
Compiling and collecting traces

Path must be updated so fgtrun.bat and fgt2xml.exe can be run from the command line
>cl hello_world.cpp /O2 /DTBB_USE_THREADING_TOOLS ... /link tbb.lib /OUT:hello_world.exe

>set FGT_ROOT=<installation-directory>\fga\fgt

>set INTEL_LIBITTNOTIFY64=<installation-directory>\fga\fgt\windows\bin\intel64\<vc-version>\fgt.dll

>hello_world.exe

Traces are saved to a unique directory _fgt_<date>_<time>

>fgt2xml.exe <name-for-the-trace-data-file>

Automatically converts the latest timestamped directory

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 16
*Other names and brands may be claimed as the property of others.
Understanding Graph Execution

17
Examining the trace data: what’s possible?
“hello” node in all views that
represent different information.

Shows trace information for

the case when 1 message is
sent to the “hello” node.

How did we get the node names

to be the same as what was in
the C++ code?

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 18
*Other names and brands may be claimed as the property of others.
Examining the trace data: correlation
“hello” node in all views that
represent different information.

Shows trace information for

the case when 25 messages
are sent to the “hello” node.

Interacting with the canvas

Interacting with the timeline
Clicking on a node on the
Clicking on a task in the
canvas can highlight the
timeline will highlight the
corresponding node’s tasks in
corresponding node in the
the timeline. This is turned
canvas
OFF by default.

Clicking on a section with low

concurrency will highlight the
nodes that are active at that
time.

These nodes would be the

starting point of a cause-and-
effect analysis to see if they
were responsible for the lower
concurrency

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 19
*Other names and brands may be claimed as the property of others.
Examining the trace data through Trace Playback
Playback of execution traces to
see how data is flowing through
the graph.

Allows you to see how the data

flows through the graph and
what sections of the graph
result in good or poor scaling

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 20
*Other names and brands may be claimed as the property of others.
Examining the trace data: node view
Node view captures all execution
traces for a given node and
presents it in a single swim-lane
for the node

Each node swimlane is comprised

of multiple swimlanes
representing the threads which
executed an instance of the node.

Provides a compact representation of a

node’s execution

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 21
*Other names and brands may be claimed as the property of others.
Challenges With asynchronous task graphs

 New programming paradigm

 Allows you to stream data through the graph, which makes debugging
challenging

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 22
*Other names and brands may be claimed as the property of others.
Examining the trace data with data analysis
How do we know which instance
of the Hello task is in response to
which input message?

Helps answer the following

questions:

Are the tasks operating on data

retiring in order?

Are they out of order?

We need to track the data flowing

through the graph

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 23
*Other names and brands may be claimed as the property of others.
Examining the trace data with data analysis, cont.
Harder to track the data in dependency graphs as the Data ID cannot be
propagated from one node to the next
• continue_node requires an input of type continue_msg
continue_node<continue_msg> hello( hello_world_g0, []( continue_msg & ) {
cout << “Hello “;
} );

continue_node<continue_msg> world(( hello_world_g0,[]( continue_msg & ) {

cout << “World!\n“;
} );

We are going to convert the Hello World example to use function_node instead
so we can send the ID from one node to the next

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 24
*Other names and brands may be claimed as the property of others.
Examining the trace data with data analysis, cont.
Data tracking using an
experimental feature will allow
you to track which task instance
is for which inputs.

1. We changed our graph to use

a function_node instead of a
continue_node
2. We have a source_node that
streams 25 messages/data
through the graph
3. We modified the graph to
emit the data id from the
node source to hello and
hello to world.
4. We add an user event API to
tell the tool which data we
are processing in each node.

Gives you insight into

scheduler behavior.

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 25
*Other names and brands may be claimed as the property of others.
Examining the trace data with data analysis, cont.
Data tracking using an
experimental feature will allow
you to track which task instance
is for which inputs

Statistics for the graph is

organized by data operated on
and can be seen in Data Analysis
tab under Statistics

Using data analysis, the questions

posed earlier can be answered.

You can examine the trace data

to see if the data is retiring in-
order or out-of-order.

If the algorithm is meant to be

latency bound, then order is
important. If it is throughput
bound, data can retire out-of-
order

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 26
*Other names and brands may be claimed as the property of others.
Challenges with asynchronous task graphs

 New programming paradigm

 Allows you to stream data through the graph, which makes debugging
challenging
 Graph algorithms can be latency-bound or throughput-bound

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 27
*Other names and brands may be claimed as the property of others.
Understanding the performance

28
A simulation example

Goes through multiple time steps

Graph is created once programmatically and executed for each time step
• A message is sent to the graph to trigger each time step
• Wait for the graph to process the message (current time step) before the
next time step is triggered
• Implemented as a dependency graph using TBB continue_node
Measured performance shows some performance scaling w.r.t serial
implementation

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Example: performance analysis
A complex graph was created
programmatically.

Graph has 1319 nodes and 3066

edges.

General health of the graph with a

mix of red, yellow and green

Concurrency observed over time

ranges from good concurrency
where all cores are kept busy to
very few kept busy

What do the colors mean?

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 30
*Other names and brands may be claimed as the property of others.
Challenges with asynchronous task graphs

 Creating implicit or explicit task-graphs programmatically is easy

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 31
*Other names and brands may be claimed as the property of others.
Example: identifying problem areas
What was run and how much was
run?

Run captures 11 time steps

Appears to have one node that

consumes a lot of CPU time.

This node also has an observed

concurrency that is poor when it
executes

Clicking on the node takes you to

the node in the graph
visualization

You can also sort on the

appropriate column in the
statistics table.

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 32
*Other names and brands may be claimed as the property of others.
Example: identifying problem areas, cont.
Clicking on the node takes you to
the node in the graph
visualization

1. To see all tasks belonging

to this node in the execution
trace, you will have to
enable this interaction.
2. Click on the Show/Hide
tasks button
3. Now select the node in the
canvas

When this node is executing, the

resource utilization is very poor.

1. Improving the performance

of this one node will
substantially improve the
performance of the graph.

1. Critical Path
2. Rule-check

Critical Path

Computes the Critical Path(s) for

the graph using the execution
trace information

The most dominant task that

had the maximum CPU Time and
a corresponding low
concurrency
(continue_node_1009) is on this
critical path

Critical path reduces the complexity in large graphs by isolating a small set of nodes for analysis and tuning for performance improvements

35
Example: performance analysis
Analysis features

1. Critical Path
2. Rule-check

Rule check

Rule-check runs registered rules

that may include validation and
performance rules

 Creating implicit or explicit task-graphs programmatically is easy

 Determining what was created is hard in many cases
 New programming paradigm
 Allows you to stream data through the graph, which makes debugging
challenging
 Graph algorithms can be latency-bound or throughput-bound
 Parallelism is unstructured in certain types of graphs, so performance
analysis can be challenging

38
Applications often contain multiple levels of
parallelism
Visible in FGA

Task Parallelism/
Message Passing

Visible in FGA

#pragma omp parallel for tbb::parallel_for

SIMD SIMD SIMD SIMD SIMD SIMD SIMD SIMD

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 39
*Other names and brands may be claimed as the property of others.
Fork-join parallelism: tbb::parallel_for
Captures the execution task-
graph for a fork-join construct
and provides additional analytics
that present information about
the construct

1. Imbalance
2. Efficiency

Timeline shows trace information

for the graph and any nested
parallelism that is present

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 41
*Other names and brands may be claimed as the property of others.
Multi-level parallelism in OpenMP*
Double-click
Top-level here onshows
the parallel
just one
region node to
entity, which is see the activity
a parallel region in
within the region
this OpenMP* example

Top-level treemap shows poor

resource utilization

Hovering the mouse over the

treemap shows activity in the
parallel region – double click to
show the details

43
Intel® Advisor – Flow Graph Analyzer

Product feature in Intel® Parallel

Studio XE 2019
Tool supports analysis and
design of parallel applications
using OpenMP* and Threading
Building Blocks
Available for Windows*, Linux*
and MacOS*

https://software.intel.com/en-us/articles/getting-started-
with-flow-graph-analyzer

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Summary
Asynchronous task-graphs improves the efficiency of irregular and runtime
dependent execution
• TBB and OpenMP* provide mechanisms to program in this manner
Flow Graph Analyzer helps you create, debug, visualize and analyze such
graphs
• Critical path analysis is crucial in reducing the complexity of the analysis
problem to a handful of nodes
• Runtime specific analyses, such as the lightweight policy analysis for TBB,
target additional performance improvements

Getting started with FGA

https://software.intel.com/en-us/articles/getting-started-with-flow-graph-analyzer

Driving Code Performance with Intel® Advisor’s Flow Graph Analyzer

https://software.intel.com/en-us/download/parallel-universe-magazine-issue-30-october-
2017

IWOMP 2018: Visualization of OpenMP* Task Dependencies Using

Intel® Advisor – Flow Graph Analyzer
https://link.springer.com/chapter/10.1007%2F978-3-319-98521-3_12

CPUs, GPUs, FPGAs: Managing the alphabet soup with Intel Threading
Building Blocks
https://software.intel.com/en-us/videos/cpus-gpus-fpgas-managing-the-alphabet-soup-with-
intel-threading-building-blocks

Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 46
*Other names and brands may be claimed as the property of others.
Legal Disclaimer & Optimization Notice
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance
tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any
change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully
evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete
information visit www.intel.com/benchmarks.

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS
ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS
FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY
RIGHT.

Copyright © 2018, Intel Corporation. All rights reserved. Intel, the Intel logo, Pentium, Xeon, Core, VTune, OpenVINO, Cilk, are trademarks of
Intel Corporation or its subsidiaries in the U.S. and other countries.

Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture
are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804

Pump Control (Main Hydraulic) ..
92% (12)
Pump Control (Main Hydraulic) ..
23 pages
CPP Compiler Classic
No ratings yet
CPP Compiler Classic
2,356 pages
Distributed Graph Analytics Programming, Languages, and Their Compilation
No ratings yet
Distributed Graph Analytics Programming, Languages, and Their Compilation
213 pages
HPC Clusters Best Practices Performance Study
No ratings yet
HPC Clusters Best Practices Performance Study
38 pages
Taskflow A Lightweight Parallel and Heterogeneous Task Graph Computing System
No ratings yet
Taskflow A Lightweight Parallel and Heterogeneous Task Graph Computing System
18 pages
Multi-Core Programming Digital Edition (06-29-06) PDF
100% (1)
Multi-Core Programming Digital Edition (06-29-06) PDF
362 pages
2025 04 03 Burford 2025 Investor Day FINAL 1543020 1
No ratings yet
2025 04 03 Burford 2025 Investor Day FINAL 1543020 1
132 pages
Hpca Notes
No ratings yet
Hpca Notes
216 pages
Math g1 m2 Full Module
No ratings yet
Math g1 m2 Full Module
379 pages
Accelerated MacOS Core Dump Analysis
No ratings yet
Accelerated MacOS Core Dump Analysis
250 pages
Tracy
No ratings yet
Tracy
101 pages
Ab Initio
100% (1)
Ab Initio
128 pages
PDC Lecture 02
No ratings yet
PDC Lecture 02
35 pages
HPCToolkit Users Manual
No ratings yet
HPCToolkit Users Manual
135 pages
01 Ab I E0
100% (1)
01 Ab I E0
129 pages
4-Software Execution Model
No ratings yet
4-Software Execution Model
53 pages
01 Popov AWhirlwindTour oftheLLVMOptimizer
No ratings yet
01 Popov AWhirlwindTour oftheLLVMOptimizer
109 pages
Sse 213
No ratings yet
Sse 213
3 pages
Abinitio Interview
100% (6)
Abinitio Interview
70 pages
Pengaruh Suhu Dan Lama Perendaman Dalam Air Dingin Pada Praperebusan Terhadap Kualitas Bakso Ikan Patin (Pangasius Pangasius)
No ratings yet
Pengaruh Suhu Dan Lama Perendaman Dalam Air Dingin Pada Praperebusan Terhadap Kualitas Bakso Ikan Patin (Pangasius Pangasius)
12 pages
FoP HPC Unit II
No ratings yet
FoP HPC Unit II
107 pages
Introduction
No ratings yet
Introduction
100 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
22.optimization III
No ratings yet
22.optimization III
82 pages
Part5 ImplementingTaskDecompositions
No ratings yet
Part5 ImplementingTaskDecompositions
47 pages
AA Part1
No ratings yet
AA Part1
43 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Comp Architecture 101
No ratings yet
Comp Architecture 101
46 pages
Part7 ImprovingParallelPerformance
No ratings yet
Part7 ImprovingParallelPerformance
37 pages
Boeing 777-300ER Air New Zealand
No ratings yet
Boeing 777-300ER Air New Zealand
18 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-12 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-12 Reference-Material-I
28 pages
15DD
No ratings yet
15DD
51 pages
Previosyear 3 RD
No ratings yet
Previosyear 3 RD
28 pages
Lab1 PAR
No ratings yet
Lab1 PAR
40 pages
Parallelizing The Standard Algorithms Library - Jared Hoberock - CppCon 2014
No ratings yet
Parallelizing The Standard Algorithms Library - Jared Hoberock - CppCon 2014
58 pages
Files Reviewer
No ratings yet
Files Reviewer
24 pages
Multi-Core Programming Digital Edition (06!29!06)
No ratings yet
Multi-Core Programming Digital Edition (06!29!06)
362 pages
Parallel Universe Issue 30
No ratings yet
Parallel Universe Issue 30
101 pages
Anurag Fulare Re Review 1 Sem 6
No ratings yet
Anurag Fulare Re Review 1 Sem 6
28 pages
Task Manager
No ratings yet
Task Manager
23 pages
Depgraph: A Dependency-Driven Accelerator For Efficient Iterative Graph Processing
No ratings yet
Depgraph: A Dependency-Driven Accelerator For Efficient Iterative Graph Processing
14 pages
ACA 2024W 01 Introduction
No ratings yet
ACA 2024W 01 Introduction
19 pages
Fortran UG 2
No ratings yet
Fortran UG 2
263 pages
(Ebook) Cause and Correlation in Biology: A User's Guide To Path Analysis, Structural Equations and Causal Inference by Bill Shipley ISBN 9780521529211, 0521529212 PDF Download
No ratings yet
(Ebook) Cause and Correlation in Biology: A User's Guide To Path Analysis, Structural Equations and Causal Inference by Bill Shipley ISBN 9780521529211, 0521529212 PDF Download
54 pages
INTEL - The Parallel Universe - Issue 03 - 2010
No ratings yet
INTEL - The Parallel Universe - Issue 03 - 2010
17 pages
JIP390
No ratings yet
JIP390
11 pages
INTEL - The Parallel Universe - Issue 02 - 2010
No ratings yet
INTEL - The Parallel Universe - Issue 02 - 2010
15 pages
Lecture 5 Principles of Parallel Algorithm Design
No ratings yet
Lecture 5 Principles of Parallel Algorithm Design
30 pages
From Oops To Aha Navigating C Semantic Analysis and Runtime Systems
No ratings yet
From Oops To Aha Navigating C Semantic Analysis and Runtime Systems
10 pages
Functional Level Strategy of Starbucks
No ratings yet
Functional Level Strategy of Starbucks
25 pages
IDA Pro Function Analysis and Graphing Part4
No ratings yet
IDA Pro Function Analysis and Graphing Part4
10 pages
Paraplop 2010 The Task Graph Pattern Workshop Submission
No ratings yet
Paraplop 2010 The Task Graph Pattern Workshop Submission
11 pages
Taskflow A Generalpurpose Parallel and Heterogeneous Task Programming System Using Modern CPP Tsungwei Huang Cppcon 2020
No ratings yet
Taskflow A Generalpurpose Parallel and Heterogeneous Task Programming System Using Modern CPP Tsungwei Huang Cppcon 2020
53 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
Second Quarter Physical Education: Masbate National Comprehensive High School
No ratings yet
Second Quarter Physical Education: Masbate National Comprehensive High School
11 pages
Andculture Brand Guide
No ratings yet
Andculture Brand Guide
35 pages
Ab Initio - V1.2
No ratings yet
Ab Initio - V1.2
29 pages
A Simple Graph-Based Intermediate Representation
No ratings yet
A Simple Graph-Based Intermediate Representation
15 pages
Hitchhiker's Guide: IBM Parallel Environment For AIX
No ratings yet
Hitchhiker's Guide: IBM Parallel Environment For AIX
160 pages
AntiBiotic Medicine by AdEel-SaiM
No ratings yet
AntiBiotic Medicine by AdEel-SaiM
21 pages
tpds21 Taskflow
No ratings yet
tpds21 Taskflow
18 pages
BR Gaswellblowoutfire
No ratings yet
BR Gaswellblowoutfire
8 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
C Ug lnx2
No ratings yet
C Ug lnx2
431 pages
Parallel Universe Issue 32
No ratings yet
Parallel Universe Issue 32
74 pages
Solutions On Quiz 1
No ratings yet
Solutions On Quiz 1
6 pages
FEA Lab Manual
No ratings yet
FEA Lab Manual
17 pages
RLT A Question of Trust
No ratings yet
RLT A Question of Trust
3 pages
Processing of Leather by Microbial Enzyme
100% (1)
Processing of Leather by Microbial Enzyme
13 pages
What Do You Mean by Code Optimization
No ratings yet
What Do You Mean by Code Optimization
3 pages
Ulllted States Patent (10) Patent N0.: US 8,504,994 B2
No ratings yet
Ulllted States Patent (10) Patent N0.: US 8,504,994 B2
41 pages
Identification: Vulnerable Individual (Assessment)
No ratings yet
Identification: Vulnerable Individual (Assessment)
20 pages
4ME Brochure Update V2657
No ratings yet
4ME Brochure Update V2657
12 pages
Important Effective Teaching Methods and Techniques
No ratings yet
Important Effective Teaching Methods and Techniques
26 pages
Intel Parallel Magazine Issue17
No ratings yet
Intel Parallel Magazine Issue17
49 pages
Unit 2 - Approaches To Tourism Entrepreneurship
100% (1)
Unit 2 - Approaches To Tourism Entrepreneurship
16 pages
ResearchMethods CS
No ratings yet
ResearchMethods CS
16 pages
Forest Monitoring System Using Wireless Sensor Network: Prof. Sagar Pradhan
No ratings yet
Forest Monitoring System Using Wireless Sensor Network: Prof. Sagar Pradhan
8 pages
Introduction To Parallel Programming - Student Workbook With Instructor's Notes PDF
No ratings yet
Introduction To Parallel Programming - Student Workbook With Instructor's Notes PDF
33 pages
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
No ratings yet
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
18 pages
GCC Profile Guided Optimization
No ratings yet
GCC Profile Guided Optimization
47 pages
Catalogue Mitsubishi 6D24TC
No ratings yet
Catalogue Mitsubishi 6D24TC
2 pages
Performance Guidelines For Amd Athlon™ 64 and Amd Opteron™ Ccnuma Multiprocessor Systems
No ratings yet
Performance Guidelines For Amd Athlon™ 64 and Amd Opteron™ Ccnuma Multiprocessor Systems
48 pages
Meeting Script
No ratings yet
Meeting Script
1 page
Intel VTune Using
No ratings yet
Intel VTune Using
9 pages
English Grammar For ESL Learners
No ratings yet
English Grammar For ESL Learners
3 pages
Faster Eft
100% (1)
Faster Eft
3 pages
CUCOH 2013 Executive Application
No ratings yet
CUCOH 2013 Executive Application
4 pages
Msbi Notes PPT Faqs
No ratings yet
Msbi Notes PPT Faqs
3 pages
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet

Intel OpenMP Webinar

Uploaded by

Intel OpenMP Webinar

Uploaded by

Vasanth Tovinkere | Architect, Flow Graph Analyzer

*Other names and brands may be claimed as the property of others

Overview of Intel® Advisor - Flow Graph Analyzer (FGA)

SIMD SIMD SIMD SIMD SIMD SIMD SIMD SIMD

Hello World f() f()

task task task task

#pragma omp parallel

Creating implicit or explicit task graphs programmatically is easy

General health of the

Displays the execution trace data,

Drag and Drop Support

Analytics and Modeling

Let’s make this

 New programming paradigm

Traces are saved to a unique directory _fgt_<date>_<time>

Automatically converts the latest timestamped directory

Shows trace information for

How did we get the node names

Shows trace information for

Interacting with the canvas

Clicking on a section with low

These nodes would be the

Allows you to see how the data

Each node swimlane is comprised

Provides a compact representation of a

 New programming paradigm

Helps answer the following

Are the tasks operating on data

Are they out of order?

We need to track the data flowing

continue_node<continue_msg> world(( hello_world_g0,[]( continue_msg & ) {

1. We changed our graph to use

Gives you insight into

Statistics for the graph is

Using data analysis, the questions

You can examine the trace data

If the algorithm is meant to be

 New programming paradigm

Goes through multiple time steps

Graph has 1319 nodes and 3066

General health of the graph with a

Concurrency observed over time

What do the colors mean?

 Creating implicit or explicit task-graphs programmatically is easy

Run captures 11 time steps

Appears to have one node that

This node also has an observed

Clicking on the node takes you to

You can also sort on the

1. To see all tasks belonging

When this node is executing, the

1. Improving the performance

Computes the Critical Path(s) for

The most dominant task that

Rule-check runs registered rules

 Creating implicit or explicit task-graphs programmatically is easy

#pragma omp parallel for tbb::parallel_for

SIMD SIMD SIMD SIMD SIMD SIMD SIMD SIMD

Timeline shows trace information

Top-level treemap shows poor

Hovering the mouse over the

Product feature in Intel® Parallel

Getting started with FGA

Driving Code Performance with Intel® Advisor’s Flow Graph Analyzer

IWOMP 2018: Visualization of OpenMP* Task Dependencies Using

You might also like