0% found this document useful (0 votes)
66 views25 pages

23 Profiling and Performance Improvement

xilinx tutorial slides

Uploaded by

karthikp207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views25 pages

23 Profiling and Performance Improvement

xilinx tutorial slides

Uploaded by

karthikp207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Profiling and Performance Improvement

Zynq
Vivado 2015.2 Version

This material exempt per Department of Commerce license exception TSU © Copyright 2015 Xilinx
Objectives

After completing this module, you will be able to:


– Describe what profiling is and how it works
– Use profiling reports to evaluate software efficiency
– Discuss software tradeoffs to hardware
– Describe the function of the gprof tool
– List methods of improving performance

Profiling and Performance 23-2 © Copyright 2015 Xilinx


Outline

Introduction
Software Profiling in XSDK
Performance Improvement
Summary

Profiling and Performance 23-3 © Copyright 2015 Xilinx


Embedded Systems

Profiling and Performance 23-4 © Copyright 2015 Xilinx


Hardware and Software Partitioning

Determine the software "critical path" by profiling


– Profiling measures where the CPU is spending its cycles on a function-by-function or task-by-task
basis
– Similar to timing analysis in hardware
– Informs the system designer which software routine may be a candidate to hardware-accelerate
Functions can be rewritten to improve efficiency in a number of ways
– Implementation in assembly code rather than C
– Writing faster C code, for example limit pointer use

Profiling and Performance 23-5 © Copyright 2015 Xilinx


What is Profiling?

Profiling is an analysis of software performance


– Where routine time is being spent
– How many times functions are being called
– Which algorithms to consider moving to hardware

Results in two useful formats Samples per function: How much time
is spent in each routine

Function call graph: Which routine call, which


function, and how many times
Profiling and Performance 23-6 © Copyright 2015 Xilinx
Outline

Introduction
Software Profiling in XSDK
Performance Improvement
Summary

Profiling and Performance 23-7 © Copyright 2015 Xilinx


How Does Profiling Work?

Hardware/software intrusive
– Requires a hardware timer
– Requires a dedicated profile RAM area
– Executable is modified with profiler routines
A dedicated hardware timer interrupts the processor at a fixed interval
– The interrupt routine keeps track of the program counter at each interrupt
– A histogram of PC locations is kept in profile RAM
– Interrupt interval time is programmable
Every function call in the software application is annotated by the compiler to track
which functions are being called by what

Profiling and Performance 23-8 © Copyright 2015 Xilinx


Profiling Procedure

Set board support package option to include the profiler in the BSP
Enable the compiler for profiling an application with the –pg option in the board support
package
Compile, link, and generate the ELF executable
Create run configuration of the executable
– Configure the profiler memory
– Set the interrupt latency time
Download the executable into a hardware or software simulator
Run the software application until completion or for an "amount of time"
Execute the GNU gprof tool to generate report output

Profiling and Performance 23-9 © Copyright 2015 Xilinx


Configuring the Software Platform Settings

Select Xilinx Tools > Board Support Package


Settings
Select standalone
Enable software profiling
Select the profiling timer
Select CPU instance (ps7_cortexa9_0)
– Add -pg to the Value column for the
extra_compiler_flags option

Profiling and Performance 23-10 © Copyright 2015 Xilinx


Profile Configuration: Create a Run Configuration

If any of the embedded design resides in


programmable logic, download the
bitstream to the programmable logic
– Select Xilinx Tools > Program FPGA
Select Run > Run Configurations and
create a new configuration
– Give appropriate name
– Select the elf file that was compiled with –pg

Profiling and Performance 23-11 © Copyright 2015 Xilinx


Set Profile Option in Run Configuration

In the Profile Options tab


– Enable profiling
– Set the sampling frequency at which the timer
will interrupt
• Higher speed will give a finer resolution
– Set the memory bin size
– Set the location of RAM that the profiler can
use
• make sure that the software application is not
using this memory
Click Run to download the program and
begin execution

Profiling and Performance 23-12 © Copyright 2015 Xilinx


Profiling Reports

Profile scratch memory is populated with statistics while the program is executing
– Intrusive profiling routines and the fixed interval timer interrupt use this memory
– Stored in gmon.out upon completion or execution halt
The gprof tool reads gmon.out and assembles the information into a user configurable
report
gprof is launched by the user after execution completion

Profiling and Performance 23-13 © Copyright 2015 Xilinx


Viewing Profiling Reports: Launching gprof

Double-click gmon.out to launch gprof


Point to executable ELF; usually
selected by default
gprof report launches
Report toolbar control report options
and view capabilities
– Sort samples per file
– Sort samples per function
– Sort samples per line
– Display function call graph
– Switch sample/time

Profiling and Performance 23-14 © Copyright 2015 Xilinx


Profiled Output in XSDK

1: Sort Samples per File 2: Sort Samples per Function

3: Sort Samples per Line 4: Display Function Call Graph


Profiling and Performance 23-15 © Copyright 2015 Xilinx
Profiling Report Options

Gprof report options allow report view flexibility and export

1. Show/hide columns 2. Export to CSV 3. Sorting 4. Switch time<>Samples

Profiling and Performance 23-16 © Copyright 2015 Xilinx


Coding Style Can Impact Profiling

Effective profiling is based on how much time is spent in functions, and how often they
are called
– If your code is just a fall-through main, profiling is not useful because 100 percent of execution time
will be in main with no calls to other functions
– Carefully architect the application with a structured architecture by using functions
– Complier does not consider macros functions – the macro will be expanded and treated as in-line
code
– Separate algorithms logically into functions that will help you analyze the flat profile view
– Think ahead when architecting code—Is this algorithm a candidate for implementing in programmable
logic?

Profiling and Performance 23-17 © Copyright 2015 Xilinx


Outline

Introduction
Software Profiling in XSDK
Performance Improvement
Summary

Profiling and Performance 23-18 © Copyright 2015 Xilinx


Task Implementation Decision

Keep it in software Move to hardware


– Not in critical path – Programmable logic co-processor
– Enough "free" cycles • Customized to user's needs
– Easier to code in software than in hardware • Excellent for iterative and pipelined
• Uses math library functions processing
– Add soft core processor in PL
– NEON co-processor
• Both Cortex-A9 and MicroBlaze processors
• Supports integer vector operations
can co-exist in the AP SoC
• Single floating-point operations

Profiling and Performance 23-19 © Copyright 2015 Xilinx


Mechanisms to Improve Performance

Slow software tasks can be accelerated by taking them to hardware


– After thoroughly profiling
– Good candidates are where the software spends most of its time
Many mechanisms
– Enabling caching if (by default) it is turned OFF
– Code optimization: use of macros, increasing compiler optimization
– Dual-port block RAM
– Custom AXI peripheral
• Vivado HLS
• System Generator for DSP

Profiling and Performance 23-20 © Copyright 2015 Xilinx


Using Block RAM

Leverage the dual-port nature of Xilinx block RAM


Useful for data in block or frame format
– Video
– 2D matrix maps
Advantages
– Low silicon overhead
– Fast and deterministic latency

Profiling and Performance 23-21 © Copyright 2015 Xilinx


Hardware Accelerator Communication Channels

Profiling and Performance 23-22 © Copyright 2015 Xilinx


Enhanced Accelerator SoC Integration Example

ARM MPCore: accelerator coherence port (ACP)


– Sharing benefits of the ARM MPCore optimized coherency design
– Accelerators gain access to CPU cache hierarchy
– Compatible with standard un-cached peripherals and accelerators

Profiling and Performance 23-23 © Copyright 2015 Xilinx


Outline

Introduction
Software Profiling in XSDK
Performance Improvement
Summary

Profiling and Performance 23-24 © Copyright 2015 Xilinx


Summary

Profiling allows you to analyze the software and determine where the CPU’s time is
spent
Profiling can help you rearrange or rewrite the code or even help you consider if a
function can be targeted to hardware
The gprof tool is used to generate a profiling report from collected statistics
A hardware timer and memory are required to use the profiling tool
– Sampling frequency will have direct impact on the amount of memory used to collect samples
Profiling in XSDK is provided by the Standalone BSP as a GNU service
Enabling cache can improve performance
Porting software into hardware can improve system performance
– Vivado HLS
– System Generator

Profiling and Performance 23-25 © Copyright 2015 Xilinx

You might also like