0% found this document useful (0 votes)
59 views

Christopher A. Wood Caw4567@rit - Edu

The document discusses various techniques for optimizing code performance. It begins by explaining the need to measure performance through profiling to identify hotspots in the code. Various levels for optimization are described, from high-level design changes to low-level tweaks of compiler settings and assembly code. Specific optimization strategies discussed include improving parallelism, data access patterns, control flow, and memory usage. The document also provides an overview of a RISC CPU architecture and its performance characteristics. Common misconceptions about optimization are debunked, and additional resources are referenced.

Uploaded by

caw4567
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Christopher A. Wood Caw4567@rit - Edu

The document discusses various techniques for optimizing code performance. It begins by explaining the need to measure performance through profiling to identify hotspots in the code. Various levels for optimization are described, from high-level design changes to low-level tweaks of compiler settings and assembly code. Specific optimization strategies discussed include improving parallelism, data access patterns, control flow, and memory usage. The document also provides an overview of a RISC CPU architecture and its performance characteristics. Common misconceptions about optimization are debunked, and additional resources are referenced.

Uploaded by

caw4567
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Christopher A. Wood caw4567@rit.

edu

Code architecture and design High-level source code changes Compiler settings Assembly tweaks

1.

Measure performance

Dynamic program analysis using a software profiler Portions of the code that consume the most CPU cycles and computation time I/O overhead, inefficient algorithm, poor design? Source code tweaks or design changes?

2.

Identify hotspots

3.
4.

Identify cause of hotspots


Change the program

-Donald Knuth

Design changes tend to have the biggest impact on code performance Analysis of the code architecture is the best starting point
Mathematical analysis Understanding technological considerations Parallelism

Change the scope of analysis (module- and global-

based)

Data bandwidth performance


Arithmetic operation performance
functions Think at the bit-level
Keep data in devices that can be accessed faster Know your order of operations and the performance of mathematical

Control flow

Software control flow structures (e.g. indirect

Memory usage

function calls, switch statements, branches) perform differently. Be conscious of processor pipeline predictions
Especially important with embedded devices

High performance, dual-issue, superscalar 32bit RISC CPU Seven stage, highly pipelined microarchitecture Dual instruction fetch, decode, and out-oforder issue Separate instruction and data cache arrays Memory Management Unit (MMU) with separate instruction and data shadow TLBs

Soft processor core designed specifically for Xilinx FPGAs Implemented using general-purpose memory and logic fabric of the FPGA Versatile interconnect system to support embedded applications connected to the PLB, its primary I/O bus User-configured memory aspects (cache size, pipeline depth, embedded peripherals, MMU, etc.) Capable of hosting operating systems that require hardware support (e.g. page tables and address space protection in Linux)

Is it an option on the target platform? Can portions of your algorithm be performed in parallel?
E.g. if your algorithm operates on bytes you may

be able to operate on 2, 4, or 8 of them simultaneously using word-based instructions provided by CPU

Can other hardware components perform computations in parallel with the processor?

Look at the software from both a source code and design perspective Analyze the flow of data in your algorithm High-level API usage Code size!

Improved hardware makes software optimization unimportant Using tables always beats recalculating Using C compilers makes it impossible to optimize code for performance Globals are faster than locals Using smaller data types is faster than larger ones

Powers of 2 Optimize loop overhead Loop manipulation (rolling/unrolling/jamming) Declare local functions as static Pass by value and pass by reference Unsigned vs. signed Leverage early termination of if statements Register usage (global variables arent placed there)

http://www.azillionmonkeys.com/qed/optimize. html http://www.cs.ucsb.edu/~nagy/docs/MAEMostafa.pdf http://www.codeproject.com/KB/cpp/C___Code _Optimization.aspx http://developer.amd.com/documentation/articl es/pages/6212004126.aspx https://www01.ibm.com/chips/techlib/techlib.nsf/techdocs/2 D417029AE3F3089872570F8006D4E99/$file/Pow erPC440x6_um_29Sept10_pub.pdf

You might also like