Christopher A. Wood Caw4567@rit - Edu
Christopher A. Wood Caw4567@rit - Edu
edu
Code architecture and design High-level source code changes Compiler settings Assembly tweaks
1.
Measure performance
Dynamic program analysis using a software profiler Portions of the code that consume the most CPU cycles and computation time I/O overhead, inefficient algorithm, poor design? Source code tweaks or design changes?
2.
Identify hotspots
3.
4.
-Donald Knuth
Design changes tend to have the biggest impact on code performance Analysis of the code architecture is the best starting point
Mathematical analysis Understanding technological considerations Parallelism
based)
Control flow
Memory usage
function calls, switch statements, branches) perform differently. Be conscious of processor pipeline predictions
Especially important with embedded devices
High performance, dual-issue, superscalar 32bit RISC CPU Seven stage, highly pipelined microarchitecture Dual instruction fetch, decode, and out-oforder issue Separate instruction and data cache arrays Memory Management Unit (MMU) with separate instruction and data shadow TLBs
Soft processor core designed specifically for Xilinx FPGAs Implemented using general-purpose memory and logic fabric of the FPGA Versatile interconnect system to support embedded applications connected to the PLB, its primary I/O bus User-configured memory aspects (cache size, pipeline depth, embedded peripherals, MMU, etc.) Capable of hosting operating systems that require hardware support (e.g. page tables and address space protection in Linux)
Is it an option on the target platform? Can portions of your algorithm be performed in parallel?
E.g. if your algorithm operates on bytes you may
Can other hardware components perform computations in parallel with the processor?
Look at the software from both a source code and design perspective Analyze the flow of data in your algorithm High-level API usage Code size!
Improved hardware makes software optimization unimportant Using tables always beats recalculating Using C compilers makes it impossible to optimize code for performance Globals are faster than locals Using smaller data types is faster than larger ones
Powers of 2 Optimize loop overhead Loop manipulation (rolling/unrolling/jamming) Declare local functions as static Pass by value and pass by reference Unsigned vs. signed Leverage early termination of if statements Register usage (global variables arent placed there)
http://www.azillionmonkeys.com/qed/optimize. html http://www.cs.ucsb.edu/~nagy/docs/MAEMostafa.pdf http://www.codeproject.com/KB/cpp/C___Code _Optimization.aspx http://developer.amd.com/documentation/articl es/pages/6212004126.aspx https://www01.ibm.com/chips/techlib/techlib.nsf/techdocs/2 D417029AE3F3089872570F8006D4E99/$file/Pow erPC440x6_um_29Sept10_pub.pdf