0% found this document useful (0 votes)

59 views

Christopher A. Wood Caw4567@rit - Edu

The document discusses various techniques for optimizing code performance. It begins by explaining the need to measure performance through profiling to identify hotspots in the code. Various levels for optimization are described, from high-level design changes to low-level tweaks of compiler settings and assembly code. Specific optimization strategies discussed include improving parallelism, data access patterns, control flow, and memory usage. The document also provides an overview of a RISC CPU architecture and its performance characteristics. Common misconceptions about optimization are debunked, and additional resources are referenced.

Uploaded by

caw4567

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Christopher A. Wood Caw4567@rit - Edu

Uploaded by

caw4567

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Christopher A. Wood caw4567@rit.

edu

Code architecture and design High-level source code changes Compiler settings Assembly tweaks

Measure performance

Dynamic program analysis using a software profiler Portions of the code that consume the most CPU cycles and computation time I/O overhead, inefficient algorithm, poor design? Source code tweaks or design changes?

Identify hotspots

3.
4.

Identify cause of hotspots

Change the program

-Donald Knuth

Design changes tend to have the biggest impact on code performance Analysis of the code architecture is the best starting point
Mathematical analysis Understanding technological considerations Parallelism

Change the scope of analysis (module- and global-

based)

Data bandwidth performance

Arithmetic operation performance
functions Think at the bit-level
Keep data in devices that can be accessed faster Know your order of operations and the performance of mathematical

Control flow

Software control flow structures (e.g. indirect

Memory usage

function calls, switch statements, branches) perform differently. Be conscious of processor pipeline predictions
Especially important with embedded devices

High performance, dual-issue, superscalar 32bit RISC CPU Seven stage, highly pipelined microarchitecture Dual instruction fetch, decode, and out-oforder issue Separate instruction and data cache arrays Memory Management Unit (MMU) with separate instruction and data shadow TLBs

Soft processor core designed specifically for Xilinx FPGAs Implemented using general-purpose memory and logic fabric of the FPGA Versatile interconnect system to support embedded applications connected to the PLB, its primary I/O bus User-configured memory aspects (cache size, pipeline depth, embedded peripherals, MMU, etc.) Capable of hosting operating systems that require hardware support (e.g. page tables and address space protection in Linux)

Is it an option on the target platform? Can portions of your algorithm be performed in parallel?
E.g. if your algorithm operates on bytes you may

be able to operate on 2, 4, or 8 of them simultaneously using word-based instructions provided by CPU

Can other hardware components perform computations in parallel with the processor?

Look at the software from both a source code and design perspective Analyze the flow of data in your algorithm High-level API usage Code size!

Improved hardware makes software optimization unimportant Using tables always beats recalculating Using C compilers makes it impossible to optimize code for performance Globals are faster than locals Using smaller data types is faster than larger ones

Powers of 2 Optimize loop overhead Loop manipulation (rolling/unrolling/jamming) Declare local functions as static Pass by value and pass by reference Unsigned vs. signed Leverage early termination of if statements Register usage (global variables arent placed there)

http://www.azillionmonkeys.com/qed/optimize. html http://www.cs.ucsb.edu/~nagy/docs/MAEMostafa.pdf http://www.codeproject.com/KB/cpp/C___Code _Optimization.aspx http://developer.amd.com/documentation/articl es/pages/6212004126.aspx https://www01.ibm.com/chips/techlib/techlib.nsf/techdocs/2 D417029AE3F3089872570F8006D4E99/$file/Pow erPC440x6_um_29Sept10_pub.pdf

Computer Architecture
No ratings yet
Computer Architecture
667 pages
اسئلة البرومترك
No ratings yet
اسئلة البرومترك
49 pages
Railways of Britain Narrow Gauge Steam 8 30 September 2022
100% (1)
Railways of Britain Narrow Gauge Steam 8 30 September 2022
102 pages
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Computers As Components: Principles of Embedded Computing System Design
No ratings yet
Computers As Components: Principles of Embedded Computing System Design
9 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Henry Thesis PHD
No ratings yet
Henry Thesis PHD
275 pages
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
From Everand
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
Sam Steed
No ratings yet
CESE4040 - Processor Design Project Guide
No ratings yet
CESE4040 - Processor Design Project Guide
32 pages
Architecture Sem2 @di
No ratings yet
Architecture Sem2 @di
40 pages
Issues in Hardware-Software Design and Co-Design
No ratings yet
Issues in Hardware-Software Design and Co-Design
7 pages
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
No ratings yet
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
7 pages
A.A.I COM ARC
No ratings yet
A.A.I COM ARC
16 pages
Architecture-and-micro
No ratings yet
Architecture-and-micro
69 pages
Mastering System Programming with C: Files, Processes, and IPC
From Everand
Mastering System Programming with C: Files, Processes, and IPC
Larry Jones
No ratings yet
Module 3_Basic Software Techniques for Embedded Applications and Parallel Input and Output
No ratings yet
Module 3_Basic Software Techniques for Embedded Applications and Parallel Input and Output
51 pages
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
No ratings yet
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
151 pages
A Highly Productive Implementation of An Out-Of-Order
No ratings yet
A Highly Productive Implementation of An Out-Of-Order
157 pages
Home Work 3: Class: M.C.A SECTION: RE3004 Course Code: CAP211
No ratings yet
Home Work 3: Class: M.C.A SECTION: RE3004 Course Code: CAP211
15 pages
Vivado Intro Fpga Design Hls
No ratings yet
Vivado Intro Fpga Design Hls
92 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
26 pages
Computer Architecture Scenarios
No ratings yet
Computer Architecture Scenarios
10 pages
Module 2
No ratings yet
Module 2
127 pages
Memory Controller For A 6502 CPU in VHDL: Michel Wilson, 1047981
No ratings yet
Memory Controller For A 6502 CPU in VHDL: Michel Wilson, 1047981
28 pages
Course 24. Embedded Systems (Video Course) Faculty Coordinator(s) : 1
No ratings yet
Course 24. Embedded Systems (Video Course) Faculty Coordinator(s) : 1
4 pages
Programming and Synthesis For Software-Defined FPGA Acceleration - Status and Future Prospects
No ratings yet
Programming and Synthesis For Software-Defined FPGA Acceleration - Status and Future Prospects
39 pages
Program Design and Analysis Program-Level Performance Analysis
No ratings yet
Program Design and Analysis Program-Level Performance Analysis
13 pages
Lec-1-To-10 19ECE349-RISC Processor Design Using HDL
No ratings yet
Lec-1-To-10 19ECE349-RISC Processor Design Using HDL
95 pages
Hal Embedded
No ratings yet
Hal Embedded
61 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
3 pages
CO - Module 1 - PPTtt2
No ratings yet
CO - Module 1 - PPTtt2
87 pages
Wk05 - CPU Architecture (Part 1)
No ratings yet
Wk05 - CPU Architecture (Part 1)
72 pages
Opens in A New Window: Types of Direct Memory Access (DMA)
No ratings yet
Opens in A New Window: Types of Direct Memory Access (DMA)
11 pages
Comparch 03
No ratings yet
Comparch 03
44 pages
Static Pipelining #2 and Goodbye To Computer Architecture: Prof. Lawrence Rauchwerger
No ratings yet
Static Pipelining #2 and Goodbye To Computer Architecture: Prof. Lawrence Rauchwerger
22 pages
Embedded Systems Programming with C++: Real-World Techniques
From Everand
Embedded Systems Programming with C++: Real-World Techniques
Robert Johnson
No ratings yet
PHD Thesis 2004 - Efficient, Transparent, and Comprehensive Runtime Code Manipulation PHD
No ratings yet
PHD Thesis 2004 - Efficient, Transparent, and Comprehensive Runtime Code Manipulation PHD
306 pages
Advanced comp
No ratings yet
Advanced comp
7 pages
Open Source Hardware Development and The Openrisc Project
No ratings yet
Open Source Hardware Development and The Openrisc Project
124 pages
Microsoft PowerPoint - SoC Design Flow Tools Codesign
No ratings yet
Microsoft PowerPoint - SoC Design Flow Tools Codesign
110 pages
Labrecque Martin 201111 PHD Thesis
No ratings yet
Labrecque Martin 201111 PHD Thesis
151 pages
Embedded Systems Programming with C: Writing Code for Microcontrollers
From Everand
Embedded Systems Programming with C: Writing Code for Microcontrollers
Larry Jones
No ratings yet
And Motivation: Presenter
No ratings yet
And Motivation: Presenter
22 pages
Imp topics
No ratings yet
Imp topics
5 pages
CS556 Computer Architecture Syllabus: Dr. Stephan Ehrlich
No ratings yet
CS556 Computer Architecture Syllabus: Dr. Stephan Ehrlich
2 pages
Architectural and System Synthesis: Camposano, J. Hofstede, Knapp, Macmillen Lin
No ratings yet
Architectural and System Synthesis: Camposano, J. Hofstede, Knapp, Macmillen Lin
106 pages
Mudge Mpsoc
No ratings yet
Mudge Mpsoc
47 pages
Embedded Processors On FPGA: Soft Vs Hard: Vivek Jayakrishnan
No ratings yet
Embedded Processors On FPGA: Soft Vs Hard: Vivek Jayakrishnan
8 pages
Computer Archi
No ratings yet
Computer Archi
58 pages
Credits: WWW - Cse.Scu - Edu/ Rdaniels/Html/Courses/Co En1/Cpuarch
No ratings yet
Credits: WWW - Cse.Scu - Edu/ Rdaniels/Html/Courses/Co En1/Cpuarch
35 pages
FPGAproject
No ratings yet
FPGAproject
5 pages
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
SC1006 Course Content
No ratings yet
SC1006 Course Content
2 pages
Document^^
No ratings yet
Document^^
7 pages
SEH Book5 Processor Programming
No ratings yet
SEH Book5 Processor Programming
166 pages
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
51 pages
FPGA
No ratings yet
FPGA
16 pages
217 Lec1
No ratings yet
217 Lec1
35 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
PPT#01
No ratings yet
PPT#01
30 pages
Microprocessor Architecture From Simple Pipelines to Chip Multiprocessors 1st Edition Jean-Loup Baer pdf download
100% (1)
Microprocessor Architecture From Simple Pipelines to Chip Multiprocessors 1st Edition Jean-Loup Baer pdf download
46 pages
Computer Organization - MIPS Assembly Part 1
No ratings yet
Computer Organization - MIPS Assembly Part 1
6 pages
Cebada-Sánchez El At., 2014
No ratings yet
Cebada-Sánchez El At., 2014
17 pages
Adele Vedic Yogas
No ratings yet
Adele Vedic Yogas
14 pages
Cranes and Derricks Fourth Edition Shapiro Lawrence K Shapiro Jay P Download PDF
100% (1)
Cranes and Derricks Fourth Edition Shapiro Lawrence K Shapiro Jay P Download PDF
39 pages
The Political Self and Being Filipino
No ratings yet
The Political Self and Being Filipino
25 pages
9 - Morta Vs Occidental Digest
No ratings yet
9 - Morta Vs Occidental Digest
2 pages
Yr10 Higher Term 2 Assessment Paper1 2020-21
No ratings yet
Yr10 Higher Term 2 Assessment Paper1 2020-21
21 pages
Belgium Vs Senegal PDF
No ratings yet
Belgium Vs Senegal PDF
174 pages
Cobra CB 29 LX EU Service Manual
No ratings yet
Cobra CB 29 LX EU Service Manual
16 pages
Creating Your Own Indicators Amibroker
100% (1)
Creating Your Own Indicators Amibroker
6 pages
Analogy Exercise C
No ratings yet
Analogy Exercise C
10 pages
A) Pcqandq-,P B) P Z Q 3,4: (1 +P) X Py +P (1+P) 0, (1 4-Q) (X-Q) +Q (I +Q) 0andy 0wherep Q, Is
No ratings yet
A) Pcqandq-,P B) P Z Q 3,4: (1 +P) X Py +P (1+P) 0, (1 4-Q) (X-Q) +Q (I +Q) 0andy 0wherep Q, Is
20 pages
9868 0159 01f Christensen C20 With RHS - Epiroc Brochure - English WEB
No ratings yet
9868 0159 01f Christensen C20 With RHS - Epiroc Brochure - English WEB
7 pages
2022 To Grade 11 Nov Examination Paper 1
No ratings yet
2022 To Grade 11 Nov Examination Paper 1
25 pages
FATE Actions and Aspects Quick Reference
No ratings yet
FATE Actions and Aspects Quick Reference
2 pages
All AT-Wastegate Maint Recommendation
No ratings yet
All AT-Wastegate Maint Recommendation
1 page
BRIBERY
No ratings yet
BRIBERY
2 pages
Revised Result of 2nd & 4th Sem of BBA, Exam Held in July 2023
No ratings yet
Revised Result of 2nd & 4th Sem of BBA, Exam Held in July 2023
16 pages
Unit - I Introduction To Programming Languages: Computer Application in Business
No ratings yet
Unit - I Introduction To Programming Languages: Computer Application in Business
93 pages
Semester 2 Lesson Plan Pop Cycle
No ratings yet
Semester 2 Lesson Plan Pop Cycle
5 pages
Manned Aircraft Losses Over The Former Yugoslavia1994-1999 PDF
No ratings yet
Manned Aircraft Losses Over The Former Yugoslavia1994-1999 PDF
14 pages
Comet 5.0 Fiksasi Revisi Akhir
No ratings yet
Comet 5.0 Fiksasi Revisi Akhir
198 pages
Rembrandt and The Female Nude (Gnv64)
86% (22)
Rembrandt and The Female Nude (Gnv64)
320 pages
Report
No ratings yet
Report
3 pages
Relevant Costs (Part 2) : F. M. Kapepiso
No ratings yet
Relevant Costs (Part 2) : F. M. Kapepiso
21 pages
Karen Demonstration Lesson Plan 1
No ratings yet
Karen Demonstration Lesson Plan 1
7 pages
Nuoer
No ratings yet
Nuoer
32 pages
POGIL 01 - Nomenclature 1 - Ions
No ratings yet
POGIL 01 - Nomenclature 1 - Ions
2 pages
Practical Number-8: Write A Program To Perform Insertion and Deletion in Single Linked List
No ratings yet
Practical Number-8: Write A Program To Perform Insertion and Deletion in Single Linked List
7 pages

Christopher A. Wood Caw4567@rit - Edu

Uploaded by

Christopher A. Wood Caw4567@rit - Edu

Uploaded by

Christopher A. Wood caw4567@rit.

Identify cause of hotspots

Change the program

Change the scope of analysis (module- and global-

Data bandwidth performance

Software control flow structures (e.g. indirect

be able to operate on 2, 4, or 8 of them simultaneously using word-based instructions provided by CPU

You might also like