0% found this document useful (0 votes)

57 views29 pages

06 Profiling

This document summarizes profiling and code optimization techniques for improving program performance. It discusses Amdahl's Law, profiling to identify hot spots, standard compiler optimizations like common subexpression elimination and dead code elimination, and architectural optimizations like avoiding function calls. The goal is to spend time optimizing the parts of code responsible for 80% of execution time using these different profiling and optimization strategies.

Uploaded by

api-3726520

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views29 pages

06 Profiling

Uploaded by

api-3726520

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 29

Profiling & Code Optimization

Lecture 6

Introduction to Embedded Systems

Administrivia
• Lecture first half
• Quiz #1 for the second half of today’s lecture

Introduction to Embedded Systems

Summary of Previous Lecture
• Overview of the ARM Debug Monitor
• Loading a Program
• The ARM Image Format
• What happens on program startup?

Introduction to Embedded Systems

Outline of This Lecture
• Profiling
– Amdahl’s Law
– The 80/20 rule
– Profiling in the ARM environment

• Improving program performance

– Standard compiler optimizations
– Aggressive compiler optimizations
– Architectural code optimizations

Introduction to Embedded Systems

Quote of the Day

I haven’t failed. I’ve found 10,000 ways that won’t work.

– Benjamin Franklin

Introduction to Embedded Systems

Profiling and Benchmark Analysis
• Problem: You're given a program's source code (which someone else
wrote) and asked to improve its performance by at least 20%

• Where do you begin?

– Look at source code and try to find inefficient C code
– Try rewriting some of it in assembly
– Rewrite using a different algorithm
– (Remove random portions of the code) 

Introduction to Embedded Systems

Gene Amdahl
• One of the original architects of
the IBM 360 mainframe series

• Founded four companies

– Amdahl Corporation
– Trilogy Systems (Part of Elxsi)
– Andor Systems
– Commercial Data Servers (CDS)

• A relatively few sequential

instructions might have a limiting
factor on program speedup such
that adding more processors may
not make the program run faster.

Introduction to Embedded Systems

Amdahl’s Law

Introduction to Embedded Systems

Profiling and Benchmark Analysis (cont’d)
• Most important question ...
– Where is the program spending most of its time?

• Amdahl's Law
– The performance improvement gained from using some faster mode of
execution is limited by the fraction of the total time the faster mode can be
used

• Example:

Optimizable
2x Speedup

Unoptimizable Unoptimizable

Introduction to Embedded Systems

Profiling and Benchmark Analysis (cont’d)
• How do we figure out where a program is spending its time?

– If we could count every static instruction, we would know which

routines (functions) were the biggest
• Big deal, large functions that aren't executed often don't really
matter

– If we could count every dynamic instruction, we would know which

routines executed the most instructions
• Excellent! It tells us the “relative importance” of each function
• But doesn't account for memory system (stalls)

– If we could count how many cycles were spent in each routine, we

would know which routines took the most amount of time

Introduction to Embedded Systems

Profiling
• Profiling: collecting statistics from example executions
– Very useful for estimating importance of each routine
– Common profiling approaches:
• Instrument all procedure call/return points (expensive: e.g., 20% overhead)
• Sampling PC every X milliseconds so long as program run is significantly
longer than the sampling period, the accuracy of profiling is pretty good
– Usually results in output such as
Routine % of Execution Time
function_a 60%
function_b 27%
function_c 4%
...
function_zzz 0.01%
– Often over 80% of the time spent in less than 20% of the code (80/20 rule)
– Can now do more accurate profiling with onchip counters and analysis
tools
• Alpha, Pentium, Pentium Pro, PowerPC
• DEC Atom analysis tool
• Both are covered in Advanced Computer Architecture courses
Introduction to Embedded Systems
Introduction to Embedded Systems
Timing execution with armsd
• The simulator simulates every cycle
– Can gather very accurate timings for each function

• Run the simulator to determine total time

– Section 4.8 of the ARM Developer Suite AXD and armsd Debugger’s Guide

• Compiler can optimize for speed

prompt> armcc Otime o sort sorts.c

• Can also optimize for size

prompt> armcc Ospace o sort sorts.c

• Rerun the simulator to determine new total time

– new time is 2,059,629 µsecs an improvement of 4.5% (compared to g)

Introduction to Embedded Systems

Profiling with armsd
• No compiletime options needed

• Run the simulator to profile, capturing callgraph data

prompt> armsd
armsd: load/callgraph sorts
armsd: ProfOn
armsd: go
armsd: ProfWrite sorts.prf
armsd: quit
prompt> armprof Parent sorts.prf > profile

• To profile for only samples, skip the “/callgraph” portion

– avoids the 20% overhead (in this example)

Introduction to Embedded Systems

armprof output
Name cum% self% desc% calls
main 96.4% 0.16% 95.88% 0
qsort 0.44% 0.75% 1
_printf 0.00% 0.00% 3
clock 0.00% 0.00% 6
_sprintf 0.34% 3.56% 1000
randomise 0.12% 0.69% 1
hell_sort 1.59% 3.43% 1
insert_sort 19.91% 59.44% 1

main 19.91% 59.44% 1
insert_sort 79.35% 19.91% 59.44% 1
strcmp 59.44% 0.00% 243432

qs_string_compare 3.17% 0.00% 13021
shell_sort 3.43% 0.00% 14059
insert_sort 59.44% 0.00% 243432
strcmp 66.05% 66.05% 0.00% 270512

Introduction to Embedded Systems

Optimizing “sorts”
• Almost 60% of time spent in strcmp called by insert_sort
• strcmp compares two strings and returns int
– 0 if equal, negative if first is ``less than'' second, positive otherwise
• Replace “strcmp(a,b)” call with some initial compares
if (a[0] < b[0]) {
result is neg
}
if (a[0] == b[0]) {
if (a[1] < b[1]) {
result is neg
}
if (a[1] == b[1]) {
if (strcmp(a,b) <= 0) {
result is neg or zero
}
}
}

• Result of this change is 20% reduction in execution time

– Avoids some procedure call overheads (inlining)
– Avoids some loop control overheads (loop unrolling)
– Handles common cases efficiently and other cases correctly

Introduction to Embedded Systems

Improving Program Performance
• Compiler writers try to apply several standard optimizations
– Do not always succeed

• Compiler writers sometimes apply aggressive optimizations

– Often not “informed” enough to know that change will help rather than hurt

• Optimizations based on specific architecture/implementation

characteristics can be very helpful
– Much harder for compiler writers because it requires multiple, generally
very different, “backend” implementations

• How can one help?

– Better code, algorithms and data structures (of course)
– Reorganize code to help compiler find opportunities for improvement
– Replace poorly optimized code with assembly code (i.e., bypass compiler)

Introduction to Embedded Systems

Standard Compiler Optimizations
• Common Subexpression Elimination
– Formally, “An occurrence of an expression E is called a common sub-
expression if E was previously computed, and the values of variables in E
have not changed since the previous computation.”
– You can avoid recomputing the expression if we can use the previously
computed one.
– Benefit: less code to be executed

b: b:
t6 = 4 * i t6 = 4* i
x = a[t6] x = a[t6]
t7 = 4 * i t8 = 4 * j
t8 = 4 * j t9 = a[t8]
t9 = a[t8] a[t6] = t9
a[t7] = t9 a[t8] = x
t10 = 4 * j goto b
a[t10] = x
Before After
goto b

Introduction to Embedded Systems

Standard Compiler Optimizations
• DeadCode Elimination
– If code is definitely not going to be executed during any run of a program,
then it is called dead code and can be removed.
– Example:
debug = 0;
...
if (debug){
print .....
}
– You can help by using ASSERTs and #ifdefs to tell the compiler about
dead code
• It is often difficult for the compiler to identify dead code itself

Introduction to Embedded Systems

Standard Compiler Optimizations (con't)
• Induction Variables and Strength Reduction
– A variable X is called an induction variable of a loop L if every time the
variable X changed value, it is incremented or decremented by some
constant
– When there are 2 or more induction variables in a loop, it may be possible to
get rid of all but one
– It is also frequently possible to perform strength reduction on induction
variables
• the strength of an instruction corresponds to its execution cost
– Benefit: fewer and less expensive operations

t4 = 0 t4 = 0
label_XXX label_XXX
j = j + 1 t4 += 4
t4 = 4 * j t5 = a[t4]
t5 = a[t4] if (t5 > v) goto label_XXX
if (t5 > v) goto label_XXX

Before After

Introduction to Embedded Systems

Aggressive Compiler Optimizations
• Inlining of functions
– Replacing a call to a function with the function's code is called “inlining”
– Benefit: reduction in procedure call overheads and opportunity for additional code
optimizations
– Danger: code bloat and negative instruction cache effects
– Appropriate when small and/or called from a small number of sites
MOV r0, r4 ; r4 > r0 (param 1) ADD r5, r4, #4
MOV r1, #4 ; 4 > r1 (param 2) SWI 0x11
BL c_add ; call c_add
MOV r5, r0 ; r0 (result) > r5
SWI 0x11 ; terminate
c_add
MOV r12, r13 ; save sp
STMDB r13!, {r0,r1,r11,r12,r14,pc} ; save regs
SUB r11, r12, #4 ; (sp 4) > r11
MOV r2, r0 ; param 1 > r2
ADD r3, r2, r1 ; param 1 + param 2 --> r3
MOV r0, r3 ; move result to r0
LDMDB r11, {r11, r13, pc} ; restore regs
Before After

Introduction to Embedded Systems

Aggressive Compiler Optimizations (2)
• Loop Unrolling
– Doing multiple iterations of work in each iteration is called “loop unrolling”
– Benefit: reduction in looping overheads and opportunity for more code opts.
– Danger: code bloat, negative instruction cache effects, and non-integral loop div.
– Appropriate when small and/or called from small number of sites
MOV r4, #0
MOV r4, #0
sym1:CMP r4, #4
sym1: CMP r4, #0x10 BLT sym3
BLT sym3 B sym4
B sym4 sym2:ADD r4, r4, #1
sym2: B sym1
ADD r4, r4, #1 sym3: LDR r1, [r13, r4, lsl #2]
B sym1 ADD r0, r13, r4, lsl #2 1
sym3: LDR r0, [r13, r4, lsl #2] LDR r0, [r0, #4] 2
ADD r1, r1, r0
ADD r5, r0, r5 ADD r0, r13, r4, lsl #2 3
B sym2 LDR r0, [r0, #8]
sym4: ADD r1, r1, r0
ADD r0, r13, r4, lsl #2 4
LDR r0, [r0, #0xc]
Before After ADD r6, r1, r0
B sym2
Loop in sym3 is unrolled 4 times sym4:

Introduction to Embedded Systems

Introduction to Embedded Systems
Architectural/Code Optimizations
• Often, it is important to understand the architecture's implementation in
order to effectively optimize code
– Much more difficult for compilers to do because it requires a different
compiler backend for every implementation

• One example of this is the ARM barrel shifter

– Can convert Y * Constant into series of adds and shifts
• Y*9=Y*8+Y*1
• Assume R1 holds Y and R2 will hold the result
– ADD R2, R1, R1, LSL #3 ; LSL #3 is same as * by 8

• Another example is the ARM 7500 write buffer specifics

Introduction to Embedded Systems

ARM Path to Memory
• Normally, a STR will
Address Register
write data directly to
memory
Addr Incrementer
• Example Incrementer Bus
– STR r1, SP!
– Writes contents of r1 to ALU Bus Register Bank
memory
– Requires n cycles,
A Bus Barrel Shifter
where n is the time B Bus
necessary to access 32bit ALU
memory (typically 5
100 cycles)
– Very costly to Read Data/ Instr Reg
Mem Addr Register Write Data Register
performance but
doesn't really matter Dout[31:0] Data[31:0]
what the code looks
like RAM

Introduction to Embedded Systems

ARM Write Buffer
• “Write buffer”
holds writes and Address Register
slowly retires them
Addr
to memory while ALU Bus Incrementer
Incrementer
Bus
processor
continues to
Write Buffer Register Bank
execute other (holds address and data)
instructions
• A Bus Barrel Shifter
Allows multiple B Bus
writes to occur 32bit ALU
backtoback
• Now the order of
code does matter Mem Addr Register Write Data Register Read Data/Instr Reg

Dout[31:0] Data[31:0]

RAM

Introduction to Embedded Systems

Critical Thinking
• When is optimization a bad thing?

Introduction to Embedded Systems

Summary of Lecture
• Profiling
– Amdahl’s Law
– The 80/20 rule
– Profiling in the ARM environment

• Improving program performance

– Standard compiler optimizations
• Common subexpression elimination
• Deadcode elimination
• Induction variables
– Aggressive compiler optimizations
• Inlining of functions
• Loop unrolling
– Architectural code optimizations

Introduction to Embedded Systems

And Now For Something Completely Different

Good luck for Quiz #1 !

Introduction to Embedded Systems

HCCDA - Tech Essentials Exam Outline
No ratings yet
HCCDA - Tech Essentials Exam Outline
4 pages
WWW - Immigration.go - Ke - Downloads - Form-19-Application For Kenya Passport PDF
No ratings yet
WWW - Immigration.go - Ke - Downloads - Form-19-Application For Kenya Passport PDF
4 pages
Chapter 09 Embedded Firmware Design and Development
76% (17)
Chapter 09 Embedded Firmware Design and Development
63 pages
Module-3 ARMProgram Notes.-16857877494142 PDF
No ratings yet
Module-3 ARMProgram Notes.-16857877494142 PDF
5 pages
GCC Profile Guided Optimization
No ratings yet
GCC Profile Guided Optimization
47 pages
Chap 7Lesson01Emsys3ECProgrElements
No ratings yet
Chap 7Lesson01Emsys3ECProgrElements
55 pages
Assignment in Embedded System (CT74)
No ratings yet
Assignment in Embedded System (CT74)
24 pages
Ecprogramiii Opt Tool
No ratings yet
Ecprogramiii Opt Tool
47 pages
Lecture 3: Performance/Power, MIPS Instructions
No ratings yet
Lecture 3: Performance/Power, MIPS Instructions
22 pages
Embedded Systems - CS 2364
100% (1)
Embedded Systems - CS 2364
97 pages
Embedded C Interview Questions
75% (4)
Embedded C Interview Questions
3 pages
Unit.3
No ratings yet
Unit.3
37 pages
Jamshaid Sarwar Malik: Introduction To Embedded Systems Lectures 1 - 2
No ratings yet
Jamshaid Sarwar Malik: Introduction To Embedded Systems Lectures 1 - 2
36 pages
Microcontroller and Embedded Systems 21cs43 Mes Vtu Notes 2021
No ratings yet
Microcontroller and Embedded Systems 21cs43 Mes Vtu Notes 2021
221 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
25 pages
Unit 2 Basic Optimization Techniques For Serial Code
No ratings yet
Unit 2 Basic Optimization Techniques For Serial Code
31 pages
C Programming Language Review Language Review: 1 Embedded Systems
No ratings yet
C Programming Language Review Language Review: 1 Embedded Systems
49 pages
25 Wrapup
No ratings yet
25 Wrapup
34 pages
Arm Programming Using Assembly Language: Microcontroller and Embedded Systems
No ratings yet
Arm Programming Using Assembly Language: Microcontroller and Embedded Systems
16 pages
Lecture 2: Performance/Power, MIPS Instructions
No ratings yet
Lecture 2: Performance/Power, MIPS Instructions
28 pages
Class Ans Q
No ratings yet
Class Ans Q
24 pages
Unit.3
No ratings yet
Unit.3
46 pages
Performance and Tuning of Openmp Programs
No ratings yet
Performance and Tuning of Openmp Programs
76 pages
Embedded System LESSONPLAN
No ratings yet
Embedded System LESSONPLAN
7 pages
MCM 3 Notes
No ratings yet
MCM 3 Notes
28 pages
Module 4
No ratings yet
Module 4
70 pages
HPC Unit 5 B
No ratings yet
HPC Unit 5 B
31 pages
Unit 2
No ratings yet
Unit 2
77 pages
Embedded Systems
No ratings yet
Embedded Systems
93 pages
21CS43 Notes-PDF 3
No ratings yet
21CS43 Notes-PDF 3
17 pages
Ch1 Introduction To Embedded Systems
No ratings yet
Ch1 Introduction To Embedded Systems
23 pages
Unit - I
No ratings yet
Unit - I
47 pages
T3 Embedded Programing 07072022
No ratings yet
T3 Embedded Programing 07072022
89 pages
Lecture 3: Performance/Power, MIPS Instructions
No ratings yet
Lecture 3: Performance/Power, MIPS Instructions
18 pages
Embedded Systems Unit 3
No ratings yet
Embedded Systems Unit 3
69 pages
17 Code Optimization 05-02-2025
No ratings yet
17 Code Optimization 05-02-2025
52 pages
03 - Introduction To Embedded Systems
No ratings yet
03 - Introduction To Embedded Systems
11 pages
Es Module 2 Notes PDF
No ratings yet
Es Module 2 Notes PDF
11 pages
Embedded System and Development: Rajani Bhandari Senior Project Manager HCL Technologies
No ratings yet
Embedded System and Development: Rajani Bhandari Senior Project Manager HCL Technologies
38 pages
Unit-Ii C and Assembly: Software Technology For Embedded Systems
100% (2)
Unit-Ii C and Assembly: Software Technology For Embedded Systems
41 pages
It2354 Es Notes
No ratings yet
It2354 Es Notes
91 pages
15-745 Optimizing Compilers: What Is A Compiler?
No ratings yet
15-745 Optimizing Compilers: What Is A Compiler?
13 pages
CS1601 Computer Architecture
100% (1)
CS1601 Computer Architecture
389 pages
Unit 7 Jntuworld
No ratings yet
Unit 7 Jntuworld
11 pages
Unit3 & 4 Esd Design
No ratings yet
Unit3 & 4 Esd Design
62 pages
Es (U4) 1
No ratings yet
Es (U4) 1
24 pages
Clase de Progrea 555
No ratings yet
Clase de Progrea 555
35 pages
HW 2 Is Out! Due 9/25!
No ratings yet
HW 2 Is Out! Due 9/25!
21 pages
Week10a PDF
No ratings yet
Week10a PDF
23 pages
CH2 Arm
No ratings yet
CH2 Arm
68 pages
Module 2
No ratings yet
Module 2
41 pages
2nd Session Unit III 1
No ratings yet
2nd Session Unit III 1
26 pages
Software Testing: 1 Embedded Systems
No ratings yet
Software Testing: 1 Embedded Systems
35 pages
EE382N-4 Advanced Microcontroller Systems: Embedded Software Optimization and Power Aware Software Development
No ratings yet
EE382N-4 Advanced Microcontroller Systems: Embedded Software Optimization and Power Aware Software Development
31 pages
Lect15 16 Emb FMW
No ratings yet
Lect15 16 Emb FMW
20 pages
Embedded Development Life Cycle
50% (2)
Embedded Development Life Cycle
4 pages
T3 - Embedded Programing - 07072022
No ratings yet
T3 - Embedded Programing - 07072022
88 pages
Lecture02 FundamentalsOfComputerDesign
No ratings yet
Lecture02 FundamentalsOfComputerDesign
47 pages
AT - Better C Code For ARM Devices
No ratings yet
AT - Better C Code For ARM Devices
30 pages
The Embedded Design Life Cycle
No ratings yet
The Embedded Design Life Cycle
6 pages
IES - UNIT - 3 - Notes
No ratings yet
IES - UNIT - 3 - Notes
35 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Resume 2
No ratings yet
Resume 2
2 pages
Resume 2
No ratings yet
Resume 2
2 pages
Resume 2
No ratings yet
Resume 2
2 pages
Resume 2
No ratings yet
Resume 2
2 pages
C51 Primer
No ratings yet
C51 Primer
190 pages
Writing C Code For The 8051
No ratings yet
Writing C Code For The 8051
52 pages
C51
100% (1)
C51
393 pages
Philips P89C51RD2 6
No ratings yet
Philips P89C51RD2 6
52 pages
IDE51
No ratings yet
IDE51
116 pages
Gs51 Keil Intro
No ratings yet
Gs51 Keil Intro
230 pages
C51
100% (1)
C51
393 pages
24C512
No ratings yet
24C512
20 pages
Placement Guide
No ratings yet
Placement Guide
457 pages
The 8051 Assembly Language
100% (1)
The 8051 Assembly Language
82 pages
AT24C512 2tp
No ratings yet
AT24C512 2tp
4 pages
Shell Ino
No ratings yet
Shell Ino
13 pages
24 Rtos
No ratings yet
24 Rtos
38 pages
20 Ad2d D2a
No ratings yet
20 Ad2d D2a
30 pages
19 RealTime Synchronization
No ratings yet
19 RealTime Synchronization
27 pages
18
100% (2)
18
62 pages
15 Trad Sched
No ratings yet
15 Trad Sched
17 pages
13 Con Currency
No ratings yet
13 Con Currency
40 pages
Chubirka Michele Tyranny Expensive Security
No ratings yet
Chubirka Michele Tyranny Expensive Security
46 pages
Accessing/ Traversing Peoplesoft Component Buffer: Peopletools 8.4 Peoplebook: Peoplesoft Peoplecode Developer'S Guide
No ratings yet
Accessing/ Traversing Peoplesoft Component Buffer: Peopletools 8.4 Peoplebook: Peoplesoft Peoplecode Developer'S Guide
2 pages
How To Unlock Bootloader On Infinix Note 7
No ratings yet
How To Unlock Bootloader On Infinix Note 7
12 pages
Complete Computer Architecture From The Stone Age To The Quantum Age Charles Fox PDF For All Chapters
100% (6)
Complete Computer Architecture From The Stone Age To The Quantum Age Charles Fox PDF For All Chapters
62 pages
eSAP Automation Company Profile
No ratings yet
eSAP Automation Company Profile
15 pages
Different Type Network Security Threats and Solutions, A Review
No ratings yet
Different Type Network Security Threats and Solutions, A Review
11 pages
VL5F Series - Datasheet (Low) - LG Video Wall - 200220
No ratings yet
VL5F Series - Datasheet (Low) - LG Video Wall - 200220
6 pages
Code Clean of the Road (Chinese Edition) - Robert C Martin; Lei Han - Di 1 Ban, 北京 Beijing, 2010 - The People's Posts and Telecommunications - 9787115216878 - - Anna's Archive
No ratings yet
Code Clean of the Road (Chinese Edition) - Robert C Martin; Lei Han - Di 1 Ban, 北京 Beijing, 2010 - The People's Posts and Telecommunications - 9787115216878 - - Anna's Archive
414 pages
BSC in Network Security and Computer Forensics
No ratings yet
BSC in Network Security and Computer Forensics
2 pages
Course Outline Csc201
No ratings yet
Course Outline Csc201
9 pages
Ccs370 - Ui & Ux Design
No ratings yet
Ccs370 - Ui & Ux Design
6 pages
Software Design and Architecture: Week 3 A Case Study: Designing A Document Editor - Lexi
No ratings yet
Software Design and Architecture: Week 3 A Case Study: Designing A Document Editor - Lexi
42 pages
Customer Release Notes 07 01 83 7 1 Phase I - RGM
No ratings yet
Customer Release Notes 07 01 83 7 1 Phase I - RGM
17 pages
Tactix Charlie: Owner's Manual
100% (1)
Tactix Charlie: Owner's Manual
46 pages
Artificial Intelligence in Project Management A Study of The Role of AI Powered Chatbots in Project Stakeholder Engagement
No ratings yet
Artificial Intelligence in Project Management A Study of The Role of AI Powered Chatbots in Project Stakeholder Engagement
6 pages
Seq Cheat Sheet
No ratings yet
Seq Cheat Sheet
5 pages
Tay Ho Bus Route For Dance Show - 18.05.2024 - For Audiences
No ratings yet
Tay Ho Bus Route For Dance Show - 18.05.2024 - For Audiences
2 pages
Revit + BIM
No ratings yet
Revit + BIM
11 pages
V6.3.2a ReleaseNotes v1.0
No ratings yet
V6.3.2a ReleaseNotes v1.0
106 pages
BIG-IP DNS Presentation
No ratings yet
BIG-IP DNS Presentation
18 pages
Background Bundle Js LICENSE
No ratings yet
Background Bundle Js LICENSE
9 pages
Revit Structure Test
100% (3)
Revit Structure Test
6 pages
RLGG Crash
No ratings yet
RLGG Crash
14 pages
XYZ Katalog Centra 2010-2012
No ratings yet
XYZ Katalog Centra 2010-2012
20 pages
Denis CV
No ratings yet
Denis CV
4 pages
Visos An Assistive Computer Based Testing (CBT) in The Examination of The Visually Impaired in Nigeria
No ratings yet
Visos An Assistive Computer Based Testing (CBT) in The Examination of The Visually Impaired in Nigeria
7 pages
Features of Iphone That Android Doesn't - Google Search
No ratings yet
Features of Iphone That Android Doesn't - Google Search
1 page
How To Edit EBR Files and Change Partition Size in MediaTek Phones
100% (1)
How To Edit EBR Files and Change Partition Size in MediaTek Phones
37 pages

06 Profiling

Uploaded by

06 Profiling

Uploaded by

Profiling & Code Optimization

Introduction to Embedded Systems

Introduction to Embedded Systems

Introduction to Embedded Systems

• Improving program performance

Introduction to Embedded Systems

I haven’t failed. I’ve found 10,000 ways that won’t work.

Introduction to Embedded Systems

• Where do you begin?

Introduction to Embedded Systems

• Founded four companies

• A relatively few sequential

Introduction to Embedded Systems

Introduction to Embedded Systems

Introduction to Embedded Systems

– If we could count every static instruction, we would know which

– If we could count every dynamic instruction, we would know which

– If we could count how many cycles were spent in each routine, we

Introduction to Embedded Systems

• Run the simulator to determine total time

• Compiler can optimize for speed

• Can also optimize for size

• Re­run the simulator to determine new total time

Introduction to Embedded Systems

• Run the simulator to profile, capturing callgraph data

• To profile for only samples, skip the “/callgraph” portion

Introduction to Embedded Systems

Introduction to Embedded Systems

• Result of this change is 20% reduction in execution time

Introduction to Embedded Systems

• Compiler writers sometimes apply aggressive optimizations

• Optimizations based on specific architecture/implementation

• How can one help?

Introduction to Embedded Systems

Introduction to Embedded Systems

Introduction to Embedded Systems

Introduction to Embedded Systems

Introduction to Embedded Systems

Introduction to Embedded Systems

• One example of this is the ARM barrel shifter

• Another example is the ARM 7500 write buffer specifics

Introduction to Embedded Systems

Introduction to Embedded Systems

Introduction to Embedded Systems

Introduction to Embedded Systems

• Improving program performance

Introduction to Embedded Systems

Good luck for Quiz #1 !

Introduction to Embedded Systems

You might also like

• Rerun the simulator to determine new total time