Assembly Language Programming With Ubuntu
Assembly Language Programming With Ubuntu
Assembly Language
Programming
with
Ubuntu
Ed Jorgensen
Version 1.0.34
March 2016
Cover image:
AMD Opteron, the first CPU to introduce the x86-64 extensions in 2003.
Source: http://en.wikipedia.org/wiki/File:AMD_Opteron_146_Venus,_2005.jpg
Cover background:
By Benjamint444 (Own work)
Source: http://commons.wikimedia.org/wiki/File%3ASwirly_belt444.jpg
Table of Contents
Table of Contents
1.0 Introduction...............................................................................................................1
1.1 Prerequisites...........................................................................................................1
1.2 What is Assembly Language..................................................................................2
1.3 Why Learn Assembly Language............................................................................2
1.3.1 Gain a Better Understanding of Architecture Issues......................................3
1.3.1 Understanding the Tool Chain........................................................................3
1.3.1 Improve Algorithm Development Skills........................................................3
1.3.1 Improves Understanding of Functions/Procedures........................................3
1.3.1 Gain an Understanding of I/O Buffering........................................................4
1.3.1 Understand Compiler Scope...........................................................................4
1.3.1 Introduction to Multi-processing Concepts....................................................4
1.3.1 Introduction to Interrupt Processing Concepts...............................................4
1.4 Additional References............................................................................................4
1.4.1 Ubuntu References.........................................................................................5
1.4.2 BASH Command Line References.................................................................5
1.4.3 Architecture References..................................................................................5
1.4.4 Tool Chain References....................................................................................5
1.4.4.1 YASM References...................................................................................6
1.4.4.2 DDD Debugger References....................................................................6
2.0 Architecture Overview..............................................................................................7
2.1 Architecture Overview...........................................................................................7
2.2 Data Storage Sizes..................................................................................................8
2.3 Central Processing Unit..........................................................................................9
2.3.1 CPU Registers..............................................................................................10
2.3.1.1 General Purpose Registers (GPRs).......................................................10
2.3.1.2 Stack Pointer Register (RSP)................................................................12
2.3.1.3 Base Pointer Register (RBP).................................................................12
2.3.1.4 Instruction Pointer Register (RIP)........................................................12
2.3.1.5 Flag Register (rFlags)...........................................................................12
2.3.1.6 XMM Registers.....................................................................................13
2.3.2 Cache Memory.............................................................................................14
2.4 Main Memory.......................................................................................................16
2.5 Memory Layout....................................................................................................17
Page iii
Table of Contents
2.6 Memory Hierarchy...............................................................................................17
2.7 Exercises...............................................................................................................19
2.7.1 Quiz Questions.............................................................................................19
3.0 Data Representation...............................................................................................21
3.1 Integer Representation..........................................................................................21
3.1.1 Two's Compliment........................................................................................23
3.1.2 Byte Example...............................................................................................23
3.1.3 Word Example..............................................................................................24
3.2 Unsigned and Signed Addition.............................................................................24
3.3 Floating-point Representation..............................................................................24
3.3.1 IEEE 32-bit Representation..........................................................................25
3.3.1.1 IEEE 32-bit Representation Examples.................................................26
3.3.1.1.1 Example -7.7510......................................................................26
3.3.1.1.2 Example -0.12510....................................................................26
3.3.1.1.3 Example 4144000016..............................................................27
3.3.2 IEEE 64-bit Representation..........................................................................27
3.3.3 Not a Number (NaN)....................................................................................27
3.4 Characters and Strings..........................................................................................28
3.4.1 Character Representation..............................................................................28
3.4.1.1 American Standard Code for Information Interchange.........................28
3.4.1.2 Unicode.................................................................................................29
3.4.2 String Representation...................................................................................29
3.5 Exercises...............................................................................................................30
3.5.1 Quiz Questions.............................................................................................30
4.0 Program Format.....................................................................................................33
4.1 Comments.............................................................................................................33
4.2 Numeric Values....................................................................................................33
4.3 Defining Constants...............................................................................................34
4.4 Data Section.........................................................................................................34
4.5 BSS Section..........................................................................................................35
4.6 Text Section..........................................................................................................36
4.7 Example Program.................................................................................................37
4.8 Exercises...............................................................................................................39
4.8.1 Quiz Questions.............................................................................................39
5.0 Tool Chain................................................................................................................41
5.1 Assemble/Link/Load Overview............................................................................41
5.2 Assembler.............................................................................................................43
Page iv
Table of Contents
5.2.1 Assemble Commands...................................................................................43
5.2.2 List File.........................................................................................................43
5.2.3 Two-Pass Assembler.....................................................................................45
5.2.3.1 First Pass...............................................................................................46
5.2.3.2 Second Pass...........................................................................................46
5.2.4 Assembler Directives....................................................................................47
5.3 Linker...................................................................................................................47
5.3.1 Linking Multiple Files..................................................................................48
5.3.2 Linking Process............................................................................................48
5.3.3 Dynamic Linking..........................................................................................50
5.4 Assemble/Link Script...........................................................................................50
5.5 Loader...................................................................................................................52
5.6 Debugger..............................................................................................................52
5.7 Exercises...............................................................................................................53
5.7.1 Quiz Questions.............................................................................................53
6.0 DDD Debugger........................................................................................................55
6.1 Starting DDD........................................................................................................55
6.1.1 DDD Configuration Settings........................................................................57
6.2 Program Execution with DDD.............................................................................57
6.2.1 Setting Breakpoints......................................................................................57
6.2.2 Executing Programs......................................................................................58
6.2.2.1 Run / Continue......................................................................................60
6.2.2.2 Next / Step............................................................................................60
6.2.3 Displaying Register Contents.......................................................................60
6.2.4 DDD/GDB Commands Summary................................................................62
6.2.4.1 DDD/GDB Commands, Examples.......................................................64
6.2.5 Displaying Stack Contents...........................................................................65
6.2.6 Debugger Commands File (interactive).......................................................65
6.2.6.1 Debugger Commands File (non-interactive)........................................66
6.2.6.2 Debugger Commands File (non-interactive)........................................67
6.3 Exercises...............................................................................................................67
6.3.1 Quiz Questions.............................................................................................67
6.3.2 Suggested Projects........................................................................................68
7.0 Instruction Set Overview........................................................................................71
7.1 Notational Conventions........................................................................................71
7.1.1 Operand Notation.........................................................................................72
7.2 Data Movement....................................................................................................73
Page v
Table of Contents
7.3 Addresses vs Values.............................................................................................75
7.4 Conversion Instructions........................................................................................76
7.4.1 Narrowing Conversions................................................................................76
7.4.2 Widening Conversions..................................................................................76
7.4.2.1 Unsigned Conversions..........................................................................77
7.4.2.2 Signed Conversions..............................................................................78
7.5 Integer Arithmetic Instructions.............................................................................80
7.5.1 Addition........................................................................................................80
7.5.1.1 Addition with Carry..............................................................................83
7.5.2 Subtraction....................................................................................................86
7.5.3 Integer Multiplication...................................................................................89
7.5.3.1 Unsigned Multiplication.......................................................................89
7.5.3.2 Signed Multiplication...........................................................................93
7.5.4 Integer Division............................................................................................96
7.6 Logical Instructions............................................................................................103
7.6.1 Logical Operations.....................................................................................104
7.6.2 Shift Operations..........................................................................................105
7.6.2.1 Logical Shift.......................................................................................105
7.6.2.2 Arithmetic Shift...................................................................................107
7.6.3 Rotate Operations.......................................................................................109
7.7 Control Instructions............................................................................................110
7.7.1 Labels..........................................................................................................110
7.7.2 Unconditional Control Instructions............................................................111
7.7.3 Conditional Control Instructions................................................................111
7.7.3.1 Jump Out Of Range............................................................................114
7.7.4 Iteration.......................................................................................................117
7.8 Example Program, Sum of Squares....................................................................119
7.9 Exercises.............................................................................................................120
7.9.1 Quiz Questions...........................................................................................120
7.9.2 Suggested Projects......................................................................................124
8.0 Addressing Modes.................................................................................................129
8.1 Addresses vs Values...........................................................................................129
8.1.1 Register Mode Addressing.........................................................................130
8.1.2 Immediate Mode Addressing......................................................................130
8.1.3 Memory Mode Addressing.........................................................................131
8.2 Example Program, List Summation...................................................................134
8.3 Example Program, Pyramid Areas and Volumes................................................135
8.4 Exercises.............................................................................................................140
Page vi
Table of Contents
8.4.1 Quiz Questions...........................................................................................140
8.4.2 Suggested Projects......................................................................................144
9.0 Process Stack.........................................................................................................147
9.1 Stack Example....................................................................................................147
9.2 Stack Instructions...............................................................................................148
9.3 Stack Implementation.........................................................................................149
9.3.1 Stack Layout...............................................................................................150
9.3.2 Stack Operations.........................................................................................151
9.4 Stack Example....................................................................................................153
9.5 Exercises.............................................................................................................154
9.5.1 Quiz Questions...........................................................................................155
9.5.2 Suggested Projects......................................................................................156
10.0 Program Development........................................................................................157
10.1 Understand the Problem...................................................................................157
10.2 Create the Algorithm........................................................................................158
10.3 Implement the Program....................................................................................160
10.4 Test/Debug the Program...................................................................................162
10.5 Error Terminology............................................................................................163
10.5.1 Assembler Error........................................................................................163
10.5.2 Run-time Error..........................................................................................164
10.5.3 Logic Error...............................................................................................164
10.6 Exercises...........................................................................................................164
10.6.1 Quiz Questions.........................................................................................164
10.6.2 Suggested Projects....................................................................................165
11.0 Macros..................................................................................................................167
11.1 Single-Line Macros..........................................................................................167
11.2 Multi-Line Macros...........................................................................................168
11.2.1 Macro Definition......................................................................................168
11.2.2 Using a Macros.........................................................................................168
11.3 Macro Example................................................................................................169
11.4 Debugging Macros...........................................................................................171
11.5 Exercises...........................................................................................................172
11.5.1 Quiz Questions..........................................................................................172
11.5.2 Suggested Projects....................................................................................172
12.0 Functions..............................................................................................................173
12.1 Stack Dynamic Local Variables.......................................................................173
Page vii
Table of Contents
12.2 Function Declaration........................................................................................174
12.3 Standard Calling Convention...........................................................................174
12.4 Linkage.............................................................................................................175
12.5 Argument Transmission....................................................................................176
12.6 Calling Convention..........................................................................................176
12.6.1 Parameters Passing...................................................................................177
12.6.2 Register Usage..........................................................................................178
12.6.3 Call Frame................................................................................................179
12.6.3.1 Red Zone...........................................................................................181
12.7 Example, Statistical Function 1 (leaf)..............................................................181
12.7.1 Caller........................................................................................................182
12.7.2 Callee........................................................................................................182
12.8 Example, Statistical Function2 (non-leaf)........................................................184
12.8.1 Caller........................................................................................................184
12.8.2 Callee........................................................................................................185
12.9 Stack-Based Local Variables............................................................................188
12.10 Summary........................................................................................................191
12.11 Exercises.........................................................................................................193
12.11.1 Quiz Questions........................................................................................193
12.11.2 Suggested Projects..................................................................................194
13.0 System Services...................................................................................................197
13.1 Calling System Services...................................................................................197
13.2 Newline Character............................................................................................198
13.3 Console Output.................................................................................................199
13.3.1 Example, Console Output.........................................................................200
13.4 Console Input...................................................................................................203
13.4.1 Example, Console Input...........................................................................204
13.5 File Open Operations.......................................................................................208
13.5.1 File Open..................................................................................................209
13.5.2 File Open/Create.......................................................................................210
13.6 File Read...........................................................................................................211
13.7 File Write..........................................................................................................212
13.8 File Operations Examples................................................................................212
13.8.1 Example, File Write..................................................................................212
13.8.2 Example, File Read..................................................................................218
13.9 Exercises...........................................................................................................224
13.9.1 Quiz Questions.........................................................................................224
13.9.2 Suggested Projects....................................................................................225
Page viii
Table of Contents
14.0 Multiple Source Files..........................................................................................227
14.1 Extern Statement..............................................................................................227
14.2 Example, Sum and Average..............................................................................228
14.2.1 Assembly Main.........................................................................................228
14.2.2 Function Source........................................................................................230
14.2.3 Assemble and Link...................................................................................232
14.3 Interfacing with a High-Level Language.........................................................232
14.3.1 Example, C++ Main / Assembly Function...............................................232
14.3.2 Compile, Assemble, and Link..................................................................234
14.4 Exercises...........................................................................................................235
14.4.1 Quiz Questions.........................................................................................235
14.4.2 Suggested Projects....................................................................................235
15.0 Stack Buffer Overflow........................................................................................237
15.1 Understanding a Stack Buffer Overflow..........................................................238
15.2 Code to Inject...................................................................................................239
15.3 Code Injection..................................................................................................242
15.4 Code Injection Protections...............................................................................243
15.4.1 Data Stack Smashing Protector (or Canaries)..........................................244
15.4.2 Data Execution Prevention.......................................................................244
15.4.3 Data Address Space Layout Randomization............................................244
15.5 Exercises...........................................................................................................244
15.5.1 Quiz Questions.........................................................................................244
15.5.2 Suggested Projects....................................................................................245
16.0 Command Line Arguments................................................................................247
16.1 Parsing Command Line Arguments.................................................................247
16.2 High-Level Language Example.......................................................................248
16.3 Argument Count and Argument Vector Table..................................................249
16.4 Assembly Language Example..........................................................................250
16.5 Exercises...........................................................................................................254
16.5.1 Quiz Questions.........................................................................................254
16.5.2 Suggested Projects....................................................................................255
17.0 Input/Output Buffering......................................................................................257
17.1 Why Buffer.......................................................................................................257
17.2 Buffering Algorithm.........................................................................................259
17.3 Exercises...........................................................................................................262
17.3.1 Quiz Questions.........................................................................................262
17.3.2 Suggested Projects....................................................................................263
Page ix
Table of Contents
18.0 Floating Point Instructions.................................................................................265
18.1 Floating Point Values.......................................................................................265
18.2 Floating Point Registers...................................................................................265
18.3 Data Movement................................................................................................266
18.4 Integer / Floating Point Conversion Instructions.............................................268
18.5 Floating Point Arithmetic Instructions.............................................................270
18.5.1 Floating Point Addition............................................................................270
18.5.2 Floating Point Subtraction........................................................................272
18.5.3 Floating Point Multiplication...................................................................273
18.5.4 Floating Point Division.............................................................................275
18.5.5 Floating Point Square Root......................................................................277
18.6 Floating Point Control Instructions..................................................................279
18.6.1 Floating Point Comparison.......................................................................280
18.7 Floating Point Calling Conventions.................................................................283
18.8 Example Program, Sum and Average...............................................................283
18.9 Example Program, Absolute Value...................................................................285
18.10 Exercises.........................................................................................................286
18.10.1 Quiz Questions.......................................................................................286
18.10.2 Suggested Projects..................................................................................287
19.0 Parallel Processing..............................................................................................289
19.1 Distributed Computing.....................................................................................290
19.2 Multiprocessing................................................................................................290
19.2.1 POSIX Threads.........................................................................................291
19.2.2 Race Conditions........................................................................................292
19.3 Exercises...........................................................................................................295
19.3.1 Quiz Questions.........................................................................................295
19.3.2 Suggested Projects....................................................................................296
20.0 Interrupts.............................................................................................................297
20.1 Multi-user Operating System...........................................................................297
20.1.1 Interrupt Classification.............................................................................298
20.1.2 Interrupt Timing........................................................................................298
20.1.2.1 Asynchronous Interrupts...................................................................298
20.1.2.2 Synchronous Interrupts.....................................................................298
20.1.3 Interrupt Categories..................................................................................299
20.1.3.1 Hardware Interrupt............................................................................299
20.1.3.1.1 Exceptions.................................................................................299
20.1.3.2 Software Interrupts...........................................................................300
Page x
Table of Contents
20.2 Interrupt Types and Levels...............................................................................300
20.2.1 Interrupt Types..........................................................................................300
20.2.2 Privilege Levels........................................................................................300
20.3 Interrupt Processing..........................................................................................302
20.3.1 Interrupt Service Routine (ISR)................................................................302
20.3.2 Processing Steps.......................................................................................302
20.3.2.1 Suspension........................................................................................302
20.3.2.2 Obtaining ISR Address.....................................................................302
20.3.2.3 Jump to ISR......................................................................................303
20.3.2.4 Suspension Execute ISR...................................................................303
20.3.2.5 Resumption.......................................................................................304
20.4 Suspension Interrupt Processing Summary......................................................304
20.5 Exercises...........................................................................................................305
20.5.1 Quiz Questions.........................................................................................305
20.5.2 Suggested Projects....................................................................................306
21.0 Appendix A ASCII Table.................................................................................307
22.0 Appendix B Instruction Set Summary...........................................................309
22.1 Notation............................................................................................................309
22.2 Data Movement Instructions............................................................................310
22.3 Data Conversion instructions...........................................................................310
22.4 Integer Arithmetic Instructions.........................................................................311
22.5 Logical, Shift, and Rotate Instructions.............................................................313
22.6 Control Instructions..........................................................................................315
22.7 Stack Instructions.............................................................................................317
22.8 Function Instructions........................................................................................317
22.9 Floating Point Data Movement Instructions....................................................317
22.10 Floating Point Data Conversion Instructions.................................................318
22.11 Floating Point Arithmetic Instructions...........................................................320
22.12 Floating Point Control Instructions................................................................323
23.0 Appendix C System Services...........................................................................325
23.1 Return Codes....................................................................................................325
23.2 Basic System Services......................................................................................325
23.3 File Modes........................................................................................................327
23.4 Error Codes......................................................................................................328
24.0 Appendix D Quiz Question Answers..............................................................331
24.1 Quiz Question Answers, Chapter 1..................................................................331
Page xi
Table of Contents
24.2 Quiz Question Answers, Chapter 2..................................................................331
24.3 Quiz Question Answers, Chapter 3..................................................................332
24.4 Quiz Question Answers, Chapter 2..................................................................334
24.5 Quiz Question Answers, Chapter 5..................................................................335
24.6 Quiz Question Answers, Chapter 6..................................................................336
24.7 Quiz Question Answers, Chapter 7..................................................................337
24.8 Quiz Question Answers, Chapter 8..................................................................340
24.9 Quiz Question Answers, Chapter 9..................................................................341
24.10 Quiz Question Answers, Chapter 10..............................................................341
24.11 Quiz Question Answers, Chapter 11...............................................................342
24.12 Quiz Question Answers, Chapter 12..............................................................342
24.13 Quiz Question Answers, Chapter 13..............................................................343
24.14 Quiz Question Answers, Chapter 14..............................................................343
24.15 Quiz Question Answers, Chapter 15..............................................................344
24.16 Quiz Question Answers, Chapter 16..............................................................344
24.17 Quiz Question Answers, Chapter 17..............................................................345
24.18 Quiz Question Answers, Chapter 18..............................................................345
24.19 Quiz Question Answers, Chapter 19..............................................................346
24.20 Quiz Question Answers, Chapter 20..............................................................346
25.0 Alphabetical Index..............................................................................................349
Page xii
Table of Contents
Illustration Index
Illustration 1: Computer Architecture................................................................................7
Illustration 2: CPU Block Diagram..................................................................................15
Illustration 3: Little Endian Data Layout.........................................................................16
Illustration 4: General Memory Layout...........................................................................17
Illustration 5: Memory Hierarchy....................................................................................18
Illustration 6: Overview: Assemble, Link, Load..............................................................42
Illustration 7: Little Endian, Multiple Variable Data Layout...........................................44
Illustration 8: Linking Multiple Files...............................................................................49
Illustration 9: Initial Debugger Screen.............................................................................56
Illustration 10: Debugger Screen with Breakpoint Set....................................................58
Illustration 11: Debugger Screen with Green Arrow.......................................................59
Illustration 12: DDD Command Bar................................................................................60
Illustration 13: Register Window.....................................................................................61
Illustration 14: MOV Instruction Overview.....................................................................73
Illustration 15: Integer Multiplication Overview.............................................................90
Illustration 16: Integer Division Overview......................................................................98
Illustration 17: Logical Operations................................................................................104
Illustration 18: Logical Shift Overview.........................................................................106
Illustration 19: Logical Shift Operations.......................................................................106
Illustration 20: Arithmetic Right Shift...........................................................................108
Illustration 21: Process Memory Layout........................................................................150
Illustration 22: Process Memory Layout Example........................................................151
Illustration 23: Stack Call Frame Example....................................................................238
Illustration 24: Stack Call Frame Corruption.................................................................243
Illustration 25: Argument Vector Layout.......................................................................250
Illustration 26: Privilege Levels.....................................................................................301
Illustration 27: Interrupt Processing Overview..............................................................304
Page xiii
Table of Contents
Page xiv
Table of Contents
Page xv
Chapter
1
1.0 Introduction
The purpose of this text is to provide a reference for University level assembly language
and systems programming courses.
Specifically, this text addresses the x86-64 1
instruction set for the popular x86-64 class of processors using the Ubuntu 64-bit
Operating System (OS). While the provided code and various examples should work
under any Linux-based 64-bit OS, they have only been tested under Ubuntu 14.04 LTS
(64-bit).
The x86-64 is a Complex Instruction Set Computing (CISC2) is a CPU design. This
refers to the internal processor design philosophy. CISC processors typically include a
wide variety of instructions (sometime overlapping), varying instructions sizes, and a
wide range of addressing modes. The term was retroactively coined in contrast to
Reduced Instruction Set Computer (RISC3).
1.1 Prerequisites
It must be noted that the text is not geared toward learning how to program. It is
assumed that the reader has already become proficient in a high-level programming
language. Specifically, the text is generally geared toward a compiled, C-based high
level language such as C, C++, or Java. Many of the explanations and examples assume
the reader is already familiar with programming concepts such as declarations,
arithmetic operations, control structures, iteration, function calls, functions, indirection
(i.e., pointers), and variable scoping issues.
Additionally, the reader should be comfortable using a Linux-based operating system
including using the command line. If the reader is new to Linux, the Additional
References section has links to some useful documentation.
1
2
3
Page 1
Page 2
Page 3
Page 4
Page 6
Chapter
2
CPU
Primary Storage
Random Access
Memory (RAM)
BUS
(Interconnection)
Screen / Keyboard /
Mouse
Secondary Storage
(i.e., SSD / Disk Drive /
Other Storage Media)
Size (bits)
Size (bytes)
Byte
8-bits
1 byte
Word
16-bits
2 bytes
Double-word
32-bits
4 bytes
Quadword
64-bits
8 bytes
Double quadword
128-bits
16 bytes
Page 8
Storage
char
Byte
8-bits
1 byte
short
Word
16-bits
2 bytes
int
Double-word
32-bits
4 bytes
Double-word
32-bits
4 bytes
long
Double-word
32-bits
4 bytes
long long
Quadword
64-bits
8 bytes
char *
Quadword
64-bits
8 bytes
int *
Quadword
64-bits
8 bytes
float
Double-word
32-bits
4 bytes
double
Quadword
64-bits
8 bytes
unsigned int
5
Size (bits)
Size (bytes)
The asterisk indicates an address variable. For example, int * means the address of an
integer. Other high level languages typically have similar mappings.
5
6
7
8
9
10
Note, the 'long' type declaration is compiler dependent. Type shown is for gcc and g++ compilers.
For more information, refer to: http://en.wikipedia.org/wiki/Central_processing_unit
For more information, refer to: http://en.wikipedia.org/wiki/Die_(integrated_circuit)
For more information, refer to: http://en.wikipedia.org/wiki/Arithmetic_logic_unit
For more information, refer to: http://en.wikipedia.org/wiki/Processor_register
For more information, refer to: http://en.wikipedia.org/wiki/Cache_(computing)
Page 9
There are sixteen, 64-bit General Purpose Registers (GPRs). The GPSs are described in
the following table. A GPR register can be accessed with all 64-bits or some portion or
subset accessed.
Page 10
64-bit register
Lowest
32-bits
Lowest
16-bits
Lowest
8-bits
rax
eax
ax
al
rbx
ebx
bx
bl
rcx
ecx
cx
cl
rdx
edx
dx
dl
rsi
esi
si
sil
rdi
edi
di
dil
rbp
ebp
bp
bpl
rsp
esp
sp
spl
r8
r8d
r8w
r8b
r9
r9d
r9w
r9b
r10
r10d
r10w
r10b
r11
r11d
r11w
r11b
r12
r12d
r12w
r12b
r13
r13d
r13w
r13b
r14d
r14w
r14b
r15
r15d
r15w
r15b
Additionally, some of the GPR registers are used for dedicated purposes as described in
the later sections.
When using data element sizes less than 64-bits (i.e., 32-bit, 16-bit, or 8-bit), the lower
portion of the register can be accessed by using a different register name as show table.
For example, when accessing the lower portions of the 64-bit rax register, the layout is
as follows:
eax
rax =
ah
ax
al
As shown in the diagram, the first four registers, rax, rbx, rcx, and rdx also allow the
bits 8-15 to be accessed with ah, bh, ch, and dh register names. This is provided for
legacy support and will not be used in this text.
The ability to access portions of the register means that, if the quadword rax register is
set to 50,000,000,00010 (fifty billion), the rax register would contain the following value
in hex.
rax=0000000BA43B7400
If a subsequent operation sets the double-word eax register to 1,000,00010 (one million,
which is 000F424016), the rax register would contain the following value in hex.
rax=0000000B000F4240
Note that when the lower 32-bit eax portion of the 64-bit rax register is set, the upper
32-bits are unaffected.
If a subsequent operation sets the word sized ax register to 15,00010 (fifteen thousand,
which is 3A9816), the rax register would contain the following value in hex.
rax=0000000B000F3A98
Page 11
When the lower 8-bit al portion of the 64-bit rax register is set, the upper 56-bits are
unaffected.
2.3.1.2 Stack Pointer Register (RSP)
One of the GPU registers, rsp, is used to point to the current top of the stack. The rsp
register should not be used for data or other uses. Additional information regarding the
stack and stack operations is provided in the Chapter 9, Process Stack.
2.3.1.3 Base Pointer Register (RBP)
One of the GPU registers, rbp, is used to as a base pointer during function calls. The
rbp register should not be used for data or other uses. Additional information regarding
the functions and function calls is provided in the Chapter 12, Functions.
2.3.1.4 Instruction Pointer Register (RIP)
In addition to the GPRs, there is special register, rip, that is used by the CPU to point to
the next instruction to be executed. Specifically, since the rip points to the next
instruction, that means the instruction being pointed to by rip, and shown in the
debugger, has not yet been executed. This is an important distinction which can
confusing when reviewing code in a debugger.
2.3.1.5 Flag Register (rFlags)
The flag register, rFlags, is used for status and CPU control information. The rFlag
register is updated by the CPU after each instruction and not directly accessible by
programs. This register stores status information about the instruction that was just
executed. Of the 64-bits in the rFlag register, many are reserved for future use.
The following table shows some of the status bits in the flag register.
Page 12
Name
Symbol
Bit
Carry
CF
Use
Used to indicate if the previous operation
resulted in a carry.
PF
Adjust
AF
Zero
ZF
Sign
SF
Direction
DF
10
Overflow
OF
11
There are a number of additional bits not specified in this text. More information can be
obtained from the additional references noted in Chapter 1, Introduction.
2.3.1.6 XMM Registers
There are a set of dedicated registers used to support 64-bit and 32-bit floating point
operations and Single Instruction Multiple Data (SIMD) instructions. The SIMD
instructions allow a single instruction to be applied simultaneously to multiple data
items. Used effectively, this can result in a significant performance increase. Typical
applications include some graphics processing and digital signal processing.
The XMM registers as follows:
128-bit Registers
xmm0
xmm1
xmm2
xmm3
xmm4
xmm5
Page 13
Note, some of the more recent X86-64 processors support 256-bit XMM registers. This
will not be an issue for the programs in this text.
Additionally, the XMM registers are used to support the Streaming SIMD Extensions
(SSE). The SSE instructions are out of the scope of this text. More information can be
obtained from the Intel references (as noted in Chapter 1, Introduction).
Page 14
CPU Chip
Core 0
Core 1
L1 Cache
L1 Cache
L2 Cache
BUS
Illustration 2: CPU Block Diagram
Current chip designs typically include an L1 cache per core and a shared L2 cache.
Many of the newer CPU chips will have an additional L3 cache.
As can be noted form the diagram, all memory accesses travel through each level of
cache. As such, there is a potential for multiple, duplicate copies of the value (CPU
register, L1 cache, L2 cache, and main memory). This complication is managed by the
CPU and is not something the programmer can change. Understanding the cache and
associated performance gain is useful in understanding how a computer works.
Page 15
MSB
LSB
var1
value
Address
(in hex)
0100100C
00
0100100B
4C
0100100A
4B
01001009
40
01001008
01001007
Based on the little-endian architecture, the LSB is stored in the lowest memory address
and the MSB is stored in the highest memory location.
Page 16
stack
.
.
.
heap
BSS uninitialized data
data
text (code)
low memory
reserved
Page 17
CPU
Registers
Cache
Primary Storage
Main Memory (RAM)
Secondary Storage
(disk drives, SSD's, etc.)
Tertiary Storage
(remote storage, optical, backups, etc.)
Larger, slower, and
less expensive
Illustration 5: Memory Hierarchy
Where the top of the triangle represents the fastest, smallest, and most expensive
memory memory. As we move down levels, the memory become slower, larger, and
less expensive. The goal is to use an effective balance between the small, fast,
expensive memory and the large, slower, and cheaper memory.
Page 18
Example Size
Typical Speed
Registers
~1 nanoseconds13
Cache Memory
4 - 8+ Megabytes14
(L1 and L2)
~5-60 nanoseconds
Primary Storage
(i.e., main memory)
2 32+ Gigabytes15
~100-150 nanoseconds
Secondary Storage
500 Gigabytes
(i.e., disk, SSD's, etc.) 4+ Terabytes16
~3-15 milliseconds17
Based on this table, a primary storage access at 100 nanoseconds (100 10-9) is 30,000
times faster than a secondary storage access, at 3 milliseconds (3 10-3).
The typical speeds improve over time (and these are already out of date). The key point
is the relative difference between each memory unit is significant. This difference
between the memory units applies even as newer, faster SSDs are being utilized.
2.7 Exercises
Below are some questions based on this chapter.
http://en.wikipedia.org/wiki/Nanosecond
http://en.wikipedia.org/wiki/Megabyte
http://en.wikipedia.org/wiki/Gigabyte
http://en.wikipedia.org/wiki/Terabyte
http://en.wikipedia.org/wiki/Millisecond
Page 19
al
2.
rcx
3.
bx
4.
edx
5.
r11
6.
r8b
7.
sil
8.
r14w
al
2.
ax
3.
eax
4.
rax
Page 20
Chapter
3
Page 21
Size
8
Unsigned Range
Signed Range
Bytes (8 bits)
0 to 255
-128 to +127
216
0 to 65,535
32,768 to +32,767
232
0 to 4,294,967,295
2,147,483,648 to
+2,147,483,647
Quadword
264
0 to 264-1
-263-1 to 263-1
Double quadword
2128
0 to 2128-1
-2127-1 to 2127-1
In order to determine if a value can be represented, you will need to know the size of
storage element (byte, word, double-word, quad word, etc.) being used and if the values
are signed or unsigned.
For representing unsigned values within the range of a given storage size,
standard binary is used.
For representing signed values within the range, two's compliment is used.
Specifically, the two's compliment encoding process applies to the values in the
negative range. For values within the positive range, standard binary is used.
For example, the unsigned byte range can be represented using a number line as follows:
0
255
For example, the signed byte range can also be represented using a number line as
follows:
-128
+127
The same concept applies to halfwords and words which have larger ranges.
Page 22
When the unsigned and signed values are outside the overlapping range:
This overlap can cause confusion unless the data types are clearly and correctly defined.
00001001
12 (8+4) =
00001100
Step 1
11110110
Step 1:
11110011
Step 2
11110111
-9 (in hex) =
F7
11110100
-12 (in hex) =
F4
Page 23
40 (32+8) = 0000000000101000
Step 1
1111111111101100
Step 1
1111111111010111
Step 2
1111111111101110
Step 2
1111111111011001
0xFFEE
-18 (hex) =
0xFFD8
-40 (hex) =
Note, all bits for the given size, words in these examples, must be specified.
241
11110001
00000111
248
11111000
248 =
F8
-15
11110001
00000111
-8
11111000
-8 =
F8
The final result of 0xF8 may be interpreted as 248 for unsigned representation and -8 for
a signed representation. Additionally, 0xF816 is the (degree symbol) in the ASCII table.
As such, it is very important to have a clear definition of the sizes (byte, halfword, word,
etc.) and types (signed, unsigned) of data for the operations being performed.
biased exponent
fraction
Where s is the sign (0 => positive and 1 => negative). More formally, this can be
written as;
N = (1) S 1. F 2E127
When representing floating point values, the first step is to convert floating point value
into binary. The following table provides a brief reminder of how binary handles
fractional components:
...
23
22
21
20
2-1
2-2
2-3
...
For example, 100.1012 would be 4.62510. For repeating decimals, calculating the binary
value can be time consuming. However, there is a limit since computers have finite
storage sizes (32-bits in this example).
The next step is to show the value in normalized scientific notation in binary. This
means that the number should have a single, non-zero leading digit to the left of the
decimal point. For example, 8.12510 is 1000.0012 (or 1000.0012 x 20) and in binary
normalized scientific notation that would be written as 1.000001 x 23 (since the decimal
point was moved three places to the left). Of course, if the number was 0.12510 the
binary would be 0.0012 (or 0.0012 x 20) and the normalized scientific notation would be
1.0 x 2-3 (since the decimal point was moved three places to the right). The numbers
after the leading 1, not including the leading 1, are stored left-justified in the fraction
portion of the double-word.
The next step is to calculate the biased exponent, which is the exponent from the
normalized scientific notation with plus the bias. The bias for the IEEE 754 32-bit
floating-point standard is 12710. The result should be converted to a byte (8 bits) and
stored in the biased exponent portion of the word.
Page 25
For example, to find the IEEE 754 32-bit floating-point representation for -7.7510:
Example 1:
-7.75
determine sign
-7.75 => 1 (since negative)
convert to binary
-7.75 =
-0111.112
normalized scientific notation
=
1.1111 x 22
compute biased exponent
210 + 12710 = 12910
and convert to binary
= 100000012
write components in binary:
sign exponent mantissa
1 10000001 11110000000000000000000
convert to hex (split into groups of 4)
11000000111110000000000000000000
1100 0000 1111 1000 0000 0000 0000 0000
C
0
F
8
0
0
0
0
final result:
C0F8 000016
For example, to find the IEEE 754 32-bit floating-point representation for -0.12510:
Example 2:
-0.125
determine sign
convert to binary
normalized scientific notation
compute biased exponent
and convert to binary
Page 26
For example, given the IEEE 754 32-bit floating-point representation 4144000016 find
the decimal value:
Example 3:
4144000016
convert to binary
0100 0001 0100 0100 0000 0000 0000 00002
split into components
0 10000010 100010000000000000000002
determine exponent
100000102 = 13010
and remove bias
13010 - 12710 = 310
determine sign
0 => positive
write result
+1.10001 x 23 = +1100.01 = +12.25
52 51
biased exponent
fraction
The representation process is the same, however the format allows for an 11-bit biased
exponent (which support large and smaller values). The 11-bit biased exponent uses a
bias of 1023.
Characters are represented using the American Standard Code for Information
Interchange (ASCII20). Based on the ASCII table, each character and control character
is assigned a numeric value. When using ASCII, the character displayed is based on the
assigned numeric value. This only works if everyone agrees on common values, which
is the purpose of the ASCII table. For example, the letter A is defined as 65 10 (0x41).
The 0x41 is stored in computer memory, and when displayed to the console, the letter
A is shown. Refer to Appendix A for the complete ASCII table.
Additionally, numeric symbols can be represented in ASCII. For example, 9 is
represented as 5710 (0x39) in computer memory. The 9 can be displayed as output to
the console. If sent to the console, the integer value 9 10 (0x09) would be interpreted as
an ASCII value which in the case would be a tab.
18 For more information, refer to: http://en.wikipedia.org/wiki/Hello,_World!_program
19 For more information, refer to: http://en.wikipedia.org/wiki/Character_(computing)
20 For more information, refer to: http://en.wikipedia.org/wiki/ASCII
Page 28
It should be noted that Unicode21 uses 2 bytes for each character. The additional space
supports a much wider range of characters which allows for many non-English
languages. Details regarding Unicode representation are not addressed in this text.
NULL
72
101
108
108
111
0x48
0x65
0x6C
0x6C
0x6F
0x0
A string may consist partially or completely of numeric symbols. For example, the
string 1234 would be represented as follows:
Character
NULL
49
57
54
53
51
0x31
0x39
0x36
0x35
0x33
0x0
Again, it is very important to understand the difference between the string 19653
(using 6 bytes) and the single integer 19,65310 (which can be stored in a single word or 2
bytes).
21 For more information, refer to: http://en.wikipedia.org/wiki/Unicode
22 For more information, refer to: http://en.wikipedia.org/wiki/String_(computer_science)
Page 29
3.5 Exercises
Below are some questions based on this chapter.
-310
2.
+1110
3.
-910
4.
-2110
4) Provide the hex, word size, two's compliment values of the following decimal
values. Note, four hex digits expected.
1.
-1710
2.
+1710
Page 30
-3110
4.
-13810
5) Provide the hex, double-word size, two's compliment values of the following
decimal values. Note, eight hex digits expected.
1.
-1110
2.
-2710
3.
+710
4.
-26110
6) Provide the decimal values of the following hex, double-word sized, two's
compliment values.
1.
FFFFFFFB16
2.
FFFFFFEA16
3.
FFFFFFF316
4.
FFFFFFF816
0.1
2.
0.2
3.
0.3
4.
0.4
5. 0. 5
8) Provide the decimal representation of the following IEEE 32-bit floating point
values.
1.
0xC1440000
2.
0x41440000
3.
0xC0D00000
4.
0x40F00000
Page 31
2. a
3.
4.
5.
tab
11) What are the ASCII values, in hex, for each of the following strings:
1. World
2. 123
3. Yes!?
Page 32
Chapter
4
The following sections summarize the basic formatting requirements. Only the basic
formatting and assembler syntax is presented. For additional information, refer to the
yasm reference manual (as noted in Chapter 1, Introduction).
4.1 Comments
The semicolon (;) is used to note program comments. Comments (using the ;) may be
placed anywhere, including after an instruction. Any characters after the ; are ignore by
the assembler. This can be used to explain steps taken in the code or to comment out
sections of code.
Page 33
equ
10000
<dataType><initialValue>
Refer to the following sections for a series of examples using various data types.
The supported data types are as follows:
Declaration
db
Page 34
8-bit variable(s)
16-bit variable(s)
dd
32-bit variable(s)
dq
64-bit variable(s)
ddq
dt
These are the primary assembler directives for initialized data declarations. Other
directives are referenced in different sections.
Initialized arrays are defined with comma separated values.
Some simple examples include:
bVar
cVar
str
wVar
dVar
arr
flt1
qVar
db
db
db
dw
dd
dd
dd
dq
10
"H"
"HelloWorld"
5000
50000
100,200,300
3.14159
1000000000
;bytevariable
;singlecharacter
;string
;wordvariable
;32bitvariable
;3elementarray
;32bitfloat
;64bitvariable
The value specified must be able to fit in the specified data type. For example, if the
value of a byte sized variables is defined as 500, it would generate an assembler error.
Refer to the following sections for a series of examples using various data types.
Page 35
8-bit variable(s)
resw
16-bit variable(s)
resd
32-bit variable(s)
resq
64-bit variable(s)
resdq
128-bit variable(s)
These are the primary assembler directives for uninitialized data declarations. Other
directives are referenced in different sections.
Some simple examples include:
bArr
wArr
dArr
qArr
resb
resw
resd
resq
10
50
100
200
;10elementbytearray
;50elementwordarray
;100elementdoublearray
;200elementquadarray
No special label or directives are required to terminate the program. However, a system
service should be used to inform the operating system that the program should be
terminated.
Refer to the example program in the following section.
Page 36
equ 0
equ 60
;successfuloperation
;callcodeforterminate
;
;Byte(8bit)variabledeclarations
bVar1
bVar2
bResult
db
db
db
17
9
0
;
;Word(16bit)variabledeclarations
wVar1
wVar2
wResult
dw
dw
dw
17000
9000
0
;
;Doubleword(32bit)variabledeclarations
dVar1
dVar2
dResult
dd
dd
dd
17000000
9000000
0
Page 37
dq
dq
dq
170000000
90000000
0
;*************************************************************
;CodeSection
section .text
global_start
_start:
;Performsaseriesofverybasicadditionoperations
;todemonstratebasicprogramformat.
;
;Byteexample
;
bResult=bVar1+bVar2
mov
add
mov
al,byte[bVar1]
al,byte[bVar2]
byte[bResult],al
;
;Wordexample
;
wResult=wVar1+wVar2
mov
add
mov
ax,word[wVar1]
ax,word[wVar2]
word[wResult],ax
;
;Doublewordexample
;
dResult=dVar1+dVar2
mov
add
mov
Page 38
eax,dword[dVar1]
eax,dword[dVar2]
dword[dResult],eax
rax,qword[qVar1]
rax,qword[qVar2]
qword[qResult],rax
;************************************************************
;
Done,terminateprogram.
last:
mov
rax,SYS_exit
mov
rdi,EXIT_SUCCESS
syscall
;Callcodeforexit
;Exitprogramwithsuccess
This example program will be referenced and further explained in the following
chapters.
4.8 Exercises
Below are some questions based on this chapter.
Page 39
Page 40
Chapter
5
Assembler
Linker
Loader
Debugger
While there are many options for the tool chain, this text uses a fairly standard set of
open-source tools that work well together and fully support the x86 64-bit environment.
Each of these programming tools is explained in the following sections.
Page 41
Assembly
Language
Source
File
List
File
ASSEMBLE
Object
File
Other
Object
Files
(if any)
Library
Routines
(if any)
LINK
Shared
Object
Files
(if any)
Executable
File
LOAD
RAM
Illustration 6: Overview: Assemble, Link, Load
The assemble, link, and load steps are described in more detail in the following sections.
Page 42
5.2 Assembler
The assembler24 is a program that will read an assembly language input file and convert
the code into a machine language binary file. The input file is an assembly language
source file containing assembly language instructions in human readable form. The
machine language output is referred to as an object file. As part of this process, the
comments are removed, and the variable names and labels are converted into
appropriate addresses (as required by the CPU during execution).
The assembler used in this text is the yasm assembler. Links to the yasm web site and
documentation can be found in Chapter 1, Introduction
Note, the -l is a dash lower-case letter L (which is easily confused with the number 1).
The -g dwarf2 option is used to inform the assembler to include debugging information
in the final object file. This increases the size of the object file, but is necessary to allow
effective debugging. The -f elf64 informs the assembler to create the object file in the
ELF6425 format which is appropriate for 64-bit, Linux based systems. The
example.asm is the name of the assembly language source file for input. The -l
example.lst (dash lower-case letter L) informs the assembler to create a list file named
example.lst.
If an error occurs during the assembly process, it must be resolved before continuing to
the link step.
Page 43
On the first line, the 36 is the line number. The next number, 0x00000009, is the
relative address in the data area of where that variable will be stored. Since dVar1 is a
double-word, which requires four bytes, the address for the next variable is
0x0000000D. The dVar1 variable uses 4 bytes as addresses 0x00000009, 0x0000000A,
0x0000000B, and 0x0000000C. The rest of the line is the data declaration as typed in
the original assembly language source file.
The 0x40660301 is the value, in hex, as placed in memory. The 17,000,000 10 is
0x01036640. Recalling that the architecture is little-endian, the least significant byte
(0x40) is placed in the lowest memory address. As such, the 0x40 is placed in relative
address 0x00000009, the next byte, 0x66, is placed in address 0x00000009 and so forth.
This can be confusing as at first glance the number may appear backwards or garbled
(depending on how it is viewed).
To help visualize, the memory picture would be as follows:
variable
name
dVar2
dVar1
value
address
00
0x00000010
89
0x0000000F
54
0x0000000E
40
0x0000000D
01
0x0000000C
03
0x0000000B
66
0x0000000A
40
0x00000009
Again, the numbers to the left are the line numbers. The next number, 0x0000005A, is
the relative address of where the line of code will be placed.
The next number, 0x48C7C03C000000, is the machine language version of the
instruction, in hex, that the CPU reads and understands. The rest of the line is the
original assembly language source instruction.
The label, last:, does not have a machine language instruction since the label is used to
reference a specific address and is not an executable instruction.
rax,0
skipRest
Page 45
The steps taken on the first pass vary based on the design of the specific assembler.
However, some of the basic operations performed on the first pass include the
following:
Expand macros
rax,BUFF+5
The steps taken on the second pass vary based on the design of the specific assembler.
However, some of the basic operations performed on the second pass include the
following:
Page 46
The term code generation refers to the conversion of the programmer provided assembly
language instruction into the CPU executable machine language instruction. Due to the
one-to-one correspondence, this can be done for instructions that do not use symbols on
either the first or second pass.
It should be noted that, based on the assembler design, much of the code generation
might be done on the first pass or all done on the second pass. Either way, the final
generation is performed on the second pass. This will require using the symbol table to
check program symbols and obtain the appropriate addresses from the table.
The list file, while optional, can be useful for debugging. If requested, it would be
generated on the second pass.
If there are no errors, the final object file is created on the second pass.
5.3 Linker
The linker27, sometimes referred to as linkage editor, will combine one or more object
files into a single executable file. Additionally, any routines from user or system
libraries are included as necessary. The GNU gold linker, ld28, is used. The appropriate
linker command for the example program from the previous chapter is as follows:
ldgoexampleexample.o
Note, the -o is a dash lower-case letter O, which is can be confused with the number 0.
The -g option is used to inform the linker to include debugging information in the final
executable file. This increases the size of the executable file, but is necessary to allow
effective debugging. The -o example specifies to create the executable file named
example (with no extension). If the -o <fileName> option is omitted, the output file is
named a.out (by default). The example.o is the name of the input object file read by the
27 For more information, refer to: http://en.wikipedia.org/wiki/Linker_(computing)
28 For more information, refer to: http://en.wikipedia.org/wiki/Gold_(linker)
Page 47
Page 48
main.o
executable
...
call fnc1-R
call 0x0400
funcs.o
...
...
Fnc1-R:
0x0100:
0x0400:
Here, the function fnc1 is external to the main.o object file and is marked with an R.
The actual function fnc1 is in the funcs.o file, which starts its relative addressing from
0x0 (in the text section) since it does not know about the main code. When the object
files are combined, the original relative address of fnc1 (shown as 0x0100:) is changed
to its final address in executable file (shown as 0x0400:). The linker must insert this
final address into the call statement in the main (shown as call 0x0400:) in order to
complete the linking process and ensure the function call will work correctly.
This will occur with all relocatable addresses for both code and data.
Page 49
Often-used libraries (for example the standard system libraries) can be stored in
only one location, not duplicated in every single binary.
In Linux/Unix, the dynamically linked object files typically a have .so (shared object)
extension. In Windows, they have a .dll (dynamically linked library) extension.
Further details of dynamic linking are outside the scope of tis text.
Page 50
The above script should be placed in a file. For this example, the file will be named
asm64 and placed in the current working directory (where the source files are located).
Once created, execute privilege will need to be added to the script file as follows:
chmod+xasm64
This will only need to be done once for each script file.
The script file will read the source file name from the command line. For example, to
use the script file to assembly example from the previous chapter (named example.asm),
type the following:
./asm64example
The ".asm" extension on the example.asm file should not be included (since it is added
in the script). The script file will assemble and link the source file, creating the list file,
object file, and executable file.
30 For more information, refer to: http://en.wikipedia.org/wiki/Bash_(Unix_shell)
Page 51
5.5 Loader
The loader31 is a part of the operating-system that will load the program from secondary
storage into primary storage (i.e., main memory). In broad terms, the loader will
attempt to find, and if found, read a properly formatted executable file, create a new
process, and load the code into memory and mark the program as ready for execution.
The operating-system scheduler will make the decisions about which process is
executed when.
The loader is implicitly invoked by typing the program name. For example, on the
previous example program, named example, the Linux command would be:
./example
which will execute the file named example created via the previous steps (assemble and
link). Since the example program does not perform any output, nothing will be
displayed to the console. As such, a debugger can be used to check the results.
5.6 Debugger
The debugger32 is used to control execution of a program. This allows for testing and
debugging activities to be performed.
In the previous example, the program computed a series of calculations, but did not
output any of the results. The debugger can be used to check the results. The
executable file is created with the assemble and link command previously described and
must include the -g option.
The debugger used is the GNU DDD debugger which provides a visual front-end for the
GNU command line debugger, gdb. The DDD web site and documentation are noted in
the references section of Chapter 1, Introduction.
The ddd debugger is started with the executable file. For example, using the previous
sample program, example, the command would be:
dddexample
Page 52
5.7 Exercises
Below are some questions based on this chapter.
Page 53
Page 54
Chapter
6
Upon starting DDD/GDB, something similar to the screen, shown below, should be
displayed (with the appropriate source code displayed).
Page 55
Page 56
In the GDB Command Console, at the (gdb) prompt, type: break last
Page 57
Page 59
As needed, additional breakpoints can be set. However, click the Run command will restart execution from the beginning and stop at the initial breakpoint.
The next command will execute to the next instruction. This includes executing an
entire function if necessary. The step command will execute one step, stepping into
functions if necessary. For a single, non-function instruction, there is no difference
between the next and step commands.
Page 61
Description
quit | q
break | b <label/addr>
continue | c
step | s
next | n
F3
where
x/<n><f><u> $rsp
Page 62
x/<n><f><u> &<variable>
Description
d decimal (signed)
x hex
u decimal (unsigned)
c character
s string
f floating point
<u> unit size: b byte (8-bits)
h halfword (16-bits)
w word (32-bits)
g giant (64-bits)
source <filename>
set logging on
More information can be obtained via the built-in help facility or from the
documentation the ddd website (referenced from Chapter 1).
Page 63
db
dw
dd
dq
db
dd
5
2000
100000
1234567890
"Assembly",0
6.28
dd
100001,100002,100003,100004,100005
Where the 5 is the array length. The d indicates signed data (u would have been
unsigned data). The w indicates 32-bit sized data (which is what the dd, define double,
definition declares in the source file). The &list1 is the name of the variable. Note, the
name points to the first element (that all). As such, it is very possible to display less or
more elements that are actually declared in the array.
Page 64
The examine memory command to display the top 6 items on the stack would be as
follows:
x/6ug$rsp
Due to the stack implementation, the first item shown will always be current top of the
stack.
Page 65
Note 1; this example assumes a label 'last' is defined in the source program (as is
done on the example program).
Note 2; this example exits the debugger. If that is not desired, the 'exit' command
can be removed. When exiting from the input file, the debugger may request user
conformation of the exit (yes or no).
These commands should be placed in a file (such as gdbIn.txt) so they can be read from
within the debugger.
6.2.6.1 Debugger Commands File (non-interactive)
The debugger command to read a file is ''source <filename>''. For example, if the
command file is named gdbIn.txt,
(gbd)sourcegdbIn.txt
Based on the above commands, the output will be placed in the file out.txt. The output
file name can be changed as desired.
Page 66
It is possible to obtain the output file directly without an interactive DDD session. The
following command, entered at the command line, will execute the command in the
input file on the given program, create the output file, and exit the program.
gbd<gdbIn.txtprog
Which will create the output file (as specified in the gdbIn.txt file) and exit the
debugger. This is the fastest option for obtaining the final output file for a working
program. Again, this would only be useful if the program is working or very close to
working correctly.
6.3 Exercises
Below are some quiz questions based on this chapter.
Page 67
2.
3.
4.
5.
6.
7.
12) Provide the debugger command to display each of the following variables in
hexadecimal format.
1.
2.
3.
4.
5.
6.
7.
13) What is the debugger command to display the value at the current top of the
stack?
14) What is the debugger command to display five (5) values at the current top of the
stack?
Page 69
Page 70
Chapter
7
Data Movement
Conversion Instructions
Arithmetic Instructions
Logical Instructions
Control Instructions
The instructions for function calls are discussed in the chapter in Chapter 12, Functions.
A complete listing of the instructions covered in this text is located in Appendix B for
reference.
Page 71
Description
Register operand. The operand must be a register.
<reg8>,<reg16>,
<reg32>,<reg64>
<dest>
<RXdest>
<src>
<imm>
<mem>
<op>or<operand>
<op8>,<op16>,
<op32>,<op64>
<label>
Page 72
<dest>,<src>
The source operand is copied from the source operand into the destination operand. The
value of the source operand is unchanged. The destination and source operand must be
of the same size (both bytes, both words, etc.). The destination operand can not be an
immediate. Both operands can not be memory. If a memory to memory operation is
required, two instructions must be used.
mov
what
to do
eax,dword[myVariable]
where
to place
how much
to get
memory
location
eax,100
rcx,1
ecx,eax
;eax=0x00000064
;rcx=0xffffffffffffffff
;ecx=0x00000064
Initially, the rcx register to set to -1 (which is all 0xF's). When the positive number
Page 73
Explanation
mov<dest>,<src>
Examples:
movax,42
movcl,byte[bvar]
movdword[dVar],eax
movqword[qVar],rdx
dd
db
dd
dw
dw
db
dw
dd
dd
0
42
5000
73000
73000000
0
0
0
0
Page 74
dword[dValue],27
;dValue=27
mov
mov
al,byte[bNum]
byte[bAns],al
;bAns=bNum
mov
mov
ax,word[wNum]
word[wAns],ax
;wAns=wNum
mov
mov
eax,dword[dNum]
dword[dAns],eax
;dAns=dNum
mov
mov
rax,qword[qNum]
qword[qAns],rax
;qAns=qNum
For some instructions, including those above, the explicit type specification (e.g., byte,
word, dword, qword) can be omitted as the other operand will clearly define the size. In
the text it will be included for consistency and good programming practice.
rax,qword[var1]
rax,var1
;valueofvar1inrax
;addressofvar1inrax
Since omitting the brackets is not an error, the assembler will not generate error
messages or warnings. This can lead to confusion.
In addition, the address of a variable can be obtained with the load effective address, or
lea, instruction. The load effective address instruction is summarized as follows:
Instruction
Explanation
lea<reg64>,<mem>
Examples:
learcx,byte[bvar]
learsi,dword[dVar]
Page 75
rax,50
byte[bVal],al
This example is reasonable since value of 50 will fit in a byte value. However, if the
value of 500 (0x1f4) is placed in the rax register, the al register can still be accessed.
mov
mov
rax,500
byte[bVal],al
In this example, the bVal variable will contain 0xf4 which may lead to incorrect results.
The programmer is responsible for ensuring that narrowing conversions are performed
appropriately. Unlike a compiler, no warnings or error messages will be generated.
Page 76
For unsigned widening conversions, the upper part of the memory location or register
must be set to zero. Since an unsigned value can only be positive, the upper-order bits
can only be zero. For example, to convert the byte value of 50 in the al register, to a
double-word value in rbx, the following operations can be performed.
mov
mov
mov
al,50
rbx,0
bl,al
Since the rbx register was set to 0 and then the lower 8-bits were to the value from al
(50 in this example), the entire 64-bit rbx register is now 50.
This general process can be performed on memory or other registers. It is the
programmers responsibility to ensure that the values are appropriate for the data sizes
being used.
An unsigned conversion from a smaller size to a larger size can also be performed with a
special move instruction, as follows:
movzx
<dest>,<src>
Which will fill the upper-order bits with zero. The movzx instruction does not allow a
quadword destination operand with a double-word source operand. As previously noted,
a mov instruction with a double-word register destination operand with a double-word
source operand will zero the upper-order double-word of the quadword destination
register.
A summary of the instructions that perform the unsigned widening conversion are as
follows:
Instruction
Explanation
movzx<dest>,<src>
movzx<reg16>,<op8>
movzx<reg32>,<op8>
movzx<reg32>,<op16>
movzx<reg64>,<op8>
movzx<reg64>,<op16>
Examples:
movzxcx,byte[bVar]
movzxdx,al
Page 77
For signed widening conversions, the upper-order bits must be set to either 0's or 1's
depending on if the original value was positive or negative.
This is performed by a sign-extend operation. Specifically, the upper-order bit of the
original value indicates if the value is positive (with a 0) or negative (with a 1). The
upper-order bit of the original value is extended into the higher bits of the new, widened
value.
For example, given that the ax register is set to -7 (0xfff9), the bits would be set as
follows:
15 14 13 12 11 10
Since the value is negative, the upper-order bit (bit 15) is a 1. To convert the word value
in the ax register into a double-word value in the eax register, the upper order bit (1 in
this example) is extended or copied into the entire upper-order word (bits 31-16)
resulting in the following:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
There are a series of dedicated instructions used to convert signed values in the A
register from a smaller size into a larger size. These instructions work only on the A
register, sometimes using the D register for the result. For example, the cwd instruction
will convert a signed value in the ax register into a double-word value in the dx (upper
order portion) and ax (lower order portion) registers. This is typically by convention
written as dx:ax. The cwde instruction will convert a signed value in the ax register
into a double-word value in the eax register.
Page 78
<dest>,<src>
<dest>,<src>
Which will perform the sign extension operation on the source argument. The movsx
instruction is the general for and the movsxd instruction is used to allow a quadword
destination operand with a double-word source operand.
A summary of the instructions that perform the signed widening conversion are as
follows:
Instruction
Explanation
cbw
cwd
cbw
cwde
cwd
cdq
cbw
cdqe
cwd
cdqe
Page 79
Explanation
cqo
movsx<dest>,<src>
movsx<reg16>,<op8>
movsx<reg32>,<op8>
movsx<reg32>,<op16>
movsx<reg64>,<op8>
movsx<reg64>,<op16>
movsxd<reg64>,<op32>
Examples:
cqo
7.5.1 Addition
The general form of the integer addition instruction is as follows:
add<dest>,<src>
Page 80
db
db
db
42
73
0
wNum1
wNum2
wAns
dw
dw
dw
4321
1234
0
dNum1
dNum2
dAns
dd
dd
dd
42000
73000
0
qNum1
qNum2
qAns
dq
dq
dq
42000000
73000000
0
Page 81
For some instructions, including those above, the explicit type specification (e.g., byte,
word, dword, qword) can be omitted as the other operand will clearly define the size. It
is included for consistency and good programming practice.
In addition to the basic add instruction, there is an increment instruction that will add
one to the specified operand. The general form of the increment instruction is as
follows:
inc<operand>
The result is exactly the same as using the add instruction (and adding one). When
using a memory operand, the explicit type specification (e.g., byte, word, dword, qword)
is required to clearly define the size.
For example, assuming the following data declarations:
bNum
wNum
dNum
qNum
db
dw
dd
dq
42
4321
42000
42000000
Page 82
rax
byte[bNum]
word[wNum]
dword[dNum]
qword[qNum]
;rax=rax+1
;bNum=bNum+1
;wNum=wNum+1
;dNum=dNum+1
;qNum=qNum+1
The addition instruction operates the same on signed and unsigned data. It is the
programmers responsibility to ensure that the data types and sizes are appropriate for the
operations being performed.
The integer addition instructions are summarized as follows:
Instruction
Explanation
add<dest>,<src>
Examples:
inc<operand>
addcx,word[wVvar]
addrax,42
adddword[dVar],eax
addqword[qVar],300
Increment <operand> by 1.
Note, <operand> can not be an immediate.
Examples:
incword[wVvar]
incrax
incdword[dVar]
incqword[qVar]
The add with carry is a special add instruction that will include a carry from a previous
addition operation. This is useful when adding very large numbers, specifically
numbers larger than the register size of the machine.
Page 83
17
+25
42
As you may recall, the least significant digits (7 and 5) are added first. The result of 12
is noted as a 2 with a 1 carry. The most significant digits (1 and 2) are added along with
the previous carry (1 in this example) resulting in a 4.
As such, two addition operations are required. Since there is no carry possible with the
least significant portion, a regular addition instruction is used. The second addition
operation would need to include a possible carry from the previous operation and must
be an add with carry instruction. Additionally, the add with carry must immediately
follow the initial addition operation to ensure that the rFlag register is not altered by am
unrelated instruction (thus possibly altering the carry bit).
For assembly language programs the Least Significant Quadword (LSQ) is added with
the add instruction and then immediately the Most Significant Quadword (MSQ) is
added with the adc which will add the quadwords and include a carry from the previous
addition operation.
The general form of the integer add with carry instruction is as follows:
adc<dest>,<src>
Specifically, the source and destination operands along with the carry bit are added and
the result is placed in the destination operand (over-writing the previous value). The
carry bit is part of the rFlag register. The value of the source operand is unchanged.
The destination and source operand must be of the same size (both bytes, both words,
etc.). The destination operand can not be an immediate. Both operands, can not be
memory. If a memory to memory addition operation is required, two instructions must
be used.
Page 84
ddq
ddq
ddq
0x1A000000000000000
0x2C000000000000000
0
Each of the variables dquad1, dquad2, and dqSum are 128-bits and thus will exceed the
machine 64-bit register size. However two 64-bit registers can be used for each of the
128-bit values. For example,
mov
mov
rax,dword[dquad1]
rdx,dword[dquad1+8]
If the LSQ's are added and then the MSQ's are added including any carry, the 128-bit
result can be correctly obtained. For example,
mov
mov
rax,qword[dquad1]
rdx,qword[dquad1+8]
add
adc
rax,qword[dquad2]
rdx,qword[dquad2+8]
mov
mov
qword[dqsum],rax
qword[dqsum+8],rdx
Initially, the LSQ of dquad1 is placed in rax and the MSQ is placed in rdx. Then the
add instruction will add the 64-bit rax with the LSQ of dquad2 and, in this example,
provide a carry of 1 with the result in rax. Then the rdx is added with the MSQ of
dquad2 along with the carry via the adc instruction and the result placed in rdx.
The integer add with carry instruction is summarized as follows:
Instruction
Explanation
adc<dest>,<src>
Page 85
Explanation
Examples:
adcrcx,qword[dVvar1]
adcrax,42
7.5.2 Subtraction
The general form of the integer subtraction instruction is as follows:
sub<dest>,<src>
Specifically, the source operand is subtracted from the destination operand and the result
is placed in the destination operand (over-writing the previous value). The value of the
source operand is unchanged. The destination and source operand must be of the same
size (both bytes, both words, etc.). The destination operand can not be an immediate.
Both operands, can not be memory. If a memory to memory subtraction operation is
required, two instructions must be used.
For example, assuming the following data declarations:
bNum1
bNum1
bAns
db
db
db
73
42
0
wNum1
wNum2
wAns
dw
dw
dw
1234
4321
0
dNum1
dNum2
dAns
dd
dd
dd
73000
42000
0
qNum1
qNum2
qAns
dq
dq
dd
73000000
73000000
0
Page 86
For some instructions, including those above, the explicit type specification (e.g., byte,
word, dword, qword) can be omitted as the other operand will clearly define the size. It
is included for consistency and good programming practices.
In addition to the basic subtract instruction, there is a decrement instruction that will
subtract one from the specified operand. The general form of the decrement instruction
is as follows:
dec<operand>
Page 87
db
dw
dd
dq
42
4321
42000
42000000
rax
byte[bNum]
word[wNum]
dword[dNum]
qword[qNum]
;rax=rax1
;bNum=bNum1
;wNum=wNum1
;dNum=dNum1
;qNum=qNum1
The subtraction instructions operate the same on signed and unsigned data. It is the
programmers responsibility to ensure that the data types and sizes are appropriate for the
operations being performed.
The integer subtraction instructions are summarized as follows:
Instruction
Explanation
sub<dest>,<src>
Examples:
Page 88
subcx,word[wVvar]
subrax,42
Explanation
subdword[dVar],eax
subqword[qVar],300
dec<operand>
Decrement <operand> by 1.
Note, <operand> can not be an immediate.
Examples:
decword[wVvar]
decrax
decdword[dVar]
decqword[qVar]
<src>
Bytes
Words
al
x
ah
op8
al
Double words
ax
x
dx
eax
op16
ax
op32
edx
eax
Quad words
rax
x
rdx
op64
rax
Page 90
db
db
dw
dw
42
73
0
0
wNumA
wNumB
dAns2
dw
dw
dd
4321
1234
0
dNumA
dNumB
qAns3
dd
dd
dq
42000
73000
0
qNumA
qNumB
dqAns4
dq
dq
ddq
420000
730000
0
;bNumAsquared
bAns1=bNumA*bNumB
wAns1=bNumA*bNumB
wAns2=wNumA*wNumB
dAns2=wNumA*wNumB
dAns3=dNumA*dNumB
qAns3=dNumA*dNumB
Page 91
al,byte[bNumA]
al
word[wAns],ax
;resultinax
mov
mul
mov
al,byte[bNumA]
byte[bNumB]
word[wAns1],ax
;resultinax
mov
mul
mov
mov
ax,word[wNumA]
word[wNumB]
word[dAns2],ax
word[dAns2+2],dx
mov
mul
mov
mov
eax,dword[dNumA]
dword[dNumB]
dword[qAns3],eax
dword[qAns3+4],edx
mov
mul
mov
mov
rax,qword[qNumA]
dword[qNumB]
qword[dqAns4],rax
qword[dqAns4+8],rdx
;resultindx:ax
;resultinedx:eax
;resultinrdx:rax
For some instructions, including those above, the explicit type specification (e.g., byte,
word, dword, qword) is required to clearly define the size.
Page 92
Explanation
mul<src>
mul<op8>
mul<op16>
mul<op32>
mul<op64>
Examples:
mulword[wVvar]
mulal
muldword[dVar]
mulqword[qVar]
The signed multiplication allows a wider range of operands and operand sizes. The
general forms of the signed multiplication are as follows:
imul
imul
imul
<source>
<dest>,<src/imm>
<dest>,<src>,<imm>
In all cases, the destination operand must be a register. For the multiple operand
multiply instruction, byte operands are not supported.
When using a single operand multiply instruction, the imul is the same layout as the
mul (as previously presented). However, the operands are interpreted only as signed.
When two operands are used, the destination operand and the source operand are
multiplied and the result placed in the destination operand (over-writing the previous
value).
Specifically, the action performed is:
<dest>=<dest>*<src/imm>
Page 93
For three operands, the <src> operand must be a register or memory location, but not an
immediate. The <imm> operand must be an immediate value. The size of the
immediate value is limited to the size of the source operand, up to a double-word size
(32-bit), even for quadword multiplications. The final result is truncated to the size of
the destination operand. A byte sized destination operand is not supported.
For example, assuming the following data declarations:
wNumA
wNumB
wAns1
wAns2
dw
dw
dw
dw
1200
2000
0
0
dNumA
dNumB
dAns1
dAns2
dd
dd
dd
dd
42000
13000
0
0
qNumA
qNumB
qAns1
qAns2
dq
dq
dq
dq
120000
230000
0
0
Page 94
;resultinax
;resultinax
;resultineax
;resultineax
;resultinrax
;resultinrax
In these examples, the multiplication result is truncated to the size of the destination
operand. For a full sized result, the single operand instruction should be used (as fully
described in the section regarding unsigned multiplication).
Page 95
Explanation
imul<src>
imul<dest>,<src/imm32>
imul<dest>,<src>,<imm32>
imul<op8>
imul<op16>
imul<op32>
imul<op64>
imul<reg16>,<op16/imm>
imul<reg32>,<op32/imm>
imul<reg64>,<op64/imm>
imul<reg16>,<op16>,<imm>
imul<reg32>,<op32>,<imm>
imul<reg64>,<op64>,<imm>
Examples:
Page 96
dividend
= quotient
divisor
Byte Divide:
Word Divide:
Double-word divide:
Quadword Divide:
ax for 16-bits
dx:ax for 32-bits
edx:eax for 64-bits
rdx:rax for 128-bits
Setting the dividend (top operand) correctly is a key source of problems. For the word,
double-word, and quadword division operations, the dividend requires both the D
register (for the upper order portion) and A (for the lower order portion).
Setting these correctly depends on the data type. If a previous multiplication was
performed, the D and A registers may already be set correctly. Otherwise, a data item
may need to be converted from its current size to a larger size with the upper order
portion being placed in the D register. For unsigned data, the upper portion will always
be zero. For signed data, the existing date must be sign extended as noted in a previous
section, Signed Conversions.
The divisor can be a memory location or register, but not an immediate. Additionally,
the result will be placed in the A register (al/ax/eax/rax) and the remainder in either the
ah, dx, edx, or rdx register. Refer to the Integer Division Overview table to see the
layout more clearly.
The use of a larger size operand for the dividend matches the single operand
multiplication. For simple divisions, an appropriate conversion may be required in order
to ensure the dividend is set correctly. For unsigned divisions, the upper-order part of
the dividend can set to zero. For signed divisions, the upper-order part of the dividend
can be set with an applicable conversion instruction.
As always, division by zero will crash the program and damage the space-time
continuum. So, try not to divide by zero.
Page 97
Bytes
ah
Words
al
dx
=
ax
al
op8
ax
rem
dx
op16
rem
ah
Double Words
edx
eax
=
eax
rem
edx
op32
Quad Words
rdx
rax
op64
rax
rem
rdx
db
db
db
db
db
db
db
63
17
5
0
0
0
0
wNumA
wNumB
wNumC
wAns1
wAns2
wRem2
wAns3
dw
dw
dw
dw
dw
dw
dw
4321
1234
167
0
0
0
0
dNumA
dNumB
dNumC
dAns1
dAns2
dRem2
dAns3
dd
dd
dd
dd
dd
dd
dd
42000
3157
293
0
0
0
0
qNumA
qNumB
qNumC
qAns1
qAns2
qRem2
qAns3
dq
dq
dq
dq
dq
dq
dq
730000
13456
1279
0
0
0
0
;unsigned
Page 99
;unsigned
;%ismodulus
;unsigned
wAns1=wNumA/5
wAns2=wNumA/wNumB
wRem2=wNumA%wNumB
wAns3=(wNumA*wNumC)/wNumB
;unsigned
;unsigned
;%ismodulus
;unsigned
dAns=dNumA/7
dAns3=dNumA*dNumB
dRem1=dNumA%dNumB
dAns3=(dNumA*dNumC)/dNumB
;signed
;signed
;%ismodulus
;signed
qAns=qNumA/9
qAns4=qNumA*qNumB
qRem1=qNumA%qNumB
qAns3=(qNumA*qNumC)/qNumB
;signed
;signed
;%ismodulus
;signed
;al=ax/3
;al=ax/bNumB
;ah=ax%bNumB
;bAns3=(bNumA*bNumC)/bNumB(unsigned)
mov
al,byte[bNumA]
mul
byte[bNumC]
;resultinal
Page 100
byte[bBumB]
byte[bAns3],al
;al=ax/bNumB
;
;examplewordoperations,unsigned
;wAns1=wNumA/5(unsigned)
mov
ax,word[wNumA]
mov
dx,0
mov
bx,5
div
bx
mov
word[wAns1],ax
;wAns2=wNumA/wNumB(unsigned)
mov
dx,0
mov
ax,word[wNumA]
div
word[wNumB]
mov
word[wAns2],ax
mov
word[wRem2],dx
;ax=dx:ax/5
;ax=dx:ax/wNumB
;wAns3=(wNumA*wNumC)/wNumB(unsigned)
mov
ax,word[wNumA]
mul
word[wNumC]
;resultindx:ax
div
word[wBumB]
;ax=dx:ax/wNumB
mov
word[wAns3],ax
;
;exampledoublewordoperations,signed
;dAns1=dNumA/7(signed)
mov
eax,dword[dNumA]
cdq
mov
ebx,7
idiv
ebx
mov
dword[dAns1],eax
;dAns2=dNumA/dNumB(signed)
mov
eax,dword[dNumA]
cdq
idiv
dword[dNumB]
;eaxedx:eax
;eax=edx:eax/7
;eaxedx:eax
;eax=edx:eax/dNumB
Page 101
dword[dAns2],eax
dword[dRem2],edx
;edx=edx:eax%dNumB
;dAns3=(dNumA*dNumC)/dNumB(signed)
mov
eax,dword[dNumA]
imul
dword[dNumC]
;resultinedx:eax
idiv
dword[dBumB]
;eax=edx:eax/dNumB
mov
dword[dAns3],eax
;
;examplequadwordoperations,signed
;qAns1=qNumA/9(signed)
mov
rax,qword[qNumA]
cqo
mov
rbx,9
idiv
rbx
mov
qword[qAns1],rax
;qAns2=qNumA/qNumB(signed)
mov
rax,qword[qNumA]
cqo
idiv
qword[qNumB]
mov
qword[qAns2],rax
mov
qword[qRem2],rdx
;raxrdx:rax
;eax=edx:eax/9
;raxrdx:rax
;rax=rdx:rax/qNumB
;rdx=rdx:rax%qNumB
;qAns3=(qNumA*qNumC)/qNumB(signed)
mov
rax,qword[qNumA]
imul
qword[qNumC]
;resultinrdx:rax
idiv
qword[qBumB]
;rax=rdx:rax/qNumB
mov
qword[qAns3],rax
For some instructions, including those above, the explicit type specification (e.g., byte,
word, dword, qword) is required to clearly define the size.
Page 102
Explanation
div<src>
div<op8>
div<op16>
div<op32>
div<op64>
Examples:
idiv<src>
divword[wVvar]
divbl
divdword[dVar]
divqword[qVar]
idiv<op8>
idiv<op16>
idiv<op32>
idiv<op64>
Examples:
idivword[wVvar]
idivbl
idivdword[dVar]
idivqword[qVar]
Page 103
0 1 0 1
0 1 0 1
0 1 0 1
and 0 0 1 1
or 0 0 1 1
xor 0 0 1 1
0 0 0 1
0 1 1 1
0 1 1 0
Explanation
and<dest>,<src>
Examples:
or<dest>,<src>
Examples:
Page 104
andax,bx
andrcx,rdx
andeax,dword[dNum]
andqword[qNum],rdx
Explanation
xor<dest>,<src>
Examples:
not<op>
xorax,bx
xorrcx,rdx
xoreax,dword[dNum]
xorqword[qNum],rdx
notbx
notrdx
notdword[dNum]
notqword[qNum]
The & refers to the logical AND operation, the || refers to the logical OR operation , and
the ^ refers to the logical XOR operation as per C/C++ conventions. The refers to the
logical NOT operation.
A more complete list of the instructions is located in Appendix B.
The logical shift is a bitwise operation that shifts all the bits of its source register by the
specified number of bits places the result into the destination register. The bits can be
Page 105
0 0 0 1 0 1 1 1 =
23
0 0 0 0 1 1 0 1 = 13
0 0 0 0 1 0 1 1 =
11
0 0 1 1 0 1 0 0 = 52
Explanation
shl<dest>,<imm>
shl<dest>,cl
Examples:
shr<dest>,<imm>
shr<dest>,cl
Examples:
shlax,8
shlrcx,32
shleax,cl
shlqword[qNum],cl
The arithmetic shift right is also a bitwise operation that shifts all the bits of its source
register by the specified number of bits places the result into the destination register.
Every bit in the source operand is moved the specified number of bit positions, and the
newly vacant bit-positions on the left are filled in. The original leftmost bit (the sign
bit) is replicated to fill in all the vacant positions. This is referred to as sign extension.
Page 107
1 0
1 1
Explanation
sal<dest>,<imm>
sal<dest>,cl
Examples:
Page 108
salax,8
salrcx,32
saleax,cl
salqword[qNum],cl
Explanation
sar<dest>,<imm>
sar<dest>,cl
Examples:
sarax,8
sarrcx,32
sareax,cl
sarqword[qNum],cl
Explanation
rol<dest>,<imm>
rol<dest>,cl
Examples:
rolax,8
rolrcx,32
roleax,cl
rolqword[qNum],cl
Page 109
Explanation
ror<dest>,<imm>
ror<dest>,cl
Examples:
rorax,8
rorrcx,32
roreax,cl
rorqword[qNum],cl
7.7.1 Labels
A program label is the target, or a location to jump to, for control statements. For
example, the start of a loop might be marked with a label such as loopStart. The code
may be re-executed by jumping to the label.
A label must start with a letter, following by letters, numbers, or symbols (limited to
_), terminated with a colon (:). Label in yasm are case sensitive.
Page 110
Explanation
jmp<label>
jmpstartLoop
jmpifDone
jmplast
Page 111
<op1>,<op2>
Where <op1> and <op2> are not changed and must be of the same size. Either, but not
both, may be a memory operand. The <op1> operand can not be an immediate, but the
<op2> operand may be an immediate value.
The conditional control instructions include the jump equal (je) and jump not equal
(jne) which work the same for both signed and unsigned data.
The signed conditional control instructions include the basic set of comparison
operations; jump less than (jl), jump less than or equal (jle), jump greater than (jg), and
jump greater than or equal (jge).
The unsigned conditional control instructions include the basic set of comparison
operations; jump below than (jb), jump below or equal (jbe), jump above than (ja), and
jump above or equal (jae).
The general form of the signed conditional instructions along with an explanatory
comment are as follows:
je<label>
jne<label>
;if<op1>==<op2>
;if<op1>!=<op2>
jl<label>
jle<label>
jg<label>
jge<label>
;signed,if<op1><<op2>
;signed,if<op1><=<op2>
;signed,if<op1>><op2>
;signed;if<op1>>=<op2>
jb<label>
jbe<label>
ja<label>
jae<label>
;unsigned,if<op1><<op2>
;unsigned,if<op1><=<op2>
;unsigned,if<op1>><op2>
;unsigned,if<op1>>=<op2>
Page 112
dq
dq
0
0
;ifcurrNum<=myMax
;skipsetnewmax
Note that the logic for the IF statement has been reversed. The compare and conditional
jump provide functionality for jump or not jump. As such, if the condition from the
original IF statement is false, the code must not be executed. Thus, when false, in order
to skip the execution, the conditional jump will jump to the target label immediately
following the code to be skipped (not executed). While there is only one line in this
example, there can be many lines code code.
A more complex example might be as follows:
if(x!=0){
ans=x/y;
errFlg=FALSE;
}else{
ans=0;
errFlg=TRUE;
}
This basic compare and conditional jump do not provide and IF-ELSE structure. It must
be created. Assuming the x and y variables are signed double-words that will be set
during the program execution, and the following declarations:
TRUE
FALSE
x
y
ans
errFlg
equ
equ
dd
dd
dd
db
1
0
0
0
0
FALSE
The following code could be used to implement the above IF-ELSE statement.
cmp
je
mov
dword[x],0
doElse
eax,dword[x]
;ifstatement
Page 113
dword[y]
dword[ans],eax
byte[errFlg],FALSE
skpElse
dword[ans],0
byte[errFlg],TRUE
In this example, since the data was signed, a signed division (idiv) and the appropriate
conversion (cdq in this case) were required. It should also be noted that the edx register
was over-written even though it did not appear explicitly. If a value was previous placed
in edx (or rdx), it has been altered.
7.7.3.1 Jump Out Of Range
The target label is referred to as a short-jump. Specifically, that leans the target label
must be within 128 bytes from the conditional jump instruction. While this limit is
not typically a problem, for very large loops, the assembler may generate an error
referring to jump out-of-range. The unconditional jump (jmp) is not limited in range.
If a jump out-of-range is generated, it can be eliminated by reversing the logic and
using an unconditional jump for the long jump. For example, the following code:
cmp
jne
rcx,0
startOfLoop
Which accomplishes the same thing using an unconditional jump for the long jump and
adding a conditional jump to a very close label.
Page 114
Explanation
cmp<op1>,<op2>
Examples:
je<label>
jne<label>
cmprax,5
jewasEqual
jl<label>
cmprax,5
jnewasNotEqual
jle<label>
cmprax,5
cmpecx,edx
cmpax,word[wNum]
cmprax,5
jlwasLess
Page 115
Explanation
Examples:
jg<label>
jge<label>
jb<label>
cmprax,5
jgewasGreaterOrEqual
jbe<label>
cmprax,5
jbwasLess
Page 116
cmprax,5
jgwasGreater
ja<label>
cmprax,5
jlewasLessOrEqual
cmprax,5
jbewasLessOrEqual
Explanation
Examples:
jae<label>
cmprax,5
jawasGreater
cmprax,5
jaewasGreaterOrEqual
7.7.4 Iteration
The basic control instructions outlined provide a means to iterate or loop.
A basic loop can be implemented consisting of a counter which is checked at either the
bottom or top of a loop with a compare and conditional jump.
For example, assuming the following declarations:
maxN
sum
dq 30
dq 0
The following code would sum the odd integers from 1 to maxN:
mov
mov
sumLoop:
add
add
dec
cmp
jne
rcx,qword[maxN]
rax,1
;loopcounter
;oddintegercounter
qword[sum],rax
rax,2
rcx
rcx,0
sumLoop
;sumcurrentoddinteger
;setnextoddinteger
;decrementloopcounter
This is just one of many different ways to accomplish the odd integer summation task.
In this example, rcx was used as a loop counter and rax was used for the current odd
integer (appropriately initialized to 1 and incremented by 2).
Page 117
Which will perform the decrement of the rcx register, comparison to 0, and jump to the
specified label if rcx 0. The label must be defined exactly once.
As such, the loop instruction provides the same functionality as the three lines of code
from the previous example program. The following sets of code are equivalent;
Code Set 1
Code Set 2
loop<label>
decrcx
cmprcx,0
jne<label>
rcx,qword[maxN]
rax,1
;loopcounter
;oddintegercounter
qword[sum],rax
rax,2
sumLoop
;sumcurrentoddinteger
;setnextoddinteger
Both code examples produce the exact same result in the same manner.
Since the rcx register is decremented and then checked, forgetting to set the rcx register
could result in looping an unknown number of times. This is likely to generate an error
during the loop execution, which can be very misleading when debugging.
The loop instruction can be useful when coding, but it is limited to the rcx register and
to counting down. If nesting loops are required, the use of a loop instruction for both
the inner and outer loop can cause a conflict unless additional actions are taken (i.e.,
save/restore rcx register as required for inner loop).
While some of the programming examples in this text will use the loop instruction, it is
not required.
Page 118
Explanation
loop<label>
loopstartLoop
loopifDone
loopsumLoop
1 + 2 + + 10 = 385
This example main initializes the n value to 10 to match the above example.
;Simpleexampleprogramtocomputethe
;sumofsquaresfrom1toN.
;**********************************************
;Datadeclarations
section .data
;
;Defineconstants
SUCCESS
SYS_exit
equ
equ
0
60
;Successfuloperation
;callcodeforterminate
;DefineData.
n
sumOfSquares
dd
dq
10
0
Page 119
rbx,1
ecx,dword[n]
;i
rax,rbx
rax
qword[sumOfSquares],rax
rbx
sumLoop
;geti
;i^2
;
;Done,terminateprogram.
last:
mov
rax,SYS_exit
mov
rdi,SUCCESS
syscall
;callcodeforexit
;exitwithsuccess
The debugger can be used to examine the results and verify correct execution of the
program.
7.9 Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 120
movrax,54
2.
movax,54
3.
moval,354
4.
movrax,r11
5.
movrax,r11d
6.
mov54,ecx
7.
movrax,qword[qVar]
8.
movrax,qword[bVar]
9.
movrax,[qVar]
10. movrax,qVar
11. moveax,dword[bVar]
12. movqword[qVar2],qword[qVar1]
13. movqword[bVar2],qword[qVar1]
14. movr15,54
15. movr16,54
16. movr11b,54
2) Explain what each of the following instructions does.
1.
movzxrsi,byte[bVar1]
2.
movsxrsi,byte[bVar1]
add
[dVar],eax
2.
add
dword[dVar],1
rax,9
rbx,2
rbx,rax
What would be in the rax and rbx registers after execution? Show answer in
hex, full register size.
9) Given the following code fragment:
mov
mov
sub
rax,9
rbx,2
rax,rbx
What would be in the rax and rbx registers after execution? Show answer in
hex, full register size.
10) Given the following code fragment:
mov
mov
sub
rax,9
rbx,2
rbx,rax
What would be in the rax and rbx registers after execution? Show answer in
hex, full register size.
Page 122
What would be in the rax and rdx registers after execution? Show answer in
hex, full register size.
12) Given the following code fragment:
mov rax,5
cqo
mov rbx,3
idiv rbx
What would be in the rax and rdx registers after execution? Show answer in
hex, full register size.
13) Given the following code fragment:
movrax,11
cqo
movrbx,4
idivrbx
What would be in the rax and rdx registers after execution? Show answer in
hex, full register size.
14) Explain why each of the following statements will not work.
1.
mov
42,eax
2.
div
3.
mov
dword[num1],dword[num1]
4.
mov
dword[ax],800
15) Explain why the following code fragment will not work correctly.
mov eax,500
mov ebx,10
idiv ebx
Page 123
eax,500
ebx,10
ebx
17) Explain why the following code fragment will not work correctly.
mov
cwd
mov
idiv
mov
ax,500
bx,10
bx
dword[ans],eax
18) Under what circumstances can the three operand multiple be used?
bAns1=bNum1+bNum2
2.
bAns2=bNum1+bNum3
3.
bAns3=bNum3+bNum4
4.
bAns6=bNum1bNum2
5.
bAns7=bNum1bNum3
6.
bAns8=bNum2bNum4
7.
wAns11=bNum1*bNum3
8.
wAns12=bNum2*bNum2
9.
wAns13=bNum2*bNum4
10. bAns16=bNum1/bNum2
11. bAns17=bNum3/bNum4
Page 124
wAns1=wNum1+wNum2
2.
wAns2=wNum1+wNum3
3.
wAns3=wNum3+wNum4
4.
wAns6=wNum1wNum2
5.
wAns7=wNum1wNum3
6.
wAns8=wNum2wNum4
7.
dAns11=wNum1*wNum3
8.
dAns12=wNum2*wNum2
9.
dAns13=wNum2*wNum4
10. wAns16=wNum1/wNum2
11. wAns17=wNum3/wNum4
12. wAns18=dNum1/wNum4
13. wRem18=dNum1%wNum4
Use the debugger to execute the program and display the final results. Create a
debugger input file to show the results in both decimal and hexadecimal.
4) Repeat the previous program using signed values and signed operations. Use the
debugger to execute the program and display the final results. Create a debugger
input file to show the results in both decimal and hexadecimal.
Page 125
dAns1=dNum1+dNum2
2.
dAns2=dNum1+dNum3
3.
dAns3=dNum3+dNum4
4.
dAns6=dNum1dNum2
5.
dAns7=dNum1dNum3
6.
dAns8=dNum2dNum4
7.
qAns11=dNum1*dNum3
8.
qAns12=dNum2*dNum2
9.
qAns13=dNum2*dNum4
10. dAns16=dNum1/dNum2
11. dAns17=dNum3/dNum4
12. dAns18=qNum1/dNum4
13. dRem18=qNum1%dNum4
Use the debugger to execute the program and display the final results. Create a
debugger input file to show the results in both decimal and hexadecimal.
6) Repeat the previous program using signed values and signed operations. Use the
debugger to execute the program and display the final results. Create a debugger
input file to show the results in both decimal and hexadecimal.
7) Implement the example program to compute the sum of squares from 1 to n.
Use the debugger to execute the program and display the final results. Create a
debugger input file to show the results in both decimal and hexadecimal.
8) Create a program to compute the square of the sum from 1 to n. Specifically,
compute the sum of integers from 1 to n and then square the value. Use the
debugger to execute the program and display the final results. Create a debugger
input file to show the results in both decimal and hexadecimal.
Page 126
n
fibonacci ( n2) +
fibonacci (n1)
if n=0 or n=1
if n 2
Use the debugger to execute the program and display the final results. Test the
program for various values of n. Create a debugger input file to show the results
in both decimal and hexadecimal.
Page 127
Page 128
Chapter
8
Register
Immediate
Memory
Each of these modes is described with examples in the following sections. Additionally,
a simple example for accessing an array is presented.
rax,qword[var1]
rax,var1
;valueofvar1inrax
;addressofvar1inrax
Since omitting the brackets is not an error, the assembler will not generate error
messages or warnings.
Page 129
eax,[rbx]
moves a double-word from memory. However, for some instructions the size can be
ambiguous. For example,
inc
[rbx]
;error
is ambiguous since it is not clear if the memory being accessed is a byte, word, or
double-word. In such a case, operand size must be specified with either the byte, word,
or dword, qword size qualifier. For example,
inc
inc
inc
byte[rbx]
word[rbx]
dword[rbx]
each instruction requires the size specification in order to be clear and legal.
For
eax,123
The destination operand, eax , is register mode addressing. The 123 is immediate mode
addressing. It should be clear that the destination operand in this example can not be
immediate mode.
Page 130
rax,qword[qNum]
Will access the memory location of the variable qNum and retrieve the value stored
there. This requires that the CPU wait until the value is retrieved before completing the
operation and thus might take slightly longer to complete than a similar operation using
an immediate value.
When accessing arrays, a more generalized method is required. Specifically, an address
can be placed in a register and indirection performed using the register (instead of the
variable name).
For example, assuming the following declaration:
lst
dd
101,103,105,107
The decimal value of 101 is 0x00000065 in hex. The memory picture would be as
follows:
Value
Address
Offset
00
0x6000ef
lst + 15
00
0x6000ee
lst + 14
00
0x6000ed
lst + 13
6b
0x6000ec
lst + 12
00
0x6000eb
lst + 11
00
0x6000ea
lst + 10
00
0x6000e9
lst + 9
69
0x6000e8
lst + 8
00
0x6000e7
lst + 7
00
0x6000e6
lst + 6
00
0x6000e5
lst + 5
67
0x6000e4
lst + 4
Index
lst[3]
lst[2]
lst[1]
Page 131
lst
00
0x6000e3
lst + 3
00
0x6000e2
lst + 2
00
0x6000e1
lst + 1
65
0x6000e0
lst + 0
lst[0]
eax,dword[list]
rbx,list
eax,dword[rbx]
In this example, the starting address, or base address, of the list is placed in rbx (first
line) and then the value at that address is accessed and placed in the rax register (second
line). This allows us to easily access other elements in the array.
Recall that memory is byte addressable, which means that each address is one byte of
information. A double-word variable is 32-bits or 4 bytes so each array element uses 4
bytes of memory. As such, the next element (103) is the starting address (lst) plus 4, and
the next element (105) is the starting address (lst) 8.
Increasing the offset by 4 for each successive element. A list of bytes would increase by
1, a list of word would increase by 2, a list of double-words would increase by 4, and a
list of quadwords would increase by 8.
The offset is the amount added to the base address. The index is the array element
number as used in a high level language.
There are several way to access the array elements. One is to use a base address and add
a displacement. For example, given the initializations:
mov
mov
rbx,list
rsi,8
Each of the following instructions access the third element (105 in the above list).
mov
mov
mov
Page 132
eax,dword[list+8]
eax,dword[rbx+8]
eax,dword[rbx+rsi]
Where baseAddr is a register or a variable name. The indexReg must be a register. The
scaleValue is an immediate value of 1, 2, 4, 8 (1 is legal, but not useful). The
displacement is must be an immediate value. The total represents a 64-bit address.
Elements may be used in any combination, but must be legal and result in an valid
address.
Some example of memory addressing for the source operand are as follows:
mov
mov
mov
mov
mov
mov
mov
eax,dword[var1]
rax,qword[rbx+rsi]
ax,word[lst+4]
bx,word[lst+rdx+2]
rcx,qword[lst+(rsi*8)]
al,byte[buff1+rcx]
eax,dword[rbx+(rsi*4)+16]
For example, to access the 3rd element of the previously defined double-word array
(which is index 2 since index's start at 0):
mov
mov
rsi,2
eax,dword[list+rsi*4]
;index=2
;getlst[2]
Since addresses are always qword (on a 64-bit architecture), a 64-bit register is used for
the memory mode addressing (even when accessing double-word values). This allows a
register to be used more like an array index (from a high level language).
For example, the memory operand, [lst+rsi*4], is analogous to lst[rsi] from a high level
language. The rsi register is multiplied by the data size (4 in this example since each
element is 4 bytes).
Page 133
equ 0
equ 60
;successfuloperation
;callcodeforterminate
;
;
DefineData.
section .data
lst
dd
len
dd
sum
dd
1002,1004,1006,1008,10010
5
0
;********************************************************
section .text
global _start
_start:
;
;Summationloop.
mov
mov
sumLoop:
mov
Page 134
ecx,dword[len]
rsi,0
;getlengthvalue
;index=0
eax,dword[lst+(rsi*4)]
;getlst[rsi]
dword[sum],eax
rsi
sumLoop
;updatesum
;nextitem
;
;Done,terminateprogram.
last:
mov
rax,SYS_exit
mov
rdi,EXIT_SUCCESS
syscall
;callcodeforexit
;exitwithsuccess
The ()'s within the [ ]'s are not required and added only for clarity. As such, the [lst+
(rsi*4)], is exactly the same as [lst+rsi*4].
equ
equ
0
60
;successfuloperation
;callcodeforterminate
;
;ProvidedData
aSides
Page 136
db
db
db
db
db
db
db
db
db
10,14,13,37,54
31,13,20,61,36
14,53,44,19,42
27,41,53,62,10
19,18,14,10,15
15,11,22,33,70
15,23,15,63,26
24,33,10,61,15
14,34,13,71,81
dw
dw
dw
dw
dw
dw
dw
dw
dw
dw
1233,1114,1773,1131,1675
1164,1973,1974,1123,1156
1344,1752,1973,1142,1456
1165,1754,1273,1175,1546
1153,1673,1453,1567,1535
1144,1579,1764,1567,1334
1456,1563,1564,1753,1165
1646,1862,1457,1167,1534
1867,1864,1757,1755,1453
1863,1673,1275,1756,1353
heights
dd
dd
dd
dd
dd
dd
dd
dd
dd
dd
14145,11134,15123,15123,14123
18454,15454,12156,12164,12542
18453,18453,11184,15142,12354
14564,14134,12156,12344,13142
11153,18543,17156,12352,15434
18455,14134,12123,15324,13453
11134,14134,15156,15234,17142
19567,14134,12134,17546,16123
11134,14134,14576,15457,17142
13153,11153,12184,14142,17134
length
dd 50
taMin
taMax
taSum
taAve
dd
dd
dd
dd
0
0
0
0
volMin
dd 0
volMax
dd 0
volSum
dd 0
volAve
dd 0
;
;Additionalvariables
ddTwo
ddThree
dd 2
dd 3
Page 137
resd
resd
50
50
;*************************************************
section .text
global _start
_start:
;Calculatevolume,lateralandtotalsurfaceareas
mov
mov
ecx,dword[length]
rsi,0
;lengthcounter
;index
calculationLoop:
;
totalAreas(n)=aSides(n)*(2*aSides(n)*sSides(n))
movzx
movzx
mov
mul
mul
mul
mov
r8d,byte[aSides+rsi]
r9d,word[sSides+rsi*2]
eax,r8d
dword[ddTwo]
r9d
r8d
dword[totalAreas+rsi*4],eax
;aSides[i]
;sSides[i]
volumes(n)=(aSides(n)^2*heights(n))/3
movzx
mul
mul
div
mov
eax,byte[aSides+rsi]
eax
dword[heights]
dword[ddThree]
dword[volumes+rsi*4],eax
inc
rsi
Page 138
calculationLoop
;
;Findmin,max,sum,andaverageforthetotal
;areasandvolumes.
mov
mov
mov
eax,dword[totalAreas]
dword[taMin],eax
dword[taMax],eax
mov
mov
mov
eax,dword[volumes]
dword[volMin],eax
dword[volMax],eax
mov
mov
dword[taSum],0
dword[volSum],0
mov
mov
ecx,dword[length]
rsi,0
statsLoop:
mov
eax,dword[totalAreas+rsi*4]
add
dword[taSum],eax
cmp
jae
mov
eax,dword[taMin]
notNewTaMin
dword[taMin],eax
notNewTaMin:
cmp
eax,dword[taMax]
jbe
notNewTaMax
mov
dword[taMax],eax
notNewTaMax:
mov
eax,dword[volumes+rsi*4]
add
dword[volSum],eax
cmp
eax,dword[volMin]
jae
notNewVolMin
mov
dword[volMin],eax
Page 139
rsi
statsLoop
;
;Calculateaverages.
mov
mov
div
mov
eax,dword[taSum]
edx,0
dword[length]
dword[taAve],eax
mov
mov
div
mov
eax,dword[volSum]
edx,0
dword[length]
dword[volAve],eax
;
;Done,terminateprogram.
last:
mov
rax,SYS_exit
mov
rbx,EXIT_SUCCESS
syscall
;callcodeforexit
;exitwithsuccess
This is one example. There are multiple other valid approaches to solving this problem.
8.4 Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 140
movrdx,qword[qVar1]
2.
movrdx,qVar1
2) What is the address mode of the source operand for each of the instructions list
below. Respond with Register, Immediate, Memory, or Illegal Instruction.
Note,
mov
<dest>,<source>
mov
ebx,14
mov
ecx,dword[rbx]
mov
byte[rbx+4],10
mov
10,rcx
mov
dl,ah
mov
ax,word[rsi+4]
mov
cx,word[rbx+rsi]
mov
ax,byte[rbx]
rax,3
rbx,ans1
eax,dword[rbx]
What would be in the eax register after execution? Show answer in hex, full
register size.
Page 141
dd2,3,4,5,6,7
rbx,list1
rbx,4
eax,dword[rbx]
edx,dword[list1]
What would be in the eax and edx registers after execution? Show answer in
hex, full register size.
5) Given the following variable declarations and code fragment:
lst
dd2,3,5,7,9
mov
mov
mov
lp: add
add
loop
mov
rsi,4
eax,1
rcx,2
eax,dword[lst+rsi]
rsi,4
lp
ebx,dword[lst]
What would be in the eax, ebx, rcx, and rsi registers after execution? Show
answer in hex, full register size. Note, pay close attention to the register sizes
(32-bit vs 64-bit).
6) Given the following variable declarations and code fragment:
list
dd
mov
mov
mov
mov
lp: mov
inc
loop
imul
Page 142
8,6,4,2,1,0
rbx,list
rsi,1
rcx,3
edx,dword[rbx]
eax,dword[list+rsi*4]
rsi
lp
dword[list]
lp:
dd
8,7,6,5,4,3,2,1,0
mov
mov
mov
mov
add
inc
loop
cwd
idiv
rbx,list
rsi,0
rcx,3
edx,dword[rbx]
eax,dword[list+rsi*4]
rsi
lp
dword[list]
What would be in the eax, edx, rcx, and rsi registers after execution? Show
answer in hex, full register size. Note, pay close attention to the register sizes
(32-bit vs 64-bit).
8) Given the following variable declarations and code fragment:
list
lp:
dd
2,7,4,5,6,3
mov
mov
mov
mov
mov
add
add
loop
imul
rbx,list
rsi,1
rcx,2
eax,0
edx,dword[rbx+4]
eax,dword[rbx+rsi*4]
rsi,2
lp
dword[rbx]
What would be in the eax, edx, rcx, and rsi registers after execution? Show
answer in hex, full register size. Note, pay close attention to the register sizes
(32-bit vs 64-bit).
Page 143
Page 144
Use the debugger to execute the program and display the final results. Create a
debugger input file to show the results.
Page 145
Page 146
Chapter
9
a[0]
a[1]
a[2]
a[0]
a[1]
a[2]
The initial push will push the 7, followed by the 19, and finally the 37. Since the stack
is last-in, first-out, the first item popped off the stack will be the last item pushed, or 37
in this example. The 37 is placed in the first element of the array (over-writing the 7).
Page 147
stack
stack
stack
stack
stack
37
19
19
19
empty
push
a[0]
push
a[1]
push
a[2]
popa[0]
popa[1]
popa[2]
a={7,
19,37}
a={7,
19,37}
a={7,
19,37}
a={37,
19,37}
a={37,
19,37}
a={37,
19,7}
The following sections provide more detail regarding the stack implementation and
applicable stack operations and instructions.
<operand>
<operand>
The operand can be a register or memory, but an immediate is not allowed. In general,
push and pop operations will push the architecture size. Since the architecture is 64-bit,
we will push and pop quadwords.
The stack is implemented in reverse in memory. Refer to the following sections for a
detailed explanation of why.
Page 148
Explanation
push<op64>
pop<op64>
pushrax
pushqword[qVal] ;value
pushqVal
;address
poprax
popqword[qVal]
poprsi
If more than 64-bits must be pushed, multiple push operations would be required. While
it is possible to push and pop operands less than 64-bits, it is not recommended.
A more complete list of the instructions is located in Appendix B.
Page 149
high memory
stack
.
.
. . . available memory . . .
.
.
heap
uninitialized data
data
text (code)
low memory
reserved
Page 150
Process A
Process B
stack
stack
heap
heap
bss
bss
data
data
text (code)
text (code)
reserved
reserved
rax,6700
rax
rax,31
rax
;670010=00001A2C16
;3110=0000001F16
Would produce the following stack configuration (where each box is a byte):
...
00
00
00
00
00
00
1A
2C
00
00
00
00
00
00
00
rsp
1F
...
...
...
Page 152
equ
equ
0
60
;successfuloperation
;callcodeforterminate
;
;DefineData.
numbers
len
dq
dq
121,122,123,124,125
5
;****************************************************
section .text
global _start
_start:
Page 153
rcx,qword[len]
rbx,numbers
r12,0
rax,0
qword[rbx+r12*8]
r12
pushLoop
;
;Allthenumbersareonstack(inreverseorder).
;Looptogetthembackoff.Putthembackinto
;theoriginallist...
mov
mov
mov
popLoop:
pop
mov
inc
loop
rcx,qword[len]
rbx,numbers
r12,0
rax
qword[rbx+r12*8],rax
r12
popLoop
;
;Done,terminateprogram.
last:
mov
rax,SYS_exit
mov
rdi,EXIT_SUCCESS
syscall
;callcodeforexit
;exitwithsuccess
There are other ways to accomplish this function (reversing a list), however this is
meant to demonstrate the stack operations.
9.5 Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 154
r10,1
r11,2
r12,3
r10
r11
r12
r10
r11
r12
What would be in the r10 , r11, and r12 registers after execution? Show answer
in hex, full register size.
5) Given the following variable declarations and code fragment:
lst
lp1:
lp2:
dd1,3,5,7,9
mov
mov
push
inc
loop
mov
mov
pop
inc
loop
mov
rsi,0
rcx,5
dword[lst+rsi]
rsi
lp1
rsi,0
rcx,5
dword[lst+rsi]
rsi
lp2
ebx,dword[lst]
Page 155
Page 156
Chapter
10
To help demonstrate this process in detail, these steps will be applied to a simple
example problem in the following sections.
10.1
NULL
49
52
57
56
0x31
0x34
0x39
0x38
0x0
The goal is to convert the single integer number into the appropriate series of characters
to form a NULL terminated string.
10.2
The algorithm is the name for the unambiguous, ordered sequence of steps involved in
solving the problem. Once the program is understood, a series of steps can be
developed to solve that problem. There can be, and usually is, multiple correct solutions
to a given problem.
The process for creating an algorithm can be different for different people. In general,
some time should be devoted to thinking about possible solutions. This may involve
working on some possible solutions using a scratch piece of paper. Once an approach is
selected, that solution can be developed into an algorithm. The algorithm should be
written down, reviewed, and refined. The algorithm is then used as the outline of the
program.
For example, we will consider the integer to ASCII conversion problem outlined in the
previous section. To convert a single digit integer (0-9) into a character, 48 10 (or 0 or
0x30) can be added to the integer. For example, 0x01 + 0x30 is 0x31 which is the
ASCII value of 1. It should be obvious that this trick will only work for single digit
numbers (0-9).
In order to convert a larger integer (10) into a string, the integer must be broken into
its component digits. For example, 12310 (0x7B) would be 1, 2, and 3. This can be
accomplished by repeatedly performing integer division by 10 until a 0 result is
obtained.
Page 158
reminder 3
12
= 1
10
remainder 2
1
= 0
10
remainder 1
As can be seen, the remainder represents the individual digits. However, they are
obtained in reverse order. To address this, the program can push the remainder and,
when done dividing, pop the remainders and convert to ASCII and store in a string
(which is an array of bytes).
This process forms the basis for the algorithm. It should be noted, that there are many
ways to develop this algorithm. One such approach is shown as follows.
;PartASuccessivedivision
;
digitCount=0
;
getinteger
;
divideLoop:
;
dividenumberby10
;
pushremainder
;
incrementdigitCount
;
if(result>0)gotodivideLoop
;PartBConvertremaindersandstore
;
getstartingaddressofstring(arrayofbytes)
;
idx=0
;
popLoop:
;
pop intDigit
;
charDigit=intDigit+0(0x030)
;
string[idx]=charDigit
;
incrementidx
;
decrementdigitCount
;
if(digitCount>0)gotopopLoop
;
string[idx]=NULL
Page 159
10.3
Based on the algorithm, a program can be developed and implemented. The algorithm
is expanded and the code added based on the steps outlined in the algorithm. This allows
the programmer to focus on the specific issues for the current section being coded
including the data types and data sizes. This example addresses only unsigned data so
the unsigned divide (DIV, not IDIV) is used. Since the integer is a double-word, it must
be converted into a quadword for the division. However, the result and the remainder
after division will also be a double-words. Since the stack is quadwords, the entire
quadword register will be pushed. The upper order portion of the register will not be
accessed, so it contents are not relevant.
One possible implementation of the algorithm is as follows:
;Simpleexampleprogramtoconvertan
;integerintoanASCIIstring.
;*********************************************************
;Datadeclarations
section .data
;
;Defineconstants
NULL
EXIT_SUCCESS
SYS_exit
;
;DefineData.
Page 160
equ
equ
equ
0
0
60
;successfuloperation
;codeforterminate
dd
1498
section .bss
strNum
resb
10
;*********************************************************
section .text
global _start
_start:
;ConvertanintegertoanASCIIstring.
;
;PartASuccessivedivision
mov
mov
mov
eax,dword[intNum]
rcx,0
ebx,10
divideLoop:
mov
edx,0
div
ebx
;getinteger
;digitCount=0
;setfordividingby10
;dividenumberby10
push
inc
rdx
rcx
;pushremainder
;incrementdigitCount
cmp
jne
eax,0
divideLoop
;if(result>0)
;
gotodvideLoop
;
;PartBConvertremaindersandstore
mov
mov
popLoop:
pop
rbx,strNum
rdi,0
;getaddrofstring
;idx=0
rax
;popintDigit
Page 161
al,"0"
;char=int+"0"
mov
inc
loop
byte[rbx+rdi],al
rdi
popLoop
;string[idx]=char
;incrementidx
;if(digitCount>0)
;
gotopopLoop
mov
byte[rbx+rdi],NULL
;string[idx]=NULL
;
;Done,terminateprogram.
last:
mov
rax,SYS_exit
mov
rbx,EXIT_SUCCESS
syscall
;callcodeforexit
;exitwithsuccess
There are many different valid implementations for this algorithm. The program should
be assembled to address any typos or syntax errors.
10.4
Once the program is written, testing should be performed to ensure that the program
works. The testing will be based on the specific parameters of the program.
In this case, the program can be executed using the debugger and stopped near the end
of the program (e.g., at the label last in this example). After starting the debugger
with ddd, the command blast and run can be entered which will run the program
up to, but not executing the line referenced by the label last. The resulting string,
strNum can be viewed in the debugger with x/s&strNum will display the string
address and the contents which should be 1498. For example;
(gdb)x/s&strNum
0x600104:
"1498"
If the string is not displayed properly, it might be worth checking each character of the
five (5) byte array with the x/5cb &chrNum debugger command. The output will
show the address of the string followed by the both decimal and ASCII representation.
Page 162
10.5
Error Terminology
In case the program does not work, it helps to understand some basic terminology about
where or what the error might be. Using the correct terminology ensures that you can
communicate effectively about the problem with others.
Page 163
10.6
Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 164
Page 165
Page 166
Chapter
11
11.0 Macros
An assembly language macro is a predefined set of instructions that can easily be
inserted wherever needed. Once defined, the macro can be used as many times as
necessary. It is useful when the same set of code must be utilized numerous times. A
macro can be useful to reduce the amount of coding, streamline programs, and reduce
errors from repetitive coding.
The assembler contains a powerful macro processor, which supports conditional
assembly, multi-level file inclusion, and two forms of macros (single-line and multiline), and a 'context stack' mechanism for extra macro power.
Before using a macro, it must be defined. Macro definitions should be placed in the
source file before the data and code sections. The macro is used in the text (code)
section. The following sections will present a detailed example with the definition and
use.
11.1
Single-Line Macros
There are two key types of macro's. Single-line macros and multi-line macros. Each of
these is described in the following sections.
Single-line macros are defined using the %define directive. The definitions work in a
similar way to C/C++; so you can do things like:
%definemulby4(x)shlx,2
in the source, which will multiply the contents to the rax register by 4 (via shifting two
bits).
Page 167
11.2
Multi-Line Macros
Multi-line macro's can include a varying number of lines (including one). The multiline macro's are more useful and the following sections will focus primarily on multiline macro's.
The arguments can be referenced within the macro by %<number>, with %1 being the
first argument, and %2 the second argument, and so forth.
In order to use labels, the labels within the macro must be prefixing the label name with
a %%.
This will ensure that calling the same macro multiple times will use a different label
each time. For example, a macro definition for the absolute value function would be as
follows:
%macroabs1
cmp%1,0
jge%%done
neg%1
%%done:
%endmacro
eax,3
eax
abs
qword[qVar]
The list file will display the code as follows (for the first invocation):
2700000000B8FDFFFFFFmoveax,3
28abseax
29000000053D00000000<1>cmp%1,0
300000000A7D02<1>jge%%done
310000000CF7D8<1>neg%1
32<1>%%done:
The macro will be copied from the definition into the code, with the appropriate
arguments replaced in the body of the macro, each time it is used. The <1> indicates
code copied from a macro definition. In both cases, the %1 argument was replaced with
the given argument; eax in this example.
Macro's use more memory, but do not required overhead for transfer of control (like
procedures).
11.3
Macro Example
The following example program demonstrates the definition and use of a simple macro.
;ExampleProgramtodemonstrateasimplemacro
;**************************************************
;Definethemacro
;
calledwiththreearguments:
;
aver<lst>,<len>,<ave>
%macro
mov
aver
3
eax,0
Page 169
ecx,dword[%2]
r12,0
rbx,[%1]
%%sumLoop:
add
eax,dword[rbx+r12*4]
inc
r12
loop
%%sumLoop
cdq
idiv
mov
;length
;getlist[n]
dword[%2]
dword[%3],eax
%endmacro
;**************************************************
;Datadeclarations
section .data
;
;Defineconstants
EXIT_SUCCESS
SYS_exit
;
equ
equ
0
60
;successcode
;codeforterminate
DefineData.
section .data
list1
dd4,5,2,3,1
len1
dd5
ave1
dd0
list2
dd2,6,3,2,1,8,19
len2
dd7
ave2
dd0
;**************************************************
section .text
global _start
Page 170
list1,len1,ave1
;1st,dataset1
aver
list2,len2,ave2
;2nd,dataset2
;
;Done,terminateprogram.
last:
mov rax,SYS_exit
mov rdi,EXIT_SUCCESS
syscall
;exit
;success
In this example, the macro is invoked twice. Each time the macro is used, it is copied
from the definition) into the text section. As such, macro typically use more more
memory.
11.4
Debugging Macros
The code for a macro will not be displayed in the debugger source window. When a
macro is working correctly, this is very convenient. However, when debugging macro's,
the code must be viewable.
In order to see the macro code, display the machine code window (View Machine
Code Window). In the window, the machine code for the instructions are displayed.
The step and next instructions will execute the entire macro. In order to execute the
macro instructions, the stepi and nexti commands must be used.
The code, when viewed, will be the expanded code (as opposed to the original macro's
definition).
Page 171
11.5
Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 172
Chapter
12
12.0 Functions
Functions and procedures (i.e., void functions), help break-up a program into smaller
parts making it easier to code, debug, and maintain. Function calls involve two main
actions:
Linkage
Since the function can be called from multiple different places in the code,
the function must be able to return to the correct place in which it was
originally called.
Argument Transmission
The function must be able to access parameters to operate on or to return
results (i.e., access call-by-reference parameters).
The specifics of how each of these actions are accomplished is explained in the
following sections.
12.1
In a high level language, non-static local variables declared in a function are stack
dynamic local variables by default. Some C++ texts refer to such variables as
automatics. This means that the local variables are created by allocating space on the
stack and assigning these stack locations to the variables. When the function completes,
the space is recovered and reused for other purposes. This requires a small amount of
additional run-time overhead, but makes a more efficient overall use of memory. If a
function with a large number of local variables is never called, the memory for the local
variables is never allocated. This helps reduce the overall memory footprint of the
program which is generally helps the overall performance of the program.
Page 173
12.2
Function Declaration
A function must be written before it can be used. Functions are located in the code
segment. The general format is:
global<procName>
<procname>:
;functionbody
ret
A functions may be defined only once. There is no specific order required for how
functions are defined. However, functions can not be nested. A function definition
should be started and ended before the next functions definition can be started.
Refer to the sample functions for example of function declarations and usage.
12.3
Page 174
12.4
Linkage
The linkage is about getting to and returning from a function call correctly. There are
two instructions that handle the linkage, call<funcName> and ret instructions.
The call transfers control to the named function, and ret returns control back to the
calling routine.
The call works by saving the address of where to return to when the function
completes (referred to as the return address). This is accomplished by placing
contents rip register on the stack. Recall that the rip register points to the next
instruction to be executed (which is the instruction immediately after the call).
The ret instruction is used in a procedure to return. The ret instruction pop's
the current top of the stack (rsp) into the rip register. Thus, the appropriate
return address is restored.
Since the stack is used to support the linkage, it is important that within the function the
stack must not be corrupted. Specifically, any items push'ed must be popped. Pushing a
value and not popping would result in that value being popped off the stack and placed
in the rip register. This would cause the processor to attempt to execute code at that
location. Most likely the invalid location will cause the process to crash.
The function calling or linkage instruction is summarized as follows:
Instruction
Explanation
call<funcName>
Examples:
ret
callprintString
ret
Page 175
12.5
Argument Transmission
In general, the calling routine is referred to as the caller and the routine being called is
referred to as the callee.
12.6
Calling Convention
The function prologue is the code at the beginning of a function and the function
epilogue is the code at the end of a function. The operations performed by the prologue
and epilogue are generally specified by the standard calling convention and deal with
stack, registers, passed arguments (if any), and stack dynamic local variables (if any).
The general idea is that the program state is (i.e., contents of specific registers and the
stack) are saved, the function executed, and then the state is restored. Of course, the
function will often require extensive use of the registers and the stack. The prologue
code helps save the state and the epilogue code restores the state.
Page 176
Argument Size
64-bits
32-bits
16-bits
8-bits
rdi
edi
di
dil
rsi
esi
si
sil
rdx
edx
dx
dl
rcx
ecx
cx
cl
r8
r8d
r8w
r8b
r9
r9d
r9w
r9b
The seventh and any additional arguments are passed on the stack. The standard calling
convention requires that, when passing arguments (values or addresses) on the stack, the
arguments should be pushed in reverse order. That is someFunc (one, two,
three,four,five,six,seven,eight,nine) would imply a push order
of: nine, eight, and then seven.
Additionally, when the function is completed, the calling routine is responsible for
clearing the arguments from the stack. Instead of doing a series of pop instructions, the
stack pointer, rsp, is adjusted as necessary to clear the arguments off the stack. Since
each argument is 8 bytes, the adjustment would be adding [(number of arguments) * 8]
to the rsp.
For value returning functions, the result is placed in the A register based on the size the
value being returned). Specifically, the values are returned as follows:
Return Value Size
Location
byte
al
word
ax
double-word
eax
quadword
rax
Page 177
Page 178
Register
Usage
rax
Return Value
rbx
Callee Saved
rcx
4th Argument
rdx
3rd Argument
rsi
2nd Argument
rdi
1st Argument
rbp
Callee Saved
rsp
Stack Pointer
r8
5th Argument
r9
6th Argument
r10
Temporary
r11
Temporary
r12
Callee Saved
r13
Callee Saved
r14
Callee Saved
r15
Callee Saved
Other items may be placed in the call frame such as static links for dynamically scoped
languages. Such topics are outside the scope of this text and will not be discussed here.
For some functions, a full call frame may not be required. For example, if the function:
Pass its arguments only in registers (i.e., does not use the stack).
This can occur for simpler, smaller leaf functions. However, if any of these conditions
is not true, the stack and thus a full call frame is required.
For more non-leaf or more complex functions, some form of a call frame is required.
The standard calling convention does not explicitly require use of the frame pointer
register, rbp. Compilers are allowed to optimize the call frame and not use the frame
pointer. To simplify and clarify accessing stack-based arguments (if any) and stack
dynamic local variables, this text will utilize the frame pinter register. This is similar to
how many other architectures use a frame pointer register.
As such, if there are any stack-based arguments or any local variables needed within a
function, the frame pointer register, rbp, should be pushed and set then set pointing to
itself. As additional pushes and pops are performed (this changing rsp), the rbp register
Page 179
<7 Argument>
rbp + 24
rbp + 16
rip
(return address)
rbp
rbp
rbx
r12
r13
rsp
...
The stack-based arguments are accessed relative to the rbp. Each item push is a
quadword which uses 8 bytes. For example, [rbp+16] is the location of the first passed
argument (7th integer argument) and [rbp+24] is the location of the second passed
argument (8th integer argument).
In addition, the call frame would contain the assigned locations of local variables (if
any). The section on local variables details the specifics regarding allocating and using
local variables.
Page 180
In the Linux standard calling convention, the first 128-bytes after the stack pointer, rsp,
and reserved. For example, extending the previous example, the call frame would be as
follows:
...
th
rbp + 24
th
rbp + 16
<8 Argument>
<7 Argument>
rip
(return address)
rbp
rbp
rbx
r10
r12
rsp
...
128 bytes
...
Red Zone
...
This red zone may be used by the function without any adjustment to the stack pointer.
The purpose is to allow compiler optimizations for the allocation of local variables.
12.7
This simple example will demonstrate calling a simple void function to find the sum and
average of an array of numbers. The High-Level Language (HLL) call for C/C++ is as
follows:
stats1(arr,len,sum,ave);
As per the C/C++ convention, the array, arr, is call-by-reference and the length, len, is
call-by-value. The arguments for sum and ave are both call-by-reference (since there
are no values as yet). For this example, the array arr, sum, and ave variables are all
signed double-word integers. Of course, in context, the len must be unsigned.
Page 181
12.7.1 Caller
In this case, there are 4 arguments, and all arguments are passed in registers in
accordance with the standard calling convention. The assembly language code in the
calling routine for the call to the stats function would be as follows:
;stats1(arr,len,sum,ave);
mov
mov
mov
mov
call
rcx,ave
rdx,sum
esi,dword[len]
rdi,arr
stats1
;4tharg,addofave
;3rdarg,addrofsum
;2ndarg,valueoflen
;1starg,addrofarr
There is no specific required order for setting the argument registers. This example sets
them in reverse order in preparation for the next, extended example.
Note, the setting of the esi register also sets the upper order double-word to zero, thus
ensuring the rsi register is set appropriately for this specific usage since length is
unsigned.
No return value is provided by this void routine. If the function was a value returning
function, the value returned would be in the A register.
12.7.2 Callee
The function being called, the callee, must perform the prologue and epilogue operations
(as specified by the standard calling convention). Of course, function must perform the
summation of values in the array, compute the average, return the sum and average
values.
The following code implements the stats1 example.
;Simpleexamplefunctiontofindandreturn
;thesumandaverageofanarray.
;
;HLLcall:
;
stats1(arr,len,sum,ave);
;
;Arguments:
;
arr,addressrdi
Page 182
len,valuersi
sum,addressrdx
ave,addressrcx
globalstats1
stats1:
push
r12
mov
mov
sumLoop:
add
inc
cmp
jl
;prologue
r12,0
rax,0
;counter/index
;runningsum
eax,dword[rdi+r12*4]
r12
r12,rsi
sumLoop
;sum+=arr[i]
mov dword[rdx],eax
;returnsum
cdq
idiv
mov
rsi
dword[rcx],eax
;computeaverage
;returnave
r12
;epilogue
pop
ret
The choice of the r12 register is arbitrary, however a 'saved register' was selected.
The call frame for this function would be as follows:
...
rip
(return address)
r12
rsp
...
The minimal use of the stack helps reduce the function call run-time overhead.
Page 183
12.8
This extended example will demonstrate calling a simple void function to find the
minimum, median, maximum, sum and average of an array of numbers. The HighLevel Language (HLL) call for C/C++ is as follows:
stats2(arr,len,min,med1,med2,max,sum,ave);
For this example, it is assumed that the array is sorted as ascending order. Additionally,
for this example, the median will be the middle value. For an even length list, there are
two middle values, med1 and med2, both of which are returned. For an odd length list,
the single middle value is returned in both med1 and med2.
As per the C/C++ convention, the array, arr, is call-by-reference and the length, len, is
call-by-value. The arguments for min, med1, med2, max, sum, and ave are all call-byreference (since there are no values as yet). For this example, the array arr, min, med1,
med2, max, sum, and ave variables are all signed double-word integers. Of course, in
context, the len must be unsigned.
12.8.1 Caller
In this case, there are 8 arguments and only the first six can be passed in registers. The
last two arguments are passed on the stack. The assembly language code in the calling
routine for the call to the stats function would be as follows:
;stats2(arr,len,min,med1,med2,max,sum,ave);
push
push
mov
mov
mov
mov
mov
mov
call
add
ave
sum
r9,max
r8,med2
rcx,med1
rdx,min
esi,dword[len]
rdi,arr
stats2
rsp,16
;8tharg,addofave
;7tharg,addofsum
;6tharg,addofmax
;5tharg,addofmed2
;4tharg,addofmed1
;3rdarg,addrofmin
;2ndarg,valueoflen
;1starg,addrofarr
;clearpassedarguments
The 7th and 8th arguments are passed on the stack and pushed in reverse order in
accordance with the standard calling convention. After the function is completed, the
arguments are cleared from the stack by adjusting the stack point register (rsp). Since
two arguments, 8 bytes each, were passed on the stack, 16 is added to the stack pointer.
Page 184
12.8.2 Callee
The function being called, the callee, must perform the prologue and epilogue operations
(as specified by the standard calling convention). Of course, function must perform the
summation of values in the array, find the minimum, medians, and maximum, compute
the average, return all the values.
When call-by-reference arguments are passed on the stack, the it requires to steps to
return the values.
A common error is to attempt to return a value to a stack based location in a single step,
which will not change the referenced variable. For example, assuming the double-word
value to be returned is in the eax register and the 7th arguments is call-by-reference and
where the eax value is to be returned, the appropriate code would be as follows:
mov
mov
r12,qword[rbp+16]
dword[r12],eax
These steps can not be combined into a single step. The following code
mov
dword[rbp+16],eax
Would overwrite the address passed on the stack and not change the reference variable.
The following code implements the stats2 example.
;Simpleexamplefunctiontofindandreturn
;thesumandaverageofanarray.
;
;HLLcall:
;
stats2(arr,len,min,med1,med2,max,sum,ave);
Page 185
;prologue
;
;Getminandmax.
mov
mov
eax,dword[rdi]
dword[rdx],eax
;getmin
;returnmin
mov
dec
mov
mov
r12,rsi
r12
eax,dword[rdi+r12*4]
dword[r9],eax
;len1
;getmax
;returnmax
;
;Getmedians
Page 186
mov
mov
mov
div
rax,rsi
rdx,0
r12,2
r12
cmp
je
rdx,0
evenLength
;even/oddlength?
mov
r12,dword[rdi+rax*4]
;getarr[len/2]
;rax=length/2
dword[rcx],r12
dword[r8],r12
medDone
evenLength:
mov
r12,dword[rdi+rax*4]
mov
dword[r8],r12
dec
rbx
mov
r12,dword[rdi+rax*4]
mov
dword[rcx],r12
medDone:
;getarr[len/2]
;getarr[len/21]
;
;Findsum
mov
mov
sumLoop:
add
inc
cmp
jl
mov
mov
r12,0
rax,0
;counter/index
;runningsum
eax,dword[rdi+r12*4]
r12
r12,rsi
sumLoop
;sum+=arr[i]
r12,qword[rbp+16]
dword[r12],eax
;getsumaddr
;returnsum
;
;Calculateaverage.
cdq
idiv
rsi
;computeaverage
mov
mov
r12,qword[rbp+24]
dword[r12],eax
;getaveaddr
;returnave
pop
r12
pop rbx
pop rbp
ret
;epilogue
Page 187
<7 Argument>
rbp + 24
rbp + 16
rip
(return address)
rbp
rbp
rbx
r12
rsp
...
The preserved registers, rbx and r12 in this example, are push in arbitrary order.
However, when popped, they must be popped in the exact reverse order as required to
correctly restore their original values.
12.9
If local variables are required, they are allocated on the stack. By adjusting rsp,
additional is allocated on the stack for locals. As such, when the function is completed,
the memory used for the stack based locals variables are released (and no longer use
memory).
Further expanding the previous example, if we assume all array values are between 0
and 99, and we wish to find the mode (number that occurs the most often), a single
double-word variable count and a one hundred (100) element local double-word array,
tmpArr[100] might be used.
As before, the frame register, rbp, is pushed on the stack and set pointing to itself. The
frame register plus an appropriate offset will allow accessing any arguments passed on
the stack. For example, rbp+16 is the location of the first stack-based argument (7 th
integer argument).
After the frame register is pushed an adjustment to the stack pointer register, rsp, is
made to allocated space for the local variables, a 100-element array in this example.
Since the count variable is a one double-word, 4-bytes is needed. The temporary array
is 100 double-word elements, 400 bytes is required. Thus, a total of 404 bytes is
Page 188
This is generally better than adding the offset to the stack since allocated space may be
altered as needed without requiring adjustments to the epilogue code.
It should be clear that variables allocated in this manner are uninitialized. Should the
function require the variables to be initialized, possibly to 0, such initializations must be
explicitly performed.
For this example, the call frame would be formatted as follows:
...
<value of len>
rbp + 24
<addr of list>
rbp + 16
rip
(return address)
rbp
rbp
tmpArr[99]
tmpArr[98]
...
...
...
tmpArr[1]
rbp - 400 = tmpArr[0]
rbp - 404 = count
rbx
r12
rsp
...
Page 189
rbp
rbp,rsp
rsp,404
rbx
r12
;prologue
;allocatelocals
The local variables can be accessed relative to the frame pointer register, rbp. For
example, to initialize the count variables, now allocated to rbp-404, the following
instruction could be used:
mov
dword[rbp404],0
To access the tmpArr, the starting address must be obtained which can be performed
with the lea instruction. For example,
lea
rbx,dword[rbp400]
Which will set the appropriate stack address in the rbx register where rbx was chosen
arbitrarily. The dword qualifier in this example is not required, and may be misleading,
since addresses are always 64-bits (on a 64-bit architecture). Once set as above, the
tmpArr starting address in rbx is used in the usual manner.
For example, a small incomplete function code fragment demonstrating the accessing of
stack-based local variables is as follows:
;
;Examplefunction
globalexpFunc
expFunc:
push
rbp
mov
rbp,rsp
sub
rsp,404
push
rbx
push
r12
;prologue
;allocatelocals
;
;Initializecountlocalvariableto0.
Page 190
dword[rbp404],0
;
;Incrementcountvariable(forexample)...
inc
dword[rbp404]
;count++
;
;LooptoinitializetmpArrtoall0's.
lea
mov
zeroLoop:
mov
inc
cmp
jl
rbx,dword[rbp400]
r12,0
;tmpArraddr
;index
dword[rbx+r12*4],0
r12
r12,100
zeroLoop
;tmpArr[index]=0
;
;Done,restoreallandreturntocallingroutine.
pop
pop
mov
pop
ret
r12
rbx
rsp,rbp
rbp
;epilogue
;clearlocals
Note, this example function focuses only on how stack-based local variables are
accessed and does not perform anything useful.
12.10 Summary
This section presents a brief summary of the standard calling convention requirements
which are as follows:
Caller Operations:
Page 191
The caller executes a call instruction to pass control to the function (callee).
add
rsp,<argCount*8>
Callee Operations:
Function Prologue
If arguments are passed on stack, the callee must save rbp to the stack and
move the value of rsp into rbp. This allows the callee to use rbp as a frame
pointer to access arguments on the stack in a uniform manner.
The callee may then access its parameters relative to rbp. The quadword
at [rbp] holds the previous value of rbp as it was pushed; the next
quadword, at [rbp+8], holds the return address, pushed by the call. The
parameters start after that, at [rbp+16].
If local variables are needed, the callee decreases rsp further to allocate
space on the stack for the local variables. The local variables are accessible
at negative offsets from rbp.
The callee, if it wishes to return a value to the caller, should leave the value
in al, ax, eax, rax, depending on the size of the value being returned.
A floating-point result is returned in xmm0.
If altered, registers rbx, r12, 13, r14, r15 and rbp must be saved on the
stack.
Function Execution
The function code is executed.
Page 192
Function Epilogue
Restores any pushed registers.
If local variables were used, the callee restores rsp from rbp to clear the
stack-based local variables.
The callee restores (i.e., pop's) the previous value of rbp.
The call returns via ret instruction (return).
Refer to the sample functions to see specific examples of the calling convention.
12.11 Exercises
Below are some quiz questions and suggested projects based on this chapter.
1. If three arguments are passed on the stack, what is the value for the
<immediate>
15) If there are seven (7) arguments passed to a function, and the function itself
pushes the rbp, rbx, and r12 registers (in that order), what is the correct offset of
the stack-based argument when using the standard calling convention.
16) What, if any, is the limiting factor for how many times a function can be called?
17) If a function must return a result for the variable sum, how should the sum
variable be passed (call-by-reference or call-by-value)?
18) If there are eight (8) arguments passed to a function, and the function itself
pushes the rbp, rbx, and r12 registers (in that order), what are the correct offsets
for each of the two stack-based arguments (7th and 8th) when using the standard
calling convention?
19) What is the advantage of using stack dynamic local variables (as opposed to
using all global variables)?
Page 194
The main should call the function on at least three different data sets. Use the
debugger to execute the program and display the final results. Create a debugger
input file to show the results.
4) Update the program from the previous question to add a stats function that finds
the minimum, median, maximum, sum, and average for the sorted list. The stats
function should be called after the sort function to make the minimum and
maximum easier to find. Use the debugger to execute the program and display
the final results. Create a debugger input file to show the results.
5) Update the program from the previous question to add an integer square root
function and a standard deviation function. To estimate the square root of a
number, use the following algorithm:
iSqrt est = iNumber
iSqrt est
(
=
iNumber
+ iSqrt est
iSqrt est
iterate 50 times
iStandardDeviation =
length 1
(list [ i ] average) 2
i= 0
length
Use the
Page 195
Page 196
Chapter
13
13.1
A system service call is logically similar to calling a function, where the function code is
located within the operating system. The function may require privileges to operate
which is why control must be transferred to the operating system.
When calling system services, arguments are placed in the standard argument registers.
System services do not typically use stack-based arguments. This limits the arguments
of a system services to six (6), which does not present a significant limitation.
To call a system service, the first step is to determine which system service is desired.
There are many system services (see Appendix C). The general process is that the
system service call code is placed in the rax register. The call code is a number that has
been assigned for the specific system service being requested. These are assigned as
part of the operating system and can not be changed by application programs. To
simplify the process, this text will define a very small subset of system service call
codes to a set of constants. For this text, and the associated examples, the subset of
system call code constants are defined and shown in the source file to help provide
Page 197
Usage
rax
rdi
rsi
rdx
rcx
r8
r9
Each system call will use a different number of arguments (from none up to 6).
However, the system service call code is always required.
After the call code and any arguments are set, the syscall instruction is executed. The
syscall instruction will pause the current process and transfer control to the operating
system which will attempt to perform the service specific in the rax register. When the
system service returns, the process will be resumed.
13.2
Newline Character
As a refresher, in the context of output, a newline means move the cursor to the start of
the next line. The many languages, including C, it is often noted as \n as part of a
string. C++ uses endl in the context of a cout statement. For example, Hello World 1
and Hello\nWorld 2 would be displayed as follows;
HelloWorld1
Hello
World2
Page 198
13.3
Console Output
The system service to output characters to the console is the system write (SYS_write).
Like a high level language characters are written to standard out (STDOUT) which is
the console. The STDOUT is the default file descriptor for the console. The file
descriptor is already opened and available for use in programs (assembly and high level
languages).
The arguments for the write system service are as follows:
Register
SYS_write
rax
rdi
rsi
rdx
equ
equ
1
1
msg
msgLen
db
dq
"HelloWorld"
11
;standardoutput
;callcodeforwrite
Page 199
;msgaddress
;lengthvalue
Refer to the next section for a complete program to display the above message. It
should be noted that the operating system does not check if the string is valid.
equ
equ
10
0
TRUE
FALSE
equ
equ
1
0
EXIT_SUCCESS
equ
;successcode
STDIN
STDOUT
STDERR
equ
equ
equ
0
1
2
;standardinput
;standardoutput
;standarderror
SYS_read
SYS_write
equ
equ
0
1
;read
;write
Page 200
;linefeed
;endofstring
equ
equ
equ
equ
equ
equ
2
3
57
60
85
201
;fileopen
;fileclos
;fork
;terminate
;fileopen/create
;gettime
;
;Definesomestrings.
message1
message2
newLine
db
db
db
"HelloWorld.",LF,NULL
"EnterAnswer:",NULL
LF,NULL
;
section .text
global_start
_start:
;
;Displayfirstmessage.
mov
call
rdi,message1
printString
;
;Displaysecondmessagemessageandthennewline
mov
call
rdi,message2
printString
mov
call
rdi,newLine
printString
;
;Exampleprogramdone.
exampleDone:
mov
rax,SYS_exit
Page 201
rdx,0
prtDone
;
;CallOStooutputstring.
mov
mov
Page 202
rax,SYS_write
rsi,rdi
;systemcodeforwrite()
;addressofchar'stowrite
rdi,STDOUT
syscall
;standardin
;RDX=counttowrite,setabove
;systemcall
;
;Stringprinted,returntocallingroutine.
prtDone:
pop
ret
rbx
The newline (LF) was provided as part of the first string (message1) thus placing the
cursor on the start of the next line. The second message would leave the cursor on the
same line which would be appropriate for reading input from the user (which is not part
of this example). A final newline is printed since no actual input is obtain in this
example.
The additional, unused constants are included for reference.
13.4
Console Input
The system service to read characters from the console is the system read (SYS_read).
Like a high level language, for the console, characters are read from standard input
(STDIN). The STDIN is the default file descriptor for reading characters from from the
keyboard. The file descriptor is already opened and available for use in program
(assembly and high level languages).
Reading characters interactively from the keyboard presents an additional complication.
When using the system service to read from the keyboard, much like the write system
service, the number of characters to read is required. Of course, we will need to declare
an appropriate amount of space to store the characters being read. If we request 10
characters to read and the user types more than 10, the additional characters will be lost,
which is not a significant problem. If the user types less than 10 characters, for example
5 characters, all five characters will be read plus the newline (LF) for a total of six
characters.
Page 203
SYS_write
rax
rdi
rsi
rdx
equ
equ
0
0
inChar
db
;standardinput
;callcodeforread
For example to read a single character from the keyboard, the system read (SYS_read)
would be used. The code would be as follows:
mov
rax,sys_read
mov
rdi,STDIN
mov
rsi,inChar
mov
rdx,1
syscall
;msgaddress
;readcount
Refer to the next section for a complete program to read characters from the keyboard.
equ
equ
10
0
;linefeed
;endofstring
TRUE
FALSE
equ
equ
1
0
EXIT_SUCCESS
equ
;successcode
STDIN
STDOUT
STDERR
equ
equ
equ
0
1
2
;standardinput
;standardoutput
;standarderror
SYS_read
SYS_write
SYS_open
SYS_close
SYS_fork
SYS_exit
SYS_creat
SYS_time
equ
equ
equ
equ
equ
equ
equ
equ
0
1
2
3
57
60
85
201
;read
;write
;fileopen
;fileclose
;fork
;terminate
;fileopen/create
;gettime
;
;Definesomestrings.
STRLEN
equ
50
pmpt
db
"EnterText:",NULL
Page 205
db
LF,NULL
section .bss
chr
inLine
resb
resb
1
STRLEN+2
;totalof52
;
section .text
global_start
_start:
;
;Displayprompt.
mov
call
rdi,pmpt
printString
;
;Readcharactersfromuser(oneatatime)
mov
rbx,inLine
mov
r12,0
readCharacters:
mov
rax,SYS_read
mov
rdi,STDIN
lea
rsi,[chr]
mov
rdx,1
syscall
;inLineaddr
;charcount
;systemcodeforread
;standardin
;addressofchr
;count(howmanytoread)
;syscall
mov
cmp
je
al,byte[chr]
al,LF
readDone
;getcharacterjustread
;iflinefeed,inputdone
inc
cmp
jge
r12
r12,STRLEN
readCharacters
;count++
;if#chars>STRLEN
;stopplacinginbuffer
mov
inc
byte[rbx],al
rbx
;inLine[i]=chr
;updatetmpStraddr
Page 206
readCharacters
;
;Outputthelinetoverifysuccessfulread
mov
call
rdi,inLine
printString
;
;Exampledone.
exampleDone:
mov
rax,SYS_exit
mov
rdi,EXIT_SUCCESS
syscall
;******************************************************
;Genericproceduretodisplayastringtothescreen.
;StringmustbeNULLterminated.
;Algorithm:
;
Countcharactersinstring(excludingNULL)
;
Usesyscalltooutputcharacters
;Arguments:
;
1)address,string
;Returns:
;
nothing
global printString
printString:
push
rbx
;
;Countcharactersinstring.
mov
mov
rbx,rdi
rdx,0
Page 207
rdx,0
prtDone
;
;CallOStooutputstring.
mov
mov
mov
rax,SYS_write
rsi,rdi
rdi,STDOUT
syscall
;systemcodeforwrite()
;addressofchar'stowrite
;standardin
;RDX=counttowrite,setabove
;systemcall
;
;Stringprinted,returntocallingroutine.
prtDone:
pop
ret
rbx
If we were to completely stop reading at 50 (STRLEN) characters and the user enters
more characters, the characters might cause input errors for successive read operations.
To address any extra characters the user might enter, the extra characters are read from
the keyboard but not placed in the input buffer (inLine above). This ensures that the
extra input is removed from the input steam and but does not overrun the array.
The additional, unused constants are included for reference.
13.5
In order to perform file operations such as read and write, the file must first be opened.
There are two file open operations, open and open/create. Each of the two open
operations are explained in the following sections.
Page 208
One of these access modes must be used. Additional access modes may be used by
OR'ing with one of these. This might include modes such as append mode (which is not
addressed in this text). Refer to Appendix C, System Services for additional information
regarding the file access modes.
The arguments for the file open system service are as follows:
Register
SYS_open
rax
rdi
rsi
Page 209
equ
;fileopen
O_RDONLY
O_WRONLY
O_RDWR
equ
equ
equ
000000q
000001q
000002q
;readonly
;writeonly
;readandwrite
After the system call, the rax register will contain the return value. If the file open
operation fails, rax will contain a negative value (i.e., < 0). The specific negative value
provides an indication of the type of error encountered. Refer to Appendix C, System
Services for additional information on error codes. Typical errors might include invalid
file descriptor, file not found, or file permissions error.
If the file open operation succeeds, rax contains the file descriptor. The file descriptor
will be required for further file operations and should be saved.
Refer to the section on Example File Read for a complete example that opens a file.
Page 210
SYS_creat
rax
rdi
rsi
equ
85
O_CREAT
O_TRUNC
O_APPEND
equ
equ
equ
0x40
0x200
0x400
S_IRUSR
S_IWUSR
S_IXUSR
equ
equ
equ
00400q
00200q
00100q
;fileopen
;owner,readpermission
;owner,writepermission
;owner,executepermission
The file status flags S_IRUSR | S_IWUSR would allow simultaneous read and write,
which is typical. The | is a logical OR operation, thus combining the selections.
If the file open/create operation does not succeed, a negative value is returned in the rax
register. If file open/create operation succeeds, a file descriptor is returned. The file
descriptor is used for all subsequent file operations.
Refer to the section on Example File Write for a complete example file open/create.
13.6
File Read
A file must be opened with the appropriate file access flags before it can be read.
The arguments for the file read system service are as follows:
Register
SYS_read
rax
rdi
rsi
rdx
equ
;fileread
If the file read operation does not succeed, a negative value is returned in the rax
register. If the file read operation succeeds, the number of characters actually read is
returned.
Refer to the next section on example file read for a complete file read example.
Page 211
13.7
File Write
The arguments for the file read system service are as follows:
Register
SYS_write
rax
rdi
rsi
rdx
equ
;filewrite
If the file write operation does not succeed, a negative value is returned in the rax
register. If the file write operation does succeed, the number of characters actually
written is returned.
Refer to the section on Example File Read for a compete file write example.
13.8
This section contains some simple example programs to demonstrate very basic file I/O
operations. The more complex issues regarding file I/O buffering are addressed in a
subsequent chapter.
Page 212
equ
equ
10
0
;linefeed
;endofstring
TRUE
FALSE
equ
equ
1
0
EXIT_SUCCESS
equ
;successcode
STDIN
STDOUT
STDERR
equ
equ
equ
0
1
2
;standardinput
;standardoutput
;standarderror
SYS_read
SYS_write
SYS_open
SYS_close
SYS_fork
SYS_exit
SYS_creat
SYS_time
equ
equ
equ
equ
equ
equ
equ
equ
0
1
2
3
57
60
85
201
;read
;write
;fileopen
;fileclos
;fork
;terminate
;fileopen/create
;gettime
O_CREAT
O_TRUNC
O_APPEND
equ
equ
equ
0x40
0x200
0x400
O_RDONLY
O_WRONLY
O_RDWR
equ
equ
equ
000000q
000001q
000002q
S_IRUSR
S_IWUSR
S_IXUSR
equ
equ
equ
00400q
00200q
00100q
;readonly
;writeonly
;readandwrite
Page 213
len
db
db
db
db
db
db
dq
LF,NULL
LF,"FileWriteExample."
LF,LF,NULL
"url.txt",NUL
"http://www.google.com"
LF,NULL
$url1
writeDone
fileDescriptor
errMsgOpen
errMsgWrite
db
dq
db
db
"WriteCompleted.",LF,NULL
0
"Erroropeningfile.",LF,NULL
"Errorwritingtofile.",LF,NULL
fileName
url
;
section .text
global_start
_start:
;
;Displayheaderline...
mov
call
rdi,header
printString
;
;Attempttoopenfile.
;
Usesystemserviceforfileopen
;SystemServiceOpen/Create
;
rax=SYS_create
;
rdi=addressoffilenamestring
;
rsi=attributes(i.e.,readonly,etc.)
;Returns:
;
iferror>eax<0
;
ifsuccess>eax=filedescriptornumber
Page 214
ThefiledescriptorpointstotheFileControlBlock(FCB).
TheFCBismaintainedbytheOS.
Thefiledescriptorisusedforallsubsequentfile
operations(read,write,close).
openInputFile:
mov
rax,SYS_creat
;fileopen/create
mov
rdi,fileName
;filenamestrin
mov
rsi,S_IRUSR|S_IWUSR
;allowread/write
syscall
;callthekernel
cmp
rax,0
jl errorOnOpen
;checkforsuccess
mov qword[fileDescriptor],rax
;savedescriptor
;
;Writetofile.
;
Inthisexample,thecharacterstowriteareina
;
predefinedstringcontainingaURL.
;SystemServicewrite
;
rax=SYS_write
;
rdi=filedescriptor
;
rsi=addressofcharacterstowrite
;
rdx=countofcharacterstowrite
;Returns:
;
iferror>rax<0
;
ifsuccess>rax=countofcharactersactuallyread
mov
rax,SYS_write
mov
rdi,qword[fileDescriptor]
mov
rsi,url
mov
rdx,qword[len]
syscall
cmp
rax,0
jl errorOnWrite
Page 215
rdi,writeDone
printString
jmp exampleDone
;
;Closethefile.
;SystemServiceclose
;
rax=SYS_close
;
rdi=filedescriptor
mov
rax,SYS_close
mov
rdi,qword[fileDescriptor]
syscall
jmp exampleDone
;
;Erroronopen.
;
note,raxcontainsanerrorcodewhichisnotused
;
forthisexample.
errorOnOpen:
mov
rdi,errMsgOpen
call
printString
jmp exampleDone
;
;Erroronwrite.
;
note,eaxcontainsanerrorcodewhichisnotused
;
forthisexample.
errorOnWrite:
mov
rdi,errMsgWrite
call
printString
jmp exampleDone
;
Page 216
Page 217
eax,SYS_write
rsi,rdi
rdi,STDOUT
syscall
;codeforwrite()
;addrofcharacters
;filedescriptor
;countsetabove
;systemcall
;
;Stringprinted,returntocallingroutine.
prtDone:
pop rbx
pop
rbp
ret
;*******************************************************
This example creates the file which is read by the next example.
Page 218
equ
equ
10
0
;linefeed
;endofstring
TRUE
FALSE
equ
equ
1
0
EXIT_SUCCESS
equ
;successcode
STDIN
STDOUT
STDERR
equ
equ
equ
0
1
2
;standardinput
;standardoutput
;standarderror
SYS_read
SYS_write
SYS_open
SYS_close
SYS_fork
SYS_exit
SYS_creat
SYS_time
equ
equ
equ
equ
equ
equ
equ
equ
0
1
2
3
57
60
85
201
;read
;write
;fileopen
;fileclos
;fork
;terminate
;fileopen/create
;gettime
O_CREAT
O_TRUNC
O_APPEND
equ
equ
equ
0x40
0x200
0x400
O_RDONLY
O_WRONLY
O_RDWR
equ
equ
equ
000000q
000001q
000002q
S_IRUSR
equ
00400q
;readonly
;writeonly
;readandwrite
Page 219
equ
equ
00200q
00100q
;
;Variables/constantsformain.
BUFF_SIZE
equ 255
newLine
header
db
db
db
LF,NULL
LF,"FileReadExample."
LF,LF,NULL
fileName
db
fileDescriptor dq
"url.txt",NULL
0
errMsgOpen
errMsgRead
"Erroropeningthefile.",LF,NULL
"Errorreadingfromthefile.",LF,NULL
db
db
;
section .bss
readBuffer resb
BUFF_SIZE
;
section .text
global_start
_start:
;
;Displayheaderline...
mov
call
rdi,header
printString
;
;Attempttoopenfile.
;
Usesystemserviceforfileopen
;SystemServiceOpen
Page 220
ThefiledescriptorpointstotheFileControlBlock(FCB).
TheFCBismaintainedbytheOS.
Thefiledescriptorisusedforallsubsequentfile
operations(read,write,close).
openInputFile:
mov
rax,SYS_open
mov
rdi,fileName
mov
rsi,O_RDONLY
syscall
;fileopen
;filenamestring
;readonlyaccess
;callthekernel
cmp
jl
rax,0
errorOnOpen
;checkforsuccess
mov
qword[fileDescriptor],rax
;savedescriptor
;
;Readfromfile.
;
Inthisexample,weknowthatthefilehasexactly1line.
;SystemServiceRead
;
rax=SYS_read
;
rdi=filedescriptor
;
rsi=addressofwheretoplacedata
;
rdx=countofcharacterstoread
;Returns:
;
iferror>rax<0
;
ifsuccess>rax=countofcharactersactuallyread
mov
mov
mov
mov
rax,SYS_read
rdi,qword[fileDescriptor]
rsi,readBuffer
rdx,BUFF_SIZE
Page 221
rax,0
errorOnRead
;
;Printthebuffer.
;
addtheNULLfortheprintstring
mov
mov
rsi,readBuffer
byte[rsi+rax],NULL
mov
call
rdi,readBuffer
printString
printNewLine
;
;Closethefile.
;SystemServiceclose
;
rax=SYS_close
;
rdi=filedescriptor
mov
rax,SYS_close
mov
rdi,qword[fileDescriptor]
syscall
jmp
exampleDone
;
;Erroronopen.
;
note,eaxcontainsanerrorcodewhichisnotused
;
forthisexample.
errorOnOpen:
mov
rdi,errMsgOpen
call
printString
jmp
Page 222
exampleDone
exampleDone
;
;Exampleprogramdone.
exampleDone:
mov
rax,SYS_exit
mov
rbx,EXIT_SUCCESS
syscall
;**********************************************************
;Genericproceduretodisplayastringtothescreen.
;StringmustbeNULLterminated.
;Algorithm:
;
Countcharactersinstring(excludingNULL)
;
Usesyscalltooutputcharacters
;Arguments:
;
1)address,string
;Returns:
;
nothing
global printString
printString:
push
rbp
mov
rbp,rsp
push
rbx
;
;Countcharactersinstring.
Page 223
rdx,0
prtDone
;
;CallOStooutputstring.
mov
mov
mov
eax,SYS_write
rsi,rdi
rdi,STDOUT
syscall
;codeforwrite()
;addrofcharacters
;filedescriptor
;countsetabove
;systemcall
;
;Stringprinted,returntocallingroutine.
prtDone:
pop
pop
ret
rbx
rbp
;*******************************************************
The printString() function is the exact same in both examples and is only repeated to
allow each program to be assembled and executed independently.
13.9
Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 226
Chapter
14
14.1
Extern Statement
If a function is called from a source file and the function code is not located in the
current source file, the assembler will generate an error. The same applies to variable
accessed the are not located in the current file. In order to inform the assembler that the
function code or variable is in another file, the extern statement is used. The syntax is
as follows;
extern <symbolName>
The symbol name would be the name of the function or variable that is located in a
different source file. In general, global variables accessed across multiple files are
considered poor programming practice and should be used sparingly (if at all). Data is
typically passed between functions as arguments for the function call.
The examples in the text focus on using external functions only with no globally
declared variables.
Page 227
14.2
The following is a simple example of a main that calls an assembly language function,
stats(), to compute the integer sum and integer average for a list of signed integers. The
main and the function are in different source files and are presented as an example of
how multiple source files and used. The example itself is really too small to actually
require multiple source files.
equ
equ
10
0
TRUE
FALSE
equ
equ
1
0
EXIT_SUCCESS
equ
;
;Declarethedata
lst1
len1
lst2
len2
Page 228
dd 1,2,3,4,5
dd 7,9,11
dd 8
dd 2,3,4,5,6
dd 7,10,12,14,16
dd 10
;linefeed
;endofstring
;successcode
1
1
sum2
ave2
1
1
resd
resd
;
extern stats
section .text
global_start
_start:
;
;Callthefunction
;
HLLCall:stats(lst,len,&sum,&ave);
mov
mov
mov
mov
call
rdi,lst1
esi,dword[len1]
rdx,sum1
rcx,ave1
stats
;dataset1
mov
mov
mov
mov
call
rdi,lst2
esi,dword[len2]
rdx,sum2
rcx,ave2
stats
;dataset2
;
;Exampleprogramdone
exampleDone:
mov
rax,SYS_exit
mov
rdi,EXIT_SUCCESS
syscall
Page 229
Page 230
r11,0
r12d,0
;i=0
;sum=0
eax,dword[rdi+r11*4]
r12d,eax
r11
r11,rsi
sumLoop
;getlst[i]
;updatesum
;i++
dword[rdx],r12d
;returnsum
;
;Findandreturnaverage.
mov
cdq
idiv
eax,r12d
mov
dword[rcx],eax
esi
;returnaverage
;
;Done,returntocallingfunction.
pop
ret
r12
The above source file can be assembled with the same assemble command as described
in Chapter 5, Tool chain. No extern statement is required since no external functions
are called.
Page 231
14.3
This section provides information on how a high-level language can call an assembly
language function and how an assembly language function can call a high-level
language function. This chapter presents examples for both.
In brief, the answer of how this is accomplished is through the standard calling
convention. As such, no additional or special code is needed when interfacing with a
high-level language. The compiler or assembler will need to be informed about the
external routines in order to avoid error messages about not being able to find the source
code for the non-local routines.
The general process for linking for multiple files was described in Chapter 5 , Tool
Chain. The process of using multiple source files was described in Chapter 14, Multiple
Source File. It does not matter if the object files are from a high-level language or from
an assembly language source.
A this point, the compiler does not know the external function is written in assembly
(not does it matter).
The C compiler is pre-installed on Ubuntu. However, the C++ compiler is not installed
by default.
A C version of the same program is also presented for completeness.
#include<stdio.h>
externvoidstats(int[],int,int*,int*);
intmain()
{
int lst[]={1,2,3,4,5,7,9,11};
int len=8;
Page 233
The stats() function referenced here should be used unchanged from the previous
example.
The file names can be changed as desired. Upon execution, the output would be as
follows;
./main
Stats:
Sum=30
Ave=3
If the C main is used, and assuming that the C main is named main.c, and the assembly
source file is named stats.asm, the commands to compile, assemble, link, and execute as
as follows;
g++gWallcmain.cpp
yasmgdwarf2felf64stats.asmlstats.lst
g++gomainmain.ostats.o
Page 234
14.4
Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 235
Page 236
Chapter
15
Page 237
15.1
When a programing calls a function, the standard calling convention provides the
guidance for how the parameters are passed, how the return address is saved, how and
which registers must be preserved, and how stack-based local variables are to be
allocated.
For example, consider the function call;
expFunc(arg1,arg2,arg3);
In addition, we will assume that the function, expFunc(), reads in a line of text from the
user and stores it in a locally declared array. The local variables include the array of 48
bytes and one quadword local variable (8 bytes). This can be accomplished easily in a
high level language or as is done in Chapter 13, System Services, Console Input.
The resulting call frame would be as follows:
...
rip
return address
rbp
r12
r13
...
Page 238
15.2
Code to Inject
Before discussing how the stack buffer overflow might be exploited, we will review
what code might be injected. The code to be injected could be many things. We will
assume the program is being executed in a controlled environment with no console
access for the user. The lack of console access would limit what a user could do ideally
to only what the program allowed. This might be the case if the program is server-based
42 For more information, refer to:
http://en.wikipedia.org/wiki/C_standard_library#Buffer_overflow_vulnerabilities
Page 239
A new console will appear. The reader is encouraged to try this code example and see it
work.
The list file for this code fragment would be as follows;
400000000048C7C03B000000movrax,59
410000000748C7C7[00000000]movrdi,progName
420000000E0F05syscall
Recall that the first column is the line number, the second column is relative address in
the code section and the third column is the machine language or the hex representation
of the human readable instruction shown in the fourth column. The [00000000]
represents the relative address of the string in the data section (progName in this
example). It is zero since it is the first (and only) variable in the data section for this
example.
Page 240
This can be done for most of the bytes except the 0x00. The 0x00 is a NULL which is a
non-printable ASCII characters (used to mark string terminations). As such, the NULL
can not be entered from the keyboard. Additionally, the [00000000] address would not
make sense if the code is injected into another program.
To address these issues, we can re-write the example code and eliminate the NULL's and
change the address reference. The NULL's can be eliminated by using different
instructions. For example, setting rax to 59 can be accomplished by xor'ing rax with
itself and placing the 59 in al (having already ensured the upper 56 bits are 0 via the
xor). The string can placed on the stack and the current rsp used as the address of the
string. The string, \bin\sh is 7 bytes and the stack operation will require a push of 8
bytes. Again, the NULL can not be entered and is not counted. An extra, unnecessary
/ can be added to the string which will not impact the operation providing exactly 8
bytes in the string. Since the architecture is little-endian, in order to ensure that the
starting of the string is in low memory, it must be in the least significant byte of the
push. This will make the string appear backwards.
The revised program fragment would be as follows;
xor
rax,rax
push
rax
mov
rbx,0x68732f6e69622f2f
push
rbx
mov
al,59
mov
rdi,rsp
syscall
;clearrax
;placeNULLsonstack
;string>"//bin/sh"
;putstringinmemory
;placecallcodeinrax
;rdi=addrofstring
;systemcall
Page 241
In this revised code there no NULL's and the address reference is obtained from the
stack pointer (rsp) which points to the correct string.
There is an assembly language instruction nop which performs no operation with a
machine code of 0x90. In this example, the nop instruction is used simply to round out
the machine code to an even multiple of 8 bytes.
The series of hex values that would need to be entered is as follows;
0x480x310xC00x500x480xBB0x2F0x2F
0x620x690x6E0x2F0x730x680x530xB0
0x3B0x480x890xE70x0F0x050x900x90
15.3
Code Injection
If the code to inject is available and can be entered, the next step would be actually
getting the code executed.
Based on the previous example call frame, the code would be entered preceded by a
series of nop's (0x90). The exact spot where the rip is stored in the stack can be
determined through trial and error. When the first byte of the 8 byte address is altered,
the program will not be able to return to the calling routine and likely crash. If the bytes
of the rbp are corrupted, the program may fail in some way, but it will be different than
the immediate crash caused by the corrupted rip. The code entered would be extended
by 1 byte on each of many successive attempts. Finding this exact location in this
manner will take patience.
Once the rip location has been determined, the 8 bytes the are entered there will need to
be the address of where the injected code is in the stack where the user input was stored.
This also would be determined through trial and error. However, the exact address of
Page 242
...
return address (altered)
9090909090909090
9090909090909090
trailing nop's
050FE78948539090
68732F6E69622F2F
BB483BB052C03148
injected code
9090909090909090
nop's
9090909090909090
rbx
r12
r13
...
15.4
A number of methods have been developed and implemented to protect against the stack
buffer overflow. Some of these methods are summarized here. It must be noted that
none of these methods is completely perfect.
Page 243
15.5
Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 244
Page 245
Page 246
Chapter
16
16.1
The operating system is responsible for parsing, or reading, the command line
arguments and deliver the information to the program. It is the responsibility of the
program to determine what is considered correct and incorrect. When reading the
command line, the operating system will consider an argument a set of non-space
characters (i.e., a string). The space or spaces between arguments is removed or ignored
by the operating system. In addition, the program name itself is considered the first, and
possibly only, argument. If other arguments are entered, at least one space is required
between each argument.
All arguments are delivered to the program as character string, even number
information. If needed, the program must convert the character data into the number
value (float or integer) if needed.
For example, executing the program expProg with the following command line
arguments:
./expProgone42three
Page 247
./expProg
one
42
three
The command line arguments are delivered to the program as parameters. As such, the
program is like a function being called by the operating system. So, the standard calling
convention is used to convey the parameters.
16.2
In general, it is assumed that the reader is already familiar with the basic C/C++
command line handling. This section presents a brief summary of the how the C/C++
language handles the delivery of command line information to the program.
The count of arguments is passed the main program as the first integer parameter,
typically called argc. The second parameter, typically called argv, is an array of
addresses for each string associated with the corresponding argument.
For example, the following C++ example program will read the command line
arguments and display them to the screen.
#include<iomanip>
#include<iostream>
usingnamespacestd;
intmain(intargc,char*argv[])
{
string bars;
bars.append(50,'');
cout<<bars<<endl;
cout<<"CommandLineArgumentsExample"
<<endl<<endl;
cout<<"Totalargumentsprovided:"<<
argc<<endl;
cout<<"Thenameusedtostarttheprogram:"
<<argv[0]<<endl;
Page 248
Assuming the program is named argsExp, executing this program will produced the
following output:
./argsExpone34three
CommandLineArgumentsExample
Totalargumentsprovided:4
Thenameusedtostarttheprogram:./args
Theargumentsare:
1:one
2:34
3:three
It should be noted that the parameter '34' is a string. This example simply printed the
string the console. If a parameter is to be used as a numeric value, the program must
convert as required. This includes all error checking as necessary.
16.3
Since the operating system will call the main program as a function, the standard calling
convention applies. The argument count and the argument vector address are passed as
Page 249
Table Contents
C/C++ Reference
quadword
address
of 4th argument
argv[3]
quadword
address
of 3rd argument
argv[2]
quadword
address
nd
of 2 argument
argv[1]
quadword
address
of 1st argument
argv[0]
Each string is NULL terminated by the operating system and will not contain a new line
character.
16.4
An example assembly language program to read and display the command line
arguments in included for reference. This example simply reads and displays the
command line arguments.
Page 250
equ
equ
10
0
;linefeed
;endofstring
TRUE
FALSE
equ
equ
1
0
EXIT_SUCCESS
equ
;successcode
STDIN
STDOUT
STDERR
equ
equ
equ
0
1
2
;standardinput
;standardoutput
;standarderror
SYS_read
SYS_write
SYS_open
SYS_close
SYS_fork
SYS_exit
SYS_creat
SYS_time
equ
equ
equ
equ
equ
equ
equ
equ
0
1
2
3
57
60
85
201
;read
;write
;fileopen
;fileclos
;fork
;terminate
;fileopen/create
;gettime
;
;Variablesformain.
newLine
db
LF,NULL
;
section .text
globalmain
main:
Page 251
r12,rdi
r13,rsi
;saveforlateruse...
;
;Simplelooptodisplayeachargumenttothescreen.
;EachargumentisaNULLterminatedstring,socanjust
;printdirectly.
printArguments:
mov
rdi,newLine
call
printString
mov
rbx,0
printLoop:
mov
rdi,qword[r13+rbx*8]
call
printString
mov
call
rdi,newLine
printString
inc
cmp
jl
rbx
rbx,r12
printLoop
;
;Exampleprogramdone.
exampleDone:
mov
rax,SYS_exit
mov
rbx,EXIT_SUCCESS
syscall
;**********************************************************
Page 252
rdx,0
prtDone
;
;CallOStooutputstring.
mov
mov
mov
eax,SYS_write
rsi,rdi
edi,STDOUT
;codeforwrite()
;addrofcharacters
;filedescriptor
;countsetabove
Page 253
;systemcall
;
;Stringprinted,returntocallingroutine.
prtDone:
pop
pop
ret
rbx
rbp
;*******************************************************
The printString() function is repeated in this example and is unchanged from the
previous examples.
16.5
Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 254
Page 255
Page 256
Chapter
17
17.1
Why Buffer
Page 258
17.2
Buffering Algorithm
The first step when developing an algorithm is to understand the problem. We will
assume the file is already open and available for reading. For our buffering problem, we
wish to provide a myGetLine() function to a calling routine.
The routine might be called as follows;
status=myGetLine(fileDescriptor,textLine,MAXLEN);
The file opening and associated error checking is required but would be addressed
separately and is outlined in Chapter 13, System Services. Once a file descriptor is
obtained, it can be made available to the myGetLine() function. It is shown here as an
explicit parameter for clarity.
As you may already be aware there is no special end of file code or character. We must
infer the end of file based on the number of characters actually read. The file read
system service will return the actual number of characters read. If the process requests
100,000 characters be read and less than 100,000 are read, the end of file has been
reached and all characters have been read. Further attempts at reading the file will
generate an error. While it is easy to recognize the end of file and thus the last time the
file needs to be read, the remaining characters in the buffer need to be processed. So
while the program may know the end of file has been found, the program must continue
until the buffer is depleted.
The calling routine should not need to know any of the details regarding the buffering,
buffer management, or file operations. If a line of text from the file is available, the
myGetLine() function will return the line from the file and a TRUE status. The line is
Page 259
Page 260
This algorithm outline does not verify that the text line buffer is not overwritten nor
does it handle the case when the file size is an exact multiple of the buffer size.
Refining, implementing, and testing the algorithm is left to the reader as an exercise.
As presented, this algorithm will require statically declared variables. Stack dynamic
variables can not be used here since the information is required to be maintained
between successive calls.
Page 261
17.3
Exercises
Below are some quiz questions and suggested projects based on this chapter.
where the inFile.txt exists and contains standard ASCII text. If the inFile.txt
does not exists and error should be generated and the program terminated. The
program should open/create the file newFile.txt and and write the lines, with the
line numbers, to the newFile.txt file. The output file should be created, deleting
an old versions if one exists. Use a text editor to verify that the line numbers
track correctly in the output file.
Page 263
Page 264
Chapter
18
Data Movement
Arithmetic Instructions
A complete listing of the instructions covered in this text is located in Appendix B for
reference.
18.1
Floating point values are typically represented as either single precision (32-bits) or
double precision (64-bits). In C and C++ single precision floating point variables are
typically declared as float type and double precision floating point variables are
declared as double type. As noted in the following sections, assembly language
instructions will use an s (lower-case letter S) qualifier to refer to single precision and a
d (lower-case letter D) qualifier to refer to double precision.
18.2
There are a set of dedicated registers, referred to as XMM registers, used to support
floating point operations. Floating point operations must use the floating point registers.
Page 265
18.3
Data Movement
Typically, data must be moved into a CPU floating point register in order to be operated
upon. Once the calculations are completed, the result may copied from the register and
placed into a variable. There are a number of simple formulas in the example program
that perform these steps. This basic data movement operations are performed with the
move instruction.
The general form of the move instruction is:
movss
movsd
<dest>,<src>
<dest>,<src>
For the movss instruction, a single 32-bit source operand is copied into the destination
operand. For the movsd instruction, a single 64-bit source operand is copied into the
destination operand. The value of the source operand is unchanged. The destination
and source operand must be of the correct size for the instruction (32 or 64 bits).
Neither operand can not be an immediate value. Both operands, can not be memory,
however one can be. If a memory to memory operation is required, two instructions
must be used.
These move instructions load one value, using the lower 32 or 64-bits, into or out of the
register. Other move instructions are required to load multiple values.
The floating point move instructions are summarized as follows:
Instruction
Explanation
movss<dest>,<src>
Examples:
Page 266
movssxmm0,dword[x]
movssdword[fltSVar],xmm1
movssxmm3,xmm2
Explanation
movsd<dest>,<src>
Examples:
movsdxmm0,qword[y]
movsdqword[fltDVar],xmm1
movsdxmm3,xmm2
dd
dd
dq
dq
3.14
0.0
6.28
0.0
;singleprecisionvariables
;doubleprecisionvariables
xmm0,dword[fSVar1]
dword[fSVar2],xmm0
;fSVar2=fSVar1
movsd
movsd
xmm1,qword[fDVar1]
qword[fDVar2],xmm1
;fDVar2=fDVar1
movss
movsd
xmm2,xmm0
xmm3,xmm1
;xmm2=xmm0(32bit)
;xmm2=xmm1(64bit)
For some instructions, including those above, the explicit type specification (e.g., byte,
word, dword, qword) can be omitted as the other operand will clearly define the size. It
is included for consistency and good programming practices.
Page 267
18.4
If integer values are required during floating point calculations, the integers must be
converted into floating point values. If single precision and double precision floating
point values are required for a series of calculations, they must be converted to their
single or double so that are operations are performed on a consistent size/type.
Refer to Chapter 3 for a more detailed explanation of the representation details for
floating point values. It is assumed the reader understands the representation details and
recognizes the requirement to ensure consistent formats before performing floating
operations.
This basic data conversion operations are performed with the convert instruction.
The floating point conversion instructions are summarized as follows:
Instruction
Explanation
cvtss2sd<RXdest>,<src>
Examples:
cvtsd2ss<RXdest>,<src>
Examples:
Page 268
cvtss2sdxmm0,dword[fltSVar]
cvtss2sdxmm3,eax
cvtss2sdxmm3,xmm2
Explanation
cvtss2si<reg>,<src>
Examples:
cvtsd2si<reg>,<src>
Examples:
cvtsi2ss<RXdest>,<src>
Examples:
cvtsi2sd<RXdest>,<src>
Examples:
cvtss2sixmm1,xmm0
cvtss2sieax,xmm0
cvtss2sieax,dword[fltSVar]
Page 269
18.5
The floating point arithmetic instructions perform arithmetic operations such as add,
subtract, multiplication, and division on single or double precision floating point values.
The following sections present the basic arithmetic operations.
Specifically, the source and destination operands are added and the result is placed in the
destination operand (over-writing the previous value). The destination operand must be
a floating point register. The source operand may not be an immediate value. The value
of the source operand is unchanged. The destination and source operand must be of the
same size (double words or quad words). If a memory to memory addition operation is
required, two instructions must be used.
For example, assuming the following data declarations:
fSNum1
fSNum2
fSAns
dd
dd
dd
43.75
15.5
0.0
fDNum3
fDNum4
fDAns
dq
dq
dq
200.12
73.2134
0.0
Page 270
xmm0,dword[fSNum1]
xmm0,dword[fSNum2]
dword[dfSAns],xmm0
;fDAns=fDNum3+fDNum4
movsd
xmm0,qword[fDNum1]
addsd
xmm0,qword[fDNum2]
movsd
qword[fDAns],xmm0
For some instructions, including those above, the explicit type specification (e.g., dword,
qword) can be omitted as the other operand or the instruction itself clearly defines the
size. It is included for consistency and good programming practices.
The floating point addition instructions are summarized as follows:
Instruction
Explanation
addss<RXdest>,<src>
Examples:
addsd<RXdest>,<src>
Examples:
addssxmm0,xmm3
addssxmm5,dword[fSVar]
Specifically, the source and destination operands are subtracted and the result is placed
in the destination operand (over-writing the previous value). The destination operand
must be a floating point register. The source operand may not be an immediate value.
The value of the source operand is unchanged. The destination and source operand must
be of the same size (double words or quad words). If a memory to memory addition
operation is required, two instructions must be used.
For example, assuming the following data declarations:
fSNum1
fSNum2
fSAns
dd
dd
dd
43.75
15.5
0.0
fDNum3
fDNum4
fDAns
dq
dq
dq
200.12
73.2134
0.0
xmm0,dword[fSNum1]
xmm0,dword[fSNum2]
dword[fSAns],xmm0
;fDAns=fDNum3fDNum4
movsd
xmm0,qword[fDNum1]
Page 272
xmm0,qword[fDNum2]
qword[fDAns],xmm0
For some instructions, including those above, the explicit type specification (e.g., dword,
qword) can be omitted as the other operand or the instruction itself clearly defines the
size. It is included for consistency and good programming practices.
The floating point subtraction instructions are summarized as follows:
Instruction
Explanation
subss<RXdest>,<src>
Examples:
subsd<RXdest>,<src>
Examples:
subssxmm0,xmm3
subssxmm5,dword[fSVar]
Page 273
Specifically, the source and destination operands are multiplied and the result is placed
in the destination operand (over-writing the previous value). The destination operand
must be a floating point register. The source operand may not be an immediate value.
The value of the source operand is unchanged. The destination and source operand must
be of the same size (double words or quad words). If a memory to memory addition
operation is required, two instructions must be used.
For example, assuming the following data declarations:
fSNum1
fSNum2
fSAns
dd
dd
dd
43.75
15.5
0.0
fDNum3
fDNum4
fDAns
dq
dq
dq
200.12
73.2134
0.0
xmm0,dword[fSNum1]
xmm0,dword[fSNum2]
dword[fSAns],xmm0
;fDAns=fDNum3*fDNum4
movsd
xmm0,qword[fDNum1]
mulsd
xmm0,qword[fDNum2]
movsd
qword[fDAns],xmm0
For some instructions, including those above, the explicit type specification (e.g., dword,
qword) can be omitted as the other operand or the instruction itself clearly defines the
size. It is included for consistency and good programming practices.
Page 274
Explanation
mulss<RXdest>,<src>
Examples:
mulsd<RXdest>,<src>
Examples:
mulssxmm0,xmm3
mulssxmm5,dword[fSVar]
Specifically, the source and destination operands are divided and the result is placed in
the destination operand (over-writing the previous value). The destination operand must
be a floating point register. The source operand may not be an immediate value. The
Page 275
dd
dd
dd
43.75
15.5
0.0
fDNum3
fDNum4
fDAns
dq
dq
dq
200.12
73.2134
0.0
xmm0,dword[fSNum1]
xmm0,dword[fSNum2]
dword[fSAns],xmm0
;fDAns=fDNum3/fDNum4
movsd
xmm0,qword[fDNum1]
divsd
xmm0,qword[fDNum2]
movsd
qword[fDAns],xmm0
For some instructions, including those above, the explicit type specification (e.g., dword,
qword) can be omitted as the other operand or the instruction itself clearly defines the
size. It is included for consistency and good programming practices.
The floating point division instructions are summarized as follows:
Page 276
Explanation
divss<RXdest>,<src>
Examples:
divsd<RXdest>,<src>
Examples:
divssxmm0,xmm3
divssxmm5,dword[fSVar]
<src>
Specifically, the square root of the source operand is placed in the destination operand
(over-writing the previous value). The destination operand must be a floating point
register. The source operand may not be an immediate value. The value of the source
operand is unchanged. The destination and source operand must be of the same size
(double words or quad words). If a memory to memory addition operation is required,
Page 277
dd
dd
1213.0
0.0
fDNum3
fDAns
dq
dq
172935.123
0.0
fSNum1
fSNum3
xmm0,dword[fSNum1]
dword[fSAns],xmm0
;fDAns=sqrt(fDNum3)
divsd
xmm0,qword[fDNum3]
movsd
qword[fDAns],xmm0
For some instructions, including those above, the explicit type specification (e.g., dword,
qword) can be omitted as the other operand or the instruction itself clearly defines the
size. It is included for consistency and good programming practices.
The floating point addition instructions are summarized as follows:
Instruction
Explanation
sqrtss<RXdest>,<src>
Page 278
Explanation
Examples:
sqrtsd<RXdest>,<src>
Examples:
sqrtssxmm0,xmm3
sqrtssxmm7,dword[fSVar]
18.6
Page 279
Where <RXsrc> and <src> are compared as floating point values and must be the same
size. The results of the comparison are placed in the rFlag register. Neither operand is
changed. The <RXsrc> operand must be one of the xmm registers. The <src> register
can be a xmm register or a memory location, but may not be an immediate value. One
of the unsigned conditional jump instructions can be used to read the rFlag register.
The conditional control instructions include the jump equal (je) and jump not equal
(jne). The unsigned conditional control instructions include the basic set of comparison
operations; jump below than (jb), jump below or equal (jbe), jump above than (ja), and
jump above or equal (jae).
The general form of the signed conditional instructions along with an explanatory
comment are as follows:
je<label>
jne<label>
;if<op1>==<op2>
;if<op1>!=<op2>
jb<label>
jbe<label>
ja<label>
jae<label>
;unsigned,if<op1><<op2>
;unsigned,if<op1><=<op2>
;unsigned,if<op1>><op2>
;unsigned,if<op1>>=<op2>
dq
dq
7.5
5.25
Assuming that the values are updating appropriately within the program (not shown),
the following instructions could be used:
movsd
Page 280
xmm1,qword[fltNum]
;iffltNum<=fltMax
;skipsetnewmax
As with integer comparisons, the floating point compare and conditional jump provide
functionality for jump or not jump. As such, if the condition from the original IF
statement is false, the code to update the fltMax should not be executed. Thus, when
false, in order to skip the execution, the conditional jump will jump to the target label
immediately following the code to be skipped (not executed). While there is only one
line in this example, there can be many lines code code.
A more complex example might be as follows:
if(x!=0.0){
ans=x/y;
errFlg=FALSE;
}else{
errFlg=TRUE;
}
This basic compare and conditional jump do not provide and IF-ELSE structure. It must
be created. Assuming the x and y variables are signed double-words that will be set
during the program execution, and the following declarations:
TRUE
FALSE
fltZero
x
y
ans
errFlg
equ
equ
dq
dq
dq
dq
db
1
0
0.0
10.1
3.7
0.0
FALSE
The following code could be used to implement the above IF-ELSE statement.
movsd
ucomisd
je
divsd
movsd
mov
jmp
xmm1,qword[x]
xmm1,qword[fltZero]
doElse
xmm1,qword[y]
dword[ans],eax
byte[errFlg],FALSE
skpElse
;ifstatement
Page 281
byte[errFlg],TRUE
Floating point comparisons can be very tricky due to inexact nature of the floating point
representations and rounding errors. For example, the value 0.1 added 10 times should
be 1.0. However, implementing a program to perform this summation and checking the
result, will show that;
10
0.1
1.0
i =1
Explanation
ucomiss<RXsrc>,<src>
Examples:
ucomisd<RXsrc>,<src>
Examples:
ucomissxmm0,xmm3
ucomissxmm5,dword[fSVar]
46 See: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Page 282
18.7
The standard calling conventions detailed in Chapter 12, Functions still fully apply.
This section addresses the usage of the floating point registers when calling floating
point functions.
When using floating point registers, none of the registers are preserved across a floating
point function call.
The first eight (8) floating point arguments are passed in float registers xmm0 xmm7.
Any additional arguments are placed on the stack in backwards order in the manner
described in Chapter 12, Functions. A value returning floating point function will return
the result in xmm0.
Since none of the floating point registers are preserved, the code must be written
carefully.
18.8
This example is a simple assembly language program to calculate the sum and average
for a list of floating point values.
;FloatingPointExampleProgram
;***********************************************************
section .data
;
;Defineconstants.
NULL
TRUE
FALSE
equ 0
equ 1
equ 0
;endofstring
EXIT_SUCCESS
SYS_exit
equ 0
equ 60
;Successfuloperation
;systemcallcodeforterminate
Page 283
dq
dq
dq
dd
dq
dq
21.34,6.15,9.12,10.05,7.75
1.44,14.50,3.32,75.71,11.87
17.23,18.25,13.65,24.24,8.88
15
0.0
0.0
;***********************************************************
section .text
global _start
_start:
;
;Looptofindfloatingpointsum.
mov ecx,[length]
mov rbx,fltLst
mov rsi,0
movsd xmm1,qword[lstSum]
sumLp:
movsd
addsd
inc
loop
movsd
xmm0,qword[rbx+rsi*8]
xmm1,xmm0
rsi
sumLp
;getfltLst[i]
;updatesum
;i++
qword[lstSum],xmm1
;savesum
;
;Computeaverageofentirelist.
cvtsi2sd
cvtsd2si
divsd
movsd
xmm0,dword[length]
dword[length],xmm0
xmm1,xmm0
qword[lstAve],xmm1
;
;
Done,terminateprogram.
last:
Page 284
;exitw/success
The debugger can be used to examine the results and verify correct execution of the
program.
18.9
This example is a simple assembly language program to calculate the sum and average
for a list of floating point values. Recall that if a value is negative, it must be made
positive and if the value is already positive, nothing should be done.
;FloatingPointAbsoluteValueExample
section .data
;
;Defineconstants.
TRUE
FALSE
equ
equ
1
0
SUCCESS
SYS_exit
equ
equ
0
60
;successfuloperation
;callcodeforterminate
;
;Definesometestvariables.
dZero
dNegOne
dq
dq
0.0
1.0
fltVal
dq
8.25
;*********************************************************
section .text
global _start
_start:
Page 285
xmm0,qword[fltVal]
xmm0,qword[dZero]
isPos
xmm0,qword[dNegOne]
qword[fltVal],xmm0
;
;
Done,terminateprogram.
last:
mov
eax,SYS_exit
mov
ebx,EXIT_SUCCESS
syscall
;exitw/success
In this example, the final result for |fltVal| was saved to memory. Depending on the context,
this may not be required.
18.10 Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 286
0.1
i =1
Compare the results of the summation to the value 1.0 and display a message
Are Same if the summation result equals 1.0 and the message Are Not Same
if the result of the summation does not equal 1.0. Use the debugger as needed to
debug the program. When working, execute the program without the debugger
and verify that the expected results are displayed to the console.
Page 287
Page 288
Chapter
19
Page 289
19.1
Distributed Computing
19.2
Multiprocessing
As noted in Chapter 2, Architecture Overview most current CPU chips include multiple
cores. The CPU cores all have equal access to the main memory resource.
Multiprocessing is a form of parallel processing that specifically refers to using multiple
cores to perform simultaneous execution of multiple processes.
49
50
51
52
Page 290
http://en.wikipedia.org/wiki/Distributed_computing
http://en.wikipedia.org/wiki/Multiprocessing
http://en.wikipedia.org/wiki/List_of_distributed_computing_projects
http://en.wikipedia.org/wiki/Folding@home
Page 291
myValue
X
)+Y
We could write a high level language program something along the lines of:
for(inti=0;i<MAX;i++)
myValue=(myValue/X)+Y;
This code would be repeated in each of the thread functions. It may not be obvious, but
assuming both threads are simultaneously executing, this will cause a race condition on
the myValue variable. Specifically, each of the two threads are attempting to update the
variable simultaneously and some of the updates to the variable may be lost.
To further simplify this example, we will assume that X and Y are both set to 1. As
such, the result of each calculation would be to increment myValue by 1. If myValue is
57 For more information, refer to: http://en.wikipedia.org/wiki/Mutual_exclusion
58 For more information, refer to: http://en.wikipedia.org/wiki/Synchronization_(computer_science)
Page 292
dd
0,0,0,0,0
The following code fragment would create and start threadFunction0() executing.
;
;
pthread_create(&pthreadID0,NULL,
&threadFunction0,NULL);
mov
rdi,pthreadID0
mov
rsi,NULL
mov
rdx,threadFunction0
mov
rcx,NULL
call
pthread_create
pthread_join(pthreadID0,NULL);
mov
rdi,qword[pthreadID0]
mov
rsi,NULL
call
pthread_join
If the thread function is not done, the join call will wait until it is completed.
The thread function, threadFunction0(), itself might contain the following code;
;
globalthreadFunction0
threadFunction0:
;PerformMAX/2iterationstoupdatemyValue.
Page 293
;divideby2
incLoop0:
;myValue=(myValue/x)+y
mov
cqo
div
add
mov
loop
rax,qword[myValue]
qword[x]
rax,qword[y]
qword[myValue],rax
incLoop0
ret
mov rax,qword[myValue]
cqo
mov rax,qword[myValue]
div qword[x]
cqo
add rax,qword[y]
div qword[x]
mov qword[myValue],rax
add rax,qword[y]
mov qword[myValue],rax
As a reminder, each core has its own set of registers. Thus, the core 0 rax register is
different than the core 1 rax register.
If the variable myValue is currently at 730, two thread executions should increase it to
732. On core 0 code, at line 1 the 730 is copied into core 0, rax. On core 2, at line 2,
the 730 is copied into core 1, rax. As execution progresses, lines 2-4 are performed and
core 0 rax is incremented from 730 to 731. During this time, on core 1, lines 1 -3 are
completed and the 730 is also incremented 731. As the next line, line 5, is are executed
Page 294
19.3
Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 296
Chapter
20
20.0 Interrupts
In a general sense, an interrupt60 is a pause or hold in the current flow. For example, if
you are talking on the phone and the door bell rings, the phone conversation is placed on
hold, and the door answered. After the salesperson is sent away, the phone conversation
is resumed (where the conversation left off).
In computer programming an interrupt is also pause, or hold, of the currently executing
process. Typically, the current process is interrupted so that some other work can be
performed. An interrupt is usually defined as an event that alters the sequence of
instructions executed by a processor. Such events correspond to signals generated by
software and/or hardware. For example, most Input/Output (I/O) devices generate an
interrupt in order to transmit or receive data. Software programs can also generate
interrupts to initiate I/O as needed, request OS services, or handle unexpected
conditions.
Handling interrupts is a sensitive task. Interrupts can occur at any time, the kernel tries
to get the interrupt addressed as soon as possible. Additionally, an interrupt can be
interrupted by another interrupt.
20.1
Page 297
Synchronously occurring interrupts typically occur while under CPU control and are
caused by or on behalf of the currently executing process. The synchronous nature is
related to where the interrupt occurs and not a specific clock time or CPU cycle time.
Synchronous interrupts typically reoccur at the same location (assuming nothing has
changed to resolve the original cause).
Page 298
Exceptions
An exception is a term for an interrupt that is caused by the current process and needs
attention of the kernel. Exceptions are synchronously occurring. In this context,
synchronous implies that exception will occur in a predictable or repeatable manner.
Exceptions are typically divided into categories as follows:
Faults
Traps
Abort
An example of a fault is a page fault which is a request for loading part of the program
from disk storage into memory. The interrupted process restarts with no loss of
continuity.
A trap is typically used for debugging. The process re-starts with no loss of continuity.
An abort is typically an indication that a serious error condition occurred and must be
handled. This includes division by zero, attempt to access an invalid memory address,
or attempt to execute an invalid/illegal instruction. An illegal instruction might be an
Page 299
20.2
Interrupts have various types and privileges associated with them. The following
sections provide an explanation of the types and privileges. Interrupted processes may
execute at a lower privilege than the interrupt processing code. In order for interrupts to
be effective, the OS must securely handle this privilege escalation and deescalation
securely and quickly.
Description
Level 0
Level 1
Level 2
Level 3
Level 0
Level 1
Level 2
Level 3
Page 301
20.3
Interrupt Processing
The execution of current program is suspended. As a minimum, this requires saving the
rip and rFlags registers to system stack. The remaining registers are likely to be
preserved (as a further step), depending on the specific interrupt. The rFlags flag
register must be preserved immediately since the interrupt may have been generated
asynchronously and those registers will change as successive instructions are executed.
This multi-stage process ensures that the program context can be fully restored.
20.3.2.2 Obtaining ISR Address
The ISR addresses are stored in a table referred to as an Interrupt Descriptor Table 61
(IDT). For each ISR, the IDT contains the ISR address and some additional information
including task gate (priority and privilege information) for the ISR. Each entry in the
IDT is a total of 8 bytes each for a total of 16 bytes per IST entry. There are a maximum
61 Note, for Windows this data structure is referred to as the Interrupt Vector Table (IVT).
Page 302
Once the ISR address is obtained from the IDT some validation is
includes ensuring the interrupt is from a legal/valid source and if
change is required and allowed. Once the verifications have
successfully, the address of the ISR from the IDT is placed in the
effecting a jump to the ISR routine.
performed. This
a privilege level
been completed
rip register, thus
At this point, depending on the specific ISR, a complete process context switch may be
performed. A process context switch involves saving the entire set of CPU registers for
the interrupted process.
In Linux-based OS's, ISR are typically divided into two parts, referred to as the top-half
and bottom-half. Other OS's refer to these as the First-Level Interrupt Handler (FLIH)
and the Second-Level Interrupt Handlers (SLIH).
The top-half or FLIH is executed immediately and is where any critical activities are
performed. The activities are specific to the ISR, but might include acknowledging the
interrupt, resetting hardware (if necessary), and recording any information available
only available at the time of interrupt. The top-half may perform some blocking of other
interrupts (which needs to be minimized).
The bottom-half is where any processing activities (if any) are performed. This helps
ensure that the top-half is completed quickly and that any non-critical processing is
deferred to a more convenient time. If a bottom-half exists, the top-half will create and
schedule the execution of the bottom-half.
Once the top-half completes, the OS scheduler will select a new process.
Page 303
When the OS is ready to resume the interrupted process, the program context is restored
and an iret instruction is executed (to pop rFlags and rip registers, thus completing
the restoration).
20.4
The following diagram presents an overview of the general flow used for processing
interrupts by the system.
Main Memory
Operating
System
4
ISR
1 Executing
Process
3
2
IDT
Page 304
20.5
Exercises
Below are some quiz questions and suggested projects based on this chapter.
Page 305
Page 306
Appendix
A
Dec
Hex
Char
Dec
Hex
Char
Dec
Hex
Char
Dec
Hex
NUL
0x00
spc
32
0x20
64
0x40
96
0x60
SOH
0x01
33
0x21
65
0x41
97
0x61
STX
0x02
"
34
0x22
66
0x42
98
0x62
ETX
0x03
35
0x23
67
0x43
99
0x63
EOT
0x04
36
0x24
68
0x44
100
0x64
ENQ
0x05
37
0x25
69
0x45
101
0x65
ACK
0x06
&
38
0x26
70
0x46
102
0x66
BEL
0x07
'
39
0x27
71
0x47
103
0x67
BS
0x08
40
0x28
72
0x48
104
0x68
TAB
0x09
41
0x29
73
0x49
105
0x69
LF
10
0x0A
42
0x2A
74
0x4A
106
0x6A
VT
11
0x0B
43
0x2B
75
0x4B
107
0x6B
FF
12
0x0C
44
0x2C
76
0x4C
108
0x6C
CR
13
0x0D
45
0x2D
77
0x4D
109
0x6D
SO
14
0x0E
46
0x2E
78
0x4E
110
0x6E
SI
15
0x0F
47
0x2F
79
0x4F
111
0x6F
DLE
16
0x10
48
0x30
80
0x50
112
0x70
DC1
17
0x11
49
0x31
81
0x51
113
0x71
DC2
18
0x12
50
0x32
82
0x52
114
0x72
DC3
19
0x13
51
0x33
83
0x53
115
0x73
Page 307
20
0x14
52
0x34
84
0x54
116
0x74
NAK
21
0x15
53
0x35
85
0x55
117
0x75
SYN
22
0x16
54
0x36
86
0x56
118
0x76
ETB
23
0x17
55
0x37
87
0x57
119
0x77
CAN
24
0x18
56
0x38
88
0x58
120
0x78
EM
25
0x19
57
0x39
89
0x59
121
0x79
SUB
26
0x1A
58
0x3A
90
0x5A
122
0x7A
ESC
27
0x1B
59
0x3B
91
0x5B
123
0x7B
FS
28
0x1C
<
60
0x3C
92
0x5C
124
0x7C
GS
29
0x1D
61
0x3D
93
0x5D
125
0x7D
RS
30
0x1E
>
62
0x3E
94
0x5E
126
0x7E
US
31
0x1F
63
0x3F
95
0x5F
DEL
127
0x7F
For additional information and a more complete listing of the ASCII codes (including
the extended ASCII characters), refer to http://www.asciitable.com/
Page 308
Appendix
B
22.1
Notation
Description
Register operand. The operand must be a register.
<reg8>,<reg16>,
<reg32>,<reg64>
<dest>
<RXdest>
<src>
<imm>
Page 309
<label>
22.2
Description
Memory location. May be a variable name or an indirect
reference (i.e., a memory address).
Operand, register or memory.
Operand, register or memory, with specific size
requirement. For example, op8 means a byte sized
operand only and reg32 means a double-word sized
operand only.
Program label.
Explanation
mov<dest>,<src>
lea<reg64>,<mem>
22.3
Explanation
movzx<dest>,<src>
movzx<reg16>,<op8>
movzx<reg32>,<op8>
movzx<reg32>,<op16>
movzx<reg64>,<op8>
movzx<reg64>,<op16>
Page 310
Explanation
cbw
cwd
cwde
cdq
cdqe
cqo
movsx<dest>,<src>
movsx<reg16>,<op8>
movsx<reg32>,<op8>
movsx<reg32>,<op16>
movsx<reg64>,<op8>
movsx<reg64>,<op16>
movsxd<reg64>,<op32>
22.4
Explanation
add<dest>,<src>
Page 311
Explanation
inc<operand>
Increment <operand> by 1.
Note, <operand> can not be an immediate.
adc<dest>,<src>
Examples:
adcrcx,qword[dVvar1]
adcrax,42
sub<dest>,<src>
dec<operand>
Decrement <operand> by 1.
Note, <operand> can not be an immediate.
mul<src>
mul<op8>
mul<op16>
mul<op32>
mul<op64>
imul<src>
imul<dest>,<src/imm32>
imul<dest>,<src>,<imm32>
imul<op8>
imul<op16>
imul<op32>
imul<op64>
imul<reg16>,<op16/imm>
imul<reg32>,<op32/imm>
imul<reg64>,<op64/imm>
Page 312
Explanation
<reg32> = <reg32> * <op32/imm>
<reg64> = <reg64> * <op64/imm>
For three operands:
<reg16> = <op16> * <imm>
<reg32> = <op32> * <imm>
<reg64> = <op64> * <imm>
div<src>
div<op8>
div<op16>
div<op32>
div<op64>
idiv<src>
idiv<op8>
idiv<op16>
idiv<op32>
idiv<op64>
22.5
Below is a summary of the basic logical, shift, arithmetic shift, and rotate instructions.
Instruction
Explanation
and<dest>,<src>
or<dest>,<src>
Page 313
Explanation
Note 1, both operands can not be memory.
Note 2, destination operand can not be an
immediate.
xor<dest>,<src>
not<op>
shl<dest>,<imm>
shl<dest>,cl
shr<dest>,<imm>
shr<dest>,cl
sal<dest>,<imm>
sal<dest>,cl
sar<dest>,<imm>
sar<dest>,cl
Page 314
Explanation
The <imm> or the value in cl register must be
between 1 and 64.
Note, destination operand can not be an
immediate.
rol<dest>,<imm>
rol<dest>,cl
ror<dest>,<imm>
ror<dest>,cl
22.6
Control Instructions
Explanation
cmp<op1>,<op2>
je<label>
jne<label>
jl<label>
Page 315
Explanation
<op1> < <op2>.
Label must be defined exactly once.
jle<label>
jg<label>
jge<label>
jb<label>
jbe<label>
ja<label>
jae<label>
loop<label>
Page 316
22.7
Stack Instructions
Explanation
push<op64>
pop<op64>
22.8
Function Instructions
Explanation
call<funcName>
ret
22.9
Below is a summary of the basic instructions for floating point data movement
instructions.
Instruction
Explanation
movss<dest>,<src>
Examples:
movssxmm0,dword[x]
movssdword[fltVar],xmm1
movssxmm3,xmm2
Page 317
Explanation
movsd<dest>,<src>
Examples:
movsdxmm0,qword[y]
movsdqword[fltdVar],xmm1
movsdxmm3,xmm2
Explanation
cvtss2sd<RXdest>,<src>
Examples:
cvtsd2ss<RXdest>,<src>
Examples:
Page 318
cvtss2sdxmm0,dword[fltSVar]
cvtss2sdxmm3,eax
cvtss2sdxmm3,xmm2
Explanation
cvtss2si<reg>,<src>
Examples:
cvtsd2si<reg>,<src>
Examples:
cvtsi2ss<RXdest>,<src>
Examples:
cvtsi2sd<RXdest>,<src>
Examples:
cvtss2sixmm1,xmm0
cvtss2sieax,xmm0
cvtss2sieax,dword[fltSVar]
Page 319
Explanation
addss<RXdest>,<src>
Examples:
addsd<RXdest>,<src>
Examples:
subss<RXdest>,<src>
Examples:
Page 320
addssxmm0,xmm3
addssxmm5,dword[fSVar]
Explanation
subsd<RXdest>,<src>
Examples:
mulss<RXdest>,<src>
Examples:
mulsd<RXdest>,<src>
Examples:
divss<RXdest>,<src>
subsdxmm0,xmm3
subsdxmm5,qword[fDVar]
Page 321
Explanation
Examples:
divsd<RXdest>,<src>
Examples:
sqrtss<RXdest>,<src>
Examples:
sqrtsd<RXdest>,<src>
Examples:
Page 322
divssxmm0,xmm3
divssxmm5,dword[fSVar]
Explanation
ucomiss<RXsrc>,<src>
Examples:
ucomisd<RXsrc>,<src>
Examples:
ucomissxmm0,xmm3
ucomissxmm5,dword[fSVar]
Page 323
Page 324
Appendix
C
23.1
Return Codes
The system call will return a code in the rax register. If the value returned is less than 0,
that is an indication that an error has occurred. If the operation is successful, the value
returned will depend on the specific system service. Refer to the Error Codes section
for additional information regarding the values of the error codes.
23.2
System Service
SYS_read
Description
Read characters
rdi = file descriptor (of where to read from)
rsi = address of where to store characters
rdx = count of characters to read
SYS_write
Write characters
rdi = file descriptor (of where to write to)
Page 325
System Service
Description
rsi = address of characters to write
rdx = count of characters to write
SYS_open
Open a file
rdi = address of NULL terminated file name
rsi = file status flags (typically O_RDONLY)
SYS_close
SYS_lseek
57
SYS_fork
59
SYS_execve
60
SYS_exit
Page 326
System Service
85
SYS_creat
Description
Open/Create a file.
rdi = address of NULL terminated file name
rsi = file mode flags
23.3
If successful, returns
File Modes
When performing file operations, the fie mode provides information to the operating
system regarding the file access permissions that will be allowed.
When opening an existing file, one of the following file modes must be specified.
Mode
Value
Description
O_RDONLY
O_WRONLY
Write only.
Typically used if
information is to be appended to a file.
O_RDWR
Allow simultaneous
writing.
reading
and
When creating a new file, the file permissions must be specified. Below are the
complete set of file permissions. As is standard for Linux file systems, the permisson
values are specified in Octal.
Page 327
Value
Description
S_IRWXU
00700q
S_IRUSR
00400q
S_IWUSR
00200q
S_IXUSR
00100q
S_IRWXG
00070q
S_IRGRP
00040q
S_IWGRP
00020q
S_IXGRP
00010q
S_IRWXO
00007q
S_IROTH
00004q
S_IWOTH
00002q
S_IXOTH
00001q
The text examples only address permissions for the user or owner of the file.
23.4
Error Codes
If a system service returns an error, the value of the return code will be negative. The
following is a list of the error code. The code value is provided along with the Linux
symbolic name. High level languages typically use the name which is not used at the
assembly level and is only provided for reference.
Page 328
Error Code
Symbolic
Name
Description
-1
EPERM
-2
ENOENT
-3
ESRCH
No such process.
EINTR
-5
EIO
-6
ENXIO
-7
E2BIG
-8
ENOEXEC
-9
EBADF
-10
ECHILD
No child process.
-11
EAGAIN
Try again.
-12
ENOMEM
-13
EACCES
Permission denied.
-14
EFAULT
Bad address.
-15
ENOTBLK
-16
EBUSY
-17
EEXIST
File exists.
-18
EXDEV
Cross-device link.
-19
ENODEV
No such device.
-20
ENOTDIR
Not a directory.
-21
EISDIR
Is a directory.
-22
EINVAL
Invalid argument.
-23
ENFILE
-24
EMFILE
-25
ENOTTY
Not a typewriter.
-26
ETXTBSY
-27
EFBIG
-28
ENOSPC
-29
ESPIPE
Illegal seek.
-30
EROFS
-31
EMLINK
-32
EPIPE
I/O Error.
Out of memory.
Page 329
EDOM
-34
ERANGE
Only the most common error codes are shown. A complete list can be found via the
Internet or by looking on the current system includes files. For Ubuntu, this is typically
located in /usr/include/asm-generic/errno-base.h.
Page 330
Appendix
D
24.1
24.2
0016
4C16
4B16
Low memory
4016
Page 331
eax
rax =
24.3
ah
ax
al
24.4
1) yasm
2) With the ; (semicolon).
Page 334
bNum db 10
2.
wNum dw 10291
3.
dwNum dd 2126010
4.
qwNum dq 10000000000
2.
3.
4.
24.5
Many examples
24.6
x/db &bVar1
2.
x/dh &wVar1
3.
x/dw &dVar1
Page 336
x/dg &qVar1
5.
x/30db &bArr1
6.
x/50dh &wArr1
7.
x/75dw &dArr1
x/xb &bVar1
2.
x/xh &wVar1
3.
x/xw &dVar1
4.
x/xg &qVar1
5.
x/30xb &bArr1
6.
x/50xh &wArr1
7.
x/75xw &dArr1
24.7
Page 337
mov ah, 0
2.
cbw
movzx
2.
cwde
eax, ax
mov dx, 0
2.
cwd
6) The cwd instruction only converts the signed value in ax into a sign value in
dx:ax (and nothing else). The movsx instruction copies the word source
operand into the double-word destination operand.
7) On the first instruction, the destination operand size can be determined from the
source operand (as double-word). On the second instruction, the destination
operand size must be explicitly specified since the source operand, value of 1,
does not have an inherent size associated with it.
8) The answers are as follows:
1.
Page 338
rax = 0x00000009
rbx = 0x0000000B
rax = 0x00000007
2.
rbx = 0x00000002
rax = 0x00000009
2.
rbx = 0xFFFFFFF9
rax = 0x0000000C
2.
rdx = 0x00000000
rax = 0x00000001
2.
rdx = 0x00000002
rax = 0x00000002
2.
rdx = 0x00000003
24.8
1) The first instruction places the value from qVar1 into the rdx register. The
second instruction places the address of qVar1 into the rdx register.
2) The answers are as follows:
1. Immediate
2. Memory
3. Immediate
4. Illegal, destination operand can not be an immediate value
5. Register
6. Memory
7. Memory
8. Illegal, source and destination operands not the same size.
3) The answers are as follows:
1.
eax = 0x0000000A
eax = 0x00000003
2.
edx = 0x00000002
eax = 0x00000009
2.
ebx = 0x00000002
3.
rcx = 0x0000000000000000
4.
rsi = 0x000000000000000C
rax = 0x00000010
2.
rcx = 0x0000000000000000
3.
edx = 0x00000000
4.
rsi = 0x0000000000000004
Page 340
eax = 0x00000002
2.
rcx = 0x0000000000000000
3.
edx = 0x00000005
4.
rsi = 0x0000000000000003
eax = 0x00000018
2.
edx = 0x00000000
3.
rcx = 0x0000000000000000
4.
rsi = 0x0000000000000005
24.9
r10 = 0x0000000000000003
2.
r11 = 0x0000000000000002
3.
r12 = 0x0000000000000001
func1, func2
func1, func2
Page 343
Page 344
Page 345
An
Page 346
Page 347
Page 348
Index
automatics.............................................173
base address..........................................132
Base Pointer Register.............................12
Basic System Services..........................325
biased exponent......................................25
BSS.........................................................17
BSS Section............................................35
buffer....................................................258
Buffering Algorithm.............................259
byte addressable..............................16, 132
cache memory...........................................9
Cache Memory.................................14, 19
call code................................................197
call frame..............................................179
Call Frame............................................179
call-by-reference...................................176
call-by-value.........................................176
callee.....................................................176
Callee............................................182, 185
caller.....................................................176
Caller............................................182, 184
Calling Convention...............................176
Calling System Services.......................197
Central Processing Unit............................9
character.................................................28
code generation.......................................47
Page 349
Alphabetical Index
Code Injection......................................242
Code Injection Protections...................243
Code to Inject.......................................239
Command Line Arguments...................247
Comments...............................................33
Compile, Assemble, and Link..............234
concurrency..........................................289
Conditional Control Instructions..........111
Console Input........................................203
Console Output.....................................199
constant expression.................................46
Conversion Instructions..........................76
CPU register...........................................10
Data Address Space Layout
Randomization......................................244
Data Execution Prevention...................244
Data Movement..............................73, 266
Data representation.................................21
Data Section............................................34
Data Stack Smashing Protector (or
Canaries)...............................................244
db............................................................34
dd............................................................35
DDD Configuration Settings..................57
DDD Debugger.......................................55
DDD/GDB Commands Summary..........62
DDD/GDB Commands, Examples.........64
ddq..........................................................35
Debugger................................................52
Debugger Commands File (interactive). 65
Debugger Commands File (noninteractive)...........................................66p.
Debugging Macros...............................171
Defining Constants.................................34
dereferencing........................................131
Destination operand........................72, 309
Direct Memory Addressing..................258
Displaying Register Contents.................60
Page 350
Alphabetical Index
File Open Operations............................208
File Open/Create...................................210
File Operations Examples.....................212
File Read...............................................211
File Write..............................................212
First Pass.................................................46
Flag Register...........................................12
Floating Point Addition........................270
Floating Point Arithmetic Instructions. 270
Floating Point Calling Conventions.....283
Floating Point Comparison...................280
Floating Point Control Instructions......279
Floating point destination register operand
........................................................72, 309
Floating Point Division.........................275
Floating Point Instructions...................265
Floating Point Multiplication...............273
Floating Point Registers.......................265
Floating Point Square Root..................277
Floating Point Subtraction....................272
Floating Point Values............................265
Floating-point Representation................24
forward reference....................................46
Function Declaration............................174
Function Source....................................230
Functions..............................................173
General Purpose Registers......................10
GPRs.......................................................10
Hardware Interrupt...............................299
heap.................................................17, 150
I/O Buffering........................................257
IEEE 32-bit Representation....................25
IEEE 64-bit Representation....................27
IEEE 754 32-bit floating-point standard 24
IF statement...........................................111
Immediate Mode Addressing................130
Immediate value.............................72, 309
index.....................................................132
indirection.............................................131
input buffer...........................................258
Input/Output Buffering.........................257
Instruction Pointer Register....................12
Instruction Set Overview........................71
Integer / Floating Point Conversion
Instructions...........................................268
Integer Arithmetic Instructions...............80
integer numbers......................................21
Interfacing with a High-Level Language
..............................................................232
Interrupt Categories..............................299
Interrupt Classification.........................298
Interrupt Processing..............................302
Interrupt Service Routine (ISR)............302
Interrupt Timing....................................298
Interrupt Types......................................300
Interrupt Types and Levels...................300
Interrupts...............................................297
Iteration.................................................117
Jump Out Of Range..............................114
jump out-of-range.................................114
Jump to ISR..........................................303
Labels....................................................110
leaf function..........................................179
Least Significant Byte............................16
light-weight process..............................291
Linkage.................................................175
Linker.....................................................47
Linking Multiple Files............................48
Linking Process......................................48
list file.....................................................43
List File...................................................43
List Summation....................................134
little-endian.............................16, 153, 241
Loader.....................................................52
Logic Error...........................................164
logical AND operation..........................105
Page 351
Alphabetical Index
logical NOT operation..........................105
logical OR operation.............................105
logical XOR operation..........................105
machine language...............17, 43, 45, 240
Macro Definition..................................168
Macros..................................................167
Main Memory.........................................16
memory hierarchy...........................17, 257
Memory Hierarchy.................................17
memory layout................................17, 150
Memory Mode Addressing...................131
Most Significant Byte.............................16
Multi-Line Macros................................168
Multi-user Operating System...............297
Multiple Source Files...........................227
multiprocessing.....................................290
Multiprocessing....................................290
NaN.................................................28, 279
Narrowing conversions...........................76
Narrowing Conversions..........................76
Newline Character................................198
Next / Step..............................................60
nexti......................................................171
non-volatile memory................................8
NOP slide..............................................243
normalized scientific notation................25
not a number...................................28, 279
Not a Number (NaN)..............................27
Notation................................................309
Notational Conventions..........................71
Numeric Values.......................................33
object file................................................43
Obtaining ISR Address.........................302
offset.....................................................132
Operand Notation...........................72, 309
operands..................................................71
operation.................................................71
ordered floating point comparisons......279
Page 352
parallel processing................................289
Parallel Processing................................289
Parameters Passing...............................177
Parsing Command Line Arguments......247
pop operation........................................147
POSIX Threads.....................................291
preserved register..................................178
Primary Storage........................................7
Privilege Levels....................................300
Process Stack........................................147
Processing Steps...................................302
processor registers....................................9
Program Development..........................157
Program Format......................................33
prologue................................................176
push operation......................................147
race condition.......................................291
Race Conditions....................................292
Random Access Memory (RAM).............7
RBP.........................................................12
Red Zone..............................................181
register....................................................10
Register Mode Addressing...................130
Register operand.............................72, 309
Register Usage......................................178
resb.........................................................36
resd.........................................................36
resdq.......................................................36
resq.........................................................36
Resumption...........................................304
resw.........................................................36
Return Codes........................................325
rFlags......................................................12
RIP..........................................................12
RSP.........................................................12
Run / Continue........................................60
Run-time Error......................................164
saved register........................................178
Alphabetical Index
Second Pass............................................46
secondary storage.....................................8
Secondary Storage....................................7
section .bss..............................................35
section .data............................................34
section .text.............................................36
segment fault........................................239
Setting Breakpoints................................57
short-jump.............................................114
sign extension.......................................107
sign-extend.............................................78
signed......................................................21
Signed Conversions................................78
Single Instruction Multiple Data............13
Single-Line Macros..............................167
SNaN....................................................279
Software Interrupts...............................300
source file...............................................43
Source operand...............................72, 309
stack buffer overflow............................237
Stack Buffer Overflow..........................237
Stack Dynamic Local Variables............173
Stack Example......................................153
stack frame............................................179
Stack Implementation...........................149
Stack Instructions.................................148
Stack Layout.........................................150
Stack Operations...................................151
Stack Pointer Register............................12
stack smashing......................................237
Stack-Based Local Variables................188
Standard Calling Convention...............174
Starting DDD..........................................55
stepi.......................................................171
string.......................................................29
Summary...............................................191
Suspension............................................302
Suspension Execute ISR.......................303
Suspension Interrupt Processing Summary
..............................................................304
symbol table............................................46
Synchronous Interrupts.........................298
system call............................................197
System Services....................................197
Text Section............................................36
thread....................................................291
Tool Chain..............................................41
Two-Pass Assembler...............................45
two's compliment.................................22p.
Unconditional Control Instructions......111
Understanding a Stack Buffer Overflow
..............................................................238
Unicode...................................................29
uninitialized data............................17, 150
unordered floating point comparisons. .279
unsafe function.....................................239
unsigned..................................................21
Unsigned Conversions............................77
Using a Macros.....................................168
volatile memory........................................8
Why Buffer...........................................257
Widening conversions............................76
Widening Conversions............................76
%define.................................................167
Page 353