Assembly: Arithmetic and Logic: Machine Programming
Assembly: Arithmetic and Logic: Machine Programming
1
How does a computer
interpret and execute
C programs?
Learning Assembly
Arithmetic and
Moving data
logical Control flow Function calls
around
operations
• Register
• Memory Location
(at most one of src, dst)
Memory Location Syntax
Syntax Meaning
0x104 Address 0x104 (no $)
(%rax) What’s in %rax
4(%rax) What’s in %rax, plus 4
(%rax, %rdx) Sum of what’s in %rax and %rdx
4(%rax, %rdx) Sum of values in %rax and %rdx, plus 4
What’s in %rcx, times 4 (multiplier can be 1,
(, %rcx, 4)
2, 4, 8)
(%rax, %rcx, 2) What’s in %rax, plus 2 times what’s in %rcx
What’s in %rax, plus 2 times what’s in %rcx,
8(%rax, %rcx, 2)
plus 8
Lecture Plan
• Recap: mov so far
• Data and Register Sizes
• The lea Instruction
• Logical and Arithmetic Operations
• Practice: Reverse Engineering
Data Sizes
Data sizes in assembly have slightly different terminology to get used to:
• A byte is 1 byte.
• A word is 2 bytes.
• A double word is 4 bytes.
• A quad word is 8 bytes.
15
Register Responsibilities
Some registers take on special responsibilities during program execution.
• %rax stores the return value
• %rdi stores the first parameter to a function
• %rsi stores the second parameter to a function
• %rdx stores the third parameter to a function
• %rip stores the address of the next instruction to execute
• %rsp stores the address of the current top of the stack
mov Variants
• mov can take an optional suffix (b,w,l,q) that specifies the size of data to move:
movb, movw, movl, movq
• mov only updates the specific register bytes or memory locations indicated.
• Exception: movl writing to a register will also set high order 4 bytes to 0.
Practice: mov And Data Sizes
For each of the following mov instructions, determine the appropriate suffix
based on the operands (e.g. movb, movw, movl or movq).
Instruction Description
movzbw Move zero-extended byte to word
movzbl Move zero-extended byte to double word
movzwl Move zero-extended word to double word
movzbq Move zero-extended byte to quad word
movzwq Move zero-extended word to quad word
movz and movs
Instruction Description
movsbw Move sign-extended byte to word
movsbl Move sign-extended byte to double word
movswl Move sign-extended word to double word
movsbq Move sign-extended byte to quad word
movswq Move sign-extended word to quad word
movslq Move sign-extended double word to quad word
cltq Sign-extend %eax to %rax
%rax <- SignExtend(%eax)
Lecture Plan
• Recap: mov so far
• Data and Register Sizes
• The lea Instruction
• Logical and Arithmetic Operations
• Practice: Reverse Engineering
lea
The lea instruction copies an “effective address” from one place to another.
lea src,dst
Unlike mov, which copies data at the address src to the destination, lea copies
the value of src itself to the destination.
Examples: Whenever a register is being referenced with () i.e. (%rax), it means that the register's value
should be taken as a memory address and the value that's in action is the value in that memory
incq 16(%rax) address (also called dereferencing).
dec %rdx
not %rcx
31
Binary Instructions
The following instructions operate on two operands (both can be register or
memory, source can also be immediate). Both cannot be memory locations.
Read it as, e.g. “Subtract S from D”:
Instruction Effect Description
add S, D D ← D + S Add
sub S, D D ← D - S Subtract
imul S, D D ← D * S Multiply
xor S, D D ← D ^ S Exclusive-or
or S, D D ← D | S Or
and S, D D ← D & S And
Examples:
addq %rcx,(%rax)
xorq $16,(%rax, %rdx, 8)
32
subq %rdx,8(%rax)
Lecture Plan
• Recap: mov so far
• Data and Register Sizes
• The lea Instruction
• Logical and Arithmetic Operations
• Practice: Reverse Engineering
Assembly Exercise 1
00000000004005ac <sum_example1>:
4005bd: 8b 45 e8 mov %esi,%eax
4005c3: 01 d0 add %edi,%eax
4005cc: c3 retq
Which of the following is most likely to have generated the above assembly?
// A) // B)
void sum_example1() { int sum_example1(int x, int y) {
int x; return x + y;
int y; }
int sum = x + y;
}
// C)
void sum_example1(int x, int y) {
int sum = x + y;
} A and C does not return a value
42
Assembly Exercise 2
0000000000400578 <sum_example2>:
400578: 8b 47 0c mov 0xc(%rdi),%eax
40057b: 03 07 add (%rdi),%eax
40057d: 2b 47 18 sub 0x18(%rdi),%eax
400580: c3 retq
int sum_example2(int arr[]) { What location or value in the assembly above represents the
int sum = 0; C code’s sum variable?
sum += arr[0];
sum += arr[3];
sum -= arr[6]; %eax
return sum;
} 43
Disassembling Object Code
Disassembler
gcc -g -O -c example2.c
objdump –d example2.o
Useful tool for examining object code
Analyzes bit pattern of series of instructions
Produces approximate rendition of assembly code
Can be run on either a.out(complete executable) or .ofile
Assembly Exercise 3
0000000000400578 <sum_example2>:
400578: 8b 47 0c mov 0xc(%rdi),%eax
40057b: 03 07 add (%rdi),%eax
40057d: 2b 47 18 sub 0x18(%rdi),%eax
400580: c3 retq
int sum_example2(int arr[]) { What location or value in the assembly code above
int sum = 0; represents the C code’s 6 (as in arr[6])?
sum += arr[0];
sum += arr[3];
sum -= arr[6]; 0x18
return sum;
} 44
Recap
• Recap: mov so far
• Data and Register Sizes
• The lea Instruction
• Logical and Arithmetic Operations
• Practice: Reverse Engineering
Next Time: control flow in assembly (while loops, if statements, and more)
46
Lecture takeaway: There are assembly instructions for
arithmetic and logical operations. They share the
same operand form as mov, but lea interprets them
differently. There are also different register sizes that
may be used in assembly instructions.
48
A Note About Operand Forms
• Many instructions share the same address operand forms that mov uses.
• Eg. 7(%rax, %rcx, 2).
• These forms work the same way for other instructions, e.g. sub:
• sub 8(%rax,%rdx),%rcx -> Go to 8 + %rax + %rdx, subtract what’s there from %rcx
• The exception is lea:
• It interprets this form as just the calculation, not the dereferencing
• lea 8(%rax,%rdx),%rcx -> Calculate 8 + %rax + %rdx, put it in %rcx
49
Reverse Engineering 1
int add_to(int x, int arr[], int i) {
int sum = ___?___;
sum += arr[___?___];
return ___?___;
}
----------
add_to:
movslq %edx, %rdx
movl %edi, %eax
addl (%rsi,%rdx,4), %eax
ret
54
Reverse Engineering 1
int add_to(int x, int arr[], int i) {
int sum = ___?___;
sum += arr[___?___];
return ___?___;
}
----------
// x in %edi, arr in %rsi, i in %edx
add_to:
movslq %edx, %rdx // sign-extend i into full register
movl %edi, %eax // copy x into %eax
addl (%rsi,%rdx,4), %eax // add arr[i] to %eax
ret
55
Reverse Engineering 1
int add_to(int x, int arr[], int i) {
int sum = x;
sum += arr[i];
return sum;
}
----------
// x in %edi, arr in %rsi, i in %edx
add_to:
movslq %edx, %rdx // sign-extend i into full register
movl %edi, %eax // copy x into %eax
addl (%rsi,%rdx,4), %eax // add arr[i] to %eax
ret
56
Reverse Engineering 2
int elem_arithmetic(int nums[], int y) {
int z = nums[___?___] * ___?___;
z -= ___?___;
z >>= ___?___;
return ___?___;
}
----------
elem_arithmetic:
movl %esi, %eax
imull (%rdi), %eax
subl 4(%rdi), %eax
sarl $2, %eax
addl $2, %eax
ret
57
Reverse Engineering 2
int elem_arithmetic(int nums[], int y) {
int z = nums[___?___] * ___?___;
z -= ___?___;
z >>= ___?___;
return ___?___;
}
----------
// nums in %rdi, y in %esi
elem_arithmetic:
movl %esi, %eax // copy y into %eax
imull (%rdi), %eax // multiply %eax by nums[0]
subl 4(%rdi), %eax // subtract nums[1] from %eax
sarl $2, %eax // shift %eax right by 2
addl $2, %eax // add 2 to %eax
ret
58
Reverse Engineering 2
int elem_arithmetic(int nums[], int y) {
int z = nums[0] * y;
z -= nums[1];
z >>= 2;
return z + 2;
}
----------
// nums in %rdi, y in %esi
elem_arithmetic:
movl %esi, %eax // copy y into %eax
imull (%rdi), %eax // multiply %eax by nums[0]
subl 4(%rdi), %eax // subtract nums[1] from %eax
sarl $2, %eax // shift %eax right by 2
addl $2, %eax // add 2 to %eax
ret
59