0% found this document useful (0 votes)
197 views131 pages

Modern Computer Architecture and Programming in Assembly Language - TCM - 183 - 1309076

1. The document describes a course on modern computer architecture and programming in assembly language. It covers topics like C/assembly mapping, data movement, arithmetic operations, procedures, calling conventions, and optimization. 2. The course uses online lectures, workshops, and labs. It is organized into sections on hardware organization, mapping C to assembly, instruction types like arithmetic and jumps, procedures and calling conventions, and optimizing code. 3. Sample code is provided to demonstrate mapping a C function to assembly, including setting up the stack frame, passing arguments, performing arithmetic operations, and returning values.

Uploaded by

Jayath Gayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
197 views131 pages

Modern Computer Architecture and Programming in Assembly Language - TCM - 183 - 1309076

1. The document describes a course on modern computer architecture and programming in assembly language. It covers topics like C/assembly mapping, data movement, arithmetic operations, procedures, calling conventions, and optimization. 2. The course uses online lectures, workshops, and labs. It is organized into sections on hardware organization, mapping C to assembly, instruction types like arithmetic and jumps, procedures and calling conventions, and optimizing code. 3. Sample code is provided to demonstrate mapping a C function to assembly, including setting up the stack frame, passing arguments, performing arithmetic operations, and returning values.

Uploaded by

Jayath Gayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 131

Modern Computer Architecture and

Programming in Assembly Language

Moscow State University


Faculty of Computational Mathematics and Cybernetics
Spring, 2010/2011
Course objectives

• Thread studying C language


• Understanding C-programs via
assembly language
– Debugging
• Memory bugs
• Linkage bugs
– Performance tuning
– Malware code analysis
• Studying machine-level execution
model
Toolchain
Base textbook

• Computer Systems:
A Programmer's
Perspective, 2/E
(CS:APP2e)

Randal E. Bryant and


David R. O'Hallaron,

Carnegie Mellon
University
Course organization

• Online lectures
– http://asmcourse.cs.msu.ru/
• Online workshops
– http://algcourse.cs.msu.su/teachwiki/
• Online labs
– http://earth.ispras.ru
Agenda

I. Introduction. 3 sample programs.


1. Hardware organization. Assembly instruction. Data movement.
2. Arithmetic operations. Status flags. Condition Codes. Jump
instructions.
3. IA32 stack. Procedures. Call convention.
II. C/Assembly mapping in details.
1. «long long» arithmetic
2. Structure and union. Data alignment.
3. Logical, Shift and Rotate Instructions. Bit fields.
4. Conditional move.
5. Loops: reduction to «if-goto» form.
6. Arrays: multidimensional, multilevel. Code optimization: machine-
(in)dependent.
7. Switch: if-else chain, jump table, decision tree.
8. cdecl convention. Omit frame pointer. fastcall convention.
Von Neumann architecture
Modern hardware organization
IA32 registers
X86-64 registers
void f() { section .bss
static int cntr = 0; // 1 ; Allocation – 4 byte
int x = 2, y = 1, z = 0; // 2 cntr resd 1
unsigned short w = 282; // 3 section .text
signed char q = 13; // 4 global f
++cntr; // 5 ; Entry point
z = -x + q * w *y - w; // 6 f:
} push ebp
mov ebp, esp
sub esp, 16
mov dword [ebp-16], 2 ; (1)
mov dword [ebp-12], 1 ; (2)
mov dword [ebp-8], 0 ; (3)
mov word [ebp-4], 282 ; (4)
mov byte [ebp-1], 13 ; (5)
add dword [cntr], 1 ; (6)
movsx eax, byte [ebp-1] ; (7)
movzx edx, word [ebp-4] ; (8)
imul eax, edx ; (9)
imul eax, dword [ebp-12] ; (10)
sub eax, dword [ebp-16] ; (11)
sub eax, edx ; (12)
mov dword [ebp-8], eax ; (13)
leave
ret
Variable location

void f() {
static int cntr = 0; // 1
int x = 2, y = 1, z = 0; // 2
unsigned short w = 282; // 3
signed char q = 13; // 4
++cntr; // 5
z = -x + q * w *y - w; // 6
}
Data retrieval

byte [ebp - 12]


byte [ebp - 11]
dword [ebp - 12]
byte [ebp - 10]
Little-endian
byte [ebp – 9]
Memory segmentation

int x = 2, y = 1, z = 0;
unsigned short w = 282;
signed char q = 13;

static int cntr = 0;

x = 2;
y = 1;
z = 0;
++cntr;
z = -x + q * w *y - w;
Data transfer

mov dword [ebp-16], 2 ; (1)


mov dword [ebp-12], 1 ; (2)
mov dword [ebp-8], 0 ; (3)
mov word [ebp-4], 282 ; (4)
mov byte [ebp-1], 13 ; (5)
nasm: program organization

%include "io.inc" ; macro

section .data ; static variables


var dd 0x1234F00D

section .bss ; zero initialized


cntr resd 1 ; static variables

section .text ; code


global CMAIN
CMAIN: ; entry point
add [cntr], 1
mov eax, [var]
io.inc
• I/O macro
– PRINT_UDEC size, data
– PRINT_DEC size, data
– PRINT_HEX size, data
– PRINT_CHAR ch
– PRINT_STRING data
– NEWLINE
– GET_UDEC size, data
– GET_DEC size, data
– GET_HEX size, data
– GET_CHAR data
– GET_STRING data, maxsz
• Program entry point
– CMAIN
• stdlib functions
– CEXTERN
EFLAGS layout
Unsigned overflow diagram

Positive Negative
overflow overflow

x + y x - y
Signed overflow diagram

Positive
overflow

x - y

Negative
overflow
Negative
overflow

x + y
Positive
overflow
Arithmetic instructions: flags

OF SF ZF PF CF
ADD M M M M M
SUB M M M M M
ADC M M M M TM
SBB M M M M TM
IMUL M - - - M
IDIV - - - - -
NEG M M M M M

M = modified, T = tested, - = no effect


void f() { section .text
int a[16]; global f
int i, x = 99, y = 97; // 1 f:
if (x < y) { // 2 push ebp
a[0] = 0; // 3 mov ebp, esp
for (i = 1; i < 16; ++i) { // 4 sub esp, 88
a[i] = y / i; // 5 mov DWORD [ebp-8], 99 ; (1)
} mov DWORD [ebp-4], 97 ; (2)
} mov eax, DWORD [ebp-8] ; (3)
} sub eax, DWORD [ebp-4] ; (4)
jge L5 ; (5)
mov DWORD [ebp-76], 0 ; (6)
mov DWORD [ebp-12], 1 ; (7)
L3:
cmp DWORD [ebp-12], 15 ; (8)
jg L5 ; (9)
mov ecx, DWORD [ebp-12] ; (10)
mov edx, DWORD [ebp-4] ; (11)
mov eax, edx ; (12)
sar edx, 31 ; (13)
idiv ecx ; (14)
mov DWORD [ebp-76+ecx*4], eax ; (15)
add DWORD [ebp-12], 1 ; (16)
jmp L3 ; (17)
L5:
leave
ret
Flowchart
void f() {
int a[16];
int i, x = 99, y = 97; // 1
if (x < y) { // 2
a[0] = 0; // 3
for (i = 1; i < 16; ++i) { // 4
a[i] = y / i; // 5
}
}
}
Stack frame layout

array layout
Push onto stack
Pop off stack
Stack frame
%include ‘io.inc’
int main() { section .text
int a = 1, b = 2, c;
c = sum(a, b); global CMAIN
return 0; CMAIN:
} mov DWORD [ebp-16],0x1 ; (1)
mov DWORD [ebp-12],0x2 ; (2)
int sum(int x, int y) { mov eax,DWORD [ebp-12] ; (3)
int t = x + y; mov DWORD [esp+4],eax ; (4)
return t; mov eax,DWORD [ebp-16] ; (5)
} mov DWORD [esp],eax ; (6)
call sum ; (7)
mov DWORD [ebp-8],eax ; (8)

global sum
sum:
push ebp ; (9)
mov ebp,esp ; (10)
sub esp,0x10 ; (11)
mov edx,DWORD [ebp+12] ; (12)
mov eax,DWORD [ebp+8] ; (13)
add eax,edx ; (14)
mov DWORD [ebp-4],eax ; (15)
mov eax,DWORD [ebp-4] ; (16)
mov esp, ebp ; (17)
pop ebp ; (18)
ret ; (19)
64-bit addition
long long f1(long long a, long long b) {
long long c;
c = a + b;
return c;
}

; …
mov eax, DWORD [ebp+16] ; (1)
mov edx, DWORD [ebp+20] ; (2)
add eax, DWORD [ebp+8] ; (3)
adc edx, DWORD [ebp+12] ; (4)
; …
64-bit addition
64-bit addition: data flow
64-bit subtraction

long long f3(long long a, long long b) {


long long c;
c = a - b;
return c;
}

; …
mov eax, DWORD [ebp+8] ; (1)
mov edx, DWORD [ebp+12] ; (2)
sub eax, DWORD [ebp+16] ; (3)
sbb edx, DWORD [ebp+20] ; (4)
; …
64-bit subtraction: data flow
long long f2(long long a, globаl f2
long long b) { f2:
long long c; push ebp
c = a * b; mov ebp, esp
return c; sub esp, 8
} mov DWORD [esp], ebx ; (1)
mov ecx, DWORD [ebp+20] ; (2)
mov ebx, DWORD [ebp+8] ; (3)
mov DWORD [esp+4], esi ; (4)
mov eax, DWORD [ebp+12] ; (5)
mov esi, DWORD [ebp+16] ; (6)
imul ecx, ebx ; (7)
imul eax, esi ; (8)
add ecx, eax ; (9)
mov eax, esi ;(10)
mul ebx ;(11)
mov ebx, DWORD [esp] ;(12)
lea esi, [ecx+edx] ;(13)
mov edx, esi ;(14)
mov esi, DWORD [esp+4] ;(15)
mov esp, ebp
pop ebp
ret
64-bit multiplication
64-bit multiplication: data flow
Contest #1: expression evaluation

• 7 word problems
• Solve 5 problems for grade «excellent»
• Submit via e-judge:
http://earth.ispras.ru/cgi-bin/new-client?contest_id=150&locale_id=0
• Sample problem
– «Watch out for overflow»
Contest #1: «Watch out for overflow»

A water tank is a rectangular parallelepiped and has dimensions


AxBxC decimeters. A pipe is connected to the tank. The pipe has a
throughput of V liters per minute. Determine the number of minutes
the valve on the pipe has to be opened for so that the tank gets filled
with as much water as possible but without an overflow.
The construction of the pipe and valve allows only the maximum
throughput, and the valve can be open only for a whole number of
minutes.
The standard input contains four space-delimited numbers: A, B, C,
and V. All numbers are positive integers and do not exceed 2*109.
Print to the standard output the number of minutes for which the
valve is to be opened. It is guaranteed that the correct answer will
never exceed 2*109. Do not use conditional control and data transfer
instructions.
Time limit: 1 second
Memory limit: 64 MB
Contest #1: e-judge
Structure field allocation

struct rec { mov edx, dword [x] ; (1)


int i; mov eax, dword [edx] ; (2)
int j; mov dword [edx + 4], eax ; (3)
int a[3];
int *p;
}

struct rec *x;
x->j = x->i;
Structure field access

struct rec { mov edx, dword [i] ; (1)


int i; mov eax, dword [x] ; (2)
int j; lea eax, [eax + 4 * edx + 8] ; (3)
int a[3];
int *p;
};

struct rec *x;


int i;

&(r->a[i]);
Structure field access

struct rec { mov edx, dword [r] ; (1)


int i; mov eax, dword [edx + 4] ; (2)
int j; add eax, dword [edx] ; (3)
int a[3]; lea eax, [edx + 4 * eax + 8] ; (4)
int *p; mov dword [edx + 20], eax ; (5)
};

struct rec *r;

r->p = &r->a[r->i + r->j];


struct vs. union
// (1) wrong // (3) correct
struct NODE_S { typedef enum {
struct NODE_S *left; N_LEAF,
struct NODE_S *right; N_INTERNAL} nodetype_t;
double data;
}; struct NODE_T {
nodetype_t type;
union NODE_U {
// (2) not bed struct {
union NODE_U { struct NODE_T *left;
struct { struct NODE_T *right;
union NODE_U *left; } internal;
union NODE_U *right; double data;
} internal; } info;
double data; };
};
union vs. copy
unsigned float2bit(float f) { global float2bit
union { float2bit:
float f; push ebp
unsigned u; mov ebp, esp
} temp; mov eax, dword [ebp + 8]
temp.f = f; mov esp, ebp
return temp.u; pop ebp
} ret

unsigned copy(unsigned u) {
return u;
}
Data Alignment

typedef struct {
int i;
char c;
int j;
} trifield1; // (2)

typedef struct {
int i;
int j;
char c;
} trifield2; // (3)
Logical Instructions

int pierce_arrow(int a, int b) section .text


{ global pierce_arrow
int t = ~(a | b); pierce_arrow:
return t; push ebp
} mov ebp, esp
mov eax, DWORD [ebp+12] ; (1)
or eax, DWORD [ebp+8] ; (2)
not eax ; (3)
pop ebp
ret
Shift left
Shift logical right
Shift arithmetic right
Shift: integer promotion

char upndown(char x) { section .text


return (x << 8) >> 8; global upndown
} upndown:
push ebp
mov ebp, esp
movsx eax, BYTE [ebp+8]
sal eax, 8
sar eax, 8
pop ebp
ret
Rotate instructions
unsigned sha256_f1(unsigned x) {
unsigned t;
t = ((x >> 2) | (x << ((sizeof(x) << 3) - 2))); // (1)
t ^= ((x >> 13) | (x << ((sizeof(x) << 3) - 13))); // (2)
t ^= ((x >> 22) | (x << ((sizeof(x) << 3) - 22))); // (3)
return t;
}

global sha256_f1
sha256_f1:
push ebp
mov ebp, esp
mov edx, DWORD [ebp+8] ; (1)
pop ebp ; (2)
mov eax, edx ; (3)
mov ecx, edx ; (4)
ror eax, 13 ; (5)
ror ecx, 2 ; (6)
xor eax, ecx ; (7)
ror edx, 22 ; (8)
xor eax, edx ; (9)
ret
Special arithmetic

int arith(int x,
int y, ; …
int z) { mov eax, dword [ebp + 16] ; (1)
int t1 = x + y; lea eax, [eax + 2 * eax] ; (2)
int t2 = z * 48; sal eax, 4 ; (3)
int t3 = t1 & 0xFFFF; mov edx, dword [ebp + 12] ; (4)
int t4 = t2 * t3; add edx, dword [ebp + 8] ; (5)
return t4; and edx, 65535 ; (6)
} imul eax, edx ; (7)
; …
Bit field

struct omg { section .text


int a : 3; global f
int b : 5; f:
int c : 2; ; …
unsigned cntr: 31; mov esi, DWORD [ebp+8] ; load
int sum : 8; mov eax, DWORD [esi+4] ; cntr
}; lea edx, [eax+1] ; cntr++
and eax, -2147483648 ; mask
void f(struct omg *p) { and edx, 2147483647 ; mask
p->cntr++; // 1 or eax, edx ; merge
p->b = (p->c << 3) | (p->a); // 2 mov DWORD [esi+4], eax ; store
p->sum = p->a + p->b + p->c; // 3 ; …
}
Bit field

struct omg { section .text


int a : 3; global f
int b : 5; f:
int c : 2; ; …
unsigned cntr: 31; movzx ebx, BYTE [esi+1] ; p->c
int sum : 8; sal ebx, 6 ; <<
}; sar bl, 3 ; 3
movzx edx, BYTE [esi] ;
void f(struct omg *p) { mov eax, edx ;
p->cntr++; // 1 and edx, 7 ;
p->b = (p->c << 3) | (p->a); // 2 sal eax, 5 ;
p->sum = p->a + p->b + p->c; // 3 sar al, 5 ; p->a
} or ebx, eax ;
sal ebx, 3 ;
or edx, ebx ;
mov BYTE [esi], dl ;
; …
Bit field

struct omg { section .text


int a : 3; global f
int b : 5; f:
int c : 2; ; …
unsigned cntr: 31; movzx ebx, BYTE [esi+1] ; p->c
int sum : 8; sal ebx, 6 ; <<
}; sar bl, 3 ; 3
movzx edx, BYTE [esi] ;
void f(struct omg *p) { mov eax, edx ;
p->cntr++; // 1 and edx, 7 ;
p->b = (p->c << 3) | (p->a); // 2 sal eax, 5 ;
p->sum = p->a + p->b + p->c; // 3 sar al, 5 ; p->a
} or ebx, eax ;
sal ebx, 3 ;
or edx, ebx ;
mov BYTE [esi], dl ;
; …
Bit field

struct omg { section .text


int a : 3; global f
int b : 5; f:
int c : 2; ; …
unsigned cntr: 31; movzx ebx, BYTE [esi+1] ; p->c
int sum : 8; sal ebx, 6 ; <<
}; sar bl, 3 ; 3
movzx edx, BYTE [esi] ;
void f(struct omg *p) { mov eax, edx ;
p->cntr++; // 1 and edx, 7 ;
p->b = (p->c << 3) | (p->a); // 2 sal eax, 5 ;
p->sum = p->a + p->b + p->c; // 3 sar al, 5 ; p->a
} or ebx, eax ;
sal ebx, 3 ;
or edx, ebx ;
mov BYTE [esi], dl ;
; …
Bit field

struct omg { section .text


int a : 3; global f
int b : 5; f:
int c : 2; ; …
unsigned cntr: 31; movzx ebx, BYTE [esi+1] ; p->c
int sum : 8; sal ebx, 6 ; <<
}; sar bl, 3 ; 3
movzx edx, BYTE [esi] ;
void f(struct omg *p) { mov eax, edx ;
p->cntr++; // 1 and edx, 7 ;
p->b = (p->c << 3) | (p->a); // 2 sal eax, 5 ;
p->sum = p->a + p->b + p->c; // 3 sar al, 5 ; p->a
} or ebx, eax ;
sal ebx, 3 ;
or edx, ebx ;
mov BYTE [esi], dl ;
; …
Bit field

struct omg { section .text


int a : 3; global f
int b : 5; f:
int c : 2; ; …
unsigned cntr: 31; movzx ebx, BYTE [esi+1] ;
int sum : 8; sal ebx, 6 ;
}; sar bl, 6 ; p->c
movzx edx, BYTE [esi] ;
void f(struct omg *p) { sal edx, 5 ;
p->cntr++; // 1 sar dl, 5 ; p->a
p->b = (p->c << 3) | (p->a); // 2 movzx ecx, BYTE [esi] ;
p->sum = p->a + p->b + p->c; // 3 sar cl, 3 ; p->b
} add ebx, edx ;
add ebx, ecx ;
mov BYTE [esi+8], bl ;
pop ebx ;
pop esi ;
pop ebp ;
ret ;
Jcc Condition Description

JE ZF Equal / Zero
JNE ~ZF Not Equal / Not Zero
JS SF Negative
JNS ~SF Non-negative
JG ~(SF^OF)&~ZF Greater (signed)
JGE ~(SF^OF) Greater or Equal (signed)
JL (SF^OF) Less (signed)
JLE (SF^OF)|ZF Less or Equal (signed)
JA ~CF&~ZF Above (unsigned)
JB CF Below (unsigned)
int absdiff(int x, int y) { absdiff:
int result; push ebp
if (x > y) { mov ebp, esp
result = x-y; mov edx, dword [8 + ebp] ; (1)
} else { mov eax, dword [12 + ebp] ; (2)
result = y-x; cmp edx, eax ; (3)
} jle .L6 ; (4)
return result; sub edx, eax ; (5)
} mov eax, edx ; (6)
jmp .L7 ; (7)
.L6: ; (8)
sub eax, edx ; (9)
.L7: ; (10)
pop ebp
ret
int goto_ad(int x, int y) { absdiff:
int result; push ebp
if (x <= y) goto Else; mov ebp, esp
result = x-y; mov edx, dword [8 + ebp] ; (1)
goto Exit; mov eax, dword [12 + ebp] ; (2)
Else: cmp edx, eax ; (3)
result = y-x; jle .L6 ; (4)
Exit: sub edx, eax ; (5)
return result; mov eax, edx ; (6)
} jmp .L7 ; (7)
.L6: ; (8)
sub eax, edx ; (9)
.L7: ; (10)
pop ebp
ret
val = Test ? Then_Expr : Else_Expr;

val = x>y ? x-y : y-x;

nt = !Test; tmp_val = Then_Expr;


if (nt) goto Else; result = Else_Expr;
val = Then_Expr; t = Test;
goto Done; if (t) result = tmp_val;
Else: return result;
val = Else_Expr;
Done:
. . .
int absdiff(int x, int y) {
int result;
if (x > y) {
result = x-y;
} else {
result = y-x;
}
return result;
}

x loaded in edi
y loaded in esi
absdiff:
mov edx, edi
sub edx, esi ; tmp_val:edx = x-y
mov eax, esi
sub eax, edi ; result:eax = y-x
cmp edi, esi ; Compare x:y
cmovg eax, edx ; If >, result:eax = tmp_val:edx
ret
int pcount_do(unsigned x) { int pcount_do(unsigned x)
int result = 0; {
do { int result = 0;
result += x & 0x1; loop:
x >>= 1; result += x & 0x1;
} while (x); x >>= 1;
return result; if (x)
} goto loop;
return result;
}
int pcount_do(unsigned x) mov ecx, 0 ; result = 0
{ .L2: ; loop:
int result = 0; mov eax, edx
loop: and eax, 1 ; t = x & 1
result += x & 0x1; add ecx, eax ; result += t
x >>= 1; shr edx, 1 ; x >>= 1
if (x) jne .L2 ; If !0, goto loop
goto loop;
return result;
}

• Register allocation:
edx x
ecx result
int pcount_while(unsigned x) { int pcount_do(unsigned x) {
int result = 0; int result = 0;
while (x) { if (!x) goto done;
result += x & 0x1; loop:
x >>= 1; result += x & 0x1;
} x >>= 1;
return result; if (x)
} goto loop;
done:
return result;
}
int pcount_do(unsigned x) {
int result = 0;
loop:
if (!x) goto done;
result += x & 0x1;
x >>= 1;
goto loop;
done:
return result;
}
#define WSIZE 8*sizeof(int)

int pcount_for(unsigned x) {
int i;
int result = 0;
for (i = 0; i < WSIZE; i++) {
unsigned mask = 1 << i;
result += (x & mask) != 0;
}
return result;
}
#define WSIZE 8*sizeof(int) int pcount_for_gt(unsigned x) {
int i;
int pcount_for(unsigned x) { int result = 0;
int i; i = 0;
int result = 0; if (!(i < WSIZE))
for (i = 0; i < WSIZE; i++) { goto done;
unsigned mask = 1 << i; loop:
result += (x & mask) != 0; {
} unsigned mask = 1 << i;
return result; result += (x & mask) != 0;
} }
i++;
if (i < WSIZE)
goto loop;
done:
return result;
}
#define WSIZE 8*sizeof(int) int pcount_for_gt(unsigned x) {
int i;
int pcount_for(unsigned x) { int result = 0;
int i; i = 0;
int result = 0; if (!(i < WSIZE))
for (i = 0; i < WSIZE; i++) { goto done;
unsigned mask = 1 << i; loop:
result += (x & mask) != 0; {
} unsigned mask = 1 << i;
return result; result += (x & mask) != 0;
} }
i++;
if (i < WSIZE)
goto loop;
done:
return result;
}
int fib(int x) { // x >= 1 fib:
int i; push ebp
int predpred = 0; mov ebp, esp
int pred = 1; push ebx
int res = 1;
x--; mov ecx, dword [ebp + 8] ; x
for (i = 0; i < x; i++) { xor edx, edx ; predpred
res = predpred + pred; mov ebx, 1 ; pred
predpred = pred; mov eax, 1 ; res
pred = res; dec ecx
}
return res; jecxz .end
} .loop:
lea eax, [edx + ebx]
mov edx, ebx
mov ebx, eax
loop .loop

.end:
pop ebx
pop ebp
ret
int fib(int x) { // x >= 1 fib:
int i; push ebp
int predpred = 0; mov ebp, esp
int pred = 1; push ebx
int res = 1;
x--; mov ecx, dword [ebp + 8] ; x
for (i = 0; i < x; i++) { xor edx, edx ; predpred
res = predpred + pred; mov ebx, 1 ; pred
predpred = pred; mov eax, 1 ; res
pred = res; dec ecx
}
return res; jecxz .end
} .loop:
lea eax, [edx + ebx]
mov edx, ebx
mov ebx, eax
loop .loop

.end:
pop ebx
pop ebp
ret
int fib(int x) { // x >= 1 fib:
int i; push ebp
int predpred = 0; mov ebp, esp
int pred = 1; push ebx
int res = 1;
x--; mov ecx, dword [ebp + 8] ; x
for (i = 0; i < x; i++) { xor edx, edx ; predpred
res = predpred + pred; mov ebx, 1 ; pred
predpred = pred; mov eax, 1 ; res
pred = res; dec ecx
}
return res; jecxz .end
} .loop:
lea eax, [edx + ebx]
mov edx, ebx
mov ebx, eax
loop .loop

.end:
pop ebx
pop ebp
ret
int fib(int x) { // x >= 1 fib:
int i; push ebp
int predpred = 0; mov ebp, esp
int pred = 1; push ebx
int res = 1;
x--; mov ecx, dword [ebp + 8] ; x
for (i = 0; i < x; i++) { xor edx, edx ; predpred
res = predpred + pred; mov ebx, 1 ; pred
predpred = pred; mov eax, 1 ; res
pred = res; dec ecx
}
return res; jecxz .end
} .loop:
lea eax, [edx + ebx]
mov edx, ebx
mov ebx, eax
loop .loop

.end:
pop ebx
pop ebp
ret
•Integer values
– Stored and processed in general purpose registers
– Signed/unsigned values
Intel ASM Bytes C
byte b 1 [unsigned] char
word w 2 [unsigned] short
double word d 4 [unsigned] int
quad word q 8 [unsigned] long long int

•Floating-point values
– Stored and processed in special floating-point registers
Intel ASM Bytes C
Single d 4 float
Double q 8 double
• Arrays — layout in memory
T A[L];
– Array of elements of type T, array length is L
– Stored in a contiguous memory block of size L *
sizeof(T) bytes

char string[12];

x x + 12
int val[5];

x x+4 x+8 x + 12 x + 16 x + 20
double a[3];

x x+8 x + 16 x + 24
char *p[3];

x x+4 x+8 x + 12
•Array element access
T A[L];
– Array of elements of type T, array length is L
– The identifier A can be used as a pointer to element 0. Pointer type is T*

int val[5]; 1 5 2 1 3
x x+4 x+8 x + 12 x + 16 x + 20
• Reference Type Value
val[4] int 3
val int * x
val+1 int * x+4
&val[2] int * x+8
val[5] int ??
*(val+1) int 5
val + i int * x+4i
#define ZLEN 5
typedef int zip_dig[ZLEN];

zip_dig cmu = { 1, 5, 2, 1, 3 };
zip_dig mit = { 0, 2, 1, 3, 9 };
zip_dig ucb = { 9, 4, 7, 2, 0 };

zip_dig cmu; 1 5 2 1 3
16 20 24 28 32 36
zip_dig mit; 0 2 1 3 9
36 40 44 48 52 56
zip_dig ucb; 9 4 7 2 0
56 60 64 68 72 76

• Declaration ―zip_dig cmu‖ is equivalent to ―int cmu[5]‖


• Arrays are laid out in contiguous memory blocks 20 bytes each
– Generally it is not guaranteed that individual arrays are laid out without gaps
between them
zip_dig cmu; 1 5 2 1 3
16 20 24 28 32 36

int get_digit (zip_dig z, int dig) {  The edx register


return z[dig]; contains starting (base)
} array address

 The eax register


contains element index

; edx = z
; eax = dig  Element address is
mov eax, dword [edx+4*eax] # z[dig] edx + 4 * eax
void zincr(zip_dig z) {
int i;
for (i = 0; i < ZLEN; i++)
z[i]++;
}

; edx = z
mov eax, 0 ; eax = i
.L4: ; loop:
add dword [edx + 4 * eax], 1 ; z[i]++
add eax, 1 ; i++
cmp eax, 5 ; i vs. 5
jne .L4 ; if (!=) goto loop
void zincr_p(zip_dig z) { void zincr_v(zip_dig z) {
int *zend = z+ZLEN; void *vz = z;
do { int i = 0;
(*z)++; do {
z++; (*((int *) (vz+i)))++;
} while (z != zend); i += ISIZE;
} } while (i != ISIZE*ZLEN);
}

; edx = z = vz
movl eax, 0 ; i = 0
.L8: ; loop:
add dword [edx + eax], 1 ; Increment vz+i
add eax, 4 ; i += 4
cmp eax, 20 ; i vs. 20
jne .L8 ; if (!=) goto loop
#define PCOUNT 4
zip_dig pgh[PCOUNT] =
{{1, 5, 2, 0, 6},
{1, 5, 2, 1, 3 },
{1, 5, 2, 1, 7 },
{1, 5, 2, 2, 1 }};

zip_dig
1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1
pgh[4];

76 96 116 136 156

• ―zip_dig pgh[4]‖ is equivalent to ―int pgh[4][5]‖


– Variable pgh: array of 4 elements contiguously stored in
memory
– Each element is an array of 5 int’s contiguously stored in
memory.
• Rows are laid out first (Row-Major)
• Declaration
A[0][0] • • • A[0][C-1]
T A[R][C];
– 2D array of element of type T • •
• •
– R rows, C columns
• •
– Size of type T is K bytes
• Array size A[R-1][0] • • • A[R-1][C-1]
– R * C * K bytes
• Layout in memory
– Rows first

int A[R][C];

A A A A A A
[0] • • • [0] [1] • • • [1] • • • [R-1] • • • [R-1]
[0] [C-1] [0] [C-1] [0] [C-1]

4*R*C bytes
• Row access
– A[i] is an array of C elements
– Each element of type T requires K bytes
– Start address of row i
A + i * (C * K)

int A[R][C];

A[0] A[i] A[R-1]

A A A A A A
[0] ••• [0] • • • [i] ••• [i] • • • [R-1] ••• [R-1]
[0] [C-1] [0] [C-1] [0] [C-1]

A A+i*C*4 A+(R-1)*C*4
int *get_pgh_zip(int index){ #define PCOUNT 4
return pgh[index]; zip_dig pgh[PCOUNT] =
} {{1, 5, 2, 0, 6},
{1, 5, 2, 1, 3 },
{1, 5, 2, 1, 7 },
{1, 5, 2, 2, 1 }};

; eax = index
lea eax, [eax + 4 * eax] ; 5 * index
lea eax, [pgh + 4 * eax] ; pgh + (20 * index)

– pgh[index] is an array of 5 int’s


– Starting address is pgh+20*index

– Address is calculated and returned


– Address is calculated as pgh + 4*(index+4*index)
• Array elements
– A[i][j] is element of type T, requiring K bytes
– Element address is
A + i * (C * K) + j * K = A + (i * C + j)* K

int A[R][C];

A[0] A[i] A[R-1]

A A A A A
[0] ••• [0] • • • ••• [i] ••• • • • [R-1] ••• [R-1]
[0] [C-1] [j] [0] [C-1]

A A+i*C*4 A+(R-1)*C*4

A+i*C*4+j*4
int get_pgh_digit (int index, int dig) {
return pgh[index][dig];
}

mov eax, dword [ebp + 8] ; index


lea eax, [eax + 4 * eax] ; 5*index
add eax, dword [ebp + 12] ; 5*index+dig
mov eax, dword [pgh + 4 * eax] ; offset 4*(5*index+dig)

– pgh[index][dig] has int type


– Address: pgh + 20*index + 4*dig =
= pgh + 4*(5*index + dig)

– Address is calculated as
pgh + 4*((index+4*index)+dig)
zip_dig cmu = { 1, 5, 2, 1, 3 }; • The univ variable is an
zip_dig mit = { 0, 2, 1, 3, 9 }; array of 3 elements
zip_dig ucb = { 9, 4, 7, 2, 0 }; • Each element is a 4-byte
pointer
#define UCOUNT 3
int *univ[UCOUNT] = {mit, cmu, ucb}; • Each pointer references
an array of ints

cmu
1 5 2 1 3
univ
16 20 24 28 32 36
160 36 mit
0 2 1 3 9
164 16
168 56 ucb 36 40 44 48 52 56
9 4 7 2 0
56 60 64 68 72 76
int get_univ_digit (int index, int dig) {
return univ[index][dig];
}

mov eax, dword [ebp + 8] ; index


mov edx, dword [univ + 4 * eax] ; p = univ[index]
mov eax, dword [ebp + 12] ; dig
mov eax, dword [edx + 4 * eax] ; p[dig]

– Access to element Mem[Mem[univ+4*index]+4*dig]


– Two memory reads are required
• First one obtains pointer to a one-dimensional array
• Second one fetches required element from the one-
dimensional array
Multiple dimension array Multiple level array
int get_pgh_digit int get_univ_digit
(int index, int dig) (int index, int dig)
{ {
return pgh[index][dig]; return univ[index][dig];
} }

•Similar in C
•Significant difference in assembly

Mem[pgh+20*index+4*dig] Mem[Mem[univ+4*index]+4*dig]
N x N matrix #define N 16
typedef int fix_matrix[N][N];
• Fixed dimensions /* Get element a[i][j] */
– N is known at compile time int fix_ele
(fix_matrix a, int i, int j){
return a[i][j];
}

• Dynamic dimensions require


#define IDX(n, i, j) ((i)*(n)+(j))
explicit index calculation
/* Get element a[i][j] */
– Traditional way to
implement multiple int vec_ele
dimension arrays (int n, int *a, int i, int j){
return a[IDX(n,i,j)];
}

• Dynamic dimensions with


implicit indexing /* Get element a[i][j] */
– Supported in fresh gcc int var_ele
versions (int n, int a[n][n], int i, int j){
return a[i][j];
}
16 X 16 matrix
 Element access
 Address A + i * (C * K) + j * K
 C = 16, K = 4
/* Retrieval of element a[i][j] */
int fix_ele(fix_matrix a, int i, int j) {
return a[i][j];
}

mov edx, dword [ebp + 12] ; i


sal edx, 6 ; i*64
mov eax, dword [ebp + 16] ; j
sal eax, 2 ; j*4
add eax, dword [ebp + 8] ; a + j*4
mov eax, dword [eax + edx] ; *(a + j*4 + i*64)
n X n matrix
 Element access
 Address A + i * (C * K) + j * K
 C = n, K = 4

/* Retrieval of element a[i][j] */


int var_ele(int n, int a[n][n], int i, int j) {
return a[i][j];
}

mov eax, dword [ebp + 8] ; n


sal eax, 2 ; n*4
mov edx, eax ; n*4
imul edx, dword [ebp + 16] ; i*n*4
mov eax, dword [ebp + 20] ; j
sal eax, 2 ; j*4
add eax, dword [ebp + 12] ; a + j*4
mov eax, dword [eax + edx] ; *(a + j*4 + i*n*4)
Optimizing array element access

a jth column #define N 16


typedef int fix_matrix[N][N];

/*
• Calculations Fetch of array column j
– Process all elements in */
column j void fix_column
• Optimization (fix_matrix a, int j, int *dest)
{
– Fetch individual
int i;
elements of the column
for (i = 0; i < N; i++)
dest[i] = a[i][j];
}
Optimizing array element access
• Optimization
– Calculate ajp = &a[i][j]
• Initial value is
a + 4*j /* Fetch of array column j */
• Step is 4*N void fix_column
(fix_matrix a, int j, int *dest)
Register Value {
int i;
ecx ajp for (i = 0; i < N; i++)
ebx dest dest[i] = a[i][j];
edx i }

.L8: ; loop:
mov eax, dword [ecx] ; get *ajp
mov dword [ebx + 4 * edx], eax ; store in dest[i]
add edx, 1 ; i++
add ecx, 64 ; ajp += 4*N
cmp edx, 16 ; i vs. N
jne .L8 ; if !=, goto loop
Optimizing array element access
– Calculate ajp = &a[i][j]
• Initial value is
a + 4*j
• Step is 4*n /* Fetch of array column j */
void var_column
Register Value
(int n, int a[n][n],
ecx ajp int j, int *dest)
edi dest {
edx i int i;
for (i = 0; i < n; i++)
ebx 4*n dest[i] = a[i][j];
esi n }

.L18: ; loop:
mov eax, dword [ecx] ; get *ajp
mov dword [edi + 4 * edx], eax ; store in dest[i]
add edx, 1 ; i++
add ecx, ebx ; ajp += 4*n
cmp esi, edx ; n vs. i
jg .L18 ; if (>) goto loop
Optimizing array element access

– Change loop direction


/* Fetch of array column j */
• Exit loop on zero
counter void var_column
• Negative step
(int n, int a[n][n],
int j, int *dest)
• Initial pointer values
change {
int i;
• It is sufficient to
compare only a single for (i = n-1; i >=0; i--)
index against 0 dest[i] = a[i][j];
}

.L18: ; loop:
mov eax, dword [ecx] ; get *ajp
mov dword [edi + 4 * edx], eax ; store in dest[i]
add edx, 1 ; i++
add ecx, ebx ; ajp += 4*n
cmp esi, edx ; n vs. i
jg .L18 ; if (>) goto loop
Optimizing array element access

Register Initial value /* Fetch of array column j */


ecx a+4*n*(n-1)+4*j void var_column
(int n, int a[n][n],
edi dest – 4
int j, int *dest)
edx n {
ebx 4*n int i;
dest--;
esi unused now
for (i = n; i != 0; i--)
dest[i] = a[i-1][j];
}
Machine-dependent optimization
.L18: ; loop:
mov eax, dword [ecx] ; get *(ajp+…)
mov dword [edi + 4 * edx], eax ; store in dest[i]
sub ecx, ebx ; ajp -= 4*n
sub edx, 1 ; i--
jnz .L18 ; if (!=) goto loop
Contest #2:branches, loops, arrays

• 5 word problems
• 2 reverse engineering problems
• Solve any 5 problems for grade «excellent», but at least one
reverse engineering problem.
• Submit via e-judge:
- http://earth.ispras.ru/cgi-bin/new-client?contest_id=151&locale_id=0
- http://earth.ispras.ru/cgi-bin/new-client?contest_id=152&locale_id=0
• Sample word problem
– «Local extrema»
• Sample reverse engineering problem
– «R2»
Contest #2: «Local extrema»
Let us define local minimum of an integer sequence to be such an
element that is strictly less than both its neighbors. Let us define local
maximum of an integer sequence to be such an element that is
strictly greater than both its neighbors.

The standard input contains a non-negative integer N <= 500000


followed by N 32-bit integers comprising the sequence.

Print to the standard output first the number m of local minimums in


the sequence followed by their indices. Then print the number M of
local maximums followed by their indices. Indexing starts at 0. First
and last sequence elements cannot be its local extrema.

Time limit: 1 second


Memory limit: 64 MB
Contest #2: «R2»

Given the following %include "io.inc"


assembly language
program, recover its SECTION .text
semantics and express
GLOBAL CMAIN
it as a C language CMAIN:
program. The input is a GET_UDEC 4, EAX
32-bit unsigned MOV EBX, EAX
integer. DEC EBX
XOR EAX, EBX
ADD EAX, 1
Time limit: 1 second RCR EAX, 1
Memory limit: 64 MB PRINT_UDEC 4, EAX
NEWLINE
XOR EAX, EAX
RET
CDECL

• Where parameters are placed


– stack
• Parameter order
– «reverse»: from stack «top» to «bottom»
• Which registers may be used by the function
– EAX, EDX, ECX
• Whether the caller or the callee is responsible for cleaning up the
stack on return
– Caller cleans
• Return values
– EAX
– EAX:EDX
– In memory
CDECL

• Parameters placement
– Integer
• Actual value
– Pointer -> Integer
• Actual value
– Array -> Pointer
• Reference
– Structure/union
• Actual value
Function main
#include <stdio.h> CMAIN:
lea ecx, [esp+4]
int v; and esp, -16
void nullify(int argc, char* argv[]); push dword [ecx-4]
push ebp
int main(int argc, char* argv[]) { mov ebp, esp
nullify(argc, argv); push ecx
return 0; sub esp, 20
} mov eax, dword [ecx+4]
mov dword [esp+4], eax
void nullify(int argc, char* argv[]) { mov eax, dword [ecx]
} mov dword [esp], eax
call nullify
mov eax, 0
add esp, 20
pop ecx
pop ebp
lea esp, [ecx-4]
ret

nullify:
ret
Stack alignment
STDCALL

#include <stdio.h> sum:


push ebp
__attribute__((stdcall)) mov ebp, esp
int sum(int x, int y); sub esp, 16
mov edx, DWORD [ebp+12]
int main() { mov eax, DWORD [ebp+8]
int a = 1, b = 2, c; add eax, edx
c = sum(a, b); mov DWORD [ebp-4], eax
printf("%d\n", c); mov eax, DWORD [ebp-4]
return 0; leave
} ret 8

__attribute__((stdcall))
int sum(int x, int y) {
int t = x + y;
return t;
}
STDCALL

#include <stdio.h> CMAIN:


; …
__attribute__((stdcall)) mov eax, DWORD [ebp-12]
int sum(int x, int y); mov DWORD [esp+4], eax
mov eax, DWORD [ebp-16]
int main() { mov DWORD [esp], eax
int a = 1, b = 2, c; call sum
c = sum(a, b); sub esp, 8
printf("%d\n", c); mov DWORD [ebp-8], eax
return 0; ; …
}

__attribute__((stdcall))
int sum(int x, int y) {
int t = x + y;
return t;
}
FASTCALL

#include <stdio.h> CMAIN:


; …
__attribute__((fastcall)) int mov edx, DWORD [ebp-12]
sum(int x, int y); mov ecx, DWORD [ebp-16]
call sum
int main() { mov DWORD [ebp-8], eax
int a = 1, b = 2, c; ; …
c = sum(a, b);
printf("%d\n", c); sum:
return 0; lea eax, [ecx + edx]
} ret

__attribute__((fastcall)) int
sum(int x, int y) {
int t = x + y;
return t;
}
Omit frame pointer

void f(int x, int y) { f:


int numerator = ; setup
(x + y) * (x - y); sub esp, 8
int denominator = mov DWORD [esp+4], esi
x * x + y * y; mov esi, DWORD [esp+16]
if (0 == denominator) { mov ecx, DWORD [esp+12]
denominator = 1; mov DWORD [esp], ebx
} ; …
return (100 * numerator) /
denominator;
}
Saved address
Register Value Register
esi y esi [esp + 4]
ecx X ebx [esp]
Omit frame pointer

void f(int x, int y) { f:


int numerator = ; …
(x + y) * (x - y); mov edx, esi
int denominator = imul edx, esi ; edx = y^2
x * x + y * y; mov eax, ecx
if (0 == denominator) { imul eax, ecx ; eax = x^2
denominator = 1; mov ebx, edx
} add ebx, eax ; ebx = x^2 + y^2
return (100 * numerator) / jne .L2
denominator; mov ebx, 1
} .L2
Register Value ; …
esi y
ecx X
Omit frame pointer

void f(int x, int y) { f:


int numerator = ; …
(x + y) * (x - y); .L2
int denominator = lea edx, [esi+ecx]
x * x + y * y; sub ecx, esi
if (0 == denominator) { imul edx, ecx
denominator = 1; ; …
}
return (100 * numerator) /
denominator;
}
Register Value
esi y
ecx x
ebx x^2 + y^2
Omit frame pointer

void f(int x, int y) { f:


int numerator = ; …
(x + y) * (x - y); imul edx, edx, 100
int denominator = mov eax, edx
x * x + y * y; sar edx, 31
if (0 == denominator) { idiv ebx
denominator = 1; ; …
}
return (100 * numerator) /
denominator; Register Value
} esi y
ecx x
ebx x^2 + y^2
edx (x + y) * (x - y)
Omit frame pointer

void f(int x, int y) { f:


int numerator = ; …
(x + y) * (x - y); ; finish
int denominator = mov esi, DWORD [esp+4]
x * x + y * y; mov ebx, DWORD [esp]
if (0 == denominator) { add esp, 8
denominator = 1; ret
}
return (100 * numerator) /
denominator;
}
Variable-length parameter list

• An ellipsis (...) are placed at the end of a parameter list.


• Data type
– va_list
• Macro
– va_start(va_list, last fixed param)
– va_arg(va_list, cast type)
– va_end(va_list)
Variable-length parameter list

#include <stdarg.h>

int average(int count, ...) {


va_list ap;
int j;
int sum = 0;
va_start(ap, count);
for (j=0; j<count; j++)
sum += va_arg(ap, int);
va_end(ap);
return sum/count;
}
Contest #3: function call

• 5 word problems
• 2 reverse engineering problems
• Solve any 5 problems for grade «excellent», but at least one
reverse engineering problem.
• Submit via e-judge
- http://earth.ispras.ru/cgi-bin/new-client?contest_id=153&locale_id=0
- http://earth.ispras.ru/cgi-bin/new-client?contest_id=154&locale_id=0
• Sample word problem
– «GCD of Four»
• Sample reverse engineering problem
– «R3»
Contest #3: «GCD of Four»

The standard input contains four integers each


greater than zero and less than or equal to 109.
Print to the standard output their greatest
common divisor.

Time limit: 1 second


Memory limit: 64 MB
Contest #3: «R3»
Given the following %include "io.inc"

assembly language SECTION .text


program, recover its
GLOBAL CMAIN
semantics and express CMAIN:
it as a C language GET_UDEC 4, EAX
program. CALL F
PRINT_UDEC 4, EAX
NEWLINE
XOR EAX, EAX
The input contains a RET
single integer in
F:
bounds 0 to 20, CMP EAX, 0
inclusive. JNZ .REC
MOV EAX, 1
RET
Time limit: 1 second .REC:
Memory limit: 64 MB DEC EAX
CALL F
LEA EAX, [EAX + 2 * EAX]
RET
Acknowledgement

We are grateful to Randal E. Bryant and David R. O'Hallaron for


great textbook and other course materials we found on the site:
http://www.cs.cmu.edu/~213/
Especially we used samples for the following themes:
1. Loops: reduction to «if-goto» form.
2. Arrays: multidimensional, multilevel.
3. Loops: machine-independent code optimization.
4. Switch: jump table.
Final exam

• 10 problems
• Grading policy
– Max 6 point for each problem: 60 points total
• Grade «excellent» >= 48 points (0.8)
• Grade «good» >= 36 points (0.6)
• Grade «poor» >= 24 points (0.4)
Sample problem #1
Fill in register AL value in hex and in decimal (signed and unsigned), and
values of flags CF, OF, ZF and SF after execution of the following
instructions.

(a) MOV AL, 137


ADD AL, 200

Answer: AL = _____ (hex), _____ (signed dec), _____ (unsigned dec),


CF = __, OF = __, ZF = __, SF = __.

(b) MOV AL, -35


SUB AL, 216

Answer: AL = _____ (hex), _____ (signed dec), _____ (unsigned dec)


CF = __, OF = __, ZF = __, SF = __.
Sample problem #2

Assuming variable A containing the value 0xCAFE BABE, write


out register AX value in hex after execution of the
following instructions.

MOV AX, WORD [A + 2]


ADD AX, 3 ; Answer: AX = ______
Sample problem #3

Let register EAX contain a positive integer x <= 224. Write out
two variants, both consisting of a single assembly
instruction, that multiply x by 5. The result is to remain in
EAX. Two variants are considered distinct if mnemonics of
the used instructions are different.
Answer 1:
Answer 2:
Sample problem #4

Write a program in assembly equivalent to the following C


code fragment.

short *px, *py; *px++ = --*py;


Sample problem #5

Write a program in assembly equivalent to the following C


code fragment.

int x, y;
x /= -y;
Sample problem #6

Write a C code fragment SECTION .text


equivalent to the GLOBAL foo
following assembly foo:
fragment. Explain in your MOV ESI, DWORD [a]
TEST ESI, ESI
own words what the code JE .1
does. MOV ECX, DWORD [b]
TEST ECX, ECX
JE .1
MOV EDX, DWORD [ESI]
MOV EAX, EDX
SAR EDX, 31
IDIV ECX
SUB DWORD [ESI], EDX
.1:
XOR EAX, EAX
RET
Sample problem #7

A C function f has the MOVSX EDX, BYTE [EBP + 12]


following body. MOV EAX, DWORD [EBP + 16]
MOV DWORD [EAX], EDX
MOVSX EAX, WORD [EBP + 8]
*p = d; MOV EDX, DWORD [EBP + 20]
return x - c; SUB EDX, EAX
MOV EAX, EDX

This body corresponds to the


following assembly code.
Recover the function f
prototype declaration.
Sample problem #8

Write a function in assembly that calculates for given n and k


the number of combinations Cnk :

• Cnk  Cnk11  Cnk1 ,


for all integers n, k > 0,
• Cn0  1 , for all integers n,

• C0k  0 , for all integers k > 0.

The function must correspond to the following C declaration


and be implemented recursively.
unsigned int
combinations(unsigned int n, unsigned int k);
Sample problem #9

Write an assembly program that prints a sum of all odd


elements of the principal diagonal of matrix

int A[N][N],

where N is a compile-time constant. No matrix input code


is required.
Sample problem #10

Write a C code fragment %include "io.inc"


equivalent to the SECTION .text
following assembly GLOBAL CMAIN
fragment. Explain in your CMAIN:
GET_DEC 4, ECX
own words what the code MOV EBX, 1
does. XOR EAX, EAX
.L:
XOR EAX, EBX
XOR EBX, EAX
XOR EAX, EBX
ADD EBX, EAX
LOOP .L
PRINT_UDEC 4, EAX
NEWLINE
XOR EAX, EAX
RET

You might also like