Assembly Programming Journal 3
Assembly Programming Journal 3
Assembly Programming Journal 3
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::. Feb/March 98
:::\_____\::::::::::. Issue 3
::::::::::::::::::::::.........................................................
A S S E M B L Y P R O G R A M M I N G J O U R N A L
http://asmjournal.freeservers.com
[email protected]
T A B L E O F C O N T E N T S
----------------------------------------------------------------------
Introduction...................................................mammon_
"Extending NASM"...............................................mammon_
____________________________________________________________________________
___ .___ __) (__ _____ ______ ```
._____| \____\ ___/__._) /._) _ (_. \\
| | _\ |_ \ | \/ | |CE ,
.=|_____|___)\___|(_______|______| |===============[ Introduction ]===.
'================================| :=================================='
: . by mammon_
The first thing that you will notice about this issue --well, that it is late--
will probably be the section headers designed by iCE. I had to add a top/upper
left border to them [the horizontal and slanted lines] in order to make them
standout when scrolling though a 100K file such as this one, but other than
they are all his: comments, etc welcome.
I don't have much to say about this issue: I went overboard with the NASM stuff
this month as I have been doing a lot of 'research' work in that area recently;
my articles have been supplemented with Tamas Kaproncai's Win32 NASM pointers.
Iczelion and XBios2 have both produced --as usual-- 2 quality articles this
month, Iczelion's based on his win32 asm tutorial 'the MASM way', and XBios2
once again continuing to replace C with assembler.
A final note, I have not come up with a challenge for the next issue; anyone
with good ideas is welcome to post one to the Message Board or to the APJ
email address.
_m
::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
An introduction to SPARC assembly
by +Spath.
What is SPARC ?
-----------------
The principles of RISC (Reduced Instructions Set Computer) are born in the
early 80's in two universities (Berkeley and Stanford) ; its philosophy is
the quest for simplicity and CPU speed. SPARC (Scalable Processor ARChitecture)
is a 32 bits RISC architecture created by Sun in 1987. It's an open
architecture, so that any manufaturer can make SPARC processors (like Philips,
VLSI, T.I., Fujitsu... already did). Its key features are :
!8<------------------------------------------------------------------------
/* FILE : hello.s */
main:
save %sp, -112, %sp ! allocate space for stack
sethi %hi(.MyText), %o1 ! load higher part of string offset
or %o1, %lo(.MyText), %o0 ! add lower part of offset
set (.Year), %l1 ! get year address
ld [%l1], %o1 ! load year into %o1
call printf ! print the string
nop ! do nothing (BDS)
Every procedure must save some memory space for itself ; this stack space
will be used to store the out and local registers and all the datas needed by
the procedure (the minimal space is 64 bytes for %o and %l registers). The
stack grows from higher to lower addresses, so that allocating a stack space
is implemented by substracting a value from the current stack pointer ; the
previous stack pointer is called the frame pointer (%fp).
Registers %o0 - %o5 are used to pass the first six parameters to a procedure,
because the current stack pointer (%sp) is stored in %o6 and the calling
program counter (%pc, used to calculate the return address) is stored in %o7.
If a procedure has more than six parameters, the remaining parameters are
passed using the stack space (eg for a caller's stack space of 92 bytes, the
child procedure can get the seventh parameter at [%fp+92]).
As I said before, all instructions are 32 bits long, so that you must use
two steps (with sethi and or) to load a 32 bits data. Note that %hi refers
to the most significant 22 bits and %lo refers to the least significant 10
bits of a register.
Like most RISC machines, a SPARC processor uses a branch delay slot (BDS) to
optimize pipeline efficiency : this means that by default, the instruction
following a branching is executed regardless of whether or not the branch is
taken. So the coder must move (when possible) an instruction from before the
branch to after the branch. Another possibility is to use the 'nop' instruction
or to add the ',a' suffix to the branch instruction, which annul the next
operation.
!8<--------------------------------------------------------------------------
/* FILE: fib.s */
call fibcall !
nop ! do nothing (BDS)
mov %o0, %i0 ! return value = fibcall return value
F1orF0:
ret ! Return to caller
restore ! Restore register windows (BDS)
Endfib:
.size fib,(.-fib)
All procedures share the global registers (%g[0-7]), the remaining registers
%l[0-7], %i[0-7], %o[0-7] constitute the register window. When a procedure
starts execution, it allocates 16 registers (input and local), the output
registers are overlapped with the subroutine's input registers. Here's what
happens if procedure A calls procedure B which calls procedure C :
As you see, for each procedure, the parameters passed to and received from a
subroutine are stored in %o registers. The same way, the parameters taken from
and passed to the calling procedure are stored in %i registers. The current
window pointer (CWP) identifies the current register window : it is stored in
the least significant 5 bits of %psr, and is modified by the 'save' and
'restore' commands.
The condition code register (another part of %psr) contains four flags : Z
(zero), N (negative), C (carry), and V (overflow) ; contrary to x86 assembly,
these bits are not updated by standard arithmetic operations, but by special
instructions (with 'cc' suffix, like 'cmp' which is in fact a 'subcc').
For instance, you have these equivalences :
SPARC x86
subcc r1, r2, r1 <=> sub r1, r2 ; keep result and flags
subcc r1, r2, %g0 <=> cmp r1, r2 ; discard result, keep flags
sub r1, 0, r2 <=> mov r2, r1 ; keep result, discard flags
!8<-----------------------------------------------------------------------
/* FILE: cksum.s */
EndOfCRCZone:
set 0x10954, %o1 ! load precalculated checksum
cmp %o0, %o1 ! is checksum correct ?
be End ! yes : exit
nop ! do nothing (BDS)
End:
ret ! Return to caller
restore ! Restore register windows (BDS)
- assembling : I use gcc for that job, which itself uses as and ld (assembler
and linker) to create ELF executables. CC and cc also work well ; these 3
compilers can also be used to generate ASM source code from C source code
with the "-S" option, which is IMHO a great method to learn assembly on
a new platform.
- debugging : I use adb, which is very basic but also very powerful, but gdb
and dbx may also work.
- reversing : all the previous tools are useful ; I also use a disassembler
(SunOS dis), but some exist for other platforms (see Bruce Ediger's
homepage).
If you plan to give a try to SPARC assembly, here are some links :
http://www.cs.unm.edu/~maccabe/classes/341/labman/labman.html
ISEM (Instructional Sparc EMulator) homepage.
http://www.csn.net/~bediger/
Bruce Ediger Homepage
http://www.cs.earlham.edu/~mutioke/cs63/
a good introduction to SPARC with plenty of links.
http://www.sics.se/~psm/sparcstack.html
a very good overview of SPARC stack and registers.
Final Words
------------
If you use a SPARC based machine, give a try to assembly, it's quite fun.
If not, remember that the best you know your processor, the best you
can code ASM.
::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
Extending NASM
by mammon_
For this reason NASM comes with a preprocessor that is both simple and powerful;
by writing NASM macros, the high-level functionality of other assemblers can be
emulated rather easily. As thw following macros will demonstrate, most of the
high-level asm features in commercial assemblers really do not do anything very
elaborate; they simply are more convenient for the programmer.
The macros that I will detail below provide some basic C and ASM constructs for
use in NASM. I have made the complete file available at
http://www.eccentrica.org/Mammon/macros.asm
The macro file can be included in a .asm file with the NASM directive
%INCLUDE "macros.asm"
Comments on the usage of each macro are included in the file.
Macro Basics
------------
The fundamenal structure of a NASM macro is
%macro {macroname} {# parameters}
%endmacro
The actual code resides on the line between the %macro and %endmacro tags; this
code will be inserted into your program wherever NASM finds {macroname}. Thus
you could create a macro to push the contents of each register such as:
%macro SAVE_REGS 0
push eax
push ebx
push ecx
push edx
%endmacro
Once you have defined this macro, you can use it in your code like:
SAVE_REGS
call ReadFile
...which the preprocessor will expand to
push eax
push ebx
push ecx
push edx
call ReadFile
before assembling. It should be noted that all preprocessing takes place in a
single stage immediately before compiling starts; to preview what the pre-
processor will send to the assembler, you can invoke nasm with the -e option.
The %macro tag requires that you declare the number of paramters that will be
passed to the macro. This can be a single number or a range, with a few quirks:
%macro LilMac 0 ; takes 0 arguments
%macro LilMac 5 ; takes 5 arguments
%macro LilMac 0-3 ; takes 0-3 arguments
%macro LilMac 1-* ; takes 1 to unlimited arguments
%macro LilMac 1-2+ ; takes 1-2 arguments
%macro LilMac 1-3 0, "OK" ; takes 1-3 arguments, 2-3 default to 0 & "OK"
The last three examples bear some explanation. The "-*" operator in the %macro
tag specifies that the macro can handle any number of parameters; in other
words, there is no maximum number, and the minimum is whatever number is to the
left of the "-*" operator. The "+" operator means that any additional arguments
will be appended to the last argument instead of causing an error, so that:
LilMac 0, OK, This argument is one too many
will result in argument 1 being 0 and argument 2 being "OK, This argument is
one too many." Note that this is a good way to pass commas as part of an argu-
ment (normally they are only separators). Providing defualt arguments after the
number of arguments allows a macro to be called with fewer arguments than it
expects.
%macro SAVE_VARS 1-4 ecx, ebx, eax
will fill a missing 4th argument with eax, 3rd with ebx, and 2nd with ecx. Note
that you have to provide defaults starting with the last argument and working
backwards.
The parameters to the macro are available as %1 for the first argument, %2 for
the second, and so on, with %0 containing a count of all the arguments. There
is an equivalent to the DOS "SHIFT" command called %rotate which will rotate
the parameters to either the left or to the right depending on whether a
positive or negative value was supplied:
Before: %1 %2 %3 %4 Before: %1 %2 %3 %4 Before: %1 %2 %3 %4
%rotate 1 %rotate -1 %rotate 2
After: %4 %1 %2 %3 After: %2 %3 %4 %1 After: %3 %4 %1 %2
So that rotating by 1 will put the value at %1 into %4, and rotating by -1 will
put the value of %1 into %2.
High-Level Calls
----------------
Perhaps the buggest complaint about NASM is its primitive call syntax. In MASM
and TASM, the parameters to a call may be appended to the call itself:
call MessageBox, hOwner, lpszText, lpszTitle, fuStyle
where in NASM the parameters must be pushed onto the stack prior to the call:
push fuStyle
push lpszTitle
push lpszText
push hOwner
call MessageBox
Using NASM's "-*" macro feature along with the %rep directive make a high-level
call easy to replicate:
%macro call 2-*
%define _func %1
%rep &0-1
%rotate 1
push %1
%endrep
call _func
%endmacro
The %define directive simply defines the variable _func [underscores should
prefix variable names in macros so you do not mistakenly use the same name
later in the program] as %1, the name of the function to call. The %rep and
%endrep directives enclose the instructions to be repeated, and %rep takes as a
parameter the number of repetitions [in this case set to the number of macro
parameters minus 1]. Thus, the above macro cycles through the arguments to call
and pushes them last-argument first [C syntax] before making the call.
Switch-Case Blocks
------------------
One of the most awkward C constructs to code in assembly is the SWITCH-CASE
block. It is also rather difficult to re-create as a macro due to variable
number and length of CASE statements.
NASM's preprocessor has a context stack which allows you to create a set of
local variables and addresses which is specific to a particular invocation of a
macro. Thus it becomes possible to refer to labels which will be created in a
future macro by giving them context-dependent names:
%macro MacPart1 0
%push mac ;create a context called "mac"
jmp %$loc ;jump to context-specific label "loc"
%endmacro
%macro MacPart2 0
%ifctx mac ;if we are in context 'mac'
%$loc: ;define label 'loc'
xor eax, eax ;code at this label...
ret
%endif ;end the if block
%pop ;destroy the 'mac' context
%endmacro
As you can see, the context is created and named with a %push directive, and
destroyed with a $pop directive. NASM has a number of preprocessor conditional
IF/ELSE statements; in the above example, the %ifctx [if current context equals]
directive is used to determine if a 'mac' context has been created [Note that
the 'base' NASM conditionals include %if, %elif, %else, and %endif; these carry
over to the %ifctx directive, such that there is available %ifctx, %ifnctx,
%elifctx, %elifnctx, %else, and %endif; all %if directives must be closed with
an %endif directive]. Finally, %$ is used to prefix the name of a context-
specific variable or label. Non-context-specific local labels use the %% prefix:
%macro LOOP_XOR
%%loop:
pop eax
xor eax, ebx
test eax, eax
jnz %%loop
%endmacro
%macro CASE 1
%ifctx switch
%$loc(__curr):
%assign __curr __curr+1
mov ebx, %1
cmp eax, ebx
jne %$loc(__curr)
%endif
%endmacro
%macro DEFAULT 0
%ifctx switch
%$loc(__curr):
%endif
%endmacro
%macro BREAK 0
jmp %$endswitch
%endmacro
%macro ENDSWITCH 0
%ifctx switch
%$endswitch:
%pop
%endif
%endmacro
;==========================================================================-END
If-Then Blocks
--------------
While the preprocessor provides support for if-then directives, it is a slight
bit of work to cause that to generate the equivalent assembly language 'if'
code [ the preprocessor 'if' is resolved before compile time, not at run time].
Using macros, you can create if-then blocks with the following structure:
IF Value, Cond, Value
;if code here
ELSIF Value, Cond, Value
;else-if code here
ELSE
;else code here
ENDIF
An example being:
IF [Passwd], e, [GoodVal] ;e == equals or je
jmp Registered
ELSE
jmp FormatHardDrive
ENDIF
The trickiest part about this macro sequence is the 'Cond' parameter. NASM
allows condition codes [the 'cc' in 'jcc' that you findin opcode refs] to be
passed to macros; these condition codes are simply the 'jcc' with the 'j' cut
off -- 'jnz' becomes 'nz', 'jne' becomes 'ne', 'je' becomes 'e', and so on.
The reason for this is that the condition code is appended to a 'j' later in
the macro:
%macro Jumper %1 %2 %3 ;JUMPER Reg1, cc, Reg2
cmp %1, %3
j%+2 Gotcha
jmp error
%endmacro
The above code appends %2 to the 'j' with the directive j%+2. Note that if you
use j%- instead of j%+, NASM will insert the *inverse* condition code, so that
jz becomes jnz, etc. For example, calling the macro
%macro Jumper2 %1
j%-1 JmpHandler
%endmacro
with the invocation 'Jumper2 nz' would assemble the code 'jz JmpHandler'.
The condition codes can be a bit tricky to work with; it is advisable to add a
sequence such as the following to the macro file:
%define EQUAL e
%define NOTEQUAL ne
%define G-THAN g
%define L-THAN l
%define G-THAN-EQ ge
%define L-THAN-EQ le
%define ZERO z
%DEFINE NOTZERO nz
so that you could call the IF macro as follows:
IF PassWd, EQUAL, GoodVal
;if code here
...etc etc. Note also that the IF-THEN-ELSE macros put the passed values into
eax and ebx for compatison, so these registers will need to be preserved.
;===========================================================-IF-THEN-ELSE Loops
%macro IF 3
%push if
%assign __curr 1
mov eax, %1
mov ebx, %3
cmp eax, ebx
j%+2 %%if_code
jmp %$loc(__curr)
%%if_code:
%endmacro
%macro ELSIF 3
%ifctx if
jmp %$end_if
%$loc(__curr):
%assign __curr __curr+1
mov eax, %1
mov ebx, %3
cmp eax, ebx
j%+2 %%elsif_code
jmp %$loc(__curr)
%%elsif_code:
%else
%error "'ELSIF' can only be used following 'IF'"
%endif
%endmacro
%macro ELSE 0
%ifctx if
jmp %$end_if
%$loc(__curr):
%assign __curr __curr+1
%else
%error "'ELSE' can only be used following an 'IF'"
%endif
%endmacro
%macro ENDIF 0
%$loc(__curr):
%$end_if:
%pop
%endmacro
;==========================================================================-END
For/While Loops
---------------
The DO...FOR and DO...WHILE do nothing differnet from the previous macros, but
are simply a different application of the same principles. The syntax for
calling these macros is:
DO
;code to do here
FOR min, Cond, max, step
DO
;code to do here
WHILE variable, Cond, value
It is perhaps easiest to illustrate this by comparing the macros with C code.
for( x = 0; x <= 100; x++) { SomeFunc() }
Equates to:
DO
call SomeFunc
FOR 0, l, 100, 1
Likewise,
for( x = 0; x != 100; x--) { SomeFunc() }
Equates to:
DO
call SomeFunc
FOR 0, e, 100, -1
The WHILE macro is similar:
while( CurrByte != BadAddr) {SomeFunc() }
Equates to:
DO
call SomeFunc
WHILE CurrByte, ne, BadAddr
Once again, eax and ebx are used in the FOR and WHILE macros.
%macro FOR 4
%ifctx do
pop eax
add eax, %4
cmp eax, %3
j%-2 %%end_loop
jmp %$start_loop
%$init_loop:
mov eax, %1
jmp %$start_loop
%%end_loop:
%pop
%endif
%endmacro
%macro WHILE 3
%ifctx do
pop eax
mov ebx, %3
cmp eax, ebx
j%+2 %%end_loop
jmp %$start_loop
%$init_loop:
mov eax, %1
jmp %$start_loop
%%end_loop:
%pop
%endif
%endmacro
;==========================================================================-END
Data Declarations
-----------------
Declaring data is relatively simple in assembly, but sometimes it helps to make
code more clear if you create macros that assign meaningful data types to
variables, even if those macros simply resolve to a DB or a DD. The following
macros demonstrate this concept. They are invoked as follows:
CHAR Name, String ;e.g. CHAR UserName, "Joe User"
INT Name, Byte ;e.g. INT Timeout, 30
WORD Name, Word ;e.g. WORD Logins
DWORD Name, Dword ;e.g. DWORD Password
Note that when invoked with a name but not a value, these macros create empty
[DB 0] variables.
;============================================================-Data Declarations
%macro CHAR 1-2 0
%1: DB %2,0
%endmacro
Procedure Declarations
----------------------
Procedure declarations are another matter of convenience. It is often useful in
your code to clearly delineate the start and end of a procedure; each of the
PROC macros below does that, as well as creating a stack fram for the procedure.
The ENTRYPROC macro creates a procedure named 'main' and declares main as a
global symbol; the standard PROC declares the provided name as global. These
macros can be used as follows:
PROC ProcName Parameter1, Parameter2, Parameter3
;procedure code here
ENDP
ENTRYPROC
;entry-procedure code here
ENDP
Note that the Parameters to PROC are set up to EQU to offsets from ebp, e.g.
ebp-4, ebp-8, etc. I have also included support for local variables, which
will EQU to positive offsets from ebp' these may be used as follows:
PROC ProcName Parameter1, Parameter2, Parameter3...
LOCALDD Dword_Variable
LOCALDW Word_Variable
LOCALDB Byte_Variable
;procedure code here
ENDP
;=======================================================-Procedure Declarations
%macro PROC 1-9
GLOBAL %1
%1:
%assign _i 4
%rep %0-1
%2 equ [ebp-_i]
%assign _i _i+4
%rotate 1
%endrep
push ebp
mov ebp, esp
%push local
%assign __ll 0
%endmacro
%macro ENDP 0
%ifctx local
%pop
%endif
pop ebp
%endmacro
%macro ENTRYPROC 0
PROC main
%endmacro
%macro LOCALVAR 1
sub esp, 4
%1 equ [ebp + __ll]
%endmacro
%macro LOCALDB 1
%assign __ll __ll+1
LOCALVAR %1
%endmacro
%macro LOCALDW 1
%assign __ll __ll+2
LOCALVAR %1
%endmacro
%macro LOCALDD 1
%assign __ll __ll+4
LOCALVAR %1
%endmacro
;==========================================================================-END
Further Extension
-----------------
Continued experimentation will of course prove fruitful. It is recommended that
you read/print out chapter 4 of the NASM manual for reference. In addition, it
is very helpful to test your macros by cpmpiling the source with "nasm -e",
which will output the preprocessed source code to stdout and will not compile
the program.
____________________________________________________________________________
______ _____. ____ ```
._____/\______.________._________. ._\___ |__\_ /. \\
| | | _ | | (_ | __/ |CE ,
.=|_____/\______| |----)____| |______|______|======[ Win 32 ASM ]===.
'===============| :==================================================='
NASM specific Win32 coding
by Tamas Kaproncai
Contents
========
0. Preface
1. Compiling
2. Include files
3. Library files
4. Importing API functions
5. Calling API functions
6. WinMain
7. Window procedure
8. Sections
9. Self modification
0. Preface
==========
I will introduce the win32 coding and I will focus on the NASM specific part.
1. Compiling
============
I'm working with the following free programs in connection with NASM:
- linker: ALINK v1.5 by Anthony A.J. Williams.
- resource compiler: GoRC v0.50b by Jeremy Gordon.
The process of compiling a win32 program involves a number of steps which can
be divided into three main processes: preparing the include files, preparing
the library files, and writing the actual program.
2. Include files
================
The include files (*.inc) must be generated from existing header files (*.h)
that come with win32-compatible C or Pascal compilers. Files needed:
WIN32N.INC (Thanks for the inital MASM version to S.L.Hutchesson).
The compiler will be NASM version 0.97
http://www.cryogen.com/nasm
Usage: nasmw -fobj -w+orphan-labels -pwin32n.inc %1.asm
3. Library files
================
Files Needed:
WIN32.LIB
The linker will be ALINK
http://www.geocities.com/SiliconValley/Network/4311/#alink
Usage: alink -oPE %1 win32.lib %1.res %2 %3
More lib files can be created with IMPLIB.
Example: IMPLIB DDRAW.DLL
6. WinMain
==========
You don't need to use the name, WinMain:
You must start the program with the label, ..start:
At the begening there is nothing special in the stack, so you should call
GetModuleHandleA for hInstance and GetCommandLineA for the command line.
(Command line consists the full path, the file name and the parameters).
7. Window procedure
===================
There are four parameters on the top of the stack:
PUSH EBP
MOV EBP,ESP
%DEFINE hwnd EBP+8 ;handle of window
%DEFINE message EBP+12 ;message
%DEFINE wParam EBP+16 ;first message parameter
%DEFINE lParam EBP+20 ;second message parameter
8. Sections
===========
You need a code section:
SECTION CODE USE32 CLASS=CODE
and a data section:
SECTION DATA USE32 CLASS=DATA
9. Self modification
====================
You can include your code and data together in one section:
SECTION CODE USE32 CLASS=CODE
In that case you need another object file, with only one line source:
SECTION CODE USE32 CLASS=DATA
EXTERN MessageBoxA
EXTERN ExitProcess
____________________________________________________________________________
______ _____. ____ ```
._____/\______.________._________. ._\___ |__\_ /. \\
| | | _ | | (_ | __/ |CE ,
.=|_____/\______| |----)____| |______|______|======[ Win 32 ASM ]===.
'===============| :==================================================='
More about Text
by Iczelion
We will experiment more with text attributes, ie. font and color.
Preliminary:
------------
Windows color system is based on RGB values, R=red, G=Green, B=Blue. If you
want to specify a color in Windows, you must state your desired color in
terms of these three major colors. Each color value has a range from 0 to
255 (a byte value). For example, if you want pure red color, you should use
255,0,0. Or if you want pure white color, you must use 255,255,255. You can
see from the examples that getting the color you need is very difficult
with this system since you have to have a good grasp of how to mix and
match colors.
For text color and background, you use SetTextColor and SetBkColor, both of
them require a handle to device context and a 32-bit RGB value. The 32-bit
RGB value's structure is defined as:
RGB_value struct
unused db 0
blue db ?
green db ?
red db ?
RGB_value ends
Note that the first byte is not used and should be zero. The order of the
remaining three bytes is reversed,ie. blue, green, red. However, we will
not use this structure since it's cumbersome to initialize and use. We will
create a macro instead. The macro will receive three parameters: red, green
and blue values. It'll produce the desired 32-bit RGB value and store it in
eax. The macro is as follows:
You can put this macro in the include file for future use.
Content:
--------
.data
ClassName db "SimpleWinClass",0
AppName db "Our First Window",0
TestString db "Win32 assembly is great and easy!",0
FontName db "script",0
.data?
hInstance HINSTANCE ?
CommandLine LPSTR ?
.code
start:
invoke GetModuleHandle, NULL
mov hInstance,eax
invoke GetCommandLine
invoke WinMain, hInstance,NULL,CommandLine, SW_SHOWDEFAULT
invoke ExitProcess,eax
WinMain proc
hInst:HINSTANCE,hPrevInst:HINSTANCE,CmdLine:LPSTR,CmdShow:SDWORD
LOCAL wc:WNDCLASSEX
LOCAL msg:MSG
LOCAL hwnd:HWND
mov wc.cbSize,SIZEOF WNDCLASSEX
mov wc.style, CS_HREDRAW or CS_VREDRAW
mov wc.lpfnWndProc, OFFSET WndProc
mov wc.cbClsExtra,NULL
mov wc.cbWndExtra,NULL
push hInstance
pop wc.hInstance
mov wc.hbrBackground,COLOR_WINDOW+1
mov wc.lpszMenuName,NULL
mov wc.lpszClassName,OFFSET ClassName
invoke LoadIcon,NULL,IDI_APPLICATION
mov wc.hIcon,eax
mov wc.hIconSm,0
invoke LoadCursor,NULL,IDC_ARROW
mov wc.hCursor,eax
invoke RegisterClassEx, addr wc
invoke CreateWindowEx,NULL,ADDR ClassName,ADDR AppName,\
WS_OVERLAPPEDWINDOW,CW_USEDEFAULT,\
CW_USEDEFAULT,CW_USEDEFAULT,CW_USEDEFAULT,NULL,NULL,\
hInst,NULL
mov hwnd,eax
invoke ShowWindow, hwnd,SW_SHOWNORMAL
invoke UpdateWindow, hwnd
.WHILE TRUE
invoke GetMessage, ADDR msg,NULL,0,0
.BREAK .IF (!eax)
invoke TranslateMessage, ADDR msg
invoke DispatchMessage, ADDR msg
.ENDW
mov eax,msg.wParam
ret
WinMain endp
WndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
LOCAL hdc:HDC
LOCAL ps:PAINTSTRUCT
LOCAL hfont:HFONT
mov eax,uMsg
.IF eax==WM_DESTROY
invoke PostQuitMessage,NULL
.ELSEIF eax==WM_PAINT
invoke BeginPaint,hWnd, ADDR ps
mov hdc,eax
invoke CreateFont,24,16,0,0,400,0,0,0,OEM_CHARSET,\
OUT_DEFAULT_PRECIS,CLIP_DEFAULT_PRECIS,\
DEFAULT_QUALITY,DEFAULT_PITCH or
FF_SCRIPT,\
ADDR FontName
invoke SelectObject, hdc, eax
mov hfont,eax
RGB 200,200,50
invoke SetTextColor,hdc,eax
RGB 0,0,255
invoke SetBkColor,hdc,eax
invoke TextOut,hdc,0,0,ADDR TestString,SIZEOF TestString
invoke SelectObject,hdc, hfont
invoke EndPaint,hWnd, ADDR ps
.ELSE
invoke DefWindowProc,hWnd,uMsg,wParam,lParam
ret
.ENDIF
xor eax,eax
ret
WndProc endp
end start
;===========================================================================EOF
invoke CreateFont,24,16,0,0,400,0,0,0,OEM_CHARSET,\
OUT_DEFAULT_PRECIS,CLIP_DEFAULT_PRECIS,\
DEFAULT_QUALITY,DEFAULT_PITCH or FF_SCRIPT,\
ADDR FontName
CreateFont creates a logical font that is the closest match to the given
parameters and the font data available. This function has more parameters
than any other function in Windows. It returns a handle to logical font to
be used by SelectObject function. We will examine its parameters in detail.
nHeight --> The desired height of the characters . 0 means use default size.
nWidth --> The desired width of the characters. Normally this value should be
0 which allows Windows to match the width to the height. However, in our
example, the default width makes the characters hard to read, so I use the
width of 16 instead.
nEscapement --> Specifies the orientation of the next character output
relative to the previous one in tenths of a degree. Normally, set to 0. Set
to 900 to have all the characters go upward from the first character, 1800
to write backwards, or 2700 to write each character from the top down.
nOrientation --> Specifies how much the character should be rotated when
output in tenths of a degree. Set to 900 to have all the characters lying
on their backs, 1800 for upside-down writing, etc.
nWeight --> Sets the line thickness of each character. Windows defines the
following sizes:
FW_DONTCARE equ 0
FW_THIN equ 100
FW_EXTRALIGHT equ 200
FW_ULTRALIGHT equ 200
FW_LIGHT equ 300
FW_NORMAL equ 400
FW_REGULAR equ 400
FW_MEDIUM equ 500
FW_SEMIBOLD equ 600
FW_DEMIBOLD equ 600
FW_BOLD equ 700
FW_EXTRABOLD equ 800
FW_ULTRABOLD equ 800
FW_HEAVY equ 900
FW_BLACK equ 900
cItalic --> 0 for normal, any other value for italic characters.
cUnderline --> 0 for normal, any other value for underlined characters.
cStrikeOut --> 0 for normal, any other value for characters with a line
through the center.
cCharSet --> The character set of the font. Normally should be OEM_CHARSET
which allows Windows to select font which is operating system-dependent.
cOutputPrecision --> Specifies how much the selected font must be closely
matched to the characteristics we want. Normally should be
OUT_DEFAULT_PRECIS which defines default font mapping behavior.
cClipPrecision --> Specifies the clipping precision. The clipping precision
defines how to clip characters that are partially outside the clipping
region. You should be able to get by with CLIP_DEFAULT_PRECIS which defines
the default clipping behavior.
cQuality -->Specifies the output quality. The output quality defines how
carefully GDI must attempt to match the logical-font attributes to those of
an actual physical font. There are three choices: DEFAULT_QUALITY,
PROOF_QUALITY and DRAFT_QUALITY.
cPitchAndFamily --> Specifies pitch and family of the font. You must combine
the pitch value and the family value with "or" operator.
lpFacename A pointer to a null-terminated string that specifies the
typeface of the font.
After we get the handle to the logical font, we must use it to select the
font into the device context by calling SelectObject. SelectObject puts the
new GDI objects such as pens, brushs, and fonts into the device context to
be used by GDI functions. It returns the handle to the replaced object
which we should save for future SelectObject call. After SelectObject call,
any text output function will use the font we just selected into the device
context.
RGB 200,200,50
invoke SetTextColor,hdc,eax
RGB 0,0,255
invoke SetBkColor,hdc,eax
Use RGB macro to create a 32-bit RGB value to be used by SetColorText and
SetBkColor.
Call TextOut function to draw the text on the client area. The text will be
in the font and color we specified previously. The syntax of TextOut is as
follows:
BOOL TextOut(
When we are through with the font, we should restore the old font back into
the device context. You should always restore the object that you replaced
in the device context.
____________________________________________________________________________
______ _____. ____ ```
._____/\______.________._________. ._\___ |__\_ /. \\
| | | _ | | (_ | __/ |CE ,
.=|_____/\______| |----)____| |______|______|======[ Win 32 ASM ]===.
'===============| :==================================================='
Keyboard Input
by Iczelion
Preliminiary:
------------
Since there's only one keyboard in each PC, all running Windows programs
must share it between them. Windows is responsible for sending the key
strokes to the window which has the input focus.
Although there may be several windows on the screen, only one of them has
the input focus. The window which has input focus is the only one which can
receive key strokes. You can differentiate the window which has input focus
from other windows by looking at the title bar which is highlighted.
Actually, there are two main types of keyboard message. You can view a
keyboard as a group of keys. For example, if you press the "a" key, Windows
sends a WM_KEYDOWN message to the window which has input focus, notifying
that a key is pressed. When you release the key, Windows sends a WM_KEYUP
message. In this case, you treat a key as a button. Another way to look at
the keyboard is that it's a character input device. When you press "a" key,
Windows sends a WM_CHAR message to the window which has input focus,
telling it that the user sends "a" character to it. In fact, Windows sends
WM_KEYDOWN, WM_CHAR, and WM_KEYUP messages to the window which has input
focus. The window procedure may decide to process all three messages or
only the messages it's interested in. Most of the time, you can ignore
WM_KEYDOWN and WM_KEYUP since TranslateMessage function call in the message
loop translate WM_KEYDOWN and WM_KEYUP messages to a WM_CHAR message. We
will focus on WM_CHAR in this tutorial.
Content:
-------
;=======================================================================KEY.ASM
include windows.inc
includelib user32.lib
includelib kernel32.lib
includelib gdi32.lib
.data
ClassName db "SimpleWinClass",0
AppName db "Our First Window",0
char WPARAM 20h ; the character the program receives from keyboard
.data?
hInstance HINSTANCE ?
CommandLine LPSTR ?
.code
start:
invoke GetModuleHandle, NULL
mov hInstance,eax
invoke GetCommandLine
invoke WinMain, hInstance,NULL,CommandLine, SW_SHOWDEFAULT
invoke ExitProcess,eax
WinMain proc
hInst:HINSTANCE,hPrevInst:HINSTANCE,CmdLine:LPSTR,CmdShow:SDWORD
LOCAL wc:WNDCLASSEX
LOCAL msg:MSG
LOCAL hwnd:HWND
mov wc.cbSize,SIZEOF WNDCLASSEX
mov wc.style, CS_HREDRAW or CS_VREDRAW
mov wc.lpfnWndProc, OFFSET WndProc
mov wc.cbClsExtra,NULL
mov wc.cbWndExtra,NULL
push hInstance
pop wc.hInstance
mov wc.hbrBackground,COLOR_WINDOW+1
mov wc.lpszMenuName,NULL
mov wc.lpszClassName,OFFSET ClassName
invoke LoadIcon,NULL,IDI_APPLICATION
mov wc.hIcon,eax
mov wc.hIconSm,0
invoke LoadCursor,NULL,IDC_ARROW
mov wc.hCursor,eax
invoke RegisterClassEx, addr wc
invoke CreateWindowEx,NULL,ADDR ClassName,ADDR AppName,\
WS_OVERLAPPEDWINDOW,CW_USEDEFAULT,\
CW_USEDEFAULT,CW_USEDEFAULT,CW_USEDEFAULT,NULL,NULL,\
hInst,NULL
mov hwnd,eax
invoke ShowWindow, hwnd,SW_SHOWNORMAL
invoke UpdateWindow, hwnd
.WHILE TRUE
invoke GetMessage, ADDR msg,NULL,0,0
.BREAK .IF (!eax)
invoke TranslateMessage, ADDR msg
invoke DispatchMessage, ADDR msg
.ENDW
mov eax,msg.wParam
ret
WinMain endp
mov eax,uMsg
.IF eax==WM_DESTROY
invoke PostQuitMessage,NULL
.ELSEIF eax==WM_CHAR
push wParam
pop char
invoke InvalidateRect, hWnd,NULL,TRUE
.ELSEIF eax==WM_PAINT
invoke BeginPaint,hWnd, ADDR ps
mov hdc,eax
invoke TextOut,hdc,0,0,ADDR char,1
invoke EndPaint,hWnd, ADDR ps
.ELSE
invoke DefWindowProc,hWnd,uMsg,wParam,lParam
ret
.ENDIF
xor eax,eax
ret
WndProc endp
end start
;===========================================================================EOF
This is the variable that stores the character received from the keyboard.
Since the character is sent in WPARAM of the window procedure, we define
the variable as type WPARAM for simplicity. The initial value is 20h or the
space since when our window refreshes its client area the first time, there
is no character input. So we want to display space instead.
.ELSEIF eax==WM_CHAR
push wParam
pop char
invoke InvalidateRect, hWnd,NULL,TRUE
This is added in the window procedure to handle the WM_CHAR message. It
just puts the character into the variable named "char" and then calls
InvalidateRect. InvalidateRect makes a specified rectangle in the client
area invalid which forces Windows to send WM_PAINT message to the window
procedure. Its syntax is as follows:
BOOL InvalidateRect(
HWND hWnd, // handle of window with changed update region
CONST RECT * lpRect, // address of rectangle coordinates
BOOL bErase // erase-background flag
);
Actually we can paint the client area during processing WM_CHAR message by
calling GetDC and ReleaseDC pair. There is no problem there. But the fun
begins when our window needs to repaint its client area. Since the codes
that paint the character are in WM_CHAR section, the window procedure will
not be able to repaint our character in the client area. So the bottom line
is: put all necessary data and codes that do painting in WM_PAINT. You can
send WM_PAINT message from anywhere in your code anytime you want to
repaint the client area.
____________________________________________________________________________
::::::::::.___ . ```
::::::::::| _/__. |__ ____ . __. ____ ____ __. \\
:::::: |____ | __/_ _\_ (.___| .___) |__\_ (._) /___) | ,
::::::::::/ / | \ | - | \ | - | - | \/| - |
.=:::::::::/______|_____|_____| (___|_____|______|____|_____|===============.
'=::::::::::==================| . ____ | (____====[ The C Standard lib ]==='
:::::::::: | |------| - |
:::::::::: | |______|______|CE
. :
C string functions: introduction, _strlen
by Xbios2
I. INTRODUCTION
---------------
Beware: this is going to be long...
Yet, string functions are really needed, since almost any program in assembly
uses text for I/O. (An alternative to this would be using animated paper-clips
to communicate with the user :)).
Furthermore, coding those functions in assembly allows for smaller and faster
functions. Actually many of the string functions in C _were_ written in
assembly (e.g. strlen, strcat, strcpy, etc). Those can be divided in two
categories:
Borland C++ 4.02 and KERNEL32.DLL only have traditional functions. Borland's
C++ Builder v1.0 (once given free as a demo) includes both types. MSVCRT.DLL
(version 5) contains 'modern' versions.
The three main aspects considered in these articles (and generally when compar-
ing different versions of the same function) are speed, size and common sense.
'Size' is, obviously, the size of the resulting code. Besides creating smaller
files, small size has two interesting 'side-effects'. It (usually) creates more
elegant code and faster code (it decreases k, but it usually increases l (for
an explanation of k and l see 'speed'). For very small functions like strlen it
has the added advantage of allowing the code to be inlined without wasting too
much space, thus decreasing k even more.
'Speed' indicates the number of cycles needed to execute the function. For
simple string functions the number of cycles needed can be expressed as
c=k+l*n
Point 1 means that for long strings speed is (almost) independent of the value
of k. Especially if l1=l2 both versions will run at (almost) the same speed.
Point 2 means that for small strings k strongly affects the value of c.
For those of you that are fed up with maths, here is a simple example that
demonstrates what I've been trying to say all this time :)
The problem is that none of the above versions can be classified as better than
the other. Think of the parser of a compiler. It receives as input lines from a
text file, which are strings longer than 10 characters, but also has to deal
with tokens, which are short strings (in an assembler, three-char tokens are
very common).
Keep in mind that, while l depends only on the method used to implement the
function, k also depends on the 'push arg/call/prepare stack/resore stack/
ret/get arg' times. So if n is low, overall speed can be increased by inlining
the code, thus subtracting from k the time needed to call the function.
Well, I think you've had enough. Let's see all this stuff in practice.
The win32 API also includes a strlen function, called lstrlenA. It is based on
'repne scasb' as well, but you are _strongly_ advised to avoid it. It runs at
c=56+4*n cycles.
Suppose we need in ecx the length of the string at esi. The following code:
push esi
call _strlen
pop ecx ; restore stack
mov ecx, eax
takes 9 bytes, only one less than the macro version. Plus, of course, the at
least 15 bytes of code in _strlen.
Another 'elegant' version, which is also small and much faster is the following:
; ------------ version 5 ------------
; Elegant, small and fast
; 16 bytes
; c=12+2*n
I believe that version 5 is the best version that could have elegance, speed
and small size together. It can also be converted to a macro and inlined to
drop to a speed of c=8+2*n (it will use one register more, but this register
would anyway be lost if a call to the function was made).
It also has what I believe is the smallest value of k. However, it doesn't have
the smallest value of l. To reduce the cycles needed, data can be read not byte
after byte but dword after dword. Here is a routine given by Agner Fog in his
document on Pentium optimization (which you are _strongly_ advised to read):
_strlen proc
mov eax, [esp+4] ; get pointer
mov edx, 7
add edx, eax ; pointer+7 used in the end
push ebx
mov ebx, [eax] ; read first 4 bytes
add eax, 4 ; increment pointer
l1: lea ecx, [ebx-01010101h] ; subtract 1 from each byte
xor ebx, -1 ; invert all bytes
and ecx, ebx ; and these two
mov ebx, [eax] ; read next 4 bytes
add eax, 4 ; increment pointer
and ecx, 80808080h ; test all sign bits
jz l1 ; no zero bytes, continue loop
test ecx, 00008080h ; test first two bytes
jnz short l2
shr ecx, 16 ; not in the first 2 bytes
add eax, 2
l2: shl cl, 1 ; use carry flag to avoid a branch
pop ebx
sbb eax, edx ; compute length
ret
_strlen endp
; -----------------------------------
The only problem with this routine is that it expects the string to be aligned
on a 4 byte boundary. If the string is misaligned, the speed drops to
c=24+1.75*n. In the extreme case that the string is misalinged AND it ends on a
page boundary, the function will cause an access violation error.
The fastest version (I have found) is the one in the Borland C++ builder
library:
Actually, the original version is 90 bytes long. I have only changed the
'unalgn:' block, to reduce k if the string is unaligned.
This function works well even on unaligned strings, as it first check the unali-
gned bytes, and the proceeds in the main loop with aligned data (for unaligned
strings it runs at c=31+0.75*n cycles). Since all dwords read are aligned,
unaligned strings that end on page boundaries don't cause problems.
This function is not always the fastest. If the string contains characters in
the range 128 to 255 (i.e. signed bytes) the speed drops. If all the characters
are signed (actually if at least one byte in every dword read), the speed
becomes c=1.25*n. Of course most of the time (especially for english text) this
is not the case, but if you have to process strings in another language that
has characters in the range 128 to 255, it is a bit slower.
Another fast version of strlen can be found in MSVCRT.DLL (the one I checked is
version 5.00.7303). It runs at c=20+1*n, and handles unaligned strings almost
like the Builder version. Misaligned strings give a value of k ranging from a
minimum of 25 to a maximum of 52.
What the MSVCRT function lacks completely is common sense and small size. In
fact it is 144 bytes long and it is divided in two different pieces of the dll's
code, causing most jumps to be in the long form.
The main loop MSVCRT uses is good, but the rest of the function isn't. Based on
that function, I came up with the following one:
_strlen proc
mov eax, [esp+4]
xor ecx, ecx
loop2: test al, 3
jz loop1
cmp byte ptr [eax], cl
jz short ret0
cmp byte ptr [eax+1], cl
jz short ret1
cmp byte ptr [eax+2], cl
jnz short adjust
inc eax
ret1: inc eax
ret0: sub eax, [esp+4]
ret
This one has the advantage of having k=17 for aligned strings and k=24 to 25
for misaligned ones.
The only question left to be answered is: 'Which version should we prefer?'.
If your program does not include string handling in it's time-critical parts,
I higly recommend either versions 5 or 4.5 (the inlined macro). As said before,
the size overhead of the inlined version is very small (if any), and it has
another advantage: it keeps the source code more readable, as it only involves
the needed registers (input and output) in one single line.
____________________________________________________________________________
::::::::::.___ . ```
::::::::::| _/__. |__ ____ . __. ____ ____ __. \\
:::::: |____ | __/_ _\_ (.___| .___) |__\_ (._) /___) | ,
::::::::::/ / | \ | - | \ | - | - | \/| - |
.=:::::::::/______|_____|_____| (___|_____|______|____|_____|===============.
'=::::::::::==================| . ____ | (____====[ The C Standard lib ]==='
:::::::::: | |------| - |
:::::::::: | |______|______|CE
. :
C string functions: _strcpy
by Xbios2
I. INTRODUCTION
---------------
C syntax: char *strcpy(char *dest, const char *src);
_strcpy copies string src to dest, stopping after the terminating null character
has been moved, and returns dest.
The 'traditional' way to do this is with the 'rep movs' instruction. BC 4.02
and kernel32 use it. The problem is that it is rather slow (BC _strlen takes
53+5.5*n cycles, lstrlenA takes 74+5.5*n cycles, and optimizing their code
leads to 46+5.5*n cycles wher n the number of chars, see part I of these
articles). This is because even though the 'rep movs' instruction is fast it
needs to know the number of bytes to copy in advance. So, _strcpy includes a
_strlen function before the actual copying, which is implemented through 'repne
scasb', a slow instruction.
strcpy proc
push edi
mov edi, [esp+8] ; dest
mov ecx, [esp+0Ch] ; src
test ecx, 3
jz short loop1
Actually the code above compiles to 130 bytes. The extra 16 bytes are added
because between the loop and the 'one:' label there is the strcat function. So
4 conditional jumps take the 6-byte form, not the 2-byte one.
This function takes 39+1.75*n. This means that the loop takes 7 cycles to
execute (since each time the loop runs, it copies 4 bytes). Here is the explan-
ation of the loop (U and V refer to the pipe the commands run in).
The problem here is that both jumps run in the U pipe so they will not pair.
Generally it's better to have an even number of instructions in each block of
code. Just by moving one instruction this code will run in 6 cycles (i.e.
39+1.5*n cycles):
As you 've noticed, _strcpy makes sure that the data read from src is aligned,
because reading aligned dwords is faster. If src is aligned, the test only takes
one cycle more, so it shouldn't trouble us. Yet, aligning src is not always a
good idea. Suppose that you have an unaligned string and want to copy it in a
buffer that is aligned. So what happens is that by aligning src we misalign
dest. The problem is that misaligned writes are more expensive in cycles than
misaligned reads. So _strcpy should either align dest or leave everything
untouched. (not aligning src introduces an extremely small possibility of an
access violation error, read section V below for details).
_strcpy proc
push ebp
mov ebp, esp
mov ecx, [ebp+0Ch] ; src
mov edx, [ebp+8] ; dest
mov eax, ecx
and eax, 3
jmp algn[eax*4]
; ------------------------------------
algn dd offset loop1
dd offset algn3
dd offset algn2
dd offset algn1
; ------------------------------------
This function runs at 66+1.75*n cycles. The aligning is done in an awful way.
If the aligning code is removed, we gain 39 cycles. By not using ebp, we save
4 more cycles. The loop takes 7 cycles, as shown below:
The first two instructions don't pair because 'or al,al' accesses a register
changed by the previous instruction. Anyway, there are 13 instructions, not 12
as in the MSVCRT function. So, one instruction has to be removed. This instruct-
ion is the unconditional jump (generally unconditional jumps can be avoided).
Notice that if we get through to 'jz short four', the 'mov [edx], eax' instruct-
ion will be executed anyway. So we rewrite the code as:
Notice that we use 'shr eax, 24' instead of 'test eax, 0FF000000h', because we
no longer need the value in eax, and the 'shr' form is two bytes shorter.
; c=25+1.5*n / 80 bytes
_strcpy proc
mov ecx, [esp+8] ; src
mov edx, [esp+4] ; dest
push edx ; save return value
test edx, 3 ; check if dest is aligned
jz short loop1
This function aligns dest instead of src, which, as discussed above, is faster.
It can run at one cycle less, by reading the return value directly from the stack
and not push/popping it, but it would take 8 bytes of code more.
The MSVCRT _strcat actually counts the chars in src and then jumps into the code
of _strcpy to perform the actual copying. Strangely, _strdup is implemented
through 'repne scas' and 'rep movs'. It even has two 'repne scasb' instructions,
one to calculate the length of the string to pass to the malloc function, and
one to calculate the length of the string to copy, even though these two values
are the same. So, even if coding in MS C++, using _strlen/_malloc/_strcpy is
faster than using _strdup.
V. IS IT FOOLPROOF?
-------------------
To be honest, no. But programming is hardly ever so... First of all, any version
of strcpy (of any function, generally) will fail if it tries to read or write
data in a page it doesn't have access. This is hardly the case, but it can
happen:
-If there is no NULL character between the address of src and the last byte in
the valid page.
-If src is longer than the distance between dest and the last valid byte.
The first case is extremely unlikely, because even if src was corrupt and had no
terminating NULL, one is very likely to be encountered somewhere. The second
case is also unlikely, and it means that the programmer didn't allocate enough
space for dest. Anyway, corrupt data or not enough allocated space even if they
don't cause an access violation, they cause problems. But the problem was
created by the programmer, not the function. Yet there are also two cases where
the strings ARE ok but an access violation occurs. These cases appear only on
'optimized' versions of strcpy, not on the 'rep movsb' version.
The first case would appear in a strcpy function that doesn't align src (it
either aligns dest or leaves both unchanged), if src is not aligned and it ends
on a page boundary. Then the last read operation would try to read one to three
bytes on a page it doesn't have access. This doesn't happen on the Builder and
MSVCRT functions and happen on the one I give. Yet, it is really unlikely to
happen, and aligning dest is faster.
The second case would appear if dest points to a character in src (including
the terminating NULL). What happens is that the NULL of src (and any other
found) are overwritten, so no NULL is found and we finally get an access
violation. This doesn't happen in the 'traditional' versions, because we only
copy strlen(src) bytes. But even in those versions the last character copied
wouldn't be a NULL, so dest wouldn't be a proper string.
____________________________________________________________________________
____ . : . ```
| |___ : | ---- _____ ______ _____ | \\
| | |___|_ |______\ |---/ ._____/\____) -- (_) / |____) | ,
| ' | / | \___ __/ | | \/| | - |CE
.==|________| (______|_______/ \==|_____/\__________|____| |______|===.
'=========| |===========/----|___\==================[ The Unix World ]==='
:
X-Windows in Assembly Language: Part II
by mammon_
OK, let's face it: you've seen the tedium of XLib, one *has* to use widgets in
order to get any programming done in XWindows. 'But this is assembly langauge',
the masochist might point out. 'Aren't widgets a little Visual-Basicy?'
Not in the slightest. A widget is simply a C++ class exported for use --much
like the windows API functions, only a little more object oriented...maybe a
good comparison would be MFC or VCL. Xt, or 'X toolkit Intrinsics', is the
interface that widget sets [such as Athena, Qt or GTK] use to interface with
XLib. The Xt include files are in /usr/X11R6/include/X11, its libraries are in
/usr/X11R6/lib, and its exported functions are all prefixed with "Xt".
For the following examples I will be using the Atehna widget set, which is
supplied with XFree86. The include files for Athena are in /usr/X11R6/include/Xaw
and the libraries are in /usr/X11R6/lib.
#include <X11/Intrinsic.h>
#include <X11/StringDefs.h>
#include <X11/Xaw/Command.h>
main(argc,argv)
int argc;
char **argv;
{
XtAppContext app_context;
Widget ShellWidge, ButtnWidge;
ShellWidge = XtVaAppInitialize( &app_context, "toplevel", NULL, 0, &argc, \
argv, NULL, NULL);
ButtnWidge = XtVaCreateManagedWidget("hellbutton", commandWidgetClass, \
ShellWidge, NULL);
XtAddCallback(ButtnWidge, XtNcallback, Quit, 0);
XtRealizeWidget(ShellWidge);
XtAppMainLoop(app_context);
}
// compile with cc -o xthell xthello.c -lXmu -lXaw -lXt -lX11 -L/usr/X11R6/lib
//=========================================================================-EOF
XtAppMainLoop takes the current application context [which was filled with the
XtVaAppInitialize call] and turns control over to the Xt message processing
loop. Note that the program does not have to return; in this example, the exit
call is placed in the callback function.
SECTION .data
AppContext DD 0
ShellWidge DD 0
ButtnWidge DD 0
ARGC times 128 DB 0
ClassName DB "toplevel",0
ButtnName DB "hellbutton",0
XtNcallback DB "callback",0 ;XtNcallback
SECTION .text
bail:
pop eax ; Xt_Pointer call_data
pop ebx ; Xt_Pointer client_data
pop ecx ; Xt_Pointer widget
push dword 0
call exit
;-------------------------- main
main:
mov eax, esp
push dword 0 ;Number of Args
push dword 0 ;Args
push dword 0 ;Fallback Resources
push dword 0 ;argv
push dword ARGC ;&argc
push dword 0 ;Number of Options
push dword 0 ;Options Array
push dword ClassName ;Class Name (String)
push dword AppContext ;Application Context (Ptr)
call XtVaAppInitialize
add esp, 36
mov [ShellWidge], eax
push dword 0
push eax ;Button parent (ShellWidge)
push dword [commandWidgetClass] ;Button widget type
push dword ButtnName ;Button class name
call XtVaCreateManagedWidget
add esp, 16
mov [ButtnWidge], eax
push dword 0 ; client_data
push dword bail ;CallBack function
push dword XtNcallback ; callback type
push eax ;CallBack widget (ButtonWidge)
call XtAddCallback
add esp, 16
push dword [ShellWidge] ;Widget Handle
call XtRealizeWidget
add esp, 4
push dword [AppContext]
call XtAppMainLoop
add esp, 4
ret
;==========================================================================-EOF
This can be compiled with the following commands:
nasm -f elf xthell.asm
gcc -o xthell xthell.o -lXaw -lXt -lX11 -L/usr/X11R6/lib
Most of the operation is the same as the C file; naturally you must push dword
0's instead of NULLs...and do not forget to push the arguments in reverse
order and to clean up the stack afterwards; this is C after all and not stdcall
is used in Windows.
You will have to study up on Athena to learn what the names of the various
widgets are ... I found it helpful to use
grep extern /usr/X11R6/include/Xaw/*
for a general overview. Note that the class names are strings in assembly; also
each of the various 'handles' [widgets, contexts, etc] is simply defined with a
DD 0 -- your generic 32-bit variable. The Callback type turned out to be a
string defined in the Xt header files; I simply recreated it above.
Another interesting gemis the need to call 'exit' rather than simply using a
'ret' as you would in console mode; the latter causes segmentation faults, most
likely due to the XtAppMainLoop call. In addition you *must* provide a pointer
to ARGC whether you check the command line or not; hence the 'ARGC: DB 128'.
In case you didn't notice, the Xt asm example is huge and clunky, with a lot of
not-so-obvious variable definitions. Having included a lengthy introduction to
NASM macros in this issue, I took the opportunity to create an xt.mac file
which will take some of the burden off of experimenting with small Xt apps. The
InitXt and RegisterCallback macros probably are not ready for prime-time just
yet, but they will do for testing purposes.
;=======================================================================-xt.mac
%macro CLASS 2
%1: DB %2,0
%endmacro
%macro WDGTPTR 1
%1: DD 0
%endmacro
%macro CONTEXT 1
%1: DD 0
%endmacro
%macro CHARSTR 2
%1: DB %2,0
%endmacro
%define WIDGET EXTERN
%define XLibAPI EXTERN
%define XtAPI EXTERN
%define PUBLIC GLOBAL
%define NULL dword 0
%define TERM_VARARGS dword 0
%macro InitXt 2
SECTION .data
CONTEXT AppContext
CLASS XtShell, "XtShell"
SECTION .text
EXTERN XtVaAppInitialize
push dword 0 ;Number of Args
push dword 0 ;Args
push dword 0 ;Fallback Resources
push dword 0 ;argv
push dword %2 ;&argc
push dword 0 ;Number of Options
push dword 0 ;Options Array
push dword XtShell ;Class Name (String)
push dword AppContext ;Application Context (Ptr)
call XtVaAppInitialize
add esp, 36
mov [%1], eax
%endmacro
%macro XtMsgLoop 0
EXTERN XtAppMainLoop
push dword [AppContext]
call XtAppMainLoop
add esp, 4
%endmacro
%macro RegisterCallback 1
SECTION .data
CBType: DB "callback",0
SECTION .code
push NULL ;
push dword %1 ;CallBack function
push dword CBType ;
push eax ;CallBack parent (ButtonWidge)
call XtAddCallback
add esp, 16
%endmacro
%macro CALLBACK 1
SECTION .data
Call_Data_%1: DD 0
Client_Data_%1: DD 0
Widget_%1: DD 0
GLOBAL %1
SECTION .text
%1:
pop eax
mov [Call_Data_%1], eax
pop ebx
mov [Client_Data_%1], ebx
pop ecx
mov [Widget_%1], ecx
%endmacro
%define ENDCALLBACK nop
%macro ENTRYPOINT 1
GLOBAL %1
%1:
%endmacro
;==========================================================================-EOF
Most of the macro file should be readily apparent if you are familiar with the
NASM macro facility. I did take the opportunity to clean up the callback
function, so that the parameters to the callback are saved in variables, but for
the most part it does the same as the equivalent code in the preceding asm
example.
;===================================================================-xthell.asm
BITS 32
%INCLUDE "xt.mac"
;========================================================XTRN=====
XtAPI XtVaCreateManagedWidget
XtAPI XtAddCallback
XtAPI XtRealizeWidget
WIDGET commandWidgetClass
EXTERN exit
;========================================================DATA=====
SECTION .data
;------------
WDGTPTR ptrShell
WDGTPTR ptrButton
CLASS XHELL, "XHell"
CLASS HellButton, "HellButton"
CallbackType DB "callback",0 ;XtNcallback
ARGC times 128 DB 0
;========================================================CODE=====
SECTION .text
;------------
CALLBACK bail
push dword 0
call exit
ENDCALLBACK
ENTRYPOINT main
InitXt ptrShell, ARGC
push TERM_VARARGS
push eax ;Button parent (ShellWidge)
push dword [commandWidgetClass] ;Button widget type
push dword HellButton ;Button class name
call XtVaCreateManagedWidget
add esp, 16
mov [ptrButton], eax
RegisterCallback bail
XtMsgLoop
ret
;==========================================================================-EOF
Much prettier and hey, only twice as long as the C version! ;)
Next issue I will dwell on Xt/Athena a little longer and come up with some more
practical methods of automating the coding process.
____________________________________________________________________________
____ ___ _____ _| |_ ____ . ```
.__\ /__ ______ _) /.\ _/__ ___ ______\_ (_. | \\
| \/ | | \/ | \ | | | - | |CE ,
.==|________|______|______|_______|_______|_______| |======================.
'=================================================| :=[ Virtual Machines ]='
An Intro to the Java Virtual Machine
by Cynical Pinnacle
For awhile C/C++ reigned supreme and nothing challenged it but then along
comes Java, creating a splash, and causing outright corporate warfare to claim
right of ownership. Strangely enough the result of this war has not been dead
bodies but buckets and buckets of API's all given away for free. Just stop by
and take a look at Java's Official Website (http://java.sun.com) and what do
you find a good development kit with compiler, symbolic debugger, disassembler,
complete toolkit for creating GUI's, built in support for compression, encrypt-
ion, http, ftp, SMTP, POP3, IHMP, and more. Wow! But how can we take advantage
of all this?
First we have to step back a look at what Java really is. Because one of
the main goals of Java is platform independence (both from the chip and the OS).
The JVM, which supports Java, has to be both a chip and an OS. If any of us
(well lets say us programmers) were to design a chip and a OS in one, we would
fill it with features like built in security, automatic dependency resolution
and linking, network support, video and audio acceleration, along with more
common things such as built in data types (ints, floats, arrays, and objects ),
support for local variables, exception handling, support for debugging, and on
and on. This is really what Java is because it does all the things I mentioned
above and more. This is what Sun has tried to do - design an "Ideal" environ-
ment for writing and executing code, or write once run anywhere. But this
wealth of features comes at the price of speed of execution and further distance
further from the native code of your machine (unless you are running on a real
Java machine). And if you are like me the latter hurts as much as the former.
Still there are a alot of really appealing things about Java. And so the
challenge is to use these appealing features on our own terms. We can do this
by programming at a lower level to at least touch the native language of the
Java Virtual Machine (Java Assembly language!!)
The JVM:
Memory Manager: This is the unit responsible for the famous garbage collect-
ion and heap management. I say clean up your own garbage.
Threads: Java doesn't have to worry so much about memory because it is all
allocated on the stack. Each thread gets its own stack frame (chunk of stack
for its personal use). Switching threads in a stack based machine is easy. You
just make the threads stack top the machine's stack top and go.
Class Loader: This is just like the loaders in NT and 95. It brings up
class files from the disk initializes them (headers memory etc) and passes
execution to the classes entry point.
Security Manager: Want to find out whether or not you can do something.
Ask him.
Execution Engine: This is where the JVM opcodes are translated into native
opcodes. This is the part of the JVM which a low level Java programmer will
interact with most. The Execution engine has a much simpler structure than a
Pentium. At its heart is a stack where instuctions are executed. Note that
the JVM has no registers, which is more a trait of Virtual machines than Java.
Basically opcodes and operands are popped off of the stack and executed by the
VM bases upon a mapping between Java opcodes and Native opcodes. In addition,
there is built in support for local variables (more later). And as mentioned
before the stack is symbollically divided up by the JVM. Each method (read
method to mean function) gets it own stack frame which is allocated when the
method starts and deallocated when the method exits (sound familiar eg x86 push
bp -- mov bp, sp .. leave instructions). The Execution engine understands the
int, long, float, double, byte, char, short, reference (eg pointer) and the
instructions it understands are strongly typed (for example: there is an
instruction called iload which loads an integer from a local variable onto the
stack (like mov eax, myLocalVariable) but there is also a dload, lload, and
fload for doubles, longs, and floats).
Now with a little background it is time to learn or burn but first we need
some tools. First you will need to download the JDK from http://java.sun.com
(I recommend 1.1.x and the current I think is 7) you will also need a Java
assembler which is called Jasmin also free at
http://www.cat.nyu.edu/meyer/jasmin/
and you will need a good editor (I heartily recommend Visual SlickEdit 3.0 or
4.0 at http://www.slickedit.com ). You can get docs about the JVM from
http://www.javasoft.com/
And the best book I have found on the JVM (and Jasmin) is "Java Virtual Machine"
by Jon Meyer and Troy Denning. There are several useful tools with the JDK
comes a program called javap which is a Java .class file disassembler!
With the -c switch it will produce JASM code from any .class file. Note that
reversing Java programs is not nearly as hard as x86. Try it. Take some Java
.class file you have laying around ( Keep it small so you don't get confused )
and disassemble it then you will see what I mean. I can see a very difficult
future for Java Shareware programmers. There are also many other 3rd party
tools out there for mainpulating .class files.
With all of those tools installed we are ready to write the mandatory
"Hello from JASM!" program. First comes the Java Assembler code followed by
its Java equivilant.
;***************************************************************
;Export the class name so Java can find it
class public Hello
; General facts
; .method - means this is a function
;
; public - means you are it is visible externally (low
; level Object Orientedness)
;
; Java always uses explicit paths in Unix notation hence you
; end up with a lot of notation like java/lan/Object.
; java/lang/Object is the path to the superclass. In a lot
; of the functions there is a V, this means it returns void
; the syntax for specifying arguments is strange. I
; recommend reading Jonathan Meyer's Jasmin documentation
; http://www.cat.nyu.edu/meyer/jasmin/
aload_0
; Get out
return
end method
LoopTop:
; These next two lines put the parameters for the println
; function onto the stack in the right order. Java uses
; the pascal calling convention (push left to right and
; the callee cleans up). First a reference to the stream
; object doing the work is pushed onto the stack. Next a
; reference to the string to be printed is pushed.
; Go home
return
;Declare the end of the function
end method
;
; Java equivalent:
;
; public class Hello
; {
; public static void main ( String args[] )
; {
; int i;
; for( i = 0;i < 5; i++ )
; {
; System.out.println( "Hello from JASM!" );
; }
;
; }
;
; }
;***************************************************************
To run this JASM sample cut out the stuff between the *'s and save it to a
file called Hello.j. Next type jasmin Hello.j. This should generate a
Hello.class file. Now type java Hello and you should see the string "Hello
from JASM!" printed out 5 times. This is enough to get you started poking
around in the JVM and looking a little closer at the .class files you find
lying around ;) . More to come?
____________________________________________________________________________
____ . __ ```
._| __/___ : | ___ _____ __) |_ _____ \\
| |_____ |____|__ |_______ ______ ______ _) _ |_\_ __)___| __/_____ ,
| | ( | | __ | __ | ----' | | |_______ |CE
.=|_________| /_______|_______| _____| _____|________|____________________|=.
'===========| |===========|___|==|___|===================[ snippets ]==='
NumFactors
by Troy Benoist
______________________________________________. __________________________
```
| . ```
._____ ___ ____ ___ ___ ____ : | ______ \\
| __/__| |______\_ (_) | (____\_ ( ______ |________ _) __ |___. ,
| | | - | | | - | _ / | __ | ----' |CE
.==|_______|--)___|______|___|___|_______|--(_______|_______ |___________|===.
'=============================================[ issue | _____| challange ]==='
|___\
Convert ASCII hex to binary in 6 bytes
by mammon_ [and help]
The Challenge
-------------
Write a routine for converting ASCII hex to binary in 6 bytes.
The Solution
------------
Well, actually, I cheated: I found the following text on the internet a few
months ago and decided to see if I could beat it:
===================================================================-Asc2hex.txt
An efficient algorithm for converting ASCII hex to binary
Ken Sykes (72177,141)
David Ogle (75676,2612)
add al,90h
daa
adc al,40h
daa
This instruction sequence is, as far as we know, the shortest (16 cycles)
self-contained routine to convert a binary number to hex. Inspired by
this code and the fact that a similar routine to convert ASCII hex to
binary would be useful, we came up with two algorithms that convert an
ASCII hex digit to binary in five assembly instructions or less.
The first algorithm takes advantage of an undocumented feature of the
8088. The AAM instruction (Ascii Adjust for Multiply) divides AL by 10,
placing the remainder in AL and the quotient in AH. The opcode for
AAM is: 0D4h,0Ah. It's no coincidence that the second byte is 10 - the
8088 uses the second byte of the opcode as the divisor! The same rule
applies to the AAD (Ascii Adjust for Division) instruction. With this in
mind, a conversion routine goes as follows (assuming the ASCII digit is in
AL and in the range '0..9,A..F'):
sub al,'0'
aam 16
aad 9
The only problem is the Microsoft Assembler does not accept this form.
By placing the opcodes in data statements, however, the following code
will assemble properly:
sub al,'0'
db 0D4h,10h
db 0D5h,09h
add al,40h
cbw
and ah,09h
add al,ah
and al,0Fh
These algorithms will hopefully trim down the execution times of your
programs, and we welcome any suggestions or improvements on our code.
Happy Hacking!
===========================================================================-EOF
Sadly, I was unable to come up with a smaller version, or a faster one. Tinara
posted a similar solution to the APJ Message Board:
SUB AL, 30h
AAM 10h ; db D4h, 10h
AAD 09h ; db D5h, 09h
...so he gets kudos for uncovering by work what I managed by stealth. As for
next issue, I haven't had time to prepare a challenge, but I'm sure one will
crop up in the next month or so.