FreeForth generates compact and fast i386 native code, as shown in the following example of an iterative (i.e. non-recursive) implementation of the factorial function:
\ cSP = call Stack Pointer (normal = esp, alternate = eax) \ dSP = data Stack Pointer (normal = eax, alternate = esp) \ TOS = Top Of dataStack (normal = ebx, alternate = edx) \ NOS = Next Of dataStack (normal = edx, alternate = ebx) \ esp eax ebx edx : factorial \ n -- n! ; cSP dSP TOS NOS initial=normal 1 \ 0: 94 xchg eax,esp; dSP cSP \ 1: 52 push edx; NOS TOS under swap \ 2: 6A 01 push 1; (push 1; pop edx; takes 3 bytes) \ n 1 ; 4: 5A pop edx; (mov edx,1; would take 5 bytes) BEGIN \ i f ; 5: 90 90 90 nop; nop; nop; (loop aligned on multiple of 4) over* \ i f*i ; 8: 0F AF D3 imul edx,ebx; swap \ f*i i ; TOS NOS renaming 1- \ f*i i-1 ; B: 4B dec ebx; (modifies i386 flags) swap \ i-1 f*i ; NOS TOS renaming 0= UNTIL \ 0 n! ; C: 75 FA jnz 8; (no drop) nip \ n! ; E: 5B pop ebx; ; \ F: 87 DA xchg ebx,edx; TOS NOS final=normal \ 11: 94 xchg eax,esp; cSP dSP final=normal \ 12: C3 retTo push the literal 1 onto the dataStack, the initial single byte instruction [0:94 xchg eax,esp] first allows to use short push/pop instructions to access the dataStack within the definition body, then [1:52 push edx] saves NOS into memory (as does under), then we exchange the roles of ebx and edx (as does swap), then we can load the constant 1 into TOS=edx.
The BEGIN generates 3 nop
to align the loop start address on a multiple of 4 bytes:
this will reduce the number of clock cycles for the
loop conditional jump [C:75,FA jnz 8].
The over*
macro generates the push-pop-less single instruction
[8:OF,AF,D3 imul edx,ebx].
The two swap
generate no code, they only exchange the roles of
ebx and edx at compile time,
then the sequence swap 1- swap
compiles the single instruction
[B:4B dec ebx].
FreeForth conditional jumps don't modify the dataStack,
they only test the i386 flags modified by the last arithmetic or logic
instruction;
then the 0= UNTIL
generates the single instruction
[C:75,FA jnz 8].
The following nip
generates a pop NOS, i.e. here
[E:5B pop ebx].
The final ;
must restore the normal registers before compiling
the final [12:C3 ret]:
the [F:87,DA xchg ebx,edx]
restores the normal use of ebx=TOS and edx=NOS,
and the [11:94 xchg eax,esp]
restores the normal use of esp=cSP and eax=dSP.
This makes a total of 19 bytes, of which 6 for 3 assembly instructions in the loop. For a compiler, as simple as FreeForth is, this is not too bad.
If you need the best performance from your x86, and can write a smaller and faster definition in direct assembler, use fasm to obtain the binary code, then embed it in FreeForth inline strings (note ^ subtracts 64 from the next ASCII, and ~ adds 128 to the previous ASCII):
: factorial \ n -- n! ,"^I~Y~" \ 0: 89 D9 mov ecx,ebx ,";~^A^@^@^@" \ 2: BB 01 00 00 00 mov ebx,1 ,"^P~" \ 7: 90 nop ; align loop on multiple of 4 ,"^O/~Y~" \ 8: 0F AF D9 imul ebx,ecx ,"b~{~" \ B: E2 FB loop 8 ; \ D: C3 retThis takes 14 bytes, of which 5 for 2 instructions in the loop: that's indeed better.
If you want the very best performance from your x86, and accept to trade space for speed, to save the call and ret instructions execution time, you can alternatively define factorial as a macro (i.e. executed at compile time and generating inline code) using inlining literals, but remember the i386 is little-endian, i.e. the least significant byte of an inlining literal will be at the smallest address:
: factorial` \ n -- n! ; inlines 12 bytes \ 89D9(mov ecx,ebx)BB.01000000(mov ebx,1)0FAFD9(imul ebx,ecx)E2FB(loop -5) $BBD989, s08 s1 1, ,4 align` $D9AF0F, ,1 s08 $FBE2, ,2 ;The inlining literal $BBD989, compiles code which, when factorial` is executed (as factorial` is a macro, this will usually be at compile time), will store at the address in ebp (the compilation pointer) the 4 bytes [+0:89,D9,BB,00];
Then, in any of the 3 cases, you can use FreeForth interactivity to test your code: isn't it cool?
5 factorial . ; \ this is your command (an "anonymous" definition) 120 0; \ this is the result of "." followed by FreeForth prompt 6 factorial . ; 720 0;