Alberts home page

ciasdis

Forth assemblers for ciforth

This page is about how the Post-It Fix-Up principle works out in practical program code in Forth. For the impatient: jump to the downloads

Actual assemblers

Applying the Post-It Fix-Up principle to a 8086 assembler led to the discovery of problems that had to be solved. It turns out that some types of fixups better be considered not relative to the start of the instruction, but relative to the end. Otherwise there would be different fixups for e.g. byte/cell indication (B| X|), dependant on the length of the opcode. It is still there in the fig-forth version of the opcodes, such as B| W| besides B1| and W1| . So a new class of fixup, the "fix up's from behind" or reverse fixups were added. It turned out that other fixup's are not needed for the Intel, up to the Pentium. Other processors require fixup's with build in data. These so called data fixups are needed for the 6809 and the DEC Alpha.
A program was added that generates a PostScript file with the first byte opcodes for 8080 as well as 8086 , and the 80386 , a so called quick reference card. Comparing that to Intels documentation led to the discovery of one more bug. I had to redesign the opcodes, so other people could have trouble using this beast without such a reference card and the `SHOW: MOV|SG,' that lists all forms allowed for the move segment instruction.

Because I do not want to give a bad impression -- I use a lot of comment -- I show you the WOC's (Word Of Code):

asgen.frt   1477            (generic part)
as80.frt     389            (8080 assembler)
asi86.frt    802            (8086)
asi386.frt  1362            (80386)
ps.frt       266            (Generates postscript.)
ass.frt      518            (Excerpt for 8086 without error detection)
cassady   ca 400            (The ubiquitous classic 8080 assembler)
As per
cat $1 | sed -e 's/\\ .*//' | sed -e 's/( [^)]*)//g' | wc -w
returning the 'Words Of Code' from file $1, in the sense of Forth. (Hey there, Windows people, how do you do that?) This comparison is based on the version running on figforth 2.146.

These are the figures as per start 2005. Since the early version the assembler has been enhanced with labels, and made to work with reverse engineering. Yet the numbers are within about 10% of what they were.

Compare my 8086 assembler with cassady 8080 assembler. My assembler is 5 screens, Cassady's is about 4. (I remember the assembler for 8086 in Dr Dobbs. I was awed, and it was a terrible lot of code.)
Even the Pentium assembler, in its non-safe form takes about a dozen code screens (see below).

Compared with a good non-bloated assembler written in c, the nasm assembler, the code for the Intel 386 is compacter by 900 versus 30,000 lines. A factor approaching 30 to 40 for the commented source code. (I use this nasm assembler myself.) It would be interesting to compare with the well known MASM.EXE, but of course it is not Open Source, and its size is unknown. Note: this is a very crude comparison. nasm contains a lot more directives and a macro facility, mine contains a disassembler, and very fine diagnostics. These are supposed to cancel.

This type of assembler certainly lays a burden on the programmer. He has to know exactly what he is doing. But assembly programming is like dancing on a rope. MASM and Intel just want to hide from you were the rope is.

If I now say

 MOV, X| T| AH'| D0| [BX+IP] 0 IB,
I want that message
 AH'| MSG #32 FIXUP'S INCONSISTENT
instead of that SP is used where I wanted AH. And after that is fixed :
 IB, MSG #28 UNEXPECTED FIXUP/COMMAER

You can peek at the source of the generic assembler. You will find also the quick reference cards in PostScript ready made. In the beginning they were almost indispensible because how the mnemonics deviated from Intel's. Later I managed to find a reasonable correspondance principle, and you will find in the ciforth documentation (pdf, Postscript, info) a table to guides you. The documentation has recently been brought up to level. To complement the documentation you have available a well-commented source. It is ISO-Forth with not too many "environmental dependancies" (the ISO standard's committee's euphemism for portability problems). You need a facility to find the name of a word from its execution token and a facility to manually walk down a vocabulary chain. A similar assembler is also available for my generic fig-forth for the i86 family

The difficulty in using the assembler is the reason that I made sure that in the full version of these assembler you have absolutely flawless error detection and the word SHOW:. And e.g. SHOW: PUSH|X," shows all the addressing modes that go with the PUSH|X," instruction.
If it assembles, it works.
Also the instruction are difficult to understand, but they are very hard to misunderstand, unlike traditional assemblers.
For production there is a compatible version of the i86 assembler in the blocks. This is intended to quickly assemble fully debugged code. Because of the lack of error detection it is extremely compact. (I don't use extreme words lightly.). You can either load a Intel 8086 assembler, an Intel 80386 assembler, or an Intel Pentium assembler. The full Pentium assembler in just 17 screens, among them 4 load screens, can be downloaded here. You will have to adapt a little bit, replacing REQUIRE ASSEMBLER-xx with xxx LOAD where xxx is the number of the screen where you put the LOAD-screen with title ASSEMBLER-xx. Also you may have to eliminate the line feeds from the screens, replacing them by spaces. The word TOGGLE is easily implemented, if you don't have it. And ALIAS can be replaced by a colon definition if need be. These screens have correctly assembled all the tests for the full assembler, i.e. the combination of all instructions with all addressing modes (with an excerpt of the SIB possibilities.) But on incorrect input they will just assemble garbage.


For applications see the reverse engineering page

DOWNLOAD

You can download this older version with all assemblers and all testsets, that will suit you fine, if you only want to assemble. However the 6809 assembler is not included in this package yet.
The newer versions have in fact been enhanced to a reverse engineering system. You can download this system, with i.a. a Pentium assembler here, it contains a lot of extras.

Go to the home page of Albert van der Horst