Applying the Post-It Fix-Up principle
to a 8086 assembler led to the
discovery of problems that had to be solved.
It turns out that some types of fixups better be considered
not relative to
the start of the instruction, but relative to the end.
Otherwise there would be different fixups for e.g. byte/cell
indication (B| X|),
dependant on the length of the opcode.
It is still there in the fig-forth version
of the opcodes, such as B| W| besides B1| and W1| .
So a new class of fixup, the "fix up's from behind" or reverse fixups
were added.
It turned out that other fixup's are not needed for the
Intel, up to the Pentium.
Other processors require fixup's with build in data.
These so called data fixups are needed for the 6809 and the DEC Alpha.
A program was added that generates a PostScript file
with the first byte opcodes for 8080 as well as
8086 , and the 80386 ,
a so called quick reference card.
Comparing
that to Intels documentation led to the discovery of one more bug.
I had
to redesign the opcodes, so other people could have trouble using this
beast without such a reference card and the `SHOW: MOV|SG,' that lists
all forms allowed for the move segment instruction.
Because I do not want to give a bad impression -- I use a lot of comment -- I show you the WOC's (Word Of Code):
asgen.frt 1477 (generic part) as80.frt 389 (8080 assembler) asi86.frt 802 (8086) asi386.frt 1362 (80386) ps.frt 266 (Generates postscript.) ass.frt 518 (Excerpt for 8086 without error detection) cassady ca 400 (The ubiquitous classic 8080 assembler)As per
cat $1 | sed -e 's/\\ .*//' | sed -e 's/( [^)]*)//g' | wc -wreturning the 'Words Of Code' from file $1, in the sense of Forth. (Hey there, Windows people, how do you do that?) This comparison is based on the version running on figforth 2.146.
These are the figures as per start 2005. Since the early version the assembler has been enhanced with labels, and made to work with reverse engineering. Yet the numbers are within about 10% of what they were.
Compare my 8086 assembler with
cassady 8080 assembler.
My assembler is 5 screens, Cassady's is about 4.
(I remember the assembler for 8086 in Dr Dobbs.
I was awed, and it was a terrible lot of code.)
Even the Pentium assembler, in its non-safe form
takes about a dozen code screens (see below).
Compared with a good non-bloated assembler written in c, the nasm assembler, the code for the Intel 386 is compacter by 900 versus 30,000 lines. A factor approaching 30 to 40 for the commented source code. (I use this nasm assembler myself.) It would be interesting to compare with the well known MASM.EXE, but of course it is not Open Source, and its size is unknown. Note: this is a very crude comparison. nasm contains a lot more directives and a macro facility, mine contains a disassembler, and very fine diagnostics. These are supposed to cancel.
This type of assembler certainly lays a burden on the programmer. He has to know exactly what he is doing. But assembly programming is like dancing on a rope. MASM and Intel just want to hide from you were the rope is.
If I now say
MOV, X| T| AH'| D0| [BX+IP] 0 IB,I want that message
AH'| MSG #32 FIXUP'S INCONSISTENTinstead of that SP is used where I wanted AH. And after that is fixed :
IB, MSG #28 UNEXPECTED FIXUP/COMMAER
The difficulty in using the assembler is the reason that I
made sure that in the full version of these assembler
you have absolutely flawless error detection and the word
SHOW:.
And e.g. SHOW: PUSH|X," shows all the addressing modes
that go with the PUSH|X," instruction.
If it assembles, it works.
Also the instruction are difficult to understand,
but they are very hard to misunderstand,
unlike traditional assemblers.
For production there is a compatible version of the i86 assembler
in the blocks. This is intended to quickly assemble fully debugged
code. Because of the lack of error detection it is extremely compact.
(I don't use extreme words lightly.).
You can either load a Intel 8086 assembler, an Intel 80386 assembler,
or an Intel Pentium assembler.
The full Pentium assembler in just 17 screens,
among them 4 load screens, can be downloaded here.
You will have to adapt a little bit, replacing REQUIRE ASSEMBLER-xx
with xxx LOAD where xxx is the number of the screen
where you put the LOAD-screen with title ASSEMBLER-xx.
Also you may have to eliminate the line feeds from the screens,
replacing them by spaces.
The word TOGGLE is easily implemented,
if you don't have it.
And ALIAS can be replaced by a colon definition if need be.
These screens have correctly assembled all the tests for the full
assembler, i.e. the combination of all instructions with all addressing
modes (with an excerpt of the SIB possibilities.)
But on incorrect input they will just assemble garbage.