You then get this kind of instructions:
ES: MOV, X| T| DI'| [MEM +8* AX] FFFFF800 X,
Ugly? Yes, this reflects the ugliness of the instruction set of the Pentium.
You can hide the ugliness,
but then you defeat the purpose of an assembler: absolute control.
The following is copied from an early version of the asgen.frt source.
( Most instruction set follow this basic idea that it contains of three ) ( distinct parts: ) ( 1. the opcode that identifies the operation ) ( 2. modifiers such as the register working on ) ( 3. data, as a bit field in the instruction. ) ( 4. data, including addresses or offsets. ) ( This assembler goes through three stages for each instruction: ) ( 1. postit: assemblers the opcode with holes for the modifiers. ) ( This has a fixed length. Also posts requirements for commaers. ) ( 2. fixup: fill up the holes, either from the beginning or the ) ( end of the post. These can also post required commaers ) ( 3. fixup's with data. It has user supplied data in addition to ) ( opcode bits. Both together fill up bits left by a postit. ) ( 4. The commaers. Any user supplied data in addition to ) ( opcode, that can be added as separate bytes. Each has a ) ( separate command, where checks are built in. )
Instead of having a defining word for each "type" of opcode I have now
defining words for postits (size 1 2 3 and 4) , fixup from front and behind,
data fixups and for commaers.
The rest is data and tables.
Not all of those defining words are relevant for all assemblers.
Fixup from front can be dispensed with in Intel assemblers,
as can data fixups, while DEC Alpha's have only 4 byte instructions etc.
So from these few words
the 8080 assembler uses only 3, the 8086 assembler uses 4,
the DEC Alpha uses 3.
The above is from a Forth perspective.
From a Perl perspective there is a small interpreter that loads
tables, which are in fact look up tables, so called hashes
in Perl. During assembly, as second stage, the mnemonics are looked up.
A small trick -- FAMILY -- saves a lot of errors in tricky magic constants.
This means that similar words are defined in a loop e.g.
0100 0 8 xFAMILY|R
AX| CX| DX| BX| SP| BP| SI| DI|
I started with
implementing an 8086 assembler (for fig-Forth!).
You can look at an equivalent ISO Forth version here..
In this vein I went on to make a 386 assembler that was
now part of the
generic i86 figforth and later on of the generic i86 ciforth.
If you ran in 16 bit protected mode it automatically
switches to 16 bits.
But testing this beast was a bit of a nightmare.
(It has now, as per ciforth 4.2.0,
been superseded by a light weight version compatible with
the great assembler.)
So I went back to the drawing board and separated out the
generic part , i.e. the part that has
no reference to any processor in particular.
Then I used it to implement an 8080 assembler, and I added the selfawareness
by making a word that lists all possible opcodes. Then I added a disassembler.
All illegal combinations of instruction pieces are detected and give a
comprehensible error. The assembler is tested by assembling all the possible
opcodes, disassembling and comparing the same.
read more about the Forth implementation of Post-It Fix-Up assembler
read more about the perl implementation of Post-It Fix-Up assembler