1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890


In Forth we add new data types. They must be accomodated by notations that result in a constant object. This can always be accomplished by a compiling word that inspect the next word in the source:
S" /tmp/q.txt"

The basic idea is to add constant objects, sometimes called literals or literal data by defining prefix words that parse the remainder of the data, like so
Now 0x is a prefix and 0x4ABD12 is a denotation. Likewise for the string.
Note that the only difference with the previous solution is to leave out the blank space following HEX: and ". This is easy to implement, see my Forth implementation. Instead of wanting an exact match with the word in the dictionary, a literal is matched if its first part matches a prefix.
To examine an example (strings) in ciforth

Once this invention is made, regular numbers can be parsed using this mechanism by defining 0..9 in the dictionary for a net simplification of the Forth as a whole.
Strings starting with a blank was a constant source of confusion with the Forth word S". Now it is just as you expect, and no different from other computer languages.


Every sequence of symbols that results in a compile time constant can be considered a denotation. The Forth standard introduces a "NONAME" definition. It is a sequence of instructions, starting with :NONAME and ending with a semicolon ; . It leaves behind an execution token, which is indeed a compile time constant. Initially in ciforth a noname definition had a full header, a dea, and there was no separate concept of an execution token. This is the, somewhat appalling, definition of :NONAME in ciforth (leaving out technicalities).
With an example
:NONAME  2 3 * ;
In order to execute a colon definition in an indirect threaded implementation like ciforth, two properties of the word are needed, the executable code (the machine address to jump to) and the address of the high level code to be interpreted. So they cannot be a single cell. The executable code is the same for all colon definitions and is called docol . The name execution token hints that the header must be thinned out, to leave only those two fields. Creating these fields "manually" we get for the above example, using all carnal knowledge of ciforth that is needed.
HERE docol , HERE CELL+ ,  ] 2 3 * [ '(;) ,
Note that we use the denotation for dea's here and do not bother to discriminate between , and COMPILE, . It is possible to make this into a denotation, but the ; can no longer be used to end it. Choosing { and } which is quite a common notation for code sequences, we get
: {   HERE docol , HERE CELL+ , ] ;
: }   '(;) , POSTPONE [ ;  IMMEDIATE
This results in a fresh, gay notation for the denotation.
{ 2 3 * } EXECUTE .
6 OK


Forth can be in a compile state and in an interpretation state. In interpretation state the constant generated by the denotation resides on the stack. Where is it in compilation state? We only have to look at the traditional solution for numbers for an indirect threaded code, to answer that. It is build in into the code. This is possible by virtue of it being constant.
The sequence of instructions is broken and the data is put in the middle of the code. Remember that the high level code is in fact data, a sequence of execution tokens. The execution token in front of the data takes care that the interpreter pointer is moved past the data and that the data ends up on the stack, such that the compiled behaviour is the same than the execution behaviour. There may be slight complications for native code, but this is not important for the insight we're after. For an integer this done by LIT .
Handling strings need not be much different from handling numbers. There is a difference in that the data has a variable length. There is however also a difference in tradition. Strings were not considered as data to put on the stack, they were just used on the go. For example, ." would get a character from the input stream, put it on the screen, then get another character until a double quote is encountered. The defining word : would get a word ready to incorporate it into the dictionary moving it to PAD . It wasn't until files came into fashion with names that could be anything, that S" and the words following can be considered a string denotation. Its implementation triggers a lot of debate about so called STATE-smartness.
A decent method to handle strings is by a denotation prefix " that puts a string on the stack and in compilation mode takes care of having the string in line within a definition. As long as the input buffer remains valid, a string constant (address length pair) can represent the string. In interpretation mode this string is used immediately. Now compilation mode requires a somewhat more complicated equivalent of LIT . A possible implementation involves a branch over the area where the string resides, followed by two LIT 's. An added advantage is that if we just put the string in the dictionary always we no longer have to worry about the life time of strings in interpretation mode
The whole STATE-smartness debate is cut short by using denotations for strings instead of special manipulations, and simply forbidding any use of constant/literals outside of the denotation mechanisms.
(In particular they must not be POSTPONE -ed. I hesitate to mention it because the word POSTPONE should never have come into existence. It is there such that someone who doesn't know whether a word is IMMEDIATE or not, can extend the compiler. However such a person has no business trying to extend the compiler. )


By adding the technique for inlining a string to the noname definition, we get the "quotations" , anonymous pieces of code identified by a compile time constant. Seen the way we have developped quotations, questions like "can a quotation that happens to be in the middle of a definition use local values of that definition" have a clear answer. Of course not. That would be in violation of the constantness of denotations. (This is separate from the question whether it is desirable to have local values. The TO mechanism parses a string, and that is something we want to get away from, as explained above. So to answer that question: "no".)


I admit that locals can be handy. The price however is high. It blocks the Forth language from evolving in a sensible direction. So adherents of LOCAL will object to my approach, but we'll see.


The main problem with nested compilation is that two definitions are competing to be added to the dictionary. If one of them is half-finished and the other is started, there is a conflict. Since long ciforth has the word NESTED-COMPILE that handles this, and it is not terribly involved. The situation with quotations is much easier, because quotations are not linked into the dictionary. Here follows the full implementation of above quotations in ciforth , which is -- it must be admitted -- a simplistic Forth.

The addition compared to :NONAME is jumping over the code and LITERAL that make it a denotation. The word SKIP is a simple forward jump, the words (FORWARD and FORWARD) handle the calculation and filling in of jump distances. The compilation STATE is fetched and restored, not manipulated via [ and ] . The addition of a SKIP could be conditional upon STATE saving futile amounts of memory. That would destroy the rule that a denotation builds a constant on the stack and leaves it to LITERAL to decide whether to compile the constant, or leave it alone.
Now the fresh, gay notation works within a definition.
: test { 2 3 * } ;
test EXECUTE .
6 OK

Premature optimisation makes everything much more difficult. A COMPILE, that tries to be clever messes everything up. There are other and better ways to speed a program up, as I hope to demonstrate one day.

  • Other Forth lectures
  • Go to the home page of Albert van der Horst