Forth lecture 17.

Implementing Lisp in Forth.

This is about implementing lisp in ciforth.
Warning: there is no attempt to portability.
Notation: in the text Forth words are quoted by a single backquote
Implementing a lisp the forth way goes through a number of stages, where at each stage limited functionality can be fully tested.
If we are ever to arrive at the last stage, the lisp will be baptized and its name will be alif (a lisp in Forth). For now only stage 1 is implemented and explained here.

Stage 1: a tiny silly lisp without eval.

Silly means that blank space is required around punctuation.
There is no eval, the REPL is reduced to RPL (pun intended.)

This is a simple implementation of lisp following early descriptions
such as https://cse.sc.edu/~mgv/csce330f13/micromanualLISP.pdf

A representation is required for the Lisp objects.
The represenation choosen is a dea, a (Forth) dictionary
entry address. All dea's for this purposes are indicated as "deal".
Lisp knows atoms and cons.
A dea has 5 fields cfa dfa ffa lfa nfa, which is sufficient to
accomodate this.
This are abreviations for code, data, flag, link, name fields.
The dfa (data) can contain anything. The ffa (flag) contains 64
bit flags. The others contain pointers.
In general variable length data will be accessable through the nfa.
If a deal represents a symbol or number (atom), that symbol or number
is accessable as a string of char's, through the nfa, (name)
containing a pointer to possible dynamic storage.
If a deal represents a cons, mostly a list , the car and cdr are in dfa
resp lfa, the datafield and the link field.
If a handle represents an executable, its symbol is in nfa, and its
action is in cfa.
If a symbol has a value, that value must be a deal, it is stored in the dfa.
Inasfar it is necessary to have type information, that is handled by
bits in the flag field.

Lisp has one namespace, for regular atoms, and maybe another one for atoms
with an associated action. They are mapped on Forth namespaces.
The Forth namespace mechanism uses the lfa, which is free for that
use in atoms (not so in cons cells.)
Note that a list has no name. An atom can have a value that is a list,
which amounts to the same logically, but not implementationwise.

Note that in the deal of a cons 3 cells remain unused. In particular
the nfa must not be used.
Of course cons cells will be simplified and dynamically allocated, if
(ever) this lisp becomes mature.

Implementing a lisp the forth way goes through a number of stages,
where at each stage limited functionality can be fully tested.
After this first stage we will be able to construct lists and display them.
This stage needs a mere 30 lines of code, but they are hard,
for Forthers as well as Lispers.

The Reader takes a line of input and transforms it into a list.
The Forth way to implement this is to use the Forth interpreter
loop. That means that elements of the expression are active and
build a part of the structure. After a line Forth has the word OK
to say "OK". OK can be revectored to display debugging information,
relevant to us is the stack and the situation with respect to namespaces.
In the following the user input is after the prompt >.
Forth code is presented between %-signs.

The debugging information looks like this
  FORTH   [ FORTH   ]
  S[ ]>12
  FORTH   [ FORTH   ]
  S[ 12 ]>

This means that the namespace FORTH is used. Newly defined words are added
to the namespace that is in square brackets. Typing 12 adds it to the stack.

% NAMESPACE lisp                  : ATOMS lisp WORDS PREVIOUS ;   %

We now have added a definition `lisp which is a namespace.
It is known to the interpreter:
  >lisp
  lisp   FORTH   [ FORTH   ]
  S[ ] >

The effect is that the namespace lisp is added to the top of the
search order. It is searched first, then FORTH.

  >WORDS
  lisp   FORTH   [ FORTH   ]
  S[ ]>PREVIOUS
  FORTH   [ FORTH   ]

The above code shows the words in lisp namespace, i.e. nothing.
And removes `lisp from the search-order again.
These three commands together from the commands `ATOMS  using the
characteristic Forth `: `; construction. For now it looks like a noop.

  >ATOMS
  FORTH   [ FORTH   ]
  S[ ]>

% WANT $-PREFIX 0<> TRUE $=                                     %
This means that we want to indicate hex numbers with a $ prefix.

With this bit set in its ffa, : the deal is the empty list.
% $100 CONSTANT nil_mask                                        %
Similar for a list
% $200 CONSTANT atom_mask                                       %
% $400 CONSTANT list_mask                                       %

% : CAR   >DFA @ ;        : CDR   >LFA @ ;
% : CAR!  >DFA ! ;        : CDR!  >LFA ! ;
`CAR fetches the CAR and `CAR! stores it.
It can be demonstrated on a Forth word like `ATOMS.
Note that `ATOMS must be quoted as 'ATOMS otherwise it would be evaluated,
in Forth jargon "executed".

  [ ]>HEX 'ATOMS
  FORTH   [ FORTH   ]
  S[ 415830 ]>CAR
  FORTH   [ FORTH   ]
  S[ 415868 ]>

Like all colon words, the data field's contents points a bit
higher in memory than the dea of ATOMS.
Storing in particular to the lfa cannot be tested  interactively
without utmost care, mostly you will just crash the Forth system.

To understand the word `ATOM: have a look at this code.
  S[ ]>'_ _
  FORTH   [ FORTH   ]
  S[ 402E18 402E18 ]>

The word `_ has the property that when evaluated (as in _ ), it puts its
own dea/handle (i.e. '_) on the stack. This is the behaviour we want in
an atom. To accomplish this in `ATOM: the cfa is copied from the cfa
of `_.

It is custom in Forth to end words that allocate memory with a comma
','.  The five 64-bit words are allocate with the word `, .
The dea of an atom is thus created "manually". For now the heap is used
with no way to recover the memory.

% : ATOM:   NAME $, ALIGN   HERE
% \   Code     Data   Flag      Link  Name
%     '_ >CFA @ , 0 , atom_mask , 0 , SWAP  ,   DUP 'lisp >WID LINK ;

So `ATOM: gets a name from the input stream, stores it permanently
and uses it as the 5th field. The flag field indicates that it is an atom.
The `LINK command adds the dea to the `lisp namespace.
  S[ ]>ATOM: AAP
  FORTH   [ FORTH   ]
  S[ 415C78 ]>

We can check its presence:
  >ATOMS
  AAP
  FORTH   [ FORTH   ]
  S[ 415C78 ]>

It is customary in Forth to indicate the fetch of data by words
ending in monkeytale symbol '@'

% : ATOM@   >NFA @ $@ ;

  S[ 415C78 ]>ATOM@
  FORTH   [ FORTH   ]
  S[ 415C70 3 ]>TYPE
  AAP
  FORTH   [ FORTH   ]
  S[ ]>

`TYPE is the Forth's way to get a string to the terminal.

If an atom doesn't exist, its evaluation must lead to to its
creation. This sounds impossible, and hints that there is a need
for a procedure to look up a name and then take action if it is not
found. It is surprisingly easy to accomplish that with the
standard interpreter, at least in ciforth with the PREFIX mechanism.
Let us add a word `catch-all as the first word to be defined in lisp
like so.
% lisp DEFINITIONS : catch-all ATOM: ;   PREFIX PREVIOUS DEFINITIONS
This is an alias for `ATOM: and it works.

  >lisp catch-all HUIS
  lisp   FORTH   [ FORTH   ]
  S[ ]>ATOMS
  catch-all   HUIS
  lisp   FORTH   [ FORTH   ]
  S[ 415D78 ]>

But it also works if it is concatenated with the atom to be defined
>catch-allMIES
  lisp   FORTH   [ FORTH   ]
  S[ 415D78 415DB0 ]>ATOMS
  AAP   catch-all   HUIS   MIES   lisp   FORTH   [ FORTH   ]
  S[ 415D78 415DB0 ]>

That is what PREFIX does. It is clear that we need such a mechanism
anyway to understand " (QUOTE A) " by making `( a PREFIX.
By patching `catch-all 's name to be empty, every word that doesn't
exist is created as an atom. An empty prefix matches everything.
Also if `lisp is the first namespace to be searched, the search
will never make it to the other namespaces. The underlying Forth
vansishes.

% : ATOM?   >FFA @ atom_mask AND 0<> ;
% : LIST?   >FFA @ list_mask AND 0<> ;
% : NIL?   >FFA @ nil_mask AND 0<> ;

The second data structure is a cons cell, that must at least have
a car (field 2) and a cdr (field4 ). For convenience we create all
cons cells as a list.

% : CONS   SWAP HERE ( cdfln)  0 , SWAP  , list_mask , SWAP , 0 , ;

Two items are passed to `CONS and the deal is returned:

  >1 2 CONS
  FORTH   [ FORTH   ]
  S[ 415FA0 ]>CAR
  FORTH   [ FORTH   ]
  S[ 1 ]>

`CONS@ recovers the cdr  and  car . Note that car is on the most
accessable part of the stack, because in general we will need it first.

% : CONS@ DUP CDR SWAP CAR ;

Now we are in a position to create the infamous empty list NIL and the
symbol for trueity.
% 0 0 CONS CONSTANT NIL
% NIL NIL CAR!   NIL NIL CDR!   NIL >FFA nil_mask atom_mask OR TOGGLE

`NIL and `T must be known in the `FORTH as well as the `lisp namespace.
This is accomplished by defining constants that just return a deal.
We have already seen that a list cannot be in a namespace, which is true
for the empty list too, hence the constants.

% ATOM: T CONSTANT T

The `T is just a regular atom, The constant in the FORTH namespace
leaves its address

  S[ ]>ATOMS
  catch-all   T   [ FORTH   ]
  S[ ]>

You can see that `NIL has not been added to the lisp namespace.
This will be taken care of later.

% : listed  NIL  BEGIN OVER  WHILE CONS REPEAT NIP ;

`listed takes items from the Forth stack to create a list, with 0
as and endsentinel, like so

  FORTH   [ FORTH   ]
  S[ ]>0 10 20 30 40
  FORTH   [ FORTH   ]
  S[ 0 10 20 30 40 ]>listed
  FORTH   [ FORTH   ]
  S[ 416518 ]>

The result is the deal of the first cons cell of the list.
By defining `( as 0 and `) as `listed, it starts to look like a
lisp list. For now we want those words only in `lisp, which is
accomplished by the DEFINITIONS key word. While we're at it
we'll have `NIL in `lisp too. Like so:

% lisp DEFINITIONS
% 'NIL ALIAS NIL
% : ) PREVIOUS listed ;
% : ( lisp 0 ;
% '( PREVIOUS DEFINITIONS ALIAS (

The last alias makes a copy of `( from `lisp in the `FORTH namespace.
Let's try it!

  S[ ]>( ATOM: A   ATOM: B   ATOM: C
  lisp   FORTH   [ FORTH   ]
  S[ 0 4166A0 4166D8 416710 ]>( A B
  lisp   lisp   FORTH   [ FORTH   ]
  S[ 0 4166A0 4166D8 416710 0 4166A0 4166D8 ]>)
  lisp   FORTH   [ FORTH   ]
  S[ 0 4166A0 4166D8 416710 416760 ]>)
  FORTH   [ FORTH   ]
  S[ 416800 ]>

Note that once created atoms `A and `B are known. Also note the
stacking of the `lisp namespace with each `(. After the last closing
bracket we're back in `FORTH.

% : FOR-LIST   >R  BEGIN DUP NIL <> WHILE CONS@ R@ EXECUTE REPEAT
%    R> 2DROP ;
`FOR-LIST accepts a list and a Forth dea. It evaluates the dea
(executes the execution token) for all elements of the list.
In the following example we use ID. (print name) for that.

  >( ATOM: C ATOM: D )
  FORTH   [ FORTH   ]
  S[ 416A60 ]>'ID. FOR-LIST
  C   D
  FORTH   [ FORTH   ]
  S[ ]>

Printing a list is now a matter of recursion. An atom is printed
right away, a list invokes `FOR-LIST
The `:F and ':R are a technicality. They are a forward declaration
of `.lisp , lest it could not be used in the body of `.lisp.

% :F .lisp ;
% :R .lisp DUP LIST? IF &( EMIT '.lisp FOR-LIST &) EMIT SPACE ELSE
%     ATOM@ TYPE SPACE THEN ;

Before using `.lisp we will first seal the `lisp wordlist, by making
catch-all an empty prefix.

% lisp 'catch-all PREVIOUS "" $,   SWAP >NFA !

  >( T ( NIL T ) WE GAAN NAAR ROME )
  FORTH   [ FORTH   ]
  S[ 4300120 ]>.lisp
  (T (() T ) WE GAAN NAAR ROME )
  FORTH   [ FORTH   ]

Previously unknown symbols are created as an atom. Also note
that catch-all has disappeared:

  S[ ]>ATOMS
   T   NIL   )   (   WE   GAAN   NAAR   ROME
  FORTH   [ FORTH   ]
   S[ ]>

That is it for today. The next step is to no longer require the
spaces around the punctuation and add more punctuation
like `' and comment incrementally.
(tiny lisp without evaluation).

  • Other Forth lectures
  • Go to the home page of Albert van der Horst