For scripting we must get rid of messages during startup. At startup, normally a sign on message is presented, showing what Forth and what version you are talking to.
It helps if we have all this information gathered in a single word called .SIGNON. Typically it prints the contents of the environment queries NAME SUPPLIER VERSION CPU. Of course CPU is a double number printed in base hexatrentical.
The FIG tradition printed this sign on message with each ABORT . Coos Haak insists that ABORT should be silent, because QUIT is supposed to be silent. I am not quite convinced this is a correct interpretation of the ISO Forth standard. I see systems like GForth printing a lot of information on ABORT's (or any THROW) and I think that is a Good Thing. It is a good idea to have diagnostics information printed at the place where the final and fatal exception is caught, but I also think is is good to separate this from code that is supposed to do a reinitialisation. Of course error detection and post mortem analysis is an area where there should be much room for customization, and a possibility to insert sophisticated tools. Later on, because it definitely doesn't belong in the kernel of a Forth system.
So ideally we have about this situation.
ABORT executes 2 THROW. The exception is caught and
all possible help is given to find out about the error.
Then execute a silent reinitialisation that for a lack of a better
name we could call (ABORT). (Sane people would call it INIT.)
This word has the effect of QUIT plus cleaning of stacks.
Bottom line is that COLD calls (ABORT) and this doesn't result in any messages.
We are left with two sources of messages:
'NOOP 'OK 3 CELLS MOVEAt least in ciforth that is easy. The above code copies the behavior of NOOP (a no operation word) into OK .
If you didn't factor out the printing of "OK" to a separate word, now is the time. It is a great place to insert a stack print if you are debugging too.
Now the last trick. How do we find out whether we are
talking to a terminal? This is a bit system-dependent.
In linux that goes like this:
CREATE TERMIO 60 ALLOT HEX 0 5401 TERMIO 36 LINOS 0<This asks Linux, using an operating system call 36, to fill in TERMIO with the properties of the terminal. If it gives a negative result, that means it failed, and we are not connected to a terminal at all, but to a stream.
The constants 60, 5401 and 36 are looted from c after a long
and bloody battle. On a typical Red Hat system, there are 8 files
called termios.h , and one of them includes a file that defines
TCGETS as 5401. (Or includes a file that includes ... ).
So at last, this is the code to be present in COLD :
0 5401 TERMIO 36 LINOS 0< IF 'NOOP 'OK 3 CELLS MOVE ELSE .SIGNON THENAnd if you don't want to do it yourself.
ALSO ENVIRONMENT : .SIGNON CR SUPPLIER TYPE "is proud to present " CR BASE @ 36 BASE ! CPU D. BASE ! NAME TYPE SPACE VERSION TYPE CR ; PREVIOUS
This assumes that environment queries are Forth words present in an ENVIRONMENT word list. This is not ISO, but this approach is taken by GForth, iForth, tForth, ciforth and probably others.
Simple scripting. Let's say we have a Forth that shuts up if it senses that we are talking to it through a channel, so not an interactive terminal. Then in a Unix system we already have a practical scripting system, in combination with the powers of the Unix command interpreters, (called "shell"). For example a script to add 1 to 2 and print the results: forth < 'THEEND' 1 2 + . BYE THEEND This uses a feature called a here document. The remainder up till "THEEND" is passed to the forth program. Of course it is more useful to have a script called add, and pass it the parameters 2 and 3: add 2 3 5 The script would now look like: forth < THEEND $1 $2 + . CR BYE THEENDThe quotes are missing from THEEND. To the shell this means that it must interpret the lines before passing them on. In this case $1 gets replaced by 2 and $2 by 3. The shell will also make the Unix environment available, a set of strings with information about the environment a program is running in. An environment variable is a name, not a number. It is likewise preceeded by $ , for example "$HOME" and expanded by the shell to what it was set to. Environment variables contains such things as the current directory, the users name, and all sorts of information you care to pass to programs, such as library names, or the preferred place for video editing and cd writer programs to write huge scratch files. The most famous is undoubtedly PATH. It is a row of directories where the shell looks for programs.
Of course passing Forth code through a shell is dangerous. Unix shells are the kind of tools as on that picture of Brody. (On my page I will show the hammer-screwdrive-whatnot if I can get permission.) It will do so many things that at least one is unexpected, causing problems. (Careful people can put all lines between single quotes by default, but that is ugly.)
As an aside, the command interpreters on MSDOS systems are
plain bad in comparison. The default ones are all called
COMMAND , they change without notice, they are not powerful
and they are not sufficiently documented. There seems to
be an official Korn-shell for WINDOWS, but it is not according
to the specification (says a man named Korn. 1) ) However, that
being said, the above techniques apply to MSDOS mutatis
mutandis and can achieve useful results.
[ 1) I hope that is no urban legend. Even if it is, it is the kind of
anecdote that is true, even if it isn't. ]
The Unix system, the Bourne shell and the Kernighan&Ritchie c-compiler were all designed together. No wonder that they cooperate well. A shell passes the command line arguments and the environment variables to C as you can see in the declaration of main :
int main(int argc, char *argv[], char *env[]);A c-program has nothing to translate, the parameters are just there because the shell is expecting a c-program. On operating system oriented towards other languages, such as MSDOS where the systems programming languages is BASIC, a c-program needs a preambule to analyse data area's. And is in that respect no better off than Forth.
You see that a program also passes in int back. A zero indicates a successful completion, any other number identifies an error condition, comparable with a throw code. It is a pity that Forth has no provision in BYE to pass information back. However it is of course possible to have a variable EXIT-CODE or some such and pass its value to to the OS during BYE This cannot break any existing code. It is implemented in ciforth.
What hook do we need in a Forth system to get at the argument and environment information? Under a Unix system this is typically extremely simple. On a Forth that relies on C for the connection with the operating system, such a gForth, it is both simple and portable. On a Forth defined in assembler it is still quite simple, but system dependant.
A c-function gets its arguments via the stack. The function main is no exception to this. It is sufficient to remember the stack pointer.
The following example is from ciforth for GNU-Linux on Intel 386:
MOV LONG[USINI+(CW*(31))],ESP ;Remember ARGS.ARGS is defined as a user variable with an offset of 31 cells in the user area.
This is the dictionary entry:
ARGS "arguments" --- addr Return the addr of ARGS, a user variable that contains a system dependant pointer to any arguments that are passed from the operating system to ciforth during startup. In this ciforth it points to an area with the argument count, followed by a a null ended array of arguments strings, then by a null ended array of environment strings. This leads to the following code. The comment uses the Stallman convention, see lecture 3 (forth coming.) \ Return the NUMBER of arguments passed by Linux : ARGC ARGS @ @ ; \ Return the argument VECTOR passed by Linux : ARGV ARGS @ CELL+ ; \ Return the environment POINTER passed by Linux : ENV ARGS @ @+ 1+ CELLS + ; An indispensable word to deal with c-strings is also \ For a CSTRING (pointer to zero ended chars) return a STRING. : Z$@ DUP BEGIN COUNT 0= UNTIL 1- OVER - ; For example if forth is started with lina HELLO_WORLD The code ARGV CELL+ Z$@ TYPE would print the second argument, i.e. the first argument passed to forth Looking up an environment string C-data structures are territory alien to Forth. Looking up an environment string is not totally trivial. Lets first define what we want: GET-ENV "get environment string" sc1 -- sc2 So a string constant SC1 is passed in, and another one is passed out. A string constant is an address length pair where you are not supposed to reach through to change at the character level. See forth lecture 13. (forth coming.) For the possibility that an environment string is not found, the following convention is used. The address of sc2 is zero. This is called a NULL-string. Of course an environment string can have zero characters. Then sc2 has a length of zero, but a non-zero address. This convention is c-ish, and born from the impossibility to pass more than one parameter back. In Forth you could define the stack diagram as (sc1 -- sc2 false/true), But I don't like that. If you prefer that you can always do ; GET-ENV GET-ENV OVER ; In programming the word GET-ENV I learned something. If you test a word, and it fails, it may be too complicated. If a word contains more than say 7 words or it contains a nested control structure, you may conclude it is is too complicated from the very fact that it fails a test. What did Jeff Fox say about Chuck Moore? "He doesn't spend time debugging." The reason is that he makes the words so simple that they work the first time. I may never become as good a programmer as Chuck, but I can try to do the same trick. As can you. (And maybe Chuck doesn't get regular expressions right the first time as often as I do.) Back to looking up strings in the environment, we see that one of three possibilities can occur in comparing with a particular environment string. That environment string can be a NULL-string, meaning we have reached the end of environment. Otherwise it can compare equal, or unequal. This is sufficiently complicate to warrant generating a new word for it. Note that in addition we need a flag whether we must go on searching. For some reason I cannot recall, I have named this word (MENV) Its implementation is rather straight forward now. \ For SC and ENVSTRING leave SC / CONTENT and GOON flag. : (MENV) DUP 0= IF DROP 2DROP 0. 0 ELSE Z$@ &= $/ 2SWAP >R >R 2OVER COMPARE IF RDROP RDROP 1 ELSE 2DROP R> R> 0 THEN THEN ; (&= is a denotation, see forth lecture 1 denotations. forthcoming. read CHAR = or [CHAR] = for it in the mean time.) If I didn't get that one right the first time, I would have factored out the second line. That is the tricky part. After $/ ("string slash") (see forthlecture 12 forthcoming) we have three strings, the one to look up, the environment name and the environment content. The environment content is put on the return stack. Then we compare, keeping the string to lookup. Depending on the outcome the content or the original string is dropped. GET-ENV itself is now easy and need no further comment. ( Find a STRING in the environment, -its VALUE or NULL string) : GET-ENV ENV BEGIN @+ SWAP >R (MENV) WHILE R> REPEAT RDROP ; And at last an example: "HOME" GET-ENV TYPE /home/albert OK (" starts a denotation, it leaves a string constant. See lecture 1 forthcoming. )Options
ARGS @ @ 1 - IFtests whether there are any arguments passed to the lina.
Shell variables.
The word GET-ENV can be used to look up a string in the environment. With $ we can make a denotation of it. It remains to be seen whether we want binary search. If they are not ordered it may be no use.
Using T] and T[. Just do
WANT -scripting-You can now just loop outside of a definition:
10 0 DO I . LOOPThis works, but I am not happy with the way conditional and loops are done in Forth, want the Algol way.
Regular expressions in C or other languages are handled by creating a compiled string that is interpreted. In Forth it would result in compiling to a temporary definition.
: EMATCH ECOMPILE EXECUTE ;
EMATCH gives -1 if not matched and otherwise the number of bracketed expressions. Under the number of bracketed expressions are as many strings.
EREPLACE returns a string where \1 \2 etc are replaced by the expressions returned from EMATCH.
Strings In combination with the conditional stuff that generates very volatile strings we need
: =$ $, CREATE , , DOES 2@ ;Used as in ... if ".bin.edu" else ".bin.org" then =$ wwwtail$
Notes EREPLACE is also handy for
.if www$ 2DUP domain$ 1 "\1$" EREPLACE EMATCH 0= ipadd$ 1 "^\1" EREPLACE EMATCH 0= AND .then .. .else .. .fi
I am tempted to add the following syntactic joke to " that is non standard anyway. It parses another character. It must be blank ; or . . If it is a ; another TYPE is compiled. If it is . another TYPE and CR are compiled, such that we get
"You site had "; hits . " hits today!".
The extreme terseness beats perl, but the terseness is probably not in line with the less terseness in other areas.
First of all we need the double precision return stack words <R 2R< 2R@ . If needed they can be defined by:
: 2>R POSTPONE SWAP POSTPONE >R POSTPONE >R ; IMMEDIATE : 2R> POSTPONE R> POSTPONE R> POSTPONE SWAP ; IMMEDIATE : 2R@ POSTPONE 2R> POSTPONE 2DUP POSTPONE 2>R ; IMMEDIATE
Secondly the word $/ is indispensible once again.
Its Stallman stack comment is:
Split a STRING on a DELIMITER, leaving the PART before and
the PART after the delimiter.
For example:
"ABCDEF" &C $/ TYPE &| EMIT TYPE AB|DEF OKIn order to find out how to implement string loops, we imagine how we would print a file:
: .FILE GET-FILE $DO I$ TYPE CR $LOOP ;This is equivalent to
: .FILE GET-FILE BEGIN \ 1 ^J $/ 2SWAP 2>R 2>R \ 1 2R@ TYPE CR OVER WHILE \ 1 2R> ( current line) 2DROP 2R> \ 2 REPEAT \ 2 2DROP \ 2 ;The results of $/ are swapped in order to access the string using the standard word 2R@ .
'POSTPONE ALIAS % : I$ % 2R@ ; IMMEDIATE : $|DO % >R % BEGIN % R@ % $/ % 2SWAP % 2>R % 2>R ; IMMEDIATE : $DO ^J % LITERAL % $|DO ; IMMEDIATE : $LOOP % OVER % WHILE % 2R> % 2DROP % 2R> % REPEAT % 2DROP % RDROP ; IMMEDIATE '% HIDDEN
Now if we assume that the T] T[ are present, we can add words that do the looping even in interpret mode. These words compile to a temporary area.
WANT T[ : $do T] POSTPONE $DO ; IMMEDIATE : $|do T] POSTPONE $|DO ; IMMEDIATE : $loop POSTPONE $LOOP POSTPONE T[ ; IMMEDIATE
: TEST $DO I$ TYPE $LOOP ; "AAP" TEST AAP OKThis is an example of splitting a string on a delimiter '|':
: TEST2 &| $|DO I$ TYPE $LOOP ; "A|B|C|D|E|F" TEST2 ABCDEF OKor shorter
"A|B|C|D|E|F" &| $|do I$ TYPE $loop ABCDEF OK Alternatively "A|B|C|D|E|F" BEGIN &| $/ OVER WHILE TYPE REPEAT 2DROP Once you recognize that you can discriminate between andn empty string and a non-existing strings, there is hardly a merit to a $DO .. $LOOP construct.Prints out all lines of the file "aap" that are not empty:
"aap" GET-FILE $do I$ -TRAILING DUP 0= IF TYPE CR ELSE 2DROP THEN $loopPrint all lines that do not start with "\ " :
"x.frt" GET-FILE $do I$ OVER "\ " CORA IF TYPE CR ELSE 2DROP THEN $loop
How to handle regular expressions in Forth.
As you all know the classic way to implement reg expr
is ( Kernighan & Pike FORTRAN techniques)
compile the reg expr string into an intermediate code that
is interpreted like this.
"ab*[ab]" becomes in c.
int imp[] = { MATCH-ONE , 'a', MATCH-ONE-MULTIPLE , 'b', MATCH-SET, 2, 'a', 'b', }
But if you imagine that MATCH-xxx is a forth word that handles in lines arguments, it becomes clear that you want to compile the string to Forth code immediately. It becomes
POSTPONE MATCH-ONE [ CHAR a COMPILE, ] ...
However this somehow doesn't work out..
( CP EP -- CP' EP' FLAG )
Where CP points to the characters and EP to the regular expression.
If there is a match, CP is advanced to CP' . EP is to EP' and true is
returned as the FLAG.
Otherwise the pointers are left as is, and false is returned.
See the words RE-MATCH and RE-REPLACE for usage.
The following aspects are handled:
Specific for Forth is that < and > and \w all observe Forth white space. For this the Forth system must supply a word ?BLANK that returns for a character whether is considered blank in this Forth ( ch -- flag).
( CP EP -- CP' EP' FLAG ) as explained above.
Simple matchers advance EP one item.
Quantified matchers match against the whole remaining expression and handle backtracking.
You can get the source here.
It presupposes some other small wordsets, so you may prefer
get the archive.
Even if you don't want these regular expressions you may
want the test set.
RE-MATCH ( sc1 sc2 -- flag )
For STRING and regular expression STRING:
"there IS a match". \0 ..\9 are been filled in.
RE-MATCH" ( sc "expression" -- flag)
Only to be used while compiling.
For STRING and "inline regular expression":
"there IS a match". \0 ..\9 are filled in.
RE-REPLACE ( sc -- sc' )
Use the replacement STRING to replace the matched part for a recent call
of ``RE-MATCH''. Leave the replaced string. This is a static buffer, and
must be copied before passing to words in this package.
Other Forth lectures
Go to the home page of Albert van der Horst