[ The following is based on that assumption.

However I recently came to doubt the approach following. A practical working source of regular expressions can be found here , and it is based on the observation that compiled inline arguments get too complicated. ]
Most Forths handle in line arguments, such as for ." ).

The code for MATCH-ONE is rather obvious :

\ If the first character of the STRINCCONSTANT matches
\ the inline character, leave the STRINGCONSTANT with out the
\ matching character, else leave the original STRINGCONSTANT.
\ And leave a flag, it MATCHED.
: MATCH-ONE
   R> @+ SWAP >R     \ Get in line character
   >R OVER C@ R> = IF
        1- SWAP 1+ SWAP TRUE
   ELSE
        FALSE
   THEN
;

It becomes also clear that we have to compile

MATCH-ONE [ CHAR A COMPILE, ] ?EXIT

(Or we could replace the FALSE above with a FALSE ??EXIT something like : ??EXIT >R >R RDROP >R >R ; ) I guess that is dangerous, because of all the stuff that is or isn't stored at the return stack .

Now for the difficult part.

MATCH-ONE-MULTIPLE must be retraced if too many are matched. In the first place this means it must be a coroutine call, othwerwise it will never regain control to inspect whether there was a match.

: MATCH-ONE-MULTIPLE
     ( fetch in line )
     ( match first part of string, i.e. multiple same chars )
     ( leave flag and co-return )
\ Get back here after matching the remainder of the string.
\ This is the responsability of the caller.
     ( remainder matched?)
     BEGIN DUP 0= CHARS-LEFT-TO-MATCH? AND WHILE
        ( match one character less)
        ( backtrack ???? )
     REPEAT
     ( clean up)
     ( leave true flag)
;

(MATCH-ONE-MULTIPLE always leaves a true flag, because it is always possible to match zero characters.) The backtracking is the main problem. The code as it stands now has no way to know where the code for the remaining match hangs out. Previously this was just the inline code after the MATCH-ONE-MULTIPLE to be found via R@. But there is a solution : keep the return address ( on the stack of course) for backtracking : copy that address to the return stack do a coroutine return.

With this technique regular expressions become a (tedious) exercice for the reader. (This is not(!) true.)


Go to the home page of Albert van der Horst