Thoughts On JVM-based Forth Implementation
Forth: A language which looks as weird as it is simple and powerful --and it looks extremely weird!
I, first, came to know Forth with help of my good friend Michael a.k.a. ttmrichter and immediately Forth placed itself on top of the favourite language pyramid in my mind, alongside only Lisp family. Since then, I've been eager to do serious coding in Forth. However, there's a problem: most of Forth'ers work in the system/chip programming field which means that there are hardly any higher level libraries (e.g. UI toolkits or database connectors) for someone like me to try Forth in business applications. It's definitely possible to write all those libraries starting from the scratch but even the thought of it turns my stomach!
Recently I've been thinking about implementing Forth on JVM. JVM Forth has several advantages:
- It will be platform independent.
- It will have access to tons of existing libraries.
- It will be naturally garbage collected.
- it must behave like a Forth!
- it must be inter-operable with JVM
And the key design trick is:
- Cells are objects instead of memory addresses.
I'll try to explain the overall design with common use cases.
|Fig 1: Internal data structures (rev #1)|
|Fig 2: Internal data structure with FOO defined (rev #2)|
Consider the simplest case:
The interpreter reads
1. Checks the dictionary and passes it on to the “number runner” when it finds nothing in dictionary. Number runner converts
1 to a
BigInteger and pushes it onto the stack. The same happens for
Now the interpreter reaches +. The “xt” (execution token) for + is not a memory address, rather it's a
Method. The interpreter
invokes + on all
BigIntegers on top of stack -that's how the arity hassle is solved for now- while popping them from the stack and put the result (another
BigInteger) back onto the stack. So far so good.
Now consider this case:
This is easy too. The interpreter reads
:, looks it up in the dictionary and finds its xt, which is a method again. The method adds a node to the
LinkedList and creates a new entry in the
Hashtable to point to the node. It then reads
2, passes it to the number runner and puts the result in the node. Next,
* is read which when looked up in the dictionary is translated into a
Method and put in a new node. Finally
; is read: another method which returns the execution to the interpreter. See figure 2.
Words dealing with memory
The same logic, as we've seen so far, can apply to words like
HERE which pushes the index of its next node on the
LinkedList onto the stack -for example to be later fetched [
@ or replaced [
Defining “defining words”
Let's examine the almost simplest defining word possible and its execution.
Figure 3 is the internal status prior to reading this chunk of code.
When line 1 is read, nothing special happens. Another entry is added to
Hashtable and a series of nodes are appended to the
LinkedList. See figure 4.
But when line 2 is read, the structure changes as shown in figure 5.
Now it's easy to guess what happens when line 3 is read. The index of
BigInteger(10) is pushed onto the stack, the control jumps to the node with address of
BAR xt, the content of the address (index of node containing 10) is fetched and pushed onto the stack and the control hits 2 consecutive
|Fig 3: Internal data structures before reading the chunk||Fig 4: Internal state after reading BAR||Fig 5: Internal state after reading MY-BAR|
Adding Java inter-op to this design is easy. The interpreter, in case of not finding a word in dictionary, must lookup the currently available Java classes on the classpath for that name before popping up an error.
Consider this case:
System.out.print has not been already defined as a word, the interpreter must invoke the Java method with the same name & package.
The curious reader might ask what if we have overloaded functions with different arities/argument types? Good question. Imagine the following Java class (which we assume is on the classpath).
The approach to be able to call each of them is like below:
On lines 1 and 3, using the / notation we tell the interpreter to execute the method of arity 2. The interpreter should, based on the type of the last two items on stack, decide which version of myMethod/2 it should invoke. Line 2 is too easy to explain!
Using the same idea, we can instantiate Java classes and use them just like native Forth words. For example:
Before concluding anything, I'd like to ask all seasoned Forth programmers who read this article, to share their opinions on the expectations, specially if this design will “remain” a Forth?