Thoughts On JVM-based Forth Implementation

Is it possible and worth to implement Forth on top of JVM?

Forth: A language which looks as weird as it is simple and powerful --and it looks extremely weird!

I, first, came to know Forth with help of my good friend Michael a.k.a. ttmrichter and immediately Forth placed itself on top of the favourite language pyramid in my mind, alongside only Lisp family. Since then, I've been eager to do serious coding in Forth. However, there's a problem: most of Forth'ers work in the system/chip programming field which means that there are hardly any higher level libraries (e.g. UI toolkits or database connectors) for someone like me to try Forth in business applications. It's definitely possible to write all those libraries starting from the scratch but even the thought of it turns my stomach!

Recently I've been thinking about implementing Forth on JVM. JVM Forth has several advantages:

It will be platform independent.
It will have access to tons of existing libraries.
It will be naturally garbage collected.

Design Overview

The most important expectations from the design are:

it must behave like a Forth!
it must be inter-operable with JVM

And the key design trick is:

Cells are objects instead of memory addresses.

I'll try to explain the overall design with common use cases.

Simple Words

First, let's assume that dictionary is -for faster execution- a Hashtable each of its entries pointing to indices in a LinkedList which is the real dictionary in Forth's terms. See figure 1.

Figure 1

Consider the simplest case:

1 2 +

The interpreter reads 1. Checks the dictionary and passes it on to the “number runner” when it finds nothing in dictionary. Number runner converts 1 to a BigInteger and pushes it onto the stack. The same happens for 2.

Now the interpreter reaches +. The “xt” (execution token) for + is not a memory address, rather it's a Method. The interpreter invokes + on all BigIntegers on top of stack -that's how the arity hassle is solved for now- while popping them from the stack and put the result (another BigInteger) back onto the stack. So far so good.

Defining Words

Now consider this case:

: FOO ( n - n ) 2 * ;

This is easy too. The interpreter reads :, looks it up in the dictionary and finds its xt, which is a method again. The method adds a node to the LinkedList and creates a new entry in the Hashtable to point to the node. It then reads 2, passes it to the number runner and puts the result in the node. Next, * is read which when looked up in the dictionary is translated into a Method and put in a new node. Finally ; is read: another method which returns the execution to the interpreter. See figure 2.

Figure 2

Words dealing with memory

The same logic, as we've seen so far, can apply to words like HERE which pushes the index of its next node on the LinkedList onto the stack -for example to be later fetched [LinkedList.get()] with @ or replaced [LinkedList.set()] with !.

Defining "defining words"

Let's examine the almost simplest defining word possible and its execution.

: BAR CREATE C, DOES> C@ ;
10 BAR MY-BAR
MY-BAR

Figure 3 is the internal status prior to reading this chunk of code.

Figure 3

When line 1 is read, nothing special happens. Another entry is added to Hashtable and a series of nodes are appended to the LinkedList. See figure 4.

Figure 4

But when line 2 is read, the structure changes as shown in figure 5.

Figure 5

Now it's easy to guess what happens when line 3 is read. The index of BigInteger(10) is pushed onto the stack, the control jumps to the node with xt address of BAR, the content of the address (index of node containing 10) is fetched and pushed onto the stack and the control hits 2 consecutive EXITs.

Java interoperability

Adding Java inter-op to this design is easy. The interpreter, in case of not finding a word in dictionary, must lookup the currently available Java classes on the classpath for that name before popping up an error.

Consider this case:

10 System.out.print

Assuming that System.out.print has not been already defined as a word, the interpreter must invoke the Java method with the same name & package.

The curious reader might ask what if we have overloaded functions with different arities/argument types? Good question. Imagine the following Java class (which we assume is on the classpath).

public class MyClass {
    public static int myMethod(int, int) {
        // ...
    }

    public static int myMethod(int, int, int) {
        // ...
    }

    public static int myMethod(int, float) {
        // ...
    }
}

The approach to be able to call each of them is like below:

10 20 30   MyClass.myMethod/2    \ myMethod(int, int)
10 20 30   MyClass.myMethod/3    \ myMethod(int, int, int)
10 20 30.1 MyClass.myMethod/2    \ myMethod(int, float)

On lines 1 and 3, using the / notation we tell the interpreter to execute the method of arity 2. The interpreter should, based on the type of the last two items on stack, decide which version of myMethod/2 it should invoke. Line 2 is too easy to explain!

Using the same idea, we can instantiate Java classes and use them just like native Forth words. For example:

"/home/bahman/somefile.txt" java.io.File. \ creates a new File object. Notice the the extra “.” after File
.delete  \ deletes the file.
         \ The extra “.” in front of 'delete' tells the interpreter that this is a member method of the topmost item on the stack

Conclusion?

Before concluding anything, I'd like to ask all seasoned Forth programmers who read this article, to share their opinions on the "expectations", specially if this design will “remain” a Forth?

Search...

Bahman's Musings