You are here: Home / Blogs / Thoughts On JVM-based Forth Implementation

Thoughts On JVM-based Forth Implementation

Is it possible and worth to implement Forth on top of JVM?

Forth: A language which looks as weird as it is simple and powerful --and it looks extremely weird!

I, first, came to know Forth with help of my good friend Michael a.k.a. ttmrichter and immediately Forth placed itself on top of the favourite language pyramid in my mind, alongside only Lisp family. Since then, I've been eager to do serious coding in Forth. However, there's a problem: most of Forth'ers work in the system/chip programming field which means that there are hardly any higher level libraries (e.g. UI toolkits or database connectors) for someone like me to try Forth in business applications. It's definitely possible to write all those libraries starting from the scratch but even the thought of it turns my stomach!

Recently I've been thinking about implementing Forth on JVM. JVM Forth has several advantages:

  • It will be platform independent.
  • It will have access to tons of existing libraries.
  • It will be naturally garbage collected.


Design overview

The most important expectations from the design are:

  • it must behave like a Forth!
  • it must be inter-operable with JVM

And the key design trick is:

  • Cells are objects instead of memory addresses.

I'll try to explain the overall design with common use cases.


Simple words

JVM Forth Sample Internal Data Structures - 1
Fig 1: Internal data structures (rev #1)

Fig2: Internal data structure with FOO defined (rev #2)
Fig 2: Internal data structure with FOO defined (rev #2)

First, let's assume that dictionary is -for faster execution- a Hashtable each of its entries pointing to indices in a LinkedList which is the real dictionary in Forth's terms. See figure 1.

Consider the simplest case:

1 2 + 

The interpreter reads 1. Checks the dictionary and passes it on to the “number runner” when it finds nothing in dictionary. Number runner converts 1 to a BigInteger and pushes it onto the stack. The same happens for 2.

Now the interpreter reaches +. The “xt” (execution token) for + is not a memory address, rather it's a Method. The interpreter invokes + on all BigIntegers on top of stack -that's how the arity hassle is solved for now- while popping them from the stack and put the result (another BigInteger) back onto the stack. So far so good.


Defining words

Now consider this case:

: FOO ( n - n ) 2 * ;

This is easy too. The interpreter reads :, looks it up in the dictionary and finds its xt, which is a method again. The method adds a node to the LinkedList and creates a new entry in the Hashtable to point to the node. It then reads 2, passes it to the number runner and puts the result in the node. Next, * is read which when looked up in the dictionary is translated into a Method and put in a new node. Finally ; is read: another method which returns the execution to the interpreter. See figure 2.


Words dealing with memory

The same logic, as we've seen so far, can apply to words like HERE which pushes the index of its next node on the LinkedList onto the stack -for example to be later fetched [LinkedList.get()] with @ or replaced [LinkedList.set()]with !.


Defining “defining words”

Let's examine the almost simplest defining word possible and its execution.


Figure 3 is the internal status prior to reading this chunk of code.

When line 1 is read, nothing special happens. Another entry is added to Hashtable and a series of nodes are appended to the LinkedList. See figure 4.

But when line 2 is read, the structure changes as shown in figure 5.

Now it's easy to guess what happens when line 3 is read. The index of BigInteger(10) is pushed onto the stack, the control jumps to the node with address of BAR xt, the content of the address (index of node containing 10) is fetched and pushed onto the stack and the control hits 2 consecutive EXITs.

Fig 3: Internal data structures before reading the chunk Fig 4: Internal state after reading BAR Fig 5: Internal state after reading MY-BAR
Fig 3: Internal data structures before reading the chunk Fig 4: Internal state after reading BAR Fig 5: Internal state after reading MY-BAR


Java Inter-operability

Adding Java inter-op to this design is easy. The interpreter, in case of not finding a word in dictionary, must lookup the currently available Java classes on the classpath for that name before popping up an error.

Consider this case:

10 System.out.print

Assuming that System.out.print has not been already defined as a word, the interpreter must invoke the Java method with the same name & package.

The curious reader might ask what if we have overloaded functions with different arities/argument types? Good question.  Imagine the following Java class (which we assume is on the classpath).

public class MyClass {
    public static int myMethod(int, int) {
        // ...

    public static int myMethod(int, int, int) {
        // ...

    public static int myMethod(int, float) {
        // ...

The approach to be able to call each of them is like below:

10 20 30   MyClass.myMethod/2    \ myMethod(int, int)
10 20 30   MyClass.myMethod/3    \ myMethod(int, int, int)
10 20 30.1 MyClass.myMethod/2    \ myMethod(int, float)

On lines 1 and 3, using the / notation we tell the interpreter to execute the method of arity 2. The interpreter should, based on the type of the last two items on stack, decide which version of myMethod/2 it should invoke. Line 2 is too easy to explain!

Using the same idea, we can instantiate Java classes and use them just like native Forth words. For example:

"/home/bahman/somefile.txt" \ creates a new File object. Notice the the extra “.” after File
.delete  \ deletes the file.
         \ The extra “.” in front of 'delete' tells the interpreter that this is a member method of the topmost item on the stack



Before concluding anything, I'd like to ask all seasoned Forth programmers who read this article, to share their opinions on the expectations, specially if this design will “remain” a Forth?

Kaplan says:
Jun 22, 2018 08:59 AM

Spooky – I had quite similar ideas about a RPN language. What on todays computers is not really a point anymore, are space and speed. The main reason for a portable RPN language is Charles Henry Moore's thesis "Problem Oriented Language". Okay, You can program a POL also in a language like eg. groovy. But for smaller tasks it is an overkill and the resulting source is incomprehensibly for others. My thoughts resulted in the programming language 7th []. Of course 7th is interoperable with Java by supporting the ScriptEnging design pattern.
By the way, my data stack is a synchronized ArrayList.

Anonymous says:
Aug 08, 2018 08:08 PM

Nicely made! I had similar ideas for “7th”, but I was mainly interested in the POL (Problem Oriented Language) aspect. So I left the whole machine dependent stuff out. I made instead 7th object-oriented, type-safe and added array, map and complex numbers.
Try it out. You need only to download the Seventh.jar file – – and start…

Add comment

You can add a comment by filling out the form below. Plain text formatting. Comments are moderated.

Question: What is 8 + 1 - 7 ?
Your answer: