Thoughts On JVM-based Forth Implementation
Is it possible and worth to implement Forth on top of JVM?
Forth: A language which looks as weird as it is simple and powerful --and it looks extremely weird!
I, first, came to know Forth with help of my good friend Michael a.k.a. ttmrichter and immediately Forth placed itself on top of the favourite language pyramid in my mind, alongside only Lisp family. Since then, I've been eager to do serious coding in Forth. However, there's a problem: most of Forth'ers work in the system/chip programming field which means that there are hardly any higher level libraries (e.g. UI toolkits or database connectors) for someone like me to try Forth in business applications. It's definitely possible to write all those libraries starting from the scratch but even the thought of it turns my stomach!
Recently I've been thinking about implementing Forth on JVM. JVM Forth has several advantages:
The most important expectations from the design are:
Forth: A language which looks as weird as it is simple and powerful --and it looks extremely weird!
I, first, came to know Forth with help of my good friend Michael a.k.a. ttmrichter and immediately Forth placed itself on top of the favourite language pyramid in my mind, alongside only Lisp family. Since then, I've been eager to do serious coding in Forth. However, there's a problem: most of Forth'ers work in the system/chip programming field which means that there are hardly any higher level libraries (e.g. UI toolkits or database connectors) for someone like me to try Forth in business applications. It's definitely possible to write all those libraries starting from the scratch but even the thought of it turns my stomach!
Recently I've been thinking about implementing Forth on JVM. JVM Forth has several advantages:
- It will be platform independent.
- It will have access to tons of existing libraries.
- It will be naturally garbage collected.
Design Overview
- it must behave like a Forth!
- it must be inter-operable with JVM
- Cells are objects instead of memory addresses.
Simple Words
1 2 +
The interpreter reads
1
. Checks the dictionary and passes it on to the “number runner” when it finds nothing in dictionary. Number runner converts 1
to a BigInteger
and pushes it onto the stack. The same happens for 2
.Now the interpreter reaches
+
. The “xt” (execution token) for +
is not a memory address, rather it's a Method
. The interpreter invokes +
on all BigIntegers on top of stack -that's how the arity hassle is solved for now- while popping them from the stack and put the result (another BigInteger
) back onto the stack. So far so good.Defining Words
Now consider this case:
: FOO ( n - n ) 2 * ;
This is easy too. The interpreter reads
:
, looks it up in the dictionary and finds its xt, which is a method again. The method adds a node to the LinkedList
and creates a new entry in the Hashtable
to point to the node. It then reads 2
, passes it to the number runner and puts the result in the node. Next, *
is read which when looked up in the dictionary is translated into a Method
and put in a new node. Finally ;
is read: another method which returns the execution to the interpreter. See figure 2.![]() |
Figure 2 |
Words dealing with memory
The same logic, as we've seen so far, can apply to words like
HERE
which pushes the index of its next node on the LinkedList
onto the stack -for example to be later fetched [LinkedList.get()
] with @
or replaced [LinkedList.set()
] with !
.Defining "defining words"
Let's examine the almost simplest defining word possible and its execution.
: BAR CREATE C, DOES> C@ ;
10 BAR MY-BAR
MY-BAR
Figure 3 is the internal status prior to reading this chunk of code.
When line 1 is read, nothing special happens. Another entry is added to
But when line 2 is read, the structure changes as shown in figure 5.
Now it's easy to guess what happens when line 3 is read. The index of
Consider this case:
Assuming that
The curious reader might ask what if we have overloaded functions with different arities/argument types? Good question. Imagine the following Java class (which we assume is on the classpath).
Using the same idea, we can instantiate Java classes and use them just like native Forth words. For example:
![]() |
Figure 3 |
When line 1 is read, nothing special happens. Another entry is added to
Hashtable
and a series of nodes are appended to the LinkedList
. See figure 4.![]() |
Figure 4 |
But when line 2 is read, the structure changes as shown in figure 5.
![]() |
Figure 5 |
Now it's easy to guess what happens when line 3 is read. The index of
BigInteger(10)
is pushed onto the stack, the control jumps to the node with xt address of BAR
, the content of the address (index of node containing 10) is fetched and pushed onto the stack and the control hits 2 consecutive EXIT
s.Java interoperability
Adding Java inter-op to this design is easy. The interpreter, in case of not finding a word in dictionary, must lookup the currently available Java classes on the classpath for that name before popping up an error.Consider this case:
10 System.out.print
Assuming that
System.out.print
has not been already defined as a word, the interpreter must invoke the Java method with the same name & package.The curious reader might ask what if we have overloaded functions with different arities/argument types? Good question. Imagine the following Java class (which we assume is on the classpath).
public class MyClass {
public static int myMethod(int, int) {
// ...
}
public static int myMethod(int, int, int) {
// ...
}
public static int myMethod(int, float) {
// ...
}
}
The approach to be able to call each of them is like below:10 20 30 MyClass.myMethod/2 \ myMethod(int, int)
10 20 30 MyClass.myMethod/3 \ myMethod(int, int, int)
10 20 30.1 MyClass.myMethod/2 \ myMethod(int, float)
On lines 1 and 3, using the /
notation we tell the interpreter to execute the method of arity 2. The interpreter should, based on the type of the last two items on stack, decide which version of myMethod/2
it should invoke. Line 2 is too easy to explain!Using the same idea, we can instantiate Java classes and use them just like native Forth words. For example:
"/home/bahman/somefile.txt" java.io.File. \ creates a new File object. Notice the the extra “.” after File
.delete \ deletes the file.
\ The extra “.” in front of 'delete' tells the interpreter that this is a member method of the topmost item on the stack
Conclusion?
Before concluding anything, I'd like to ask all seasoned Forth programmers who read this article, to share their opinions on the "expectations", specially if this design will “remain” a Forth?
Comments
Post a Comment