More Thoughts On A JVM-Based Forth

Wrapping up the fundamental design challenges for a Forth implementation on JVM.

Introduction

This post is the continuation of the ideas discussed earlier in my previous post Thoughts On JVM-based Forth Implementation (please read it if you haven't done so yet), in which a general execution model and an overall stack design was introduced. However, three major topics were either very shallowly discussed or were left untouched. They shall be covered in this post.

Level of Abstraction

Forth was designed in a period when RAM was an incredibly expensive resource to equip your system with; during the absolute reign of micro-controllers and custom-made boards/systems where 64KB of RAM would have been considered a luxury. Therefore it was designed with almost no abstractions over the raw hardware to allow full utilisation of it by the systems programmer.

Nowadays, however, 2GB of RAM is a normal specification for even a cheap laptop[1]. And general programming focus, thanks to higher level facilities like garbage collection[2], have shifted from managing resources to data processing and interacting with external resources. This requires a modern Forth to move up and upgrade its abstractions to a new level.
Instead of dealing with memory bits and bytes and with raw memory addresses, a JVM-based Forth code should not need to deal with anything more primitive than JVM objects; leaving the tedious tasks of object instantiation, memory management and resource management to JVM.

Both the advantages and disadvantages of this approach are somehow obvious:
On the downside, the programmer loses the ultimate control over the hardware in the traditional sense as she becomes the master of an abstraction over that hardware --though I have to admit daily use-cases of needing to manipulate bits and bytes at the raw level have become really hard to find.

On the upside, the programmer is freed from several error prone burdens allowing her to focus on the real algorithm.

Built-in Types

Categorically this can be viewed as part of the previous section, but it's important enough to have a section of its own.

Forth, offers the programmer one data type: number. In Forth, everything is either a number or a word. And this worked fine for the type of environments Forth was designed for. However with the new level of abstraction discussed, it will be impractical to leave the programmer to extract a meaning out of mere numbers.

For a JVM-based Forth, it feels very natural and appropriate to natively support the three main categories of types: Scalar, Non-Scalar and Operational.

Scalar types consists of

Number
Boolean
String

Non-Scalar types consist of

List
Map[3]
Set

And Operational types consist of

Word
Stack Address: used to keep track of markers in the stack (such as HERE)
UGO[4] (unknown general object): the output of any external method call that is not either a Scalar or a Non-Scalar; for example a JVM object.

With a bit of syntactic sugar for supporting the Scalar and Non-Scalar types, programming in a JVM-based Forth can be much faster with the help of modern data structures.

Java Interop

Forth as a concatenative language has a delicately simple yet expressive syntax. It is simple and expressive in that it allows the developer to build a language, a vocabulary for the problem domain that feels and reads much like natural language; and Forth itself and a Forth programmer rely solely on stack effects of each word to make sense of a chain of words. Clearly, it is very important that each word's effect on the stack remains predictable and contained. Now let's continue with this important point in mind.

Generally, there are two approaches to providing interop with the underlying platform:
1. Direct interop (such as Scala or Groovy) which allows the programmer to directly call the platform's "raw" abstractions; usually this is the right approach when the language covers the platform's type system and execution model.

2. Indirect interop (such as Python or OCaml) which allows the programmer to access the "raw" abstractions only through wrappers and adapters; usually this works for cases where the platform's abstractions and types are alien to the language.

Many Java libraries, naturally, make heavy use of the concept of class and state to perform their duties. This makes their performance difficult to document and reason about in terms of stack effects[5]. For example consider a very common case where an object's method changes some referenced object's state to do its job.

Clearly, it's almost impossible to write readable and maintainable programs in a JVM-based Forth by taking the 1st approach. While approach 2, as the practical choice has its own advantages and disadvantages. The main weakness is that to use any Java library, one has to first write the Forth adapter to it. While being a weakness it may as well contribute to the most important characteristic of the language: it's expressiveness. For example, an adapter written on top of Java's hideous I/O streams, provides a consistent and pleasant vocabulary for working with files.

Conclusion

Many topics regarding a Forth implementation on JVM have been, theoretically, discussed. And I believe the material in this post and the previous one can give an almost clear picture to anyone who is interested to do so. But there's one thing missing: real code!
Stay tuned!

Footnotes

Even medium-end smart phones now come with more than this amount of RAM.
Even though garbage collection was introduced in Lisp as early as 60's, it took the IT industry a long time to recognise its value.
Maps can be used even to resemble simple records, with data and words inside: Though appealing at the first glance, this can be gravely dangerous and needs careful consideration.
UGO is actually a sign of my failure to fit Java's execution model into that of Forth. It should not exist and a proper adapter over some Java library should only use words to express operations and occasionally maps to move data around.
Not to mention method overloading.

Search...

Bahman's Musings