other posts

Bard: Had time to make it shorter

Life Lessons Learned From MMORPGs:
Play the game; don't just mash buttons.

Bard has grown yet simpler, and I like it. I may even like it enough to declare, tentatively, that the language design is in its final form for release.

A compiler/vm implementation is coming together nicely. I took a compiler written in Common Lisp and a VM written in Haskell, and I reimplemented them both in Gambit Scheme. One happy side-effect of that choice was that Termite Scheme is built into the VM, and so Erlang-style concurrency primitives are in the toolkit. That doesn't mean that all of Termite will necessarily be available in Bard; the full set of concurrency primitives that Bard provides remains to be seen. But concurrency tools are there in the VM from the get-go.

I decided to go with the VM design in order to facilitate a couple of other goals. One was to have a portable binary format for compiled Bard programs. Another was to support the VM/image style of development that has long been standard in Smalltalk and Lisp environments; it's a good model for rapid development.

I think I'm on track for a public release of this implementation reasonably soon (depending, of course, on what else happens--oh, what a wacky and exciting world we live in!). The release will at least be able to compile and run Bard programs, and will come with a REPL. I don't know very much more than that yet. I have many plans and dreams; we'll see how many of them are realized in the release. The important thing is to release a Bard implementation that can be used to write nontrivial programs. The exact set of initial features is secondary, because they are going to change and expand over time anyway.

The language is very small now. I've been bubbling and frothing with all sorts of ideas during the development process, but I decided that the wise choice was to settle on a very small language that explicitly does not have all of the features I think might be good ideas. Once the initial release is done, then I'll start thinking what to do next, and those thoughts will include musings about expansion both upward (adding more complex and higher-level features) and downward (adding lower-level tools, better access to the underlying hardware, optimizations, and so on).

Bard is still very definitely a Lisp. Its primitive values include nothing, booleans, numbers, names (what other Lisps call symbols and keywords), sequences (including text), frames, and ports.

Frames are associative arrays--maps, dictionaries, hashes, depending on your favorite language. One twist is that Bard considers every value to be a frame, in the way that Smalltalk considers every value to be an object. Of course, primitive values, like integers and booleans, are not represented as associative arrays, but the functions that operate on frames are also defined to operate in a sensible way on all values, which makes for a kind of uniformity that provides some handy utility. You can reason about all values as if they were frames; you can ask questions about them, and you can very easily use any value to construct a frame that describes it and adds arbitrary annotations to the description.

Values are instances of Schemas (or, for the fussy, Schemata). A Schema is a frame that describes a data layout. Each value is an instance of exactly one Schema.

Schemas may be members of Categories. A Category contains a sequence of Supercategories, a sequence of Subcategories, and a sequence of members. The members are Schemata.

A Schema describes a data layout; a Category describes a type. A type is a protocol (that is, a set of generic functions that operate on members of the type), plus supertype and subtype relationships. Bard has a single global graph of Categories arranged into supertype/subtype relationships.

One way to think of this design is that Bard takes CLOS classes and breaks them into two pieces: the piece that is concerned with how to lay out data is the Schema. The piece that is concerned with taxonomy--with subtypesupertype classification and method dispatch--is the Category. One consequence of separating the two is that subtypes don't have to reuse the data layouts of supertypes. Another is that a given Schema can be a member of disjoint types (for example, a single octet can serve simultaneously as a character, an integer, and a set of boolean flags).

Bard functions are generic. Methods are selected based on a value's Schema. Inheritance of methods is supported: if no method specializes on a given Schema, Bard's dispatch mechanism matches against the Categories it belongs to, and their supertypes. Bard uses the C3 class linearization algorithm to order dispatch.

Bard has no global variables, and all but one of its primitive data types are immutable. The exception is a value called a cell: a cell contains a reference to a value, and the value may be updated. Cells are implemented as Termite processes, so Bard programs can put values in cells in other threads, in other programs, and on other machines. Bard's nearest equivalent to global variables are module variables, which refer to cells.

In email, David Moon suggested that this may be going too far; that it may be too much to restrict mutation to cells. He may well be right. If he is, I'll change the language design. Before I do, though, I want to try writing some software with this design and see what I gain and what I lose. If it turns out not to be too restrictive, then it might be pretty nice. Module variables are basically Erlang-style mailboxes, which is awfully handy for writing concurrent and distributed programs. We'll see how it turns out.

Frames are named after the objects of the same name in frame-based AI languages, like FRL, KL-ONE, Cyc, and SK8. In their present form, Bard frames are much much simpler than the frames of these other languages; indeed, they're basically just mappings from names to values, with a very small amount of extra semantics. I chose that name with some thought, though. I wanted to remind myself of SK8 and how advantageous frames were in its design, and keep an open mind about expanding Bard's frames to implement more frame-language features if they turn out to be compelling.

I also wanted to remind myself that in their current form, Bard's frames are extremely flexible, and remind myself to try to keep and expand that flexibility. You can make frames like you would make Java objects, but in some ways they're more like Python objects, in that you can treat the resulting values as dictionaries. You can merge arbitrary frames to create new hybrid values. You can iterate over their slots, filter them, and transform them in various general ways. Even though frames fill the same role as instances of classes in languages like Python and Java, frames are at the same time more like Perl hashes or Clojure maps--just associative arrays that can be manipulated in arbitrary ways. Since every Bard value is logically a frame, this flexibility extends to every Bard value.

Ports are values that represent data sources and data sinks. Like cells, they are internally represented as Termite processes. Again, the concurrency and distribution features are not close to their final form, but we can reason about what is possible by thinking about what Termite can do. Ports can refer to files or processes or memory. They can be local or remote. They are first-class objects, and can be passed to functions or bound to variables. They can be inserted into cells or other ports, so a Bard process can send open a port on a second Bard process and send it to a third Bard process, either locally or remotely.

All Bard values (so far) are serializable, so any Bard data can be sent from one process to another, or saved to and restored from an image file.

That all sounds pretty good; how well does it work? I don't really know yet, because the implementation isn't finished. As I say, though, it's getting pretty far along now. The reader works; the compiler works; the primitive values all work. I need to finish converting the compiler's output for the VM, and I need to finish all the execution cases in the VM.

There are some unanswered questions and quite a few ideas that I think it's best not to implement yet, not until I've seen how a simple implementation of what I've described here really behaves. One question is what to do about continuations? The VM as I've written it is based on Norvig's simple Common Lisp implementation of a Scheme VM, and is capable of full, Scheme-style continuations. Do I expose them at the language level? Or do I instead expose a simpler (though less powerful and general) API, like Dylan's bind-exit? I'm still thinking it over.

I'm not sure if I'll implement modules in the initial release. I;ve implemented them in most of the draft implementations so far, and they're not really hard or anything, but they do add a surprising amount of code, especially in the reader and the front end of the compiler. I like having modules, but I might defer them from the initial release for the sake of simplicity. Or I might decide that doing deferring them would mean that introducing them later will break things, and so go ahead and put them in. We'll see.

The initial release will be pretty much completely unoptimized, and I don't have the remotest idea yet what performance will be like. On the one hand, the Scheme compiler I'm using generates very good code. On the other hand, the Bard implementation has a lot of dynamic stuff, and I've made pretty much no effort at all to optimize for performance. Also, I'm doing a few things in very expensive ways--for example, using Termite processes to implement the closest thing Bard has to global variables. I'm frankly curious to see how it behaves when it all finally comes together.