website | twitter

Tuesday, September 14, 2010

Tamacola (2)


Once the assembler was done, I was able to test verious Tamarin VM's features, even I wrote a tiny GUI application on Adobe Flash in the assembler. Then next step is the compiler.

Another goal of the project was to port Ian Piumarta's COLA framework to Tamarin (the project name came from this). And perhaps this is only the real technical contribution of the project. COLA is a meta language (a programming language to design another language) which resembles Scheme. COLA has a nice sub-language called Parser Expression Grammar that makes parser very terse. My plan was to write a boot-strappng compiler in PEG and COLA, then to implement COLA library, and to write the real compiler in PEG and Tamacola itself.

I won't give you the detail of PEG. But briefly, it is as simple as a regular expression and as powerful as a context free grammar.

When that time I started writing the compiler, COLA has no library at all except PEG framework, so I needed to write necessary libraries by myself from scratch. Fortunately COLA has quite a powerful external function call feature (a kind of FFI), macro sysytem, and a flexible object oriented framework. So writing library is not so hard. But I tried not to use COLA specific features as possible because it would be a problem when I rewrite the compiler in Tamacola itself later.

To implement the library, I borrowed function specifications from R6RS as well as possible to avoid unnecessary confusion. There were exception because COLA treat a slash "/" character as special for namespaces, I took PLT's function names in this case.

Writing lisp libraries is interesting puzzle to me because there were some requirements and constrain for the domain. Those requiments are:

  • Unit testing framework.
  • Library framework.
  • List manipulations.
  • String functions.
  • Bit operations and streams.
  • Pretty printer for debugging.

These requirements were carefully chosen. Because COLA has only modest debugging facility, the unit test framework must be there. So my first goal was to implement all functions needed by the unit testing. I needed a pretty printer for debugging, too.

Another "must have" library was bit operators, and file / in-memory streams that is needed to the assembler. Interestingly enough, R6RS doesn't define enough functions to support those. For example, there are no portable way to specify a stream to be binary or text. So I needed a bit creativity.

Eventually, I wrote all libraries and the compiler. And I got a pretty good sense about a minimun set of functions needed for compiler, which are testing framework, pretty printer, bit operators, and streams. In other words, if your language has those features, your language can be self-hosting.

The real puzzle part was the order. Those requirements must be precisely ordered by need. For example, the pretty printer must follow stream and string functions because the pretty printer uses those functions. Although you can write functions in random order as you like in Lisp, precise order makes testing and debugging is easy. I kept this discipline. I even implemented the test library twice, the first one was concise assert function, and the second one has more friendly fail message by the pretty printer.

It took a few weeks to build a simple compiler, but still there were long way up to the point where self-hosting can be done. One thing that I had learned from the stage was, even without debugger, debugging is not so hard if you have enough test cases and a good pretty printer.

No comments:

Post a Comment

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.