Table of Contents
- Intro, How I started the assembler
- How a lisp program is compiled to Tamarin VM
- Tamacola in Tamacola, Bootstrapping Macros, Eval
- How to make your own compiler
I have published the source code of Tamacola, a lisp compiler which runs on Adobe Flash / Tamarin VM (or Adobe Virtual Machine 2) http://github.com/propella/tamacola. I'm pretty sure that the current version is useless if you are just looking for a lisp implementation on Tamarin (las3r and scheme-abc are much better), but Tamacola includes abundant tips if you are interested in making a self-hosting compiler on Tamarin VM. That's why I decided to publish it as-is.
I'm also working on a presentation slide for S3 conference http://www.hpi.uni-potsdam.de/hirschfeld/s3/s3-10/ to show it. I'm writing random thoughts about the compiler here so that I will compile them to a thread of talk.
I've already written the motivation on the paper (perhaps I will paste the URL in a month) so I don't repeat it. But in short, I wanted make a tiny language which bootstraps and runs on Adobe Flash.
A tiny language and bootstrapping seem contradicting idea as bootstrapping requires various language functions which tends to be large. On the other hand, this is practically a nice constrain because it keeps the language from too simple or too fat. Choosing Scheme like language as a target is natural to me because I wanted to concentrate basic implementation technique instead of language design.
Well, as one reviewer of the paper said, this is not particularly surprising or dramatically different in comparison with previous systems in the area, but some of the stories from the compiler should interest you!
How I started the assembler
In the beginning I created the assembler. Honestly, I wanted to avoid the task because writing assembler seemed not quite an interesting job. But in that time, I couldn't find a nice AVM2 assembler that suite my project. So I've done it. In retrospect, this was not bad at all. I could understand what avm2overview.pdf (the AVM2 specification) said quite well, and I got self confidence.
I wrote my assembler in PLT-Scheme because Ian Piumarta's COLA (Tamacola was supposed to be written in COLA and Tamacola itself, I'll tell you this later) is not finished yet in that time and Duncan Mak, a friend of mine, recommend it. This was actually a good choice. This is my first Scheme application and PLT's good documentation helped me a lot.
An interesting part of PLT-Scheme was it encourages a functional programming style, even PLT doesn't suppport set-car! and set-cdr! in the default library. So it was natural that my assembler was written without side-effect except I/O. This is the first key of the development of the assembler. Unfortunately, because Tamarin doesn't support tail-recursion optimazion and Tamarin's stack size is small, I gave up to eliminate all side-effect later. But the implementation was pure functional up to the time, and it was quite clean.
Indeed, it had to be clean considering boot-strapping. I wanted to make the assembler run in my language itself even before enough debugging facility is not ready. If it were not clean, a tiny bug would cause a few days of debugging. I avoided the nightmare with a functional style and Test Driven Development.
Test Driven Development is the second key. I virtually wrote every test case for each function even if it looks silly. Scheme has a couple of options of testing frame work. I chose SRFI-78. It only report assertion failer only something happen, otherwise it keeps silence. I somewhat like this UNIX taste terse.
The third key was to write an assembler and a disassembler in a same time. It sounds like an unnecessary job because I only needed an assembler eventually. But I had to analyze an output from asc (an asembler in Adobe Flex) and learn how an ActionScript program was converted to the Tamarin byte-code. The disassembler was very helpful to read the byte-code as well as debugging. If output of the disassembler generates the original byte-code by the assembler, there is high chance that my imprementation is correct, unless my understanding is wrong.
The assembler is named ABCSX http://github.com/propella/abcsx and it was ported to Gauche, COLA, and Tamacola later. I ported it to Gauche because I was curious about portability of Scheme language.
I had realized there are many places where I could reduce code redundancy in the assembler. An assembler tends to include repetitive process, but some of them are not quite captured well by function abstraction. I would be effective to apply macro and domain specific language in those part. I didn't have tried to solve it yet, but I want to solve it later.
(to be continued)