website | twitter

Wednesday, September 15, 2010

Tamacola (4)

Tamacola in Tamacola

After I made the Tamacola compiler written in COLA, next thing to do was to implement it in Tamacola itself. A language is called self-hosting if the language is written in the language itself. This implies various advantage.

First, once self-hosting is done, you don't need to use COLA anymore, you can improve or modify any language aspects on Tamarin VM. If I carefully design the environment, it would be possible to do language design only on the Web browser (it needs server side help for security reason, so it hasn't done yet).

Second, self hosting is a benchmark for the language to tell that it is good enough. Scheme is especially simple language, so there are a lot of people who implement toy-Scheme. But because my Tamacola is now self-hosting, I could proudly claim that this is not a toy! Well, this is rather self satisfaction, though.

Third, it provides a rich library including "eval" function. A compiler uses various programming techniques, and those must be useful for other programs, too.

To make it self-hosting, there were two key problem which are macros and eval.

Bootstrapping Macros

I heavily used macros in my compiler, for example, the parser written in PEG was converted a bunch of macro expressions. The problem is, expanding macros requires eval function but I wasn't able to make eval before the parser was done. It's a deadlock! Here is a typical macro written in COLA:

(define-form begin e (cons 'let (cons '() e)))
This is how the macro works. When the compiler find a expression like:
(begin
  (print "Hello")
  (print "World"))
Expressions inside begin is bound to e, the body (cons 'let (cons '() e)) is executed in compile time and the expression is expanded to:
(let ()
  (print "Hello")
  (print "World"))

Such expansion is impossible without eval function because the compiler need to evaluate a list (cons 'let (cons '() e)) given by user. What I would do when I didn't have eval yet. But I realized that macros only include basic list functions like car, cdr, and cons in many cases. And a more complicated macro could be hard corded as a special form in the compiler. So I invented a pattern base macros.

(define-pattern ((begin . e) (let () . e)))

Basically this is a subset of Scheme's syntax-rule. If the compiler finds an expression starting with begin, rest of the expression is bound to e and substituted as a right hand side. Those expansion requires only limited set of list functions, so the compiler doesn't have to provide full eval function. This macro syntax made my compiler readable, and I was able to continue happily.

Even after I implemented more dynamic traditional macro with eval function, I keep using this pattern base macros mainly.

Eval

To implement eval function, you need to understand the dynamic code loading facility provided by the VM. Note that this is not part of AVM2 specification, and Avmshell (a console Tamarin shell program) and Adobe Flash have different API.

Avmshell has straightforward API. You give compiled byte code, and the function returns the value. Because Tamacola is now written in Tamacola, you can invoke the compiler as a library function and get byte code you want to execute.

avmplus.Domain.loadBytes(byteArray:ByteArray)

You can get the domain object by Domain.currentDomain() static method. Those useful functions in Avmshell are found shell/ directory in the Tamarin-central repository.

Flash Player has somewhat tricky API for dynamic code loading. The signature is normal.

flash.display.Loader.loadBytes(bytes:ByteArray, context:LoaderContext = null):void

There are two problems for our purpose. First, this method is not designed mainly for dynamic loading, it only accepts SWF, JPG, PNG, or GIF files, and byte code happen to be accepted inside a SWF file. So I had to construct SWF file to load code. In case if you don't know about SWF file, SWF file is a kind of container format. You can embedded vector graphics, mp3 sounds, and ActionScript byte code. Making a SWF file is not particularly difficult though, it needs nasty bit fiddling.

Second, this is far more problematic, is that this method works as asynchronously. In other words, this doesn't return the result value. Instead, you need to give it a callback function to wait to finish the code. Additionally, this method doesn't return value at all, so if you want the return value, you need to setup some explicit return mechanism by yourself.

Practically, this cause a problem if you want to write a traditional macro definition and use the macro in a same source code. Because a traditional macro need to evaluate a lisp expression in a compile time, but the eval function doesn't return before the compilation thread is done. I could solve the problem by setting up compilation queue or something, but it would cost performance penalty which I don't want. And now I simply gave up.

I have explained pretty much all interesting aspect of the self hosting compiler. I'll talk about how to make a new language on the Tamacola environment later.

No comments:

Post a Comment

 
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.