website | twitter

Thursday, December 15, 2011

Various examples in Haskell's FRP.Reactive

After playing with Flapjax library in Javascript, I moved to Reactive to learn more about FRP. Because research on Functional Reactive Programming is most active in Haskell, I thought it would be better to do that. Reactive seems to be a nice library, but unfortunately I couldn't find many working code examples. So I show some of them as my exercise. To write this, I owe a maoe's great article in Japanese.

(This page has been translated into Spanish language by Maria Ramos from Webhostinghub.com.)

As I didn't have much time, I couldn't write a good explanation now. But still I hope it helps some people who learn Reactive like me. I used Haskell Platform 2010 (slightly old) and did cabal install reactive --enable-documentation to install Reactive.

The first example shows "Hello, World!" after three seconds. atTime generates a timer event, and <$> convert this event to IO action (\_ -> putStrLn "Hello, World!") which writes a string.

This is as same as above, but it makes events each second.

This makes running Fibonnaci numbers. You can use scanlE to process previous value and current value of the event in a function. In this case, (0, 1) is the initial value, and when an event occurs, the function \(n0, n1) _ -> (n1, n0 + n1) calculates next value, and the result (the first case is (1, 1)) is used as a next argument when a new event occurs.

It shows characters as you type. It looks difficult but you don't have to worry about run function. The important part is machine :: Event Char -> Event (IO ()) that convert a character input event to an IO action.

This example shows how to merge two events. onType is same as machine in the previous example, and onClock is same as helloMany.hs example. I used `mappend` to merge the two events

This shows a simple state machine. The function next defines the state machine, and mealy_ convert the definition to an event. zipE is another way to merge two events. Unlike mappend, you can see two values in the two events in a same time.

Tuesday, December 06, 2011

Flapjax vs Tangle

Functional Reactive Programming (FRP) is a framework to deal with time-varying data in a clean way. It is a combination of beauty of functional programming and dynamics of object oriented programming. The basic principle is easy enough as spreadsheets, however, its vague scope and arcane terminologies keep you from grasping it. It's not quite easy to answer the question such as what makes FRP different from Observer Pattern, Data Flow, etc ??. I think a good way to explain FRP is to compare FRP library against non-FRP library, and I could show you where FRP is special, and pros-and-cons of FRP.

I examined Flapjax as an example of FRP, and took Bred Victor's Tangle as the comparison target. Although Tangle has similar goal of FRP as he wrote "Tangle is a library for creating reactive documents", its implementation is quite different from Flapjax.

Flapjax
Side-effect is hidden inside the framework. Time-varying data is represented by dependent tree, and you can compose those trees to implement a complex behavior.
Tangle
Tangle provides a simple framework and UI widgets, but the data flow is represented by a normal imperative programming and assignments.

Because of those properties, I think comparing the two libraries is helpful to understand what FRP is. I hope it makes clear idea about FRP in your mind.

Simple Calorie Calculator in Tangle

This is the first example from the Tangle's documentation. You can modify the number of cookies by dragging, and it keeps calculating the calories as you change the value.

When you eat cookies, you will consume calories.

To make this nice reactive document. This document consists with two parts, HTML for the view and javascript for the model.

<p id="tangle"
  When you eat <span data-var="cookies" class="TKAdjustableNumber" data-min="2" data-max="100"> cookies</span>,
  you will consume <span data-var="calories"></span> calories.
</p>

The HTML part is straightforward, this is just a normal HTML except special attributes for Tangle. Data-var is used to connect HTML elements to Tangle object's properties. Class name TKAdjustableNumber makes a draggable input control. Data-min and data-max are its parameters.

var element = document.getElementById("tangle");

new Tangle(element, {
  initialize: function () {
    this.cookies = 4;
  },
  update: function () {
    this.calories = this.cookies * 50;
  }
});

The actual model of the document is described in the second argument of Tangle object's constructor (new Tangle). It consists with just two parts. initialize sets up the initial state, and update is invoked whenever you modify the input value. Tangle connects the model and the HTML element specified by getElementById("tangle").

This initialize-update structure is fairly common among end-user programming language like Processing and Arduino.

Simple Calorie Calculator in Flapjax

Let's move on to Flapjax. Unfortunately, Flapjax doesn't have a nice input widget as Tangle has. Instead, we use a traditional input field. But other than that, the behavior is identical.

When you eat cookies, you will consume calories.

As Tangle, the Flapjax version has HTML part and Javascript part. Note that Flapjax provides "Flapjax Syntax" which allows you to write a simpler notation, but we don't use it because I want to compare those as Javascript libraries.

<p id="flapjax" class="example">
  When you eat <input id="cookies" value="4" /> cookies,
  you will consume <span id="calories"></span> calories.
</p>

Flapjax's HTML part is similar as Tangle's. The element identifiers (cookies and calories) are given by id attributes. Unlike Tangle, the initial number of cookies is written in the input field.

var behavior = extractValueB("cookies");
var colories = behavior.liftB(function (n) { return n * 50; });
insertDomB(colories, "calories");

In Flapjax, time-varying data is called behavior. The goal of the program is to make a behavior which always calculates calories of the cookies. It's not so difficult than it seems. ExtractValueB creates a behavior from a form element, in this case, extractvalueB("cookies") tracks every changes happening in the input field named "cookies". This created behavior is processed by the function at the argument of liftB, in this case, whenever you modify "cookies" field, colories represents a value which is always 50 times by the number of cookies.

Eventually, insertDomB insert the content of colories where HTML element "calories" is and the calories are shown on the screen. This element is automatically updated.

Unlike Tangle, there is no side-effect in the program. One advantage of FRP is that you are not confused between old values and new values. In Tangle's example, this.cookies is old value (input) and this.calories is new value (output). But you are free to be mixed up those. In Flapjax, a new value is always the return value of a function, and there is no chance to be mistaken.

Implement Adjustable Number Widget in Flapjax

One of advantages of FRP is its composability. You can make a complicated behavior by combining simple behaviors (occasionally, imperative programming gives you a hard time for debugging if the bug involves with connected program modules with side-effects). To demonstrate this feature, I will show you how to make a Tangle-style draggable widget in Flapjax. This problem is particularly interesting because processing drag and drop involves a state machine, but a state machine is not quite fit with a functional programming style. So you might find pros and cons of FRP clearly from this example.

When you eat cookies, you will consume calories.

The HTML part is almost identical except adjustable class in the input field which points a Tangle like (but not fashionable enough) stylesheet.

<p id="flapjax-drag" class="example">
  When you eat <input id="cookies-drag" value="4" class="adjustable"/> cookies,
  you will consume <span id="calories-drag"></span> calories.
</p>

The main Javascript part is also similar as above. But in this time, we are implementing makeAdjustableNumber to make a draggable widget from the element named "cookies-drag".

var element = document.getElementById("cookies-drag");
var behavior = makeAdjustableNumber(element);
var colories = behavior.liftB(function (n) { return n * 50; });
insertDomB(colories, "calories-drag");

A drag gesture consists of three events, mousedown, mousemove, and mouseup. After a mousedown is detected, it has to track mousemove events to know how far you are dragging. You can make such a state machine to construct a higher order event stream. Here are two new concepts. An event stream is similar as behavior, but it is a stream of discrete events instead of continuous values. But you don't have to worry about that. It's just another object which has slightly different API. A higher order event stream is an event stream of event streams. This is used to make a stream which behavior is switched depends on the input.

This mouseDownMove makes a higher order event stream that tracks mousedown and mousemove. extractEventE(element,"mousedown") extracts mousedown event in the element. When the event signaled, the function inside the mapE is evaluated. MapE is similar as liftB but it is only for an event stream. Inside the function, extractEventE(document,"mousemove") find mousemove events and track the distance from mousedown. Note that I used document to find the event because occasionally you drag a mouse to outside the widget.

function mouseDownMove (element) {
  return extractEventE(element,"mousedown").mapE(function(md) {
    var initValue = parseInt(element.value);
    var offset = md.layerX;

    return extractEventE(document,"mousemove").mapE(function(mm) {
      var delta = mm.layerX - offset;
      return Math.max(1, Math.round(delta / 20 + initValue));
    });
  });
}

We need to handle mouseup event also. The mouseUp function returns a higher order event stream that find mouseUp event and the zeroE happily does nothing.

function mouseUp (element) {
  return extractEventE(document,"mouseup").mapE(function() {
    return zeroE();
  });
}

And these two event stream make by mouseDownMove and mouseUp are going to be merged by the mouseDownMoveUp function to complete a mousedown, mousemove, and mouseup cycle. MergeE is used to merge two events streams. We need one more step switchE to convert a higher order stream to a nomal stream, in this case, a stream of numbers (distance).

function mouseDownMoveUp(element) {
  var downMoveUp = mouseDownMove(element).mergeE(mouseUp(element));
  return downMoveUp.switchE();
}

Finally, we connect the event stream into an HTML element. Here I did slightly dirty work. Whenever a drag gesture happens, the element.value attribute is set. Probably using insertDomB to make an output element is cleaner way, but I chose this dirty way to make it simple. At the last line, the event stream is converted to a behavior object by startsWith. And that's how makeAdjustableNumber is implemented.

function makeAdjustableNumber (element) {
  var drag = mouseDownMoveUp(element);
  drag.mapE(function(n) { element.value = n; });
  return drag.startsWith(element.value);
}

Honestly, Flapjax doesn't seems to be too easy to use. But part of the reasons might be that I chose to show a plain Javascript syntax to introduce the mechanism. Flapjax also provides its own compiler which provides cleaner syntax. This Flapjax syntax should improve readability a lot. Anyway, I hope this short note helps you to grab a brief idea of Flapjax and FRP.

References

Thursday, September 22, 2011

Yet Another "Alligator Eggs!" Animation

Bret Victor came to our office yesterday, and we had a great chat. He is a great thinker and has a beautiful sense about visualizing abstract ideas. I really like his works. I want to learn his idea more, but as a starter, I tried to implement his early famous Alligator Eggs! game. This game was made to teach about lambda calculus to eight years old kids. But it's even more fun to adult hackers!

Alligator and an egg : λx.x

This is a green alligator and her egg. This family shows a lambda expression λx.x (because I know you are not an eight years old, I use formulas without hesitation!). There is a no animation as there is nothing to eat.

An alligator eats an egg : (λx.x) y

But things are getting fun when there is something to eat before the alligator mother. In this case, a blue egg. If you click on the diagram, you see what's happening (I only tested Chrome, Safari, and Firefox). The alligator eats the poor blue egg. But price for the sacrifice is too high. The mother will die, and we will see the new baby.

And then, things are getting curiouser. The new baby doesn't look like the mother at all, rather it is like a blue egg, the victim of the slaughter. What's a amazing nature of the lambda land!

Take first : (λx.λy. x) a b

This is slightly a hard example. There are two alligators "x" and "y", and two victim eggs "a" and "b" on the right side. If there are more than two things next to an alligator, the alligator eats left one first (it is called as left associative in jargon). Can you guess what does happen after the meal? Alligator "x" eats egg "a", and alligator "y" eats egg "b". And only egg "a" survives (because it transmigrates through the green "x" egg).

You can think that this alligator family (λx.λy. x) eats two things and leave the first one. In a same way, can you think of an alligator family which eats two things and leave the second one? Here is the answer.

Old alligator : (λx.x) ((λy.y) (λz.z))

There are a few things to know more. Old alligators are not hungry. But they keep guarding their family while they guard more than one things. They behave like parenthesis in a lambda expression.

Color rule : (λx.λy.x) (λy.y)

This rule is the most tricky one. There are two blue alligators "y" at left and right, but those two are not in a same family. The only mother of the blue egg "y" is the right one. It gets trickier when the family is eaten by the green alligator because the blue family is reborn at the green egg is, where is bottom of another blue alligator. To make them different, the right blue family change the name and color to "y1" and orange.

Omega (Mockingbird hears the Mockingbird song) : (λx.x x) (λx.x x)

By these rules, you can make various kinds of alligator ecosystem. This is my favorite one. (λx.x x) is called a "Mockingbird" or, rather we should call it Mockingalligator. It doubles its prey twice. So what happens if a mockingalligator eats a mockingalligator? The result is called one of omegas, an infinite loop. They are eating forever. To stop the endless violence, please click the diagram again. But please do not to click three times! Because of my bug, something wrong will be happening.

Y combinator : λg.(λx.g (x x)) (λx.g (x x))

This is dangerous but beautiful one. The omega ecosystem above kills each other but it doesn't make any, but this Y combinator is very fertile. It produce many, so you have to watch it carefully, otherwise it consumes all the CPU power you have eventually!!

3 + 4 : (λa.λb.λs.λz.(a s (b s z))) (λs.λz.(s (s (s z)))) (λs.λz.(s (s (s (s z)))))

Actually, alligators also can do serious jobs. If you design carefully, you can teach them how to calculate 3 + 4! In this example, the middle family represents three and the right family represents four (count green eggs). And the result is a family with seven green eggs! This is called Church numbers (I don't have a time to explain the theory, so please read the link).

I only introduced very few alligator families. If you want play it, visit http://metatoys.org/alligator/ and design by your self. You can also download from http://github.com/propella/AlligatorEggs. The source code is messy because I haven't written javascript recently, but I'll clean it up soon.

Saturday, July 09, 2011

A hidden story behind the EToys Castle

http://metatoys.org/demonCastle/

Demon Castle Demon Castle Demon Castle Demon Castle Demon Castle Demon Castle

If you have played with Etoys, you might have seen The Etoys Castle (or The Demon Castle) tutorial. But you would never know how the story ends, because the Etoys distribution only includes the first chapter, and the last slide shows "To Be Continued ...". However, there are actually the hidden sequels, and the story has a happy ending.

When I first wrote the story in 2006, there were three chapters. The first chapter was about learning "handles", the second one was about the painter, and the third one was about scripting. But due to some technical issues, I gave up to publish them. Today, I happened to clean up my hard drive and I found old files. It's shame that I have never published rest of them. So I gathered the screen shots and made up one page html.

Friday, September 24, 2010

Tamacola (5)

Tamacola is not just another LISP language, it is designed as a meta-language to make a new language. I'll explain this feature today. Today's goal is to design a subset of lisp language. If you think that a lisp is too simple to keep your passion, sorry, be patient, simple thing first.

Prepare your Tamacola environment

To setup Tamacola environment, you need to download both Tamacola distribution and Tamarin VM. Those are available on http://www.vpri.org/vp_wiki/index.php/Tamacola. You need add the PATH environment variable to find the avmshell command, and also it would be useful to set the PATH to bin/ in the tamacola tree. To make sure Tamacola works, plese type:
make run-example
It runs all of the examples in the Tamacola distribution as well as recompile the compiler. If you don't find any error, you are ready to go. Otherwise, please let me know the problem.

Tamacola command

Tamacola command read a tamacola program and run immediately. If you want to make a Flash contents, another command tamacc (Tamacola Compiler) is more suitable. Now we are playing with an interactive shell of tamacola command, so I'll give you a brief explanation. The interactive shell starts with minus (-) option. Let's try a simple arithmetic. If you didn't setup PATH environment, please specify the directory name, too.
$ tamacola -
Cola/Tamarin
> (+ 3 4)
7
You can also give Tamacola source files as well as compiled binary names. Typically, source code ends with .k, and a binary ends with .abc. Tamacola is smart enough to detect newer file between .k and .abc.

Match against a string constant

Suppose you are on some working directory, and you have already set PATH environment to the bin/ directory. And then, we are going to write a very simple language, greeting:

;; greeting.g - A simple PEG example

greeting = "morning" -> "Good Morning!"
         | "evening" -> "Good Evening!"

This stupid example answers "Good Morning!" if you say "morning", and it answers "Good Evening!" if you say "evening". This PEG syntax is easy to understand. The right hand side of = is a rule name. A rule name is translated as a function once it is built. -> means an action rule, where if the left hand is matched the right hand side is returned. | is an Ordered options. In this case, the parser tries the first case "morning", and tries the second case "evening" only if the first case fails.

Save this syntax named "greeting.g". To test this language, type those commands:

$ mkpeg greeting.g
$ tamacola greeting.k -
compiling:  greeting.k
Cola/Tamarin
> (parse-collection greeting "morning")
"Good Morning!"
> (parse-collection greeting "evening tokyo")
"Good Evening!"

Mkpeg command converts grammar file (greeting.g) to tamacola source (greeting.k), a rule "greeting" the result can be read by tamacola shell. Greeting.k is built on the fly and the command prompt is shown.

Parse-collection's first argument is a parser name (in this case "greeting"), and the second is a input collection. As the name implies, it accepts any collection as the input stream.

The second case shows an interesting property of PEG syntax. Although the second rule matches the beginning part of the input "evening tokyo", still the input remains more string " tokyo". PEG doesn't care if the input is competely consumed or not. If you really want to make sure that the entire input is matched, you need to explicitly tell the Parser the point where end of the file.

Number parser

The last example only matched a predefined constant, but we make a parser for any integer number here.

;; number.g -- A number parser

digit   = [0123456789]
number  = digit+

We also convert the grammar specification into the tamacola program, but in this case, we give -n option to tell the namespace. A namespace is useful when you want to use a common name as a rule name like "number". Because "number" is already used in the system, you can not use it without namespace.

The grammar itself is easy to understand if you have an experience with regular expressoins. Brackets ([]) matches one of characters inside, and postfixed plus (+) repeats previous expression with one-or-many times.

$ mkpeg -n number number.g 
$ tamacola number.k -
compiling:  number.k
Cola/Tamarin
> (parse-collection number/number "xyz")
FAIL
> (parse-collection number/number "345")
{token-group:
(53 52 51)}

Because we use the namespace "number", we need specify the namespace before slash(/) in the function name.

As you might notice, this parser correctly rejects a non-number like "xyz", and accepts "345". But the result is not so useful. The return value of plus is a special object named "token-group", but we would want a number represented by the string, instead. So we put a conversion function to get the value.

number  = digit+:n      -> (string->number (->string n))
$ tamacola number.k -
compiling:  number.k
Cola/Tamarin
> (parse-collection number/number "345")
345

Now parser returns a number conveniently. Perhaps you might think that it is somewhat cheating. As the string->number function itself is a kind of number parser, we should have write a number parser without string->number! Yes we could. But it leads more interesting topic about left and right recursion, so I leave it for later.

S-expression parser

Now we are going to write a parser for almost real S-expression. This parser can only handle number and list, but it is useful enough to explain the essence of Tamacola.

;; sexp.g
;; Lexical Parser

spaces  = [ \t\r\n]*

digit   = [0123456789]
number  = digit+ :n spaces              -> (string->number (->string n))

char    = [+-*/abcdefghijklmnopqrstuvwxyz]
symbol  = char+ :s spaces               -> (intern (->string s))
        
sexp    = symbol
        | number
        | "(" sexp*:e ")"               -> (->list e)

In this grammar, only new operator is the postfix star (*) which repeats zero-or-many times. Rest is straightforward. To test this grammar, we use Tamacola's simple test framework. Writing test case is better than the interactive shell, because you don't have to type same expression many times.

;; sexp-test.k

(check (parse-collection sexp/spaces "    ")            => 'SPACES)
(check (parse-collection sexp/digit "0")                => 48)
(check (parse-collection sexp/number "345")             => 345)
(check (parse-collection sexp/char "a")                 => 97)
(check (parse-collection sexp/symbol "hello")           => 'hello)

(check (parse-collection sexp/sexp "345")               => 345)
(check (parse-collection sexp/sexp "hello")             => 'hello)
(check (parse-collection sexp/sexp "(hello world)")     => '(hello world))
(check (parse-collection sexp/sexp "(3 4)")             => '(3 4))
(check (parse-collection sexp/sexp "(print 4)")         => '(print 4))

The check function comes from SRFI-78. This function complains only if the left hand value and the right hand value differ. Otherwise, does nothing. I like this UNIX stile conciseness.

As a convention, a test program is added a postfix "-test" with the main program's name. I borrowed this custom from Go language.

Make sure this program do nothing.

$ tamacola sexp.k sexp-test.k 

Lisp Compiler

The PEG parser can handle any list structure as well as string. It allows you to write compiler in PEG. In a string parser, the input is a string and the output is some object (a list in our case), but in a compiler, the input is a lisp program and the output is a assembler code.

;; Compiler

arity   = .*:x                          -> (length (->list x))
insts   = inst* :xs                     -> (concatenate (->list xs)) 
                                        
inst    = is-number:x                   -> `((pushint ,x))
        | is-symbol:x                   -> `((getlex ((ns "") ,(symbol->string x))))
        | '( '+ inst:x inst:y )         -> `(,@x ,@y (add))
        | '( '- inst:x inst:y )         -> `(,@x ,@y (subtract))
        | '( '* inst:x inst:y )         -> `(,@x ,@y (multiply))
        | '( '/ inst:x inst:y )         -> `(,@x ,@y (divide))
        | '( inst:f &arity:n insts:a )  -> `(,@f (pushnull) ,@a (call ,n))

There are some new elements in the grammar. Quoted list '( ) matches a list structure, and a quoted symbol matches a symbol.

A prefix ampersand (&) prevents to consume the stream even if the rule matches. For example, &arity rule examine the rest of the list, but the contents are matched again by the insts rule later.

Is-number is matched against number, and is-symbol is for a symbol. Those rule can not be described as PEG grammar, but as a lisp function.

(define is-number
  (lambda (*stream* *parser*)
    (if (number? (peek *stream*))
        (begin (set-parser-result *parser* (next *stream*))
               #t)
        #f)))

(define is-symbol
  (lambda (*stream* *parser*)
    (if (symbol? (peek *stream*))
        (begin (set-parser-result *parser* (next *stream*))
               #t)
        #f)))

A rule is a function which receives the stream and the parser (an object which store the result). The rule function returns #t if it matches, and #f if it fails.

I think it is easier to see the test code than read my explanation.


(check (parse-collection sexp/arity '(a b c))   => 3)

(check (parse-collection sexp/insts '(3 4)      => '((pushint 3)
                                                     (pushint 4)))

(check (parse-collection sexp/inst '(3))        => '((pushint 3)))

(check (parse-collection sexp/inst '((+ 3 4)))  => '((pushint 3)
                                                     (pushint 4)
                                                     (add)))

(check (parse-collection sexp/inst '((f 3 4)))  => '((getlex ((ns "") "f"))
                                                     (pushnull)
                                                     (pushint 3)
                                                     (pushint 4)
                                                     (call 2)))

Put it in an envelope

We still need a little bit to construct a real assembler code. This detail topic is out of the context, so I simply show the code.

program = inst:x  -> `(asm
                       (method (((signature
                                  ((return_type *) (param_type ()) (name "program")
                                   (flags 0) (options ()) (param_names ())))
                                 (code ((getlocal 0)
                                        (pushscope)
                                        ,@x
                                       (returnvalue))))))
                      (script (((init (method 0)) (trait ())))))

And the test case.

(check (parse-collection sexp/program '((print 42)))
       => '(asm
            (method
             (((signature ((return_type *) (param_type ()) (name "program")
                           (flags 0) (options ()) (param_names ())))
               (code ((getlocal 0)
                      (pushscope)
                      (getlex ((ns "") "print"))
                      (pushnull)
                      (pushint 42)
                      (call 1)
                      (returnvalue))))))
            (script (((init (method 0)) (trait ()))))))

You can read the entire program in example/sexp.g in the Tamacola distribution. To try the program, please enter:

make -C example test-sexp

Left recursion

We left an interesting topic about left and right recursion. Let me show you our number parser again.

digit   = [0123456789]
number  = digit+:n               -> (string->number (->string n))

If we don't want to use string->number function, I would write the parser as:

;; Use fold-left
digit1   = [0123456789]:d        -> (- d 48)
number1  = digit1:x digit1*:xs   -> (fold-left
                                      (lambda (n d) (+ (* n 10) d))
                                      x
                                      (->list xs))

Digit1 rule converts the ascii value of the the digit character, and number1 rule construct a decimal number. As you see, you need to use fold-left function to construct a number because a number notation is essentially left recursion. For example, a number 34567 actually means:

(((3 * 10 + 4) * 10 + 5) * 10 + 6) * 10 + 7

However, PEG parser doesn't parse left recursion grammar in general. So I had to reconstruct the left recursion structure by fold-left. This is not hard at all if you familiar with functional programming. In functional programming, a list is considered as a right recursive data structure and it is even natural that a list is parsed by a right recursive way. However, I admit that it looks awkward for some people.

Yoshiki Ohshima provides a very useful extension to support a direct left recursion. To use his extension, the number parser is written as:

;; Use left-recursion

digit2   = [0123456789]:d        -> (- d 48)
number2  = number2:n digit2:d    -> (+ (* n 10) d)
         | digit2
number2s = number2

You need to load runtime/peg-memo-macro.k to use this extension.

$ tamacola ../runtime/peg-memo-macro.k number.k -
Cola/Tamarin
> (parse-collection number/number2s "345")
345

The real parser and compiler are bigger than presented grammars here, but I explained all of the essential ideas. I hope it helps you to make your own language!

Wednesday, September 15, 2010

Tamacola (4)

Tamacola in Tamacola

After I made the Tamacola compiler written in COLA, next thing to do was to implement it in Tamacola itself. A language is called self-hosting if the language is written in the language itself. This implies various advantage.

First, once self-hosting is done, you don't need to use COLA anymore, you can improve or modify any language aspects on Tamarin VM. If I carefully design the environment, it would be possible to do language design only on the Web browser (it needs server side help for security reason, so it hasn't done yet).

Second, self hosting is a benchmark for the language to tell that it is good enough. Scheme is especially simple language, so there are a lot of people who implement toy-Scheme. But because my Tamacola is now self-hosting, I could proudly claim that this is not a toy! Well, this is rather self satisfaction, though.

Third, it provides a rich library including "eval" function. A compiler uses various programming techniques, and those must be useful for other programs, too.

To make it self-hosting, there were two key problem which are macros and eval.

Bootstrapping Macros

I heavily used macros in my compiler, for example, the parser written in PEG was converted a bunch of macro expressions. The problem is, expanding macros requires eval function but I wasn't able to make eval before the parser was done. It's a deadlock! Here is a typical macro written in COLA:

(define-form begin e (cons 'let (cons '() e)))
This is how the macro works. When the compiler find a expression like:
(begin
  (print "Hello")
  (print "World"))
Expressions inside begin is bound to e, the body (cons 'let (cons '() e)) is executed in compile time and the expression is expanded to:
(let ()
  (print "Hello")
  (print "World"))

Such expansion is impossible without eval function because the compiler need to evaluate a list (cons 'let (cons '() e)) given by user. What I would do when I didn't have eval yet. But I realized that macros only include basic list functions like car, cdr, and cons in many cases. And a more complicated macro could be hard corded as a special form in the compiler. So I invented a pattern base macros.

(define-pattern ((begin . e) (let () . e)))

Basically this is a subset of Scheme's syntax-rule. If the compiler finds an expression starting with begin, rest of the expression is bound to e and substituted as a right hand side. Those expansion requires only limited set of list functions, so the compiler doesn't have to provide full eval function. This macro syntax made my compiler readable, and I was able to continue happily.

Even after I implemented more dynamic traditional macro with eval function, I keep using this pattern base macros mainly.

Eval

To implement eval function, you need to understand the dynamic code loading facility provided by the VM. Note that this is not part of AVM2 specification, and Avmshell (a console Tamarin shell program) and Adobe Flash have different API.

Avmshell has straightforward API. You give compiled byte code, and the function returns the value. Because Tamacola is now written in Tamacola, you can invoke the compiler as a library function and get byte code you want to execute.

avmplus.Domain.loadBytes(byteArray:ByteArray)

You can get the domain object by Domain.currentDomain() static method. Those useful functions in Avmshell are found shell/ directory in the Tamarin-central repository.

Flash Player has somewhat tricky API for dynamic code loading. The signature is normal.

flash.display.Loader.loadBytes(bytes:ByteArray, context:LoaderContext = null):void

There are two problems for our purpose. First, this method is not designed mainly for dynamic loading, it only accepts SWF, JPG, PNG, or GIF files, and byte code happen to be accepted inside a SWF file. So I had to construct SWF file to load code. In case if you don't know about SWF file, SWF file is a kind of container format. You can embedded vector graphics, mp3 sounds, and ActionScript byte code. Making a SWF file is not particularly difficult though, it needs nasty bit fiddling.

Second, this is far more problematic, is that this method works as asynchronously. In other words, this doesn't return the result value. Instead, you need to give it a callback function to wait to finish the code. Additionally, this method doesn't return value at all, so if you want the return value, you need to setup some explicit return mechanism by yourself.

Practically, this cause a problem if you want to write a traditional macro definition and use the macro in a same source code. Because a traditional macro need to evaluate a lisp expression in a compile time, but the eval function doesn't return before the compilation thread is done. I could solve the problem by setting up compilation queue or something, but it would cost performance penalty which I don't want. And now I simply gave up.

I have explained pretty much all interesting aspect of the self hosting compiler. I'll talk about how to make a new language on the Tamacola environment later.

Tuesday, September 14, 2010

Tamacola (3)

How a lisp program is compiled to Tamarin VM

Now I'm going to talk a bit about how a lisp (almost Scheme) program is compiled into Tamarin's byte code. This topic is especially interesting if you are curious to make your own language or VM.

Tamarin VM is made for ActionScript, so its byte code is also specifically designed for ActionScript. In other words, it is a slightly tricky to implement other language than ActionScript. In case if you don't know about ActionScript, it is almost identical as JavaScript in the execution model. Difference between them is about optimization with explicit type notion and static field.

ActionScript and Scheme are common for those aspects:

  • Lexical scope.
  • Function object.
  • Variable arguments with no curring.
  • Dynamic typing (a value has type, not variable).

But there are significant difference.

  • ActionScript doesn't have a simple function call. Everything is a method call.
  • In ActionScript, a function has a scope. No scope block or let expression.
  • Tail call optimization is not supported.
  • Call stack can not be accessed.

Those limitations sound like that Tamarin VM is inferior. But no, actually those limitations come from Tamarin VM's advantage and optimization. If you happen to have a chance to design your VM, please learn from the lesson. There ain't no such thing as a free optimization. Any optimization kills some generality. I'll explain each case.

ActionScript doesn't have a simple function call neither Tamarin VM. This is rather harmless though. When you see a function like expression like trace("hello"), this is actually mean (the global object).trace("hello"), and eventually, the receiver passes to the function as the first argument. In other words, if you want to construct a function call with two arguments, you need to make three arguments where the first argument is "this" object. A slightly tricky part is primitive operators like + or -, which don't have "this" object. Those primitives are special case.

ActionScript also has lexical scope, but only a function has a scope. So I have to be careful when I compile let expression in Scheme. Most simplest way to implement a let expression is to use a function. A let expression can be always translated to a lambda in theory though, this is a huge performance disadvantage. So I use "with" expression in ActionScript. "With" expression is an unpopular syntax in ActionScript, but you can use any object as a scope object. I borrowed this idea from Happy-ABC project http://github.com/mzp/scheme-abc.

Lack of the tail call optimization in Tamarin VM was the most disappointed thing to me. It prevents a functional programming style. I simply gave up it. Tail call optimization is not difficult topic at all. If the target were a native code like x86, it would be a matter of swapping stack and jump. But Tamarin VM doesn't allow direct access of stack or jump to other function. I understand that it might cause a security issue though, it would be wonderful if VM would provide special byte code for tail call.

Finally, you can't access the call stack directly, therefore you can't implement call/cc. The reason why I can't call Tamacola as Scheme is the lack of tail call optimization and call/cc. It prevents many experimental language features like generator, process, or so. But considering rich libraries provided by the Flash API, I would say Tamacola will be a reasonably useful language eventually.

I'll tell you convolved self hosting process and macros tomorrow.

Tamacola (2)

COLA

Once the assembler was done, I was able to test verious Tamarin VM's features, even I wrote a tiny GUI application on Adobe Flash in the assembler. Then next step is the compiler.

Another goal of the project was to port Ian Piumarta's COLA framework to Tamarin (the project name came from this). And perhaps this is only the real technical contribution of the project. COLA is a meta language (a programming language to design another language) which resembles Scheme. COLA has a nice sub-language called Parser Expression Grammar that makes parser very terse. My plan was to write a boot-strappng compiler in PEG and COLA, then to implement COLA library, and to write the real compiler in PEG and Tamacola itself.

I won't give you the detail of PEG. But briefly, it is as simple as a regular expression and as powerful as a context free grammar.

When that time I started writing the compiler, COLA has no library at all except PEG framework, so I needed to write necessary libraries by myself from scratch. Fortunately COLA has quite a powerful external function call feature (a kind of FFI), macro sysytem, and a flexible object oriented framework. So writing library is not so hard. But I tried not to use COLA specific features as possible because it would be a problem when I rewrite the compiler in Tamacola itself later.

To implement the library, I borrowed function specifications from R6RS as well as possible to avoid unnecessary confusion. There were exception because COLA treat a slash "/" character as special for namespaces, I took PLT's function names in this case.

Writing lisp libraries is interesting puzzle to me because there were some requirements and constrain for the domain. Those requiments are:

  • Unit testing framework.
  • Library framework.
  • List manipulations.
  • String functions.
  • Bit operations and streams.
  • Pretty printer for debugging.

These requirements were carefully chosen. Because COLA has only modest debugging facility, the unit test framework must be there. So my first goal was to implement all functions needed by the unit testing. I needed a pretty printer for debugging, too.

Another "must have" library was bit operators, and file / in-memory streams that is needed to the assembler. Interestingly enough, R6RS doesn't define enough functions to support those. For example, there are no portable way to specify a stream to be binary or text. So I needed a bit creativity.

Eventually, I wrote all libraries and the compiler. And I got a pretty good sense about a minimun set of functions needed for compiler, which are testing framework, pretty printer, bit operators, and streams. In other words, if your language has those features, your language can be self-hosting.

The real puzzle part was the order. Those requirements must be precisely ordered by need. For example, the pretty printer must follow stream and string functions because the pretty printer uses those functions. Although you can write functions in random order as you like in Lisp, precise order makes testing and debugging is easy. I kept this discipline. I even implemented the test library twice, the first one was concise assert function, and the second one has more friendly fail message by the pretty printer.

It took a few weeks to build a simple compiler, but still there were long way up to the point where self-hosting can be done. One thing that I had learned from the stage was, even without debugger, debugging is not so hard if you have enough test cases and a good pretty printer.

 
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.