website | twitter

Friday, June 12, 2009

Lazy list and Data flow in Squeak

Introduction

I have been working on lazy list these days. A lazy list is a collection where each element is evaluated only when it's necessary. Lazy lists are very useful to express mathematical infinite sequence naturally like all natural numbers, entire prime numbers, or so.

Lazy lists are also practical when you deal with a large sequence of data. Imagine that you want to replace particular words in a large text, and print out just the first 100 characters. Without lazy list, you must either, 1) replace all words and take first 100 characters, or 2) count character position carefully each time when a word is replaced. 1) requires time, and 2) tend to lost modularity. With lazy lists, you can write such program as simple as former way, but performance is still reasonable.

Non-strict functional language like Haskell supports this feature in everywhere, but I was also interested in implementations of lazy lists in "normal" languages. Even though its long history, lazy lists have not been widely used. Lazy lists are sometimes called as "Stream", and indeed, Unix's pipe can be seen as a kind of lazy list. But it lacks generality. I wondered if I can find some traps to be solved when I implement it myself. Squeak is a powerful tool for the purpose because it has flexible runtime debugging support. And I believe most part of difficulty in the concept comes from difficulty of its runtime behavior.

SICP is a good tutorial and I found further discussions in SRFI-40 and SRFI-41. Ruby has good implementation similar to SRFI-41. And even Squeak has two implementations (LazyList and LazyCollections)! And here is my story.

Natural numbers

You can download a demo image from LazyList.zip or current source code from Collections-Lazy-tak.5.mcz. This source only uses basic functions in my home brewed image, but please let me know if it doesn't work in your Squeak image.

This is an expression to make natural numbers with LazyList.

list := LazyList nat.

I took advantage of Squeak tools, inspector and explore. If you inspect the variable "list" (select "inspect it" in the context menu), you will see internal structure of the list. And if you explore the list (select "explore it"), you see public "interface" of the list. I'll show you what I meant.

LazyList1

And when you open a list at the explorer (the right window), values of public accessors head and tail are shown, and also the inspector is updated. The first element 1 is made only because someone send head message. And you can get further numbers with clicking tail accessor. Note that while explorer triggers the list, inspector is inert so that you can freely observe internal behavior of the list. LazyList2

You can do this by a program. Message head answers the first element of a list, and tail answers following list. This is same as CAR and CDR pair in lisp. Many other methods are defined e.g. take: returns a LazyList including first n elements, and contents convert LazyList to Array.

list head. "=> returns 1"
list tail. "=> returns LazyList .."

(list take: 10) contents. "=> #(1 2 3 4 5 6 7 8 9 10)"

And then, try next expression. You might know that select: is a message where a selected collection specified by a block as a condition is returned (this is "filter" in other language). But how possibly all natural numbers can be used with select:? Unlike other collection methods, it doesn't evaluate a condition block immediately. Instead, this method returns a new list which will be behave as a filter of natural numbers.

list select: [:e | e odd].
LazyList3

Define a new LazyList

Most useful way to construct a new lazy list would be using followedBy: method. The receiver becomes head of new list, and the argument becomes tail. You can use it with any receiver, and the argument must be a LazyList or block that returns a LazyList. LazyList nil is used to specify end of the list.
1 followedBy: (2 followedBy: (3 followedBy: LazyList nil))
"=> same as LazyList withAll: #(1 2 3)"
You can define natural numbers using followedBy: like this.
nat := 1 followedBy: [nat collect: [:e | e + 1]].
It seems very strange! But please be relax. The first element of the list is obviously 1. How about the second? collect: is sometimes called as map function in other languages, and nat collect: [:e | e + 1] answers a LazyList with all element is just larger by 1 than nat itself. Therefore the sequence is generated by recursive fashion. How clever!
nat                                           = 1, ...
nat collect: [:e | e + 1]                     = 2, ...
nat = 1 followedBy: nat collect: [:e | e + 1] = 1, 2, ...
nat collect: [:e | e + 1]                     = 2, 3, ...
nat = 1 followedBy: nat collect: [:e | e + 1] = 1, 2, 3, ...
nat collect: [:e | e + 1]                     = 2, 3, 4, ...
nat = 1 followedBy: nat collect: [:e | e + 1] = 1, 2, 3, 4, ...
Squeak Smalltalk has some great heritage from APL. Many arithmetic functions are defined in collections. So does LazyList. There is a much simple definition of natural number.
nat := 1 followedBy: [nat + 1].
It reminds me an ugly assignment expression with side effect.
x := x + 1.

This is ugly because it looks like an equation but it is not. x is never equal to x + 1! Sometimes it confuses a beginner programmer. Still, it is too useful to ignore. We need some reasoning of such side effect to use safely. On the other hand, nat := 1 followedBy: [nat + 1] is similar enough to x := x + 1 yet, it is actually an equation. This is a significant coincidence, isn't it?

Iteration, Data flow, Lazy list, and Polymorphic functions

One of the problems in x := x + 1 is that value of x depends on the position in the source code. So you have to be careful about order of each expression. Are there any better ways? You can imagine more mathematical programming language to allow you to define a function as a equation.

x1 = 1 --- some initial number
xn = xn - 1 + 1

In this equations, it is clear that right hand side of x is older than left hand side of x. This is pure and easy to reason. But another problem occurs. In practice, the program requires whole history of xs because now a programmer can specify any old value like xn - 100.

So I would add a restriction. What if only addition is allowed in subscript. In other words, what if only "future" value can be referred but "past" so that we can forget our unnecessary past conveniently (it is possible with nice GC).

x1 = 1 --- some initial number
xn + 1 = xn + 1
It makes the relationship between an iteration x := x + 1 and a lazy list more clear. This iteration can be converted a lazy list definition directly.
nat := 1 followedBy: [nat + 1].
      ~~~            ~~~~~~~~~
    x1 = 1;        xn + 1 = x + 1

I know this discussion is still rough. You can find better discussion from Lucid. However, my theory is that data flow structures (mathematical iteration) can be expressed by lazy lists and polymorphic methods without special language.

Here is another example of definition of a lazy list in iterative way to compute Fibonnaci numbers. next is a synonym of tail.

fib := 1 followedBy: [1 followedBy: [fib + fib next]].

This corresponds next mathematical equations.

fib1 = 1
fib2 = 2
fibn + 2 = fibn + fibn + 1

Simulations

As a little bit more realistic example, I will show you a gravity simulation using lazy lists. In this demo, a global variable Y shows height of an object, the first position is 400, and it is increased by Speed. The initial value of Speed is 0 and the value is decreased by 1 while the hight Y is positive. If Y is negative, Speed is negated which means the object is bounced by the ground!

Y := 400 followedBy: [Y + Speed next].

Speed := 0 followedBy: [Y > 0 whenTrue: Speed - 1
                             whenFalse: Speed negated].

LazyList

Morphic part of this demo used a special morph named ListPointer that worked as an interface between LazyList and Morphs.

I began this library as a replacement of Stream and OrderedCollection. But it turned out that combination of polymorphic methods and LazyList has unexpected cute property as data flow. I'm very happy about the fact! If you are interested in rather practical aspect of lazy lists, please check out List >> show method in this library. printString for LazyList is implemented this framework itself.

I already noticed that there are a lot of rooms I could improve it. The design was briefly along with SRFI-40, but I will include more ideas in SRFI-41 and Ruby's lazylist. I have ignored them so far simply because I am lazy!

2 comments:

  1. "lazy lists have not been widely used" - actually in the last couple of years they have, as C# has them as part of Linq. They are implemented using continuations instead of the linked list style of Scheme, and they lack automatic memoization, but they are very useful anyway.

    ReplyDelete
  2. Hi Earwicker, Thank you for the info. I am recently learning LINQ and this is very cute! It is not as natural as Haskell's list though, perhaps this iterator style is straightforward way to implement lazy lists in imperative languages.

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.