Mar 28, 2022

JIT compilation is coming to Unison: an early progress report

In the past couple weeks, I've been working on a new method of compiling unison programs to native code. Unison already has a way ofpackaging a standalone programinto a binary file that starts up very quickly, but that still uses the same interpreter written in Haskell. While the interpreter is written to be reasonably efficient given optimized intermediate code, unfortunately there hasn't been an opportunity to spend a significant amount of time on optimizations that would realize its full potential. Also, even in the best case, an interpreter will always have significant overhead compared to native code.

Thus, compiling to native code seems like an inevitable step. There are several considerations that would make a target more or less suitable/convenient:

  1. The compilation has to have the ability to work just-in-time. It's okay to be able to precompile binaries from known code, but one of the primary uses we have for the standalone binaries right now is to implement a server that evaluates unison code sent to it from another environment. That code is not known ahead of time; it is serialized and sent from the client, and needs to be deserialized, compiled and executed on the server.
  2. Tail calls need to be handled properly. They’rea useful feature in any languagebut in Unison they’re also the only mechanism for writing loops.
  3. Ideally, the target would support delimited continuations as an implementation of abilities. There are ways of implementing algebraic effects without delimited continuations, but continuations are a better match (I'll explain further on).
  4. It would be advantageous if the target already had a reasonable implementation of various low level compilation details. Unison isn't really novel in ways that would require a specialized garbage collector or native code generator, so it would be good if we can avoid needing to implement these things from scratch.
  5. The runtime should be capable of doing some relatively sophisticated things. Right now Unison wraps features of GHC's runtime, like lightweight threads with asynchronous I/O, and software transactional memory. Ideally it wouldn't be necessary to completely reimplement these features.

Based on these considerations, we decided that aChez Schemebackend was a promising direction. Scheme and Lisp in general are well known for supporting dynamic code loading features, and well behaved tail calls are one of the defining features of Scheme. Continuations are also heavily associated with scheme, so it's likely that most implementations would be well equipped in that area, too. Since it's meant to be usable as a functional language in its own right, it has garbage collection and reasonable library capabilities. Moreover it seems to be fairly well equipped as far as generating code for functional languages goes.

Jared Forsythdid some early prototyping of a compilation to Scheme pipeline and was getting some nice results compiling pure code, and that really inspired us to look more closely at Scheme as a serious target. We also knew that the rewrite ofthe Idris languagehad initially based its backend on Chez (although I think it has several backends again at this point). But when researching, I also discovered thatRackethas been moving towardreplacing its backendwith Chez as well. The Racket report specifically suggests that Chez could be a better target for functional languages than a lot of other common compiler backends, since many of the latter have limitations based around the needs of more traditional imperative languages.


What about LLVM?

For a high-level functional language like Unison, LLVM is pretty low-level as a compilation target. It provides something like a platform independent assembly language, saving you from needing to write a register allocator and native code generator. LLVM can also support tail calls provided you use one of theright calling conventions.

It doesn't provide any runtime services out of the box, such as a garbage collector, continuations and/or delimited continuations, lightweight threads, async I/O, etc, which means it can be a considerable amount of work to build a serious LLVM-based JIT with these features.

Probably the most significant gap in the capabilities of the Chez runtime is software transactional memory. Unsurprisingly (since it's very difficult to use correctly without an effect checking system), Chez doesn't implement it. Glancing atthe original STM paper for GHC,the implementation they describe doesn't soundtoodifficult, and looks doable using the concurrency primitives present in Chez.

What's working so far

So far, I've written (part of) a compiler from one of the existing intermediate code representations to Scheme code. The intermediate representation usesA-normal formwith nested pattern matching desugared to single level matching and all local function definitions floated to an outer-most recursive let block for each top level definition. This level of normalization is not all completely necessary for compilation to Scheme, but it is the format we're already using to send code between Unison instances, so it would need to be handled anyway.

I started by writing the Scheme emitter in Haskell using the existing data structures there. But after some discussion, Paul and I realized that in order for the dynamic loading of Unison code to work in a program compiled to Scheme, it'd be necessary to have the compilation from Unison to Scheme available in Scheme itself. One option is to just write it directly in Scheme, but another option is to write it in Unison, which is then compiled to Scheme. This is what I've done most recently: I have a version of the intermediate data structures defined as Unison data types, along with ways of parsing the binary format for code interchange into those Unison types, and a compiler that emits Scheme definitions from those. Right now it is rather limited, only supporting pure code and a few arithmetic builtins, but it is possible to compile those Unison definitions to Scheme just by running a Unison program. For instance, if we define:

codeTest : '{IO,Exception} ()
codeTest _ = printScheme (termLink addMain)

And executerun codeTestat the prompt, we get output ending with:

    ([(x0 x1)
      (let* ([x2 10000000]
             [x3 (identity x2)])
        (ref-M5Y65PHZ3GWHMCUMIJ3LGLBLKPSYMA x3))])
    (let* ([x2 100]
           [x3 (identity x2)]
           [x4 (x0)])
      (ref-ZQRWH24DGQ437B72KCOZXGRZAJJW2U x3 x4))))


The full Unison API for code loading requires a fair bit more than just this sort of code generation, but it's an important component. Unfortunately I haven't yet filled in enough primitive operations to be able to compile the compiler itself to Scheme. The above code comes from a nested loop that prints the sum of the numbers from 0 to 10 million 100 times. Using the compiled scheme code is around470x fasterthan our current interpreter, at least for this simple arithmetic loop.

While we could produce much better intermediate code for our interpreter by applying standard optimizations (such as worker/wrapper transformations, inlining, etc), Scheme is providing great performance with our more naive intermediate code, presumably becauseit already doesmany of these optimizations itself.

Implementing abilities

One major piece currently missing is the implementation of the primitives for abilities. As alluded to above, the way abilities work in the intermediate code and interpreter is based on delimited continuations. This section walks through some of the technical details of this but feel free to skip ahead.

Scheme is well known for its support of continuations. However, the sort in the (R6RS) Scheme standard are not the ones we want. They are based around the operationcall-with-current-continuation,which I'll abbreviate tocall/ccgoing forward. The idea is that if we write:

(call/cc (lambda (k) ...))

Then the lambda expression gets called with a special function that has captured the control flow context leading up to that point in the program. The lambda expression may return normally, in which case its result is supplied to that context. However, callingkalsocauses the program to act like the argument tokwas returned from the lambda expression. This is true even ifkescapes the scope of thecall/ccblock, which means that it has the power toreplacethe entire remaining control flow context anywhere in the program.

An ability handler clearly involves a continuation as well. A pattern match implementing one looks something like:

{ E x y z -> k } -> ...

kbeing the continuation again. However, these continuations behave much more like functions. Callingkdoes not completely replace the execution context, but adds to it like a normal function. It's possible thatkwould have captured someExceptioneffects thatwouldjump to a top-level handler, but this is also no different from functions, and those effects can be handled so that they don't escape.

This is wheredelimitedcontinuations come in. They are instead based around apairof operations, the simplest of which areshiftandreset.shiftis similar tocall/cc,and a use would look something like:

(shift (lambda (k) ...))

There are three primary differences:

  1. shiftonly captures intoka portion of the control flow context, limited by the nearest enclosingreset.
  2. kactually behaves like a function. To know what(k x)does, imagine ifxwere returned from the original(shift ...)expression, and trace the execution until you find the valueythat would be returned from the enclosing(reset ...)expression. Then(k x)evaluates toy.
    Here is a concrete example.
    Here is a concrete example.

    Consider the expression:

      (+ 1 2
          (+ 1
             (shift (lambda (k) (k (k 5))))

    This evaluates to20,because thekcaptured is essentially the inner addition, meaning it's equivalent to:

      (let ((f (lambda (x) (+ 1 2 x 3))))
        (+ 1 2 (f (f 5))))

    If theshiftis replaced withcall/cc(removing the redundantreset),then the result is14,because it is equivalent to:

      (+ 1 2 (+ 1 2 5 3))

    If you want, you can also try using delimited continuations in Unison.shiftis the effect part, introduced by the ability:

    structural ability Shift r
    structural ability Shift r where
      shift : ((a -> r) -> r) ->{Shift r} a

    resetis the handler part, definable via:

    jitAnnounce.reset : '{e, Shift r} r ->{e} r
    jitAnnounce.reset : '{e, Shift r} r ->{e} r
    jitAnnounce.reset e =
      h = cases
        { x }          -> x
        {shift e -> k} -> e (x -> (handle k x with h))
      handle !e with h

    Then we can try the above example:

    v = use Nat + 1 + 2 + jitAnnounce.reset '(1 + 2 + shift (k -> k (k 5)) + 3) v

    One thing to notice is that these operations use a "deep" handler. Shallow handlers actually correspond more closely to another pair of operations known ascontrol(similar toshift)andprompt(similar toreset).A Unison version of these might be defined something like:

    structural ability Control r
    structural ability Control r where
      control : ((a ->{Control r} r) -> r) ->{Control r} a
    prompt : '{e, Control r} r ->{e} r
    prompt : '{e, Control r} r ->{e} r
    prompt e =
      handle !e
          { x }            -> x
          {control e -> k} -> e k

    Although I haven't experimented much with them, so I'm not sure how precisely they match the theoretical definitions.

  1. shiftactually removes the captured portion inkfrom the control flow context. So, returning directly from the lambda expression yields a value directly to the enclosingreset.

This is essentially how ability handlers work, too. The continuation captured in an ability match only captures a portion of the computation delimited by the associatedhandleblock. So,handleblocks correspond to theresetoperation. The big difference is that we also associate the ability matching, and thus the uses ofshiftwith thehandleblock.

However, the intermediate code actually converts this to something more like the Scheme presentation. Ability matching is compiled to code that uses an operation likeshift,so that handled ability requests can actually be implemented by calls to the appropriate generated code. Then references to the implementation are threaded around, and requests are just function calls. So the intermediate code is already ideally suited for a runtime that directly supports delimited continuations.

There are theoretical results about implementing delimited continuations usingcall/ccand a global mutable variable. I believe the original is in Filinski'sRepresenting Monads.Essentially the Scheme implementation there is:

(define mk (lambda (x) (raise "fell off end")))

(define-syntax reset
  (syntax-rules ()
    [(reset e ...)
       (lambda (k)
         (let ([ok mk])
           (set! mk (lambda (r) (set! mk ok) (k r)))
           (let ([v (begin e ...)]) (mk v)))))]))

(define-syntax shift
  (syntax-rules ()
    [(shift f)
       (lambda (sk)
         (let* ([k (lambda (x) (reset (sk x)))]
                [v (f k)])
           (mk v))))]))

However, there seem to be some issues with doing this.Oleg Kiselyov'spage on the downsides ofcall/cclists some. The most significant problem seems to be that the correctness of the result hinges on every other side effect being subsequently implemented by reduction toshiftandreset.However, this ranges from expensive to impossible. Even in Unison, where declared abilities would reduce somehow to the delimited continuations, it is desirable to have some wired in pseudo abilities that implement things more efficiently. Encoding every mutable reference cell usingshiftandreset,which are themselves encoded usingcall/ccand a single global variable using closures is a very expensive strategy. And there are also built-in side effects like I/O and threading that are important for us to use.

There may be better 'portable' encodings than the one above.But it seems like what is really needed is a careful, system-specific implementation of delimited continuations that definitely interacts well with other parts of the system. Luckily, it should be possible to get this for Chez. As mentioned, Racket now has a Chez backend, and the former has built-in support for delimited continuations.

Technically, Racket uses its own forked copy of Chez, and the upstream version does not have all the features used for Racket's continuations implementation. There is a pull request against the upstream to add the feature, but it seems to be missing someone to fix some conflicts. I'm uncertain how necessary the feature is just for implementing delimited continuations (Racket uses it for other features as well). We might have to help get the pull request merged, or try using the Racket fork of Chez for our compilation in the end.


There's still quite a bit of work to be done, but I think even some of the early results so far are promising. The arithmetic examples mentioned above are much faster than the existing system, and that is on code actually generated from Unison definitions, not hand-crafted output that we could hypothetically generate given enough optimizations.

I think the data structures I've ported to Unison may also have other uses than this compiler, as well.Paul recently demoed a debugging abilityhe wrote completely in Unison. While discussing it with him, it occurred to me that a lot of the runtime introspection capabilities I've implemented as part of the JIT pipeline could also be useful for providing an even richer debugging experience, with access to all local variables and other niceties. That's something to investigate another time, though.

Stay tuned for more progress updates on this in the future!