In the previous part of this series, we introduced a simple distributed tree type and showed how to implement transformation functions likelib.Tree.map
in a way that "brings the computation to the data". We saw how functions likelib.Tree.map
are lazy: they don't do any work when called but merely set up a pending transformation that will be applied as the data structure is forced.
In this part of the series we'll focus on functions for ourTree
data type that evalute or force the data structure in some way. We'll use areduce
function as an example. Again, this will illuminate how small tweaks to the code can cause different runtime behavior.
sequential.Tree.reduce zero combine tree
has three parameters: a "zero" value to return if a subtree is empty,combine
for combining the results of two subtrees, and the tree to be reduced. Reduce functions for in-memory data structures deal with a set of familiar concerns: do you want to reduce the left or right subtree first, should the reduce implementation be tail-recursive or maintain state on the stack, but because we're now working with distributed computations, our reduce implmentation needs to manage the additional dimensions ofwhereandwhenthe combine function forreduce
should be run.
We could write areduce
function that behaved in either one of the following ways:
- Send the function down:Push the combining functiondownthe tree to the data, and send the resulting
Nat
reduced value back to the parent for combining. - Send the subtrees up:Send each forced
Tree Nat
upto its parentTree.Two
,which callsreduce
on each subtrees, thencombines
the twoNat
results.
We ultimately want theSend the function downoption, since sending aNat
to the parent will be cheaper than sending aTree Nat
(only to immediatelyreduce
that to aNat
),but we'll illustrate both here. Take a look at the recursive case in processing theTree.Two
branch in the following implementation:
sequential.Tree.reduce : a
-> (a ->{Remote} a ->{Remote} a)
-> Tree a
->{Remote} a
sequential.Tree.reduce :
a -> (a ->{Remote} a ->{Remote} a) -> Tree a ->{Remote} a
sequential.Tree.reduce zero combine = cases
Tree.Empty -> zero
Tree.One valueA -> Value.get valueA
Tree.Two leftValue rightValue ->
combine
(sequential.Tree.reduce zero combine (Value.get leftValue))
(sequential.Tree.reduce zero combine (Value.get rightValue))
It does the following:
- Evaluate the left subtree and send it to the current location.
- Evaluate the right subtree and send it to the current location.
- Recursively
reduce
the left subtree. - Recursively
reduce
the right subtree. - Apply the
combine
function to the two results from (3) and (4).
We've implemented thesend the subtrees upapproach. If the subtrees are at the same location as the parent, this is fine. But since this is meant to be used in situations where the data cannot fit on one location, there will be nodes in the tree where the parent resides at a different location than one of its subtrees. In these places we're sending more data over the network than we should.
There's another problem with thesend the subtrees upapproach: the reducing and combining is always happening where the parent resides. Since this is a recursive function, this means that all the work is ultimately happening at whatever location callssequential.Tree.reduce
.That is going to be bad when we try to add parallelism laterโwe can't have one location doing all the work!
Let's try to write a version ofreduce
that implements thesend the function downapproach, usingValue.map
instead:
withMap.Tree.reduce : a
-> (a ->{Remote} a ->{Remote} a)
-> Tree a
->{Remote} a
withMap.Tree.reduce :
a -> (a ->{Remote} a ->{Remote} a) -> Tree a ->{Remote} a
withMap.Tree.reduce zero combine = cases
Tree.Empty -> zero
Tree.One valueA -> Value.get valueA
Tree.Two leftValue rightValue ->
use Value get map
use withMap.Tree reduce
left' = map (t -> reduce zero combine t) leftValue
right' = map (t -> reduce zero combine t) rightValue
combine (get left') (get right')
This version will:
reduce
the left subtree at its location.reduce
the right subtree at its location.- Send thereducedleft value to the parent.
- Send thereducedright value to the parent.
- Combine the tworeducedvalues at the parent.
This is thesend the function downapproach. Notice at at no point are we sending aTree
to the parent (there are no calls toValue.get
that return aTree
,only calls toValue.get
that return reduced values).
While this is an improvement in our execution strategy forsequential.Tree.reduce
,we are still reducing the left and the right subtrees sequentially, first the left, then the right. Why not reduce the two subtrees in parallel?
To make this into a parallel reduce, we can useRemote.fork
to start a computation running in a backgroundTask
.Anawait
blocks until the forkedTask
completes and returns its result (or raises a failure if the forked taskfailed).
UsingRemote.fork
andawait
in ourreduce
function yields something like this:
parallel.Tree.reduce : a
-> (a ->{Remote} a ->{Remote} a)
-> Tree a
->{Remote} a
parallel.Tree.reduce :
a -> (a ->{Remote} a ->{Remote} a) -> Tree a ->{Remote} a
parallel.Tree.reduce zero combine = cases
Tree.Empty -> zero
Tree.One valueA -> Value.get valueA
Tree.Two leftValue rightValue ->
use Remote fork
use Value get map
use parallel.Tree reduce
leftTask =
fork here! '(get (map (t -> reduce zero combine t) leftValue))
rightTask =
fork here! '(get (map (t -> reduce zero combine t) rightValue))
left' = await leftTask
right' = await rightTask
combine left' right'
Our left and rightTree
branches are now being reduced in parallel through the use ofRemote.fork
andawait
.Moreover, all the pending transformations that have been applied to the tree (for instance vialib.Tree.map
)will be forced in parallel in the same pass as theparallel.Tree.reduce
.
Operations that only partially evaluate a structure
Suppose we want to write a function that doesn't require us to fully evaluate the tree? Let's say we want a functionTree.take
that lists the firstn
elements it finds in the tree. Let's write a function that may not need to force the right subtree at all:
Tree.take : Nat -> Tree a ->{Remote} [a]
Tree.take n = cases
Tree.Empty -> []
Tree.One valueA -> List.singleton (Value.get valueA)
Tree.Two leftVal rightVal ->
use List ++ size
use Nat - >=
use Value get map
combine l r =
if size l >= n then List.take n l
else
nextN = n - size l
l ++ get (map (Tree.take nextN) r)
combine (get (map (Tree.take n) leftVal)) rightVal
The trick is we have to guard the right branch from being evaluated by keeping it wrapped in adistributed.Value
.So the function that joins together the left and right branches has to be more careful about the circumstances in which it evaluates the right branch via calls toValue.get
.
Takeaways
Value.get
will force the evaluation of aRemote
value, bringing it to the location of the caller. You'll see it in functions which interpret the distributed data structure.- Use
Value.map
andValue.get
in tandem to controlwhereandwhena computation should be run. - Unison's
Remote.fork
andawait
functions provide a way to introduce parallelization to remote computations.
Whatever runtime behavior you want for your distributed computations can be achieved with only tiny code changes, and the decisions you make are then codified in reusable functions that others can use without needing to be experts in distributed systems.
In the next part, we'll go further, showing how computations on distributed data structures can be madeincremental,avoiding work that has already been done in a previous execution.