Last week at POPL, I presented “Freeze After Writing: Quasi-Deterministic Parallel Programming with LVars”. In this paper, my co-authors Aaron Turon, Neel Krishnaswami, Ryan Newton, and I added three new features to the LVars programming model: handlers, quiescence, and freezing. I’ve posted the slides from the talk, and sooner or later there should be a video, too! Meanwhile, here’s some of the same material in written form.
Our paper extends the original LVars programming model of least-upper-bound writes and threshold reads. Although the original LVars model generalized IVars while retaining determinism, it was still quite limited. In particular, there are a lot of things it’s hard to do with a threshold read. Suppose we have an LVar representing a container — say, a set. What if we we want to iterate over its elements? What if we want to find out if some element is not in the set? And what if we want to be able to react to the appearance of elements we aren’t expecting? It’s hard to threshold on something unless we know it’s going to be written eventually.
Handlers, quiescence, and freezing extend the LVars model to make it possible to do all those things. By adding them, we push our programming model into the realm of what we call quasi-determinism — a weaker guarantee than determinism. Happily, though, there’s a way to make our way back to determinism from quasi-determinism, and I’ll explain how that works, too.
A challenge: parallel graph traversal
Let’s consider a problem that’s particularly hard to solve using just threshold reads. Suppose we want to do a parallel traversal of a directed graph, starting from a given node and finding all the nodes reachable from it.
Ordinarily, the algorithm for doing this would go something like the following: we mark the start node as seen. Then we look at all its neighbors, perhaps launching a thread for each one, and for each neighbor, we check to see if it’s marked as having been seen. If so, then that thread is done; if not, then we mark it as seen, and then launch a thread for each of its neighbors. This continues until all the reachable nodes have been seen.
As we traverse the graph, the set of nodes that we’ve seen grows monotonically, and we continue adding the neighbors of seen nodes to that set until the set reaches a fixed point. Because this algorithm involves a monotonically growing data structure, it would seem at first glance to be a great match for the LVars model — we could use an LVar to represent the set of seen nodes, with set union as least upper bound. Unfortunately, we can’t express this algorithm using only threshold reads, because there’s no way to check if a node has been seen or not. And, without that test, we won’t know when to stop looking for new nodes.
In this graph, for instance, adding node 11 will lead to an attempt to add its neighbor, node 10, which will lead to an attempt to add node 9, and so on — in this case, infinitely, because there happens to be a cycle in the graph. But those writes aren’t doing anything; they’re no-op writes, because the nodes they’re trying to add to the set are already there. So, the LVar is done changing, all meaningful writes are over with, and it would be safe to terminate if only we had a way to know that the meaningful writes are over with.
Events and event handlers
The way we accomplish this in our model is with what we call events and event handlers. An event is a write that actually moves an LVar’s state upward in the lattice, as opposed to a no-op write, and an event handler is associated with an LVar and listens for state-changing updates to that LVar — such as a new node arriving in the set of seen nodes. The event handler can run a callback in reaction to an event (and a callback can do writes of its own, which can trigger further events, recursively).
Let’s try writing our graph traversal function using event handlers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
To traverse a graph
g starting from a particular node
startNode and find all the reachable nodes, we first create a new LVar of a set type, called
seen. Then we attach an event handler to the
seen set. The
newHandler function, defined elsewhere, takes the LVar,
seen, as its first argument. As its second argument,
newHandler takes the callback that we want to run every time an event arrives — that is, every time a new node is added to the set. The callback takes that newly arrived node
node as its argument, looks up
node’s neighbors in the graph by calling a function
neighbors (also defined elsewhere), and then maps the
insert function over the list of neighbor nodes that
Then, we add the starting node to the
seen set with
insert startNode seen — and we’re off! The event handler does the rest.
How do we know when we’re done handling events? The
quiesce operation blocks until all of the callbacks running in a particular handler pool, in this case
h, are done running. So, we call
h, and that call will block until we’ve handled all the meaningful writes.
Once we’ve quiesced, we still have one problem: how do we look at the exact contents of the LVar that just quiesced? This is where freezing comes in. The
freeze operation is a non-blocking read that lets us find out the exact contents of an LVar. We call
freeze seen on the last line of our
traverse function, and it returns the LVar’s contents.
Importantly, once an LVar has been frozen, its state can’t change anymore, and any further writes that would change its state instead raise an exception! So it’s a good thing that we quiesced before we froze, so we know that no more writes to the LVar are going to occur, making it safe to look at the LVar’s exact contents.
But wait! Aren’t explicit synchronization barriers bad?
Since we had to make sure to quiesce before freezing, aren’t we just relying on a synchronization barrier being in the right place? Aren’t we right back to nondeterminism if we forget to quiesce?
Actually, it’s not that bad, because even if we forget to quiesce, or even if we do it in the wrong place, the worst thing that can happen by freezing too early is that we raise a write-after-freeze exception. For a program that performs freezes, we can guarantee that there are only two possible outcomes: either the deterministic result of all the effects, or a write-after-freeze exception. Put another way, all executions that produce a final value produce the same final value, but some executions might raise an exception.
Moreover, if we do hit the write-after-freeze exception case, it always means that there’s a synchronization bug in our program, and in principle the error message can even tell us exactly which write happened after which freeze, so we can more easily debug the race!
Making our way back to determinism
We call this property quasi-determinism, and we’ve proved that the LVars model with the addition of handlers, quiescence, and freezing is quasi-deterministic; for more on the quasi-determinism proof, see our paper or the accompanying tech report. But what does quasi-determinism really mean for us as programmers?
The good news is that the particular graph traversal we saw above is deterministic. The bad news is that, in general, freezing introduces quasi-determinism to the programming model, because we might freeze before we’ve quiesced, or because our thread that’s freezing and quiescing might be racing with some other thread to write into a shared LVar.
But, if we could somehow ensure that freezes only ever happen after all writes have completed, then our computations should be deterministic, because then we wouldn’t have any write-after-freeze races. As it turns out, there’s a way to enforce this at the implementation level. The trick is to let the implementation take care of quiescing and freezing for us at the end of a computation. In our LVish library, the mechanism we provide for doing this is called
Freeze after writing
LVar operations can only run inside
Par computations, and the only way to execute a
Par computation is with one of the “run” functions that the LVish library provides. For instance, the version of
traverse above uses
runParIO. But another one of these “run” functions,
runParThenFreeze, executes a
Par computation and then returns a frozen LVar at the end of it, after an implicit global barrier. So, if we have a
Par computation that side-effects an LVar and we run it with
runParThenFreeze, then we don’t have to manually
freeze the LVar ourselves, which means we don’t have to manually
quiesce, either! Here’s what
traverse looks like when it’s written with
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Notice that this second version of
traverse returns a computation of type
Par Det, for “deterministic”. The “
Det” is an effect level, meaning that the computation can only perform deterministic operations — in particular, it’s not allowed to do a
freeze, and if it tried to, then that would be a static type error. On the other hand, in our first version of
traverse, where we call
traverse returns a computation of type
Par QuasiDet, for “quasi-deterministic”. Hence the type of a
Par computation reflects its determinism-by-construction or quasi-determinism-by-construction guarantee.
How can the second version of
traverse be deterministic by construction? Because the
Par computation never has to do any LVar reads at all, let alone any freezing LVar reads! It merely starts off the computation and then returns the
seen LVar. The LVar may well still be growing at the time
traverse returns — but, since
runParThenFreeze cannot return unless all threads launched inside the
Par computation are done, there’s no threat to determinism, and we’ll always get the same set of nodes as our answer. More generally, the fact that it’s possible to safely return an LVar while it is still being written to by a handler makes it possible to set up dataflow-style, pipeline-parallel networks of LVars, letting us combine task parallelism and pipeline parallelism in the same deterministic program. I’ll write more about this style of programming with LVars in a future post.
Try it yourself
In the lvar-examples repository, I’ve written up a couple of self-contained examples so that you can try this code out yourself. Here’s the version with explicit freezing, and here’s the one that uses
runParThenFreeze. There are also many more examples of handlers and freezing in the repo. Enjoy — and please file bugs, since there are certain to be some!