composition.al

Yet another blog post about how parallelism is not concurrency

Recently, an acquaintance asked for an explanation of the difference between parallelism and concurrency. Blog posts about how parallelism is not concurrency are already ground well covered, but I feel compelled to write another one!

Concurrency and parallelism are orthogonal concepts:

  • Concurrency is a way of structuring your programs; it has to do with how programs are written. A lot of software systems are useful to reason about as having multiple threads of control — for instance, separate clients connecting to a server, or separate UI widgets. Programs that are organized this way exhibit concurrency. The fact that multiple threads of control are a good way of organizing and reasoning about certain programs is orthogonal to whether those programs are running on one processor core or four or 1000.

  • Parallelism is a way of making your programs go faster; it has to do with how programs are run. Parallel hardware is hardware that can make multiple things happen at once. Basically every computer that’s made now has parallel hardware resources that we could be exploiting more fully if only we knew how — or if only our runtime systems, whose job it is to schedule the programs onto the hardware, knew how. Programs that exploit these parallel hardware resources exhibit parallelism. The fact that exploiting parallel hardware resources is a good way of making certain programs go faster is orthogonal to whether those programs are structured as one thread or two or thirteen.1

Now, as it turns out, if a program is already organized into multiple threads of control, then that can be a big ol’ hint to the runtime system about how the program can be fruitfully scheduled onto different hardware execution units that can run at the same time. This is a fine way of exploiting parallel resources. On the other hand, if a program has two threads that have to, say, pass a big data structure back and forth between them a lot, then maybe it’s not such a great idea to run them on different cores; maybe it would be better to just run them in an interleaved manner on the same core and keep that data structure in cache. Either way, the program would still be concurrent.

If you happen to be writing your code in a setting where structuring your program into concurrent threads is the best way, or maybe the only way, of telling the runtime system that you’d like it to use parallel resources, then the concepts of “parallelism” and “concurrency” will neatly overlap. Given that, it’s no wonder that they get conflated sometimes.

However, even if a program has “no concurrency” — that is, even if it has only one thread of control — then that doesn’t mean that there isn’t potential for parallelism in there. For instance, if the program is “map this function over every element in this thousand-element array” and the function is a pure function, then we can divide up the array among the cores and have them each handle a part of it. This flavor of parallelism is known as data parallelism, or SIMD (single instruction; multiple data) parallelism, because we’re doing the same thing (“single instruction”) to all the elements (“multiple data”) of the array.2 Data parallelism is in contrast to task parallelism, in which we’re executing unrelated tasks on different cores. Task parallelism is what we get when the runtime system, given a concurrent program with multiple threads of control, schedules each thread to run on a different core.

For a more in-depth discussion of concurrency and parallelism and the interplay between them, I strongly recommend chapter two — “Concurrency meets parallelism” — of Aaron Turon’s dissertation. That said, I think Aaron might actually disagree with my claim that parallelism and concurrency are orthogonal; I think he might say that if you’re executing tasks in parallel, then you have multiple threads of control, and hence concurrency, at some level of abstraction, albeit one that might not be exposed to you as the programmer. That makes sense to me, but nevertheless, I think it’s handy to distinguish a class of programs that run in parallel but do not appear to be concurrent, with the understanding that the way a program appears depends on the level of abstraction from which one is looking.

There are also those people who like to smugly point out that every program is concurrent because all the components of a processor can be thought of as different threads of control, which is one of those overliteralist, true-but-unhelpful statements along the lines of “We’re traveling through time right now.” Don’t be that guy.

An analogy: students as execution units, courses as threads

Let’s dispense with the jargon and explain concurrency and parallelism in a different way: by analogy with students taking courses at a school.

  • Not concurrent, not parallel: A single student takes one course at a time. She completes the entirety of one course before going on to the next course.3 This scenario is analogous to a program that has one thread of control and runs on a single execution unit.
  • Concurrent, but not parallel: A student takes four or so courses during a semester, interleaving the work she does for each. Most of us who went to college probably did this. It’s analogous to a program that has multiple threads of control that take turns running on a single execution unit. When people describe concurrency as “pretend parallelism”, this is the kind of scenario they have in mind: when our student is done, the four courses will all show up on her transcript as having happened in the same semester, and she can pretend that she literally worked on them all at exactly the same time, even though she probably wasn’t actually typing an English paper with her right hand while doing a linear algebra problem set with her left.
  • Not concurrent, but parallel: Two students are enrolled in Math 227 at the same time, doing all the work simultaneously and identically. (Unaware that they are trapped in a hypothetical situation, they are unfairly accused of plagiarism and kicked out of school.) Modulo the bit about getting kicked out of school, this situation is analogous to a data-parallel program that has a single thread of control running on multiple execution units simultaneously, in SIMD fashion.
  • Concurrent and parallel: During the fall semester, one student takes Anthropology 318 while another student takes Biology 150. Together, they resemble a task-parallel program that has multiple threads of control, each of which runs on a different execution unit simultaneously. Or perhaps two students take Anthro 318, doing all the work simultaneously and identically, while three take Bio 150, doing all the work simultaneously and identically. Their situation resembles a program that exhibits both task parallelism and data parallelism. (They are all accused of plagiarism, kicked out of school, and later, together with the two students from Math 227, become the lead plaintiffs in a class-action lawsuit against unpleasant hypothetical situations.)

Pipeline parallelism

One place where this students-and-courses analogy breaks down is pipeline parallelism. Pipeline parallelism occurs when the same thing has to be done several times, but different execution units each do a part of it, and those parts, although they have to occur in sequence, can be staggered. For instance, suppose that each item in an arriving stream of items has to be frobbed and then globbed. We might schedule things so that unit one handles all the frobbing, while execution unit two handles all the globbing (it has a globbing co-processor). When the first item arrives, unit one frobs it while unit two must sit idle, but then unit two can glob it at the same time as unit one is frobbing the second item. Hence we have parallelism, because once the pipeline fills up, there’s enough work to do to keep two execution units busy at the same time. (If, after globbing, the item has to be blobbed and then wobbed, we might be able to keep four execution units busy.)

In our students-and-courses analogy, students are execution units and courses are threads. However, in pipeline parallelism, an execution unit does the same bit of work over and over, and (most) students don’t take courses that way. A better analogy might be to think of professors as execution units, and students as the work they have to do. If we make the simplifying (if admittedly unrealistic) assumption that every professor only teaches one course over and over, and if we further assume that students have to proceed through a fixed sequence of courses one after the other, then we get a pretty good analogy for what happens in pipeline parallelism.

What about threads? Is a pipeline-parallel program also concurrent? On the one hand, we could be said to have multiple threads of control (the “teach Bio 150 over and over” thread, the “teach Bio 151 over and over” thread, and so on), each of which is being executed by exactly one professor. On the other hand, if we slice things the other way and squint a bit, our pipeline-parallel school looks data-parallel, in that each student proceeds through the same sequence of courses; it’s just they aren’t all proceeding through the sequence in lockstep, but are instead staggered in time (that is, a student that just arrived will be taking Bio 150, while one who’s been around for a semester will be taking Bio 151 at the same time). So we could think of pipeline parallelism as “staggered SIMD” parallelism, where the only “thread of control” is the sequence of courses taken.

But Lindsey, don’t you conflate “parallel” and “concurrent” in your own work?!

Sort of. My dissertation on LVars has the word “parallel” in the title, and not the word “concurrent”, but the kind of parallelism I discuss in it is exclusively task parallelism, and the kinds of programs that I discuss running in parallel are exclusively concurrent programs. In this particular, narrow setting, I think it’s actually more or less okay to interchangeably use “concurrent” and “parallel” to describe a program, even though the former term describes the way the program is written and the latter describes the way it is run — because the whole point of a concurrent program that uses LVars is that it can run in parallel and still be deterministic.

So, if the concepts happen to overlap in this setting, what’s the justification for using “parallel” and not “concurrent”? I use “parallel” partly because “deterministic parallelism” is something of a stock phrase, and “deterministic concurrency” isn’t.4 More importantly, though, I use “parallel” because I want to emphasize the fact that making programs go fast is the goal. Using LVars to communicate between concurrent threads makes it so that the order in which such communications occur does not affect the outcome of a program; this gives the scheduler more leverage to schedule threads onto multiple cores in a way that will not introduce bugs, and therefore we can exploit parallel resources and ultimately make programs go fast. That’s the idea, anyway!

Thanks to Scott Feeney, Julia Evans, Greg Hendershott, Paul Stansifer, and Tony Garnock-Jones for reading drafts of this post and giving feedback.

  1. I’m using “thread” and “execution unit” in a generic way in these definitions – “thread” refers not to any particular language feature or OS feature, but to any independent sequence of actions or instructions, and “execution units” could refer to CPU cores, GPU cores, functional units that process vector instructions, or what have you.

  2. Just as I’m using “thread” and “execution unit” generically, I’m using “instruction” generically here – we’re not necessarily talking about executing exactly one operation as defined by a particular instruction-set architecture. The point is just that we’re doing the same thing to all the data.

  3. Incidentally, a couple of colleges have one-course-at-a-time curricula.

  4. I think Bob Harper would say that “deterministic concurrency” is a contradiction in terms, because concurrency is inherently nondeterministic. LVars can’t and don’t change this “nondeterministic nature” of concurrency – they merely make the nondeterminism impossible to observe.

Comments