Notes on Halide and Distributed Halide

Halide is a newish domain-specific language for writing high-performance image processing pipelines, created by a group of researchers at MIT and Adobe. Earlier this year, I read “Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines’, which appeared a few years ago at PLDI ‘13, followed by “Distributed Halide”, which came out this spring at PPoPP. I’m a big fan of the Halide work. In this post, I’ll share my observations from having read these papers.

Contributions of the Halide PLDI paper

Although the original paper on Halide appeared at SIGGRAPH 2012, I like the PLDI ‘13 paper better as an introduction to this line of work. From my point of view, the Halide PLDI paper makes four main contributions.

  • The first contribution, and the one that’s most interesting to me, is a model of the scheduling tradeoff space for image processing pipelines. The authors consider a three-way tradeoff among parallelism, locality, and avoiding redundant recomputation. There are two main choices to make: domain order and call schedule. I’ll say more about these choices later.
  • The second contribution is a schedule representation, or schedule DSL, if you like, that allows one to express programs anywhere in this space of scheduling choices.
  • The third contribution is a compiler that generates machine code given an algorithm and a schedule.
  • The fourth contribution is an autotuner that can come up with “high-quality” schedules for an algorithm using stochastic search.

An image processing pipeline could be relatively simple; say, first blurring an image horizontally by averaging each pixel with its neighbors on either side, then blurring it vertically by averaging each pixel with its neighbors above and below. This two-stage pipeline serves as a running example to introduce the concepts in the paper. Even with a very simple, linear pipeline like this, there are challenging scheduling choices to be made, as the paper demonstrates

Pipelines can get much more involved, though. The paper’s go-to example of a large and difficult-to-schedule image processing pipeline is a pipeline that computes the local Laplacian filters algorithm. Here’s a diagram, taken from the paper, representing the local Laplacian filters pipeline:

The first thing that struck me about this picture is that, despite its name, a “pipeline” is not necessarily linear! It’s really a graph of interdependent stages. The fact that it’s a graph complicates the scheduling problem.

According to the paper, this pipeline has 99 stages, not all of which are obvious from looking at the figure. After spending some time myself trying to get the stages to add up to 99, I eventually went to the halide-dev mailing list and asked. Someone there was able to account for 80 of the alleged 99 stages.