I’m very happy to announce that “Parallelizing Julia with a Non-Invasive DSL”, by Todd Anderson, Paul Liu, Ehsan Totoni, Jan Vitek, Tatiana Shpeisman, and me, will appear at ECOOP 2017 in Barcelona in a couple weeks from now. This paper presents ParallelAccelerator, an open-source library and compiler for high-level, high-performance scientific computing in Julia. ECOOP is an open-access conference, and the paper will be permanently available for free as part of a LIPIcs volume; there’ll be an accompanying open-access artifact as well.
To say that I’m relieved to finally be writing this blog post would be an understatement. We released the first version of ParallelAccelerator way back in October 2015, and I gave my first public talk about the project that same month.1 In spring 2016, I contributed a post to the Julia blog about ParallelAccelerator, then followed that up with a talk about it at JuliaCon in June 2016. By then, ParallelAccelerator had become one of the top twenty most popular Julia packages, but we were having a terrible time trying to get a paper about it published — our first three attempts had all been rejected.2 While I was in Boston for JuliaCon, I spoke with Jan Vitek, who patiently listened while I griped about the string of rejections. (This was not how I had hoped my first post-Ph.D. job would go!) I was ready to just send the paper to a low-visibility journal and be done with it, but later on, Jan emailed me after looking at a draft and said that he’d be happy to come on board as a co-author and help us improve the paper to the point where it was publishable in a venue that we could be proud of.
So, with Jan’s help, we reworked the paper into what I now think is a convincing story. It was Jan who coined the phrase “non-invasive DSL” to describe what ParallelAccelerator does: we try not to interfere with the array-style programming model that many Julia programmers (in particular, those who come to Julia from array languages like MATLAB) are used to. Julia already supports programming in array style; ParallelAccelerator helps programmers make array-style code go fast without requiring them to make a lot of invasive changes to their code. We also introduce one new language construct,
runStencil, which lets programmers express stencil computations elegantly and efficiently. See our GitHub repo for lots of code examples, some of which are described in detail in the paper.
I wish I could go to ECOOP in Barcelona and celebrate the publication of this paper, but the conference is too close to the release date of my other project for me to be able to do any long-distance air travel! Instead, I’ll be at JuliaCon 2017 in Berkeley, which is local for me (and coincidentally on exactly the same days as ECOOP). My colleagues Todd and Ehsan will be at JuliaCon, too. Todd is currently hard at work getting the ParallelAccelerator compiler working under Julia 0.6, which is on track to be released in time for JuliaCon, and we’ll have a new release of ParallelAccelerator to go with Julia 0.6. Meanwhile, I’ve been working on improving our support for Mac users, GCC users, and OpenBLAS users.3 If you’re a Julia user and curious about ParallelAccelerator, JuliaCon would be a good time to give it a try!
Back then, we were still calling the project “Prospect”, and using “ParallelAccelerator” to refer to the Julia-specific implementation of Prospect-the-language-agnostic-idea; at some point between late 2015 and now, we abandoned the implementation-versus-idea duality and just stuck with “ParallelAccelerator” in all contexts. As the paper points out, though, the name “ParallelAccelerator” is a bit misleading, since much of the speedup that one can get from it isn’t due to parallelism, but is rather due to other stuff like avoiding unnecessary bounds checks and intermediate array allocations. ↩
Meanwhile, my colleagues’ paper on Latte, a Julia library and compiler for deep neural networks that builds on top of ParallelAccelerator, had already appeared at PLDI. It was when the Latte paper came out that I really began to get irritated that we still hadn’t managed to publish anything about ParallelAccelerator, because there was no ParallelAccelerator paper for the Latte paper to cite! (They had to just cite the ParallelAccelerator GitHub repository instead.) ↩
ParallelAccelerator depends on the presence of an external C++ compiler, ideally one that supports OpenMP, and it works best in the presence of a BLAS library, ideally MKL. Traditionally, our best-supported combination of OS, compiler, and BLAS library has been Linux, ICC, and MKL. However, there are lots of people in the Julia community who are instead using macOS and/or GCC and/or OpenBLAS, and so I’ve been working on making ParallelAccelerator work better in those environments. ↩