ParallelAccelerator is a Julia package for high-performance, high-level array-style programming that my group at Intel Labs released recently. It provides a macro called
@acc that Julia programmers can use to annotate functions that are written in array style.
Under the hood, it’s a compiler that intercepts the usual Julia JIT compilation process and compiles those
@acc-annotated functions to fast, parallel native code. This compiler is written entirely in Julia, except for a small runtime component written in C. In my post, I give examples of how to use
@acc, show performance results for those examples, and touch on some aspects of the compiler internals.
My post weighs in at nearly 5,000 words long, which makes it the longest post that’s ever appeared on the Julia blog:
1 2 3 4 5 6 7 8 9 10 11
I was worried about that, but somehow, the Julia team was okay with it! If it’s too long for you, though, here’s an impressively concise summary from @JuliaFeeds:
Thanks to the Julia team, my colleagues at Intel Labs, and other readers for making many suggestions that helped improve the post (and stopped it from getting even longer)!
There’s also an interesting discussion here, in which Jiahao Chen points out that the oft-recommended (and arguably idiomatic) way to write Julia is, in fact, not in array style, but rather in devectorized1 style, with explicit loops. Because of that, it would make sense to also compare the performance of
@acc-annotated programs with devectorized Julia versions, and that’s something that we plan to work on soon.
In my post, in order to avoid any confusion with the vectorization that compilers do, I don’t use the words “vectorized” or “devectorized”. As programming idioms, though (as opposed to compiler optimizations), “vectorized” and “devectorized” are what I mean when I say “array-style programming” and “programming with explicit loops”, respectively. ↩