What isn't a high-performance DSL?

At work, I’ve been participating in a series of long-running, broad-ranging discussions about the role that domain-specific languages, or DSLs, can play in helping programmers exploit high-performance parallel hardware. One thing that’s interesting about these discussions is that they reveal what people’s assumptions are about what “DSL” means.

For instance, one question that’s come up repeatedly in our discussions is “Is TensorFlow a DSL?” Google describes TensorFlow as “an open-source software library for machine intelligence”; they don’t pitch it as being a DSL. One point of view is that writing code with TensorFlow is “just writing Python” (or C++, or Java, or Go, or what have you), albeit with a domain-specific compiler under the hood that takes the place of some part of the host language’s usual compilation or execution pipeline. On the other hand, it can be fruitful to think of TensorFlow as a language in its own right. Embedded DSLs and frameworks supported by sophisticated runtime systems blur the line between “DSL” and “library”.

With ParallelAccelerator, we went back and forth for quite a while on whether or not to call it a DSL. We’ve always been content with the phrase “domain-specific compiler”, but “domain-specific language” has been more contentious. For the most part, ParallelAccelerator works by speeding up a subset of existing Julia code; all the programmer has to do is take code that’s written in that subset of Julia and wrap it in a macro annotation that tells ParallelAccelerator to compile it. On the other hand, though, ParallelAccelerator does provide one new user-facing language construct, runStencil, that indeed is not part of Julia or its standard library. Eventually, Jan Vitek coined the phrase non-invasive DSL to describe what ParallelAccelerator does. Our (forthcoming) paper describes a non-invasive embedded DSL as one that makes “as few visible changes to the host programming model as possible”. Not only is ParallelAccelerator code “just Julia code” modulo runStencil, it can also be used in a “library-only” mode in which calls to runStencil are just implemented as ordinary, serial Julia function calls (in a way that’s reminiscent of how Pochoir programs can be compiled either with the Pochoir compiler, resulting in parallel code, or with an ordinary C++ compiler, resulting in serial code).

The question “Does it have its own compiler?” is, of course, orthogonal to the question “Is it a language?” — or is it? One thing that has surprised me in our group’s discussions is that some participants in the discussion felt that the term “embedded DSL” implied that there must be some compiler (or interpreter, or JIT, or something) distinct from that of the host language, and that if there wasn’t such a thing, then what we were really dealing with was a library or an API, and certainly not a DSL. My view, on the other hand — perhaps influenced by the Racket philosophy — has been that something we call an “embedded DSL” can be implemented using a syntax extension system that elaborates the DSL’s syntax to the host language’s syntax, and is then compiled or interpreted using the host language’s compiler or interpreter (or further elaborated to yet another language in a tower of languages).

To back up for a second, though, the point of these discussions has been to focus on a certain class of DSLs — those that help programmers make the best possible use of high-performance parallel hardware. To me, the thing that makes this class of DSLs so interesting and important is the fact that they promise efficiency not in spite of the high-level domain-specific abstractions that make them pleasant to program with, but in fact because of those high-level domain-specific abstractions. Limiting the scope of the problem to a particular domain allows the implementation to make stronger assumptions about programmer intent, and employ optimizations that are tailored to the domain at hand.

For our particular purposes, then, perhaps we should be less concerned about whether or not something “is a DSL”, and more concerned with whether or not (and the extent to which) it captures domain knowledge and uses it for performance. From my point of view, TensorFlow should probably count; so should Halide; so should DSLs implemented using Delite or SEJITS; so should libraries developed using the AnyDSL framework (whose website expands the acronym “DSL” as “domain-specific library”, rather than “domain-specific language”); and so should plenty of other things. Unfortunately, I don’t have a concise name for this set of things. “High-performance domain-specific systems” is the best thing that comes to mind.