This page lists projects that I’ve contributed to during my research career, along with the dates of my major involvement.
- Parallelizing Julia with a non-invasive DSL (2015-)
- LVars: lattice-based data structures for deterministic parallel and distributed programming (2011-)
- Various contributions to the Rust programming language (2011-2012)
- Parametric polymorphism through run-time sealing (2010-2011)
- Testing abstract representations of CPU semantics (2010)
- A pattern matcher for miniKanren (2009)
Parallelizing Julia with a non-invasive DSL
Computational scientists often prototype their software using a “productivity” language such as MATLAB, Python, R, or Julia. When it comes time to scale up such prototypes to run on a high-performance computing system, though, the code typically has to be rewritten in an “efficiency” language like C++. This rewriting process takes considerable time and expense and often results in brittle code, with which it is harder to experiment. Domain-specific languages (DSLs) are one well-regarded solution to the efficiency/productivity trade-off, but it is difficult to design a DSL that can anticipate any use case that the programmer will eventually encounter.
With my colleagues at Intel Labs, I’m working on ParallelAccelerator, a compiler for the Julia programming language that attempts to bridge this productivity-efficiency gap in scientific computing. ParallelAccelerator discovers and exploits the implicit parallelism in source programs that use parallel programming patterns. For the most part, these patterns are already present in standard Julia, so programmers can use ParallelAccelerator to compile Julia programs with only minor changes to source code. Our results show that ParallelAccelerator enables orders-of-magnitude speedups over plain Julia on a variety of scientific computing workloads, without requiring the programmer to extensively rewrite code or add lots of type annotations to guide the compiler.
I gave a talk about ParallelAccelerator at SPLASH 2015, and our open-source implementation of it is available as the ParallelAccelerator.jl Julia library. My March 2016 guest post on the Julia blog has examples of how to use ParallelAccelerator, performance results, and some discussion of package internals. I followed up the blog post with a talk at JuliaCon 2016.
LVars: lattice-based data structures for deterministic parallel and distributed programming
Parallel programming is notoriously difficult. Programs can yield inconsistent answers, or even crash, due to unpredictable interactions between parallel tasks. But it doesn’t have to be this way: deterministic-by-construction programming models offer programmers the promise of freedom from subtle, hard-to-reproduce nondeterministic bugs in parallel code. To make that promise a reality, though, deterministic-by-construction parallel programming models must sharply restrict sharing of state between parallel tasks, usually either by disallowing sharing entirely or by restricting it to one type of data structure, such as single-assignment locations.
A common theme that emerges in the study of such deterministic-by-construction programming models — from venerable ones like Kahn process networks, to modern single-assignment-based models like Haskell’s monad-par library and Intel’s Concurrent Collections system — is that the determinism of the system hinges on some notion of monotonicity. By taking monotonicity as a starting point, then, we can generalize and unify existing models. For my dissertation research with Ryan Newton, I developed LVars, a new model for deterministic-by-construction parallel programming that generalizes existing single-assignment models to allow multiple assignments that are monotonically increasing with respect to a specified lattice. LVars maintain determinism by allowing only monotonic writes and “threshold” reads to and from shared data.
Finally, LVar-style threshold reads can also apply to the setting of convergent replicated data types (CvRDTs), which specify the behavior of eventually consistent replicated objects in a distributed system. In recent work, I’ve extended the CvRDT model to support deterministic, strongly consistent threshold queries. The technique generalizes to any lattice, and hence any CvRDT, and allows deterministic observations to be made of replicated objects before the replicas’ states converge.
For more on LVars, aside from the posts in the “LVars” category here on this blog, see:
- My 2015 dissertation, Lattice-based Data Structures for Deterministic Parallel and Distributed Programming.
- Our FHPC ‘13, POPL ‘14 (with Aaron Turon and Neel Krishnaswami), and PLDI ‘14 (with Aaron Todd and Sam Tobin-Hochstadt) papers.
- A draft paper on deterministic threshold queries.
- LVish, our Haskell library for parallel programming with LVars (with Aaron Turon and other contributors) (also on GitHub).
- My PLT Redex models of LVar calculi.
ParallelArray array type and a variety of potentially-parallel operations on it, like
Various contributions to the Rust programming language
Rust is a new systems programming language being developed by Mozilla Research with the goals of safety, concurrency, and speed. During the summers of 2011 and 2012, I worked on the Rust team at Mozilla, making various contributions to the self-hosted compiler. Rust remains under active development by a team of paid and volunteer contributors; follow the progress on GitHub!
self keyword in Rust!)
During summer 2012, I made various contributions to Rust, including implementing integer-literal suffix inference and some work on extending the Rust trait system to support Haskell-like default methods. I gave a talk about the latter project at Mozilla in August 2012. (When I started this work, traits were still called
ifaces; I changed the keyword to
Parametric polymorphism through run-time sealing
Languages with parametric polymorphism provide the compile-time guarantee that programs behave uniformly regardless of the types they are instantiated with. In such languages, this parametricity guarantee forms the basis of data abstraction and makes possible Wadler’s “theorems for free!” Ordinarily, embedding untyped code fragments in a statically typed language would break parametricity. However, it’s possible to preserve parametricity in such a setting by the use of dynamic sealing, which enforces data abstraction by using uniquely generated keys to seal values that should be kept opaque.
I worked with Amal Ahmed and Jacob Matthews on an update to their previous work on proving parametricity for a multi-language system that combines System F with a Scheme-like language. Our updated proof employs a cross-language, step-indexed logical relation that uses possible worlds to model the semantics of key generation. Using this possible-worlds semantic model, we can show two results: first, that System F’s parametricity guarantee is preserved when interoperating with Scheme, and, second, that parametricity is preserved in a Scheme-only setting that interacts with System F only to implement a polymorphic contract system. I gave a talk about this project at the Northeastern University Programming Languages Seminar in February 2011.
Testing abstract representations of CPU semantics
Junghee Lim and Thomas Reps’ Transformer Specification Language (TSL) system comprises a domain-specific programming language for specifying the semantics of a processor at the level of the instruction set architecture, together with a run-time system for generating static analysis engines for executables compiled for that instruction set. TSL’s ability to generate static analysis engines from ISA specifications makes it a powerful tool for analyzing executables, but it’s of limited value if the ISA specifications are wrong in the first place. We are therefore interested in verifying the correctness of TSL’s ISA specifications.
During summer 2010, I worked with David Melski and the research team at GrammaTech and implemented a prototype system for testing TSL specifications. Given a CPU emulator generated from a TSL spec, our system attaches the emulator to a Linux process running on a physical CPU, initializes the emulator’s state from that of the process, and compares the states of the emulated and physical CPUs after the execution of each instruction. If the states differ, the specification from which the emulator was generated was probably wrong. Our emulator testing framework is flexible enough that it could be extended to support testing of abstract emulators that have an abstract interpreter at their core, in addition to the concrete CPU emulators it supports now. This work was done as part of the Safety in Numbers research project and appears in the project report; I also gave a talk about my work on the project to my research group in October 2010.
A pattern matcher for miniKanren
miniKanren is a declarative logic programming system implemented in a pure functional subset of Scheme. In 2009, I worked with Andy Keep, Michael Adams, Will Byrd, and Dan Friedman on implementing a pattern matcher for miniKanren. Our self-imposed challenge was to implement the pattern matcher using Scheme’s
syntax-rules macro system, as opposed to a more powerful macro system such as
syntax-case. Our pattern matcher implementation uses CPS macros to control the order in which macro expansion takes place, delaying expansion of a macro that generates variable bindings until after other macro expansions have happened. The ability to control the order in which our macros expand allows us to generate more concise code. However, we discovered a situation in which delayed expansion caused incorrect code to be generated, leaving variables unbound that should have been bound. The problem can be worked around either by using
syntax-case, or by using
syntax-rules but not (as much) CPS. Our paper on this project appeared at the 2009 Scheme Workshop, giving me an upper bound of 4 on my Erdős number.