Research projects

This page lists projects that I’ve contributed to during my research career, along with the dates of my major involvement.

Parallelizing Julia with a non-invasive DSL

Computational scientists often prototype their software using a “productivity” language such as MATLAB, Python, R, or Julia. When it comes time to scale up such prototypes to run on a high-performance computing system, though, the code typically has to be rewritten in an “efficiency” language like C++. This rewriting process takes considerable time and expense and often results in brittle code, with which it is harder to experiment. Domain-specific languages (DSLs) are one well-regarded solution to the efficiency/productivity trade-off, but it is difficult to design a DSL that can anticipate any use case that the programmer will eventually encounter.

With my colleagues at Intel Labs, I’m working on ParallelAccelerator, a compiler for the Julia programming language that attempts to bridge this productivity-efficiency gap in scientific computing. ParallelAccelerator discovers and exploits the implicit parallelism in source programs that use parallel programming patterns. For the most part, these patterns are already present in standard Julia, so programmers can use ParallelAccelerator to compile Julia programs with only minor changes to source code. Our results show that ParallelAccelerator enables orders-of-magnitude speedups over plain Julia on a variety of scientific computing workloads, without requiring the programmer to extensively rewrite code or add lots of type annotations to guide the compiler.

I gave a talk about ParallelAccelerator at SPLASH 2015, and our open-source implementation of it is available as the ParallelAccelerator.jl Julia library. My March 2016 guest post on the Julia blog has examples of how to use ParallelAccelerator, performance results, and some discussion of package internals. I followed up the blog post with a talk at JuliaCon 2016.

LVars: lattice-based data structures for deterministic parallel and distributed programming

Parallel programming is notoriously difficult. Programs can yield inconsistent answers, or even crash, due to unpredictable interactions between parallel tasks. But it doesn’t have to be this way: deterministic-by-construction programming models offer programmers the promise of freedom from subtle, hard-to-reproduce nondeterministic bugs in parallel code. To make that promise a reality, though, deterministic-by-construction parallel programming models must sharply restrict sharing of state between parallel tasks, usually either by disallowing sharing entirely or by restricting it to one type of data structure, such as single-assignment locations.

A common theme that emerges in the study of such deterministic-by-construction programming models — from venerable ones like Kahn process networks, to modern single-assignment-based models like Haskell’s monad-par library and Intel’s Concurrent Collections system — is that the determinism of the system hinges on some notion of monotonicity. By taking monotonicity as a starting point, then, we can generalize and unify existing models. For my dissertation research with Ryan Newton, I developed LVars, a new model for deterministic-by-construction parallel programming that generalizes existing single-assignment models to allow multiple assignments that are monotonically increasing with respect to a specified lattice. LVars maintain determinism by allowing only monotonic writes and “threshold” reads to and from shared data.

Finally, LVar-style threshold reads can also apply to the setting of convergent replicated data types (CvRDTs), which specify the behavior of eventually consistent replicated objects in a distributed system. In recent work, I’ve extended the CvRDT model to support deterministic, strongly consistent threshold queries. The technique generalizes to any lattice, and hence any CvRDT, and allows deterministic observations to be made of replicated objects before the replicas’ states converge.

For more on LVars, aside from the posts in the “LVars” category here on this blog, see:

River Trail: parallel programming in JavaScript

In 2014, I joined Intel Labs and began contributing to River Trail, a library, JIT compiler, and Firefox extension to enable data-parallel programming in JavaScript. The River Trail library provides a ParallelArray array type and a variety of potentially-parallel operations on it, like map, reduce, and filter; it works in conjunction with the Firefox extension, which enables programs written using the library to run in parallel, via OpenCL. Together with my colleagues at Intel, I rewrote the River Trail Firefox extension in JavaScript, replacing the old binary XPCOM extension. This was my first major JavaScript project, and I learned a lot about browser extensions.

For more on River Trail, see the tutorial, as well as the OOPSLA 2013 paper by Stephan Herhut, Richard Hudson, Tatiana Shpeisman, and Jaswanth Sreeram.

Various contributions to the Rust programming language

Rust is a new systems programming language being developed by Mozilla Research with the goals of safety, concurrency, and speed. During the summers of 2011 and 2012, I worked on the Rust team at Mozilla, making various contributions to the self-hosted compiler. Rust remains under active development by a team of paid and volunteer contributors; follow the progress on GitHub!

During summer 2011, I added self-dispatch, object extension, and method overriding to the prototype-based object system that Rust had at the time. Prototype-based object systems are usually only found in dynamic languages such as Self and JavaScript, and adding these features to Rust posed an unusual challenge, since dynamic object extension and method overriding cause the type of the current object to change at run-time. I gave a talk about this project to my research group in September 2011. (Although the prototype-based object system was scrapped soon afterward, a vestige of my work lives on: the self keyword in Rust!)

During summer 2012, I made various contributions to Rust, including implementing integer-literal suffix inference and some work on extending the Rust trait system to support Haskell-like default methods. I gave a talk about the latter project at Mozilla in August 2012. (When I started this work, traits were still called ifaces; I changed the keyword to trait!)

Parametric polymorphism through run-time sealing

Languages with parametric polymorphism provide the compile-time guarantee that programs behave uniformly regardless of the types they are instantiated with. In such languages, this parametricity guarantee forms the basis of data abstraction and makes possible Wadler’s “theorems for free!” Ordinarily, embedding untyped code fragments in a statically typed language would break parametricity. However, it’s possible to preserve parametricity in such a setting by the use of dynamic sealing, which enforces data abstraction by using uniquely generated keys to seal values that should be kept opaque.

I worked with Amal Ahmed and Jacob Matthews on an update to their previous work on proving parametricity for a multi-language system that combines System F with a Scheme-like language. Our updated proof employs a cross-language, step-indexed logical relation that uses possible worlds to model the semantics of key generation. Using this possible-worlds semantic model, we can show two results: first, that System F’s parametricity guarantee is preserved when interoperating with Scheme, and, second, that parametricity is preserved in a Scheme-only setting that interacts with System F only to implement a polymorphic contract system. I gave a talk about this project at the Northeastern University Programming Languages Seminar in February 2011.

Testing abstract representations of CPU semantics

Junghee Lim and Thomas Reps’ Transformer Specification Language (TSL) system comprises a domain-specific programming language for specifying the semantics of a processor at the level of the instruction set architecture, together with a run-time system for generating static analysis engines for executables compiled for that instruction set. TSL’s ability to generate static analysis engines from ISA specifications makes it a powerful tool for analyzing executables, but it’s of limited value if the ISA specifications are wrong in the first place. We are therefore interested in verifying the correctness of TSL’s ISA specifications.

During summer 2010, I worked with David Melski and the research team at GrammaTech and implemented a prototype system for testing TSL specifications. Given a CPU emulator generated from a TSL spec, our system attaches the emulator to a Linux process running on a physical CPU, initializes the emulator’s state from that of the process, and compares the states of the emulated and physical CPUs after the execution of each instruction. If the states differ, the specification from which the emulator was generated was probably wrong. Our emulator testing framework is flexible enough that it could be extended to support testing of abstract emulators that have an abstract interpreter at their core, in addition to the concrete CPU emulators it supports now. This work was done as part of the Safety in Numbers research project and appears in the project report; I also gave a talk about my work on the project to my research group in October 2010.

A pattern matcher for miniKanren

miniKanren is a declarative logic programming system implemented in a pure functional subset of Scheme. In 2009, I worked with Andy Keep, Michael Adams, Will Byrd, and Dan Friedman on implementing a pattern matcher for miniKanren. Our self-imposed challenge was to implement the pattern matcher using Scheme’s syntax-rules macro system, as opposed to a more powerful macro system such as syntax-case. Our pattern matcher implementation uses CPS macros to control the order in which macro expansion takes place, delaying expansion of a macro that generates variable bindings until after other macro expansions have happened. The ability to control the order in which our macros expand allows us to generate more concise code. However, we discovered a situation in which delayed expansion caused incorrect code to be generated, leaving variables unbound that should have been bound. The problem can be worked around either by using syntax-case, or by using syntax-rules but not (as much) CPS. Our paper on this project appeared at the 2009 Scheme Workshop, giving me an upper bound of 4 on my Erdős number.