Why does a Turing-complete type system make type checking undecidable?

Last May, I wrote about an article by Radu Grigore that claimed that “all Java type checkers have bugs”, and what that claim might have meant. Later, Grigore expanded his original short article into a full-length paper that appeared at POPL 2017 a few weeks ago. The appearance of “Java Generics are Turing Complete” at POPL made me go back and reconsider what I wrote last May, given the new version of the paper. It turns out that the new version revises the sentence I quoted at the beginning of my May 2016 post from

For Java, Theorem 1 implies that there cannot exist a formally verified type checker: All Java type checkers have bugs.


For Java, Theorem 1 implies that a formally verified type checker that guarantees partial correctness cannot also guarantee termination.

So now we know, from the horse’s mouth, what “all Java type checkers have bugs” means, or at least what one person means when they say that. But that’s not really what this blog post is about; this post is about how when I looked back at what I had written last May, I made two embarrassing realizations. First, I realized that Radu Grigore and Grigore Rosu are not the same person. Second, although I had stated categorically in my post that a Turing-complete type system makes type checking undecidable, I was forced to confront the fact that I didn’t really know why that was the case.

Why does having a Turing-complete type system make type checking undecidable? My original, fuzzy line of reasoning had gone something like this: If a type system is Turing-complete, that means you can write arbitrary programs within types. Therefore, type checking is more or less interpretation of a Turing-complete language, and we all know that interpreters of Turing-complete languages don’t necessarily halt — and so type checking is undecidable. This line of reasoning doesn’t hold up, though, because the type checker wouldn’t necessarily need to run all the computations encoded in types.

After I’d flailed around for some time and complained about my confusion to a bunch of people, my friend Kamal Marhubi finally offered an explanation that made sense to me, which I’ll try to reconstruct here. First of all, since we’re in a setting where the language of types in language X is Turing-complete, we can write a language-X program like this (positing some syntax):

let x: (if terminates(TM) then unit else bool) = unit;

Here, (if terminates(TM) then unit else bool) is a type annotation on x, and TM is some representation of a Turing machine. If TM terminates, then our program is well-typed; if not, then the program is ill-typed. (We’re also making a few assumptions here, such as the fact that we have bool and unit as base types. This is without loss of generality, because if we have a Turing-complete type system, surely we can encode two different types that the type checker can distinguish between.)

Maybe you see where this is going, but let’s spell it out. We want to show that type-checking language X is undecidable. We’ll start by assuming that type-checking language X is decidable, and then show that that leads to a contradiction.

If type-checking language X is decidable, that means that you can write a predicate is_well_typed that takes a program in language X as its argument and returns True or False. And if you can do that, then you could write a program to check whether an arbitrary Turing machine terminates. It would look something like this (in pseudo-Python):

def halts(TM):
    # some_representation(TM) is the type "if terminates(TM) then unit else bool"
    type = some_representation(TM)
    program = "let x: (%s) = unit;" % type
    return is_well_typed(program)

So we have a decision procedure for whether an arbitrary Turing machine halts. But we know that that’s impossible, because of the halting problem — and so our assumption that type-checking is decidable must be incorrect!

In retrospect, Kamal’s explanation is simple. I think the main reason I spent so much time flailing around is that I got hung up for a while on something someone had suggested to me about how a Turing machine needs to produce output, and that therefore a Turing machine encoded in a type would need to produce its output as the output of the type checker, because there wouldn’t be any other place for that output to go.

The question of whether a Turing machine’s output needs to be observable somewhere is interesting, but I don’t buy that the output of a Turing machine encoded in a type would have to be somehow reflected in the result of type checking. I think it depends whether the type checker is actually a part of the model of computation that we’re saying is Turing-complete. If we’re talking about, say, a little lambda calculus embedded in the types of language X — something that is Turing-complete regardless of the presence or absence of any type checker for language X — then there’s no particular reason why such a type checker would have to cough up any information at all about the result of evaluating expressions written in that little lambda calculus. So, I like Kamal’s explanation because it avoids any silliness with the type checker having to be part of the Turing-complete model of computation.