Last May, I wrote about an article by Radu Grigore that claimed that “all Java type checkers have bugs”, and what that claim might have meant. Later, Grigore expanded his original short article into a fulllength paper that appeared at POPL 2017 a few weeks ago. The appearance of “Java Generics are Turing Complete” at POPL made me go back and reconsider what I wrote last May, given the new version of the paper. It turns out that the new version revises the sentence I quoted at the beginning of my May 2016 post from
For Java, Theorem 1 implies that there cannot exist a formally verified type checker: All Java type checkers have bugs.
to
For Java, Theorem 1 implies that a formally verified type checker that guarantees partial correctness cannot also guarantee termination.
So now we know, from the horse’s mouth, what “all Java type checkers have bugs” means, or at least what one person means when they say that. But that’s not really what this blog post is about; this post is about how when I looked back at what I had written last May, I made two embarrassing realizations. First, I realized that Radu Grigore and Grigore Rosu are not the same person. Second, although I had stated categorically in my post that a Turingcomplete type system makes type checking undecidable, I was forced to confront the fact that I didn’t really know why that was the case.
Why does having a Turingcomplete type system make type checking undecidable? My original, fuzzy line of reasoning had gone something like this: If a type system is Turingcomplete, that means you can write arbitrary programs within types. Therefore, type checking is more or less interpretation of a Turingcomplete language, and we all know that interpreters of Turingcomplete languages don’t necessarily halt — and so type checking is undecidable. This line of reasoning doesn’t hold up, though, because the type checker wouldn’t necessarily need to run all the computations encoded in types.
After I’d flailed around for some time and complained about my confusion to a bunch of people, my friend Kamal Marhubi finally offered an explanation that made sense to me, which I’ll try to reconstruct here. First of all, since we’re in a setting where the language of types in language X is Turingcomplete, we can write a languageX program like this (positing some syntax):
1


Here, (if terminates(TM) then unit else bool)
is a type annotation on x
, and TM
is some representation of a Turing machine. If TM
terminates, then our program is welltyped; if not, then the program is illtyped. (We’re also making a few assumptions here, such as the fact that we have bool
and unit
as base types. This is without loss of generality, because if we have a Turingcomplete type system, surely we can encode two different types that the type checker can distinguish between.)
Maybe you see where this is going, but let’s spell it out. We want to show that typechecking language X is undecidable. We’ll start by assuming that typechecking language X is decidable, and then show that that leads to a contradiction.
If typechecking language X is decidable, that means that you can write a predicate is_well_typed
that takes a program in language X as its argument and returns True
or False
. And if you can do that, then you could write a program to check whether an arbitrary Turing machine terminates. It would look something like this (in pseudoPython):
1 2 3 4 5 

So we have a decision procedure for whether an arbitrary Turing machine halts. But we know that that’s impossible, because of the halting problem — and so our assumption that typechecking is decidable must be incorrect!
In retrospect, Kamal’s explanation is simple. I think the main reason I spent so much time flailing around is that I got hung up for a while on something someone had suggested to me about how a Turing machine needs to produce output, and that therefore a Turing machine encoded in a type would need to produce its output as the output of the type checker, because there wouldn’t be any other place for that output to go.
The question of whether a Turing machine’s output needs to be observable somewhere is interesting, but I don’t buy that the output of a Turing machine encoded in a type would have to be somehow reflected in the result of type checking. I think it depends whether the type checker is actually a part of the model of computation that we’re saying is Turingcomplete. If we’re talking about, say, a little lambda calculus embedded in the types of language X — something that is Turingcomplete regardless of the presence or absence of any type checker for language X — then there’s no particular reason why such a type checker would have to cough up any information at all about the result of evaluating expressions written in that little lambda calculus. So, I like Kamal’s explanation because it avoids any silliness with the type checker having to be part of the Turingcomplete model of computation.