scvalex.net

7. Java Variance Pitfalls

Last week, I found out about a non-obvious pitfall of the Java language caused by the interaction of sub-classing and arrays. In short, Java arrays are covariant, so what I thought was illegal code compiles, and causes an exception at runtime. In this post, I give a quick intro to type variance, and describe the particular issue I encountered recently.

To start off, type variance is a way of describing what you can do with a type: whether you can use a more general (a superclass), or more specific (a subclass) type in place of it. Variance comes in three flavours: covariant, contravariant, and invariant.

A type is covariant if a more general type can be used in its place. For instance, function returns are covariant; so, if a function returns a Cat, you can use it as an Animal. More generally, if you have a particular object, you can use it whenever a superclass of it is required.

A type is contravariant if a more specific type can be used in its place. For instance, function arguments are contravariant; so, if a function is expecting an Animal argument, you can give it a Cat instead.

Finally, a type is invariant if nothing else can be used in its place.

Now, back to Java: arrays types are covariant, so the following will compile just fine:

public class Variance {
  static class Animal { }

  static class Dog extends Animal { }

  static class Cat extends Animal { }

  public static void main(String args[]) {
    Animal[] zoo = new Dog[10];
    zoo[0] = new Cat();
  }
}

Note that we’re casting a value of a specific type, Dog[], to a more general type, Animal[]. In other words, we’re using a more general type in place of a specific type, so the type is covariant.

Unfortunately, that means we’ve created a hidden requirement that elements of the zoo array must be of type Dog or a subclass of Dog. So, the final line fails at runtime with an ArrayStoreException. Ultimately, the problem is that every individual line in the above program makes sense from a typing standpoint, but when taken together, they no longer make sense.

Other languages such as C#, and C++, also have this issue.

As far as I can tell, the best solution right now is to use templates in C++, and generics in Java. Both of these make the types invariant, and a program equivalent to the above would cause a compile-time exception.

More modern languages have taken a more flexible approach to the issue: Scala takes a page out of OCaml’s book, and has variance annotations; these are a way of relaxing the invariance in certain cases where we know it won’t cause problems. C# 4.0 also seems to have taken this route (but I don’t know enough about the language to judge).