Unit 24: Type Erasure

Learning Objectives

After completing this unit, students should be able to:

Explain how Java implements generics using type erasure and contrast it with code specialization.
Reason about what type information is (and is not) available at run time when generics are used.
Explain why generic types are not reifiable, and how this affects runtime checks.
Identify and explain heap pollution, including how it can lead to ClassCastException.
Explain why arrays and generics do not mix, and distinguish between generic array declaration and instantiation.

Overview

In earlier units, we introduced generics as a way to write reusable, type-safe code without sacrificing expressiveness. An important question remains: how are generics actually implemented in Java?

In this unit, we uncover the answer by studying type erasure, Java’s design choice for implementing generics. While type erasure preserves backward compatibility and enables code sharing, it also means that generic type information is largely unavailable at run time. This design has far-reaching consequences, including surprising interactions with arrays, the possibility of heap pollution, and limitations on what the compiler and runtime can check.

Understanding type erasure helps explain many “why can’t Java do this?” questions about generics and equips you to reason more precisely about type safety across compile time and run time.

Implementing Generics

There are several ways one could implement generics in a programming language.

For instance, in C#, every instantiation of a generic type causes new code to be generated for that instantiated type. Instantiating Pair<S,T> into Pair<String,Integer> causes a new type to be generated during runtime. In C++ and in Rust, instantiating Pair<String,Integer> causes new code to be generated during compile-time. This approach is sometimes called code specialization, in contrast to Java's code sharing approach.

In Java, instead of creating a new type for every instantiation, it chooses to erase the type parameters and type arguments during compilation (after type checking, of course). Thus, there is only one representation of the generic type in the generated code, representing all the instantiated generic types, regardless of the type arguments.

Part of the reason to do this is for compatibility with the older version of Java. Java introduces generics only from version 5 onwards. Prior to version 5, one has to use Object to implement classes that are general enough to work on multiple types, similar to what we did with Pair here:

Pair v0.1 with Object
class Pair {
  private Object first;
  private Object second;

  public Pair(Object first, Object second) {
    this.first = first;
    this.second = second;
  }

  public Object getFirst() {
    return this.first;
  }

  public Object getSecond() {
    return this.second;
  }
}

The Java type erasure process transforms:

Pair v0.2 with Generics
class Pair<S,T> {
  private S first;
  private T second;

  public Pair(S first, T second) {
    this.first = first;
    this.second = second;
  }

  public S getFirst() {
    return this.first;
  }

  public T getSecond() {
    return this.second;
  }
}

to the version above. Note that each type parameter S and T are replaced with Object. If the type parameter is bounded, it is replaced by the bounds instead (e.g., If T extends GetAreable, then T is replaced with GetAreable).

Where a generic type is instantiated and used, the code

Integer i = new Pair<String,Integer>("hello", 4).getSecond();

is transformed into

Integer i = (Integer) new Pair("hello", 4).getSecond();

The generated code is similar to what we would write earlier, but this is generated by the compiler after type checking, only where it has proven that the casting is correct and will not lead to ClassCastException during runtime. Thus, the type safety is preserved.

Type erasure has several important implications. We will explore some of them below, and a few others during recitation.

Overloading Based on Type Arguments

Suppose we have the following class. The intention is to overload the method foo to accept either a pair of strings or a pair of integers. This seems reasonable at the source level:

class A {
  void foo(Pair<String, String> p) {
      // body omitted
  }

  void foo(Pair<Integer, Integer> p) {
      // body omitted
  }
}

After type erasure, both methods have the same signature:

class A {
  void foo(Pair p) {
      // body omitted
  }

  void foo(Pair p) {
      // body omitted
  }
}

and thus, the compiler will complain about duplicate methods. This shows that method overloading based on different type arguments is not allowed in Java generics.

Using Type Parameters in Static Contexts

Consider the following generic class:

Instance Field with Type Parameter
class B<T> { 
  private T x;

  public void setX(T x) { 
    this.x = x; 
  }

  public T getX() { 
    return this.x; 
  }
}

which can be used in the following way:

B<Integer> bInteger = new B<Integer>();
bInteger.setX(67);
Integer i = bInteger.getX();

If the caller tries to call

String s = bInteger.getX();

the compiler will report an error about type mismatch. The type checking is done at compile time, before type erasure happens. Crucially, the type argument Integer to B<Integer> is used to type check the call to getX().

What if x is a static variable? Consider the following code:

Class Field with Type Parameter
class C<T> { 
  private static T x;

  public static void setX(T x) { 
    C.x = x; 
  }

  public static T getX() { 
    return C.x; 
  }
}

We can access class fields and class methods using only the class name, without creating an instance.

C.setX(67);
String s = C.getX();

There is no type argument to C here, so the compiler is unable to type check the call to getX(). To avoid this, Java does not allow using the type parameter T in a static context. There is only one copy of the static variable x shared across all instantiations of C<T>, regardless of what type argument is used. Thus, it does not make sense to have a static variable or static method that depends on the type parameter T of the class.

As a result, it is necessary for a generic class method to declare its own type parameters. For example:

Static Generic Method
class D<T> {
  public static <T> T foo(T x) {
    // body omitted
  }
}

Note that the type parameter T of the static method foo is independent of the type parameter T of the class D.

Generics and Arrays Can't Mix

Let's consider the hypothetical code below:

// create a new array of pairs
Pair<String,Integer>[] pairArray = new Pair<String,Integer>[2];

// pass around the array of pairs as an array of object
Object[] objArray = pairArray;

// put a pair into the array -- no ArrayStoreException!
objArray[0] = new Pair<Double,Boolean>(3.14, true);

This is similar to what we have in Unit 21, where we showed we could get an ArrayStoreException due to Java arrays being covariant. We would not, however, get an exception when we try to put a pair of double and boolean, into an array meant to store a pair of string and integer! This type checking is done during runtime, and due to type erasure, the runtime has no information about what is the type arguments to Pair. The runtime sees:

// create a new array of pairs
Pair[] pairArray = new Pair[2];

// pass around the array of pairs as an array of object
Object[] objArray = pairArray;

// put a pair into the array -- no ArrayStoreException!
objArray[0] = new Pair(3.14, true);

It checks that we have an array of pairs and we are putting another pair inside. Everything checks out. This would have caused a heap pollution, a term that refers to the situation where a variable of a parameterized type refers to an object that is not of that parameterized type.

Heap pollution is dangerous, as now, we will get a ClassCastException when we do:

// getting back a string?  -- now we get ClassCastException
String str = pairArray[0].getFirst();

The example above shows why generics and arrays don't mix well together. An array is what is called reifiable type — a type where full type information is available during runtime. It is because the Java array is reifiable that the Java runtime can check what we store into the array matches the type of the array and throw an ArrayStoreException at us if there is a mismatch. Java generics, however, is not reifiable due to type erasure. Java designers have decided not to mix the two.

The hypothetical code above actually is not a valid Java syntax. We can't compile this line:

Pair<String,Integer>[] pairArray = new Pair<String,Integer>[2];

The following is illegal as well:

new Pair<S,T>[2];
new T[2];

However, given a generic type T, the following is allowed:

T[] array;

In summary, generic array declaration is fine but generic array instantiation is not.

Bridge Methods

Type erasure has an important consequence when inheritance and method overriding interact with generics. To preserve polymorphism after erasure, the Java compiler sometimes generates additional methods known as bridging methods (or bridge methods).

Consider the following example:

Generic Superclass
class Box<T> {
  void get(T t) {
    // body omitted  
  }
}

class StringBox extends Box<String> {
  @Override
  void get(String s) {
    // body omitted
  }
}

At the source level, StringBox::get(String) clearly overrides Box::get(T).

After erasure, the compiler conceptually sees:

After erasure
class Box {
  void get(Object o) {
    // body omitted
  }
}

class StringBox extends Box {
  void get(String s) {
    // body omitted
  }
}

The two methods now have different descriptors: void get(Object) and void get(String), breaking polymorphism.

In such situation, where (i) a class is a subtype of a parameterized type and (ii) type erasure changes the signature of any inherited method, the Java compiler automatically generates a bridging method.

In the case of StringBox,

class StringBox extends Box {
  // compiler-generated bridge method
  void get(Object s) {
    this.get((String) s);   
  }

  // programmer-defined method
  void get(String s) {
    // body omitted
  }
}

The bridge method get(Object) simply casts its argument to String and delegates to the programmer-defined get(String). This synthetically generated method preserves the semantic of polymorphism,