Skip to content

Unit 24: Type Erasure

Learning Objectives

Students should

  • understand that generics are implemented with type erasure in Java.
  • understand that type information is not fully available during run-time when generics are used, and problems that this could cause.
  • be aware that arrays and generics don't mix well in Java.
  • know the terms reifiable type and heap pollution.

Implementing Generics

There are several ways one could implement generics in a programming language.

Code Specialization

For instance, in C#, every instantiation of a generic type causes new code to be generated for that instantiated type. For instance, instantiating Pair<S,T> into Pair<String,Integer> causes a new type to be generated during run-time. In C++ and in Rust, instantiating Pair<String,Integer> causes new code to be generated during compile-time. This approach is sometimes called code specialization.

The disadvantages of such approach are

  • codes are duplicated, so the resulting program is larger.
  • changing the client by adding more use cases requires .recompilation of the entire generic Pair class as we may need to generate the specialized code for a new type.

Note, in the example below, the specialized codes are hypothethcal as Java does not use code specialization technique. Additionally, th ename of the class is improper as it contains the character #.

1
2
3
4
5
6
7
8
9
class Pair<S, T> {
  private S first;
  pruvate T second;

  public Pair(S first, T second) {
    this.first = first;
    this.second = second;
  }
}
1
2
3
4
5
6
7
8
9
class Pair#String#Integer {
  private String first;
  pruvate Integer second;

  public Pair(String first, Integer second) {
    this.first = first;
    this.second = second;
  }
}
1
2
3
4
5
6
7
8
9
class Pair#Sender#Receiver {
  private Sender first;
  pruvate Receiver second;

  public Pair(Sender first, Receiver second) {
    this.first = first;
    this.second = second;
  }
}

Code Sharing

Java takes a code sharing approach, instead of creating a new type for every instantiation, it chooses to erase the type parameters and type arguments during compilation (after type checking, of course). Thus, there is only one representation of the generic type in the generated code, representing all the instantiated generic types, regardless of the type arguments. The resulting compiled code does not to be recompiled when encountering new types!

Part of the reason to do this is for compatibility with the older version of Java. Java introduces generics only from version 5 onwards. Prior to version 5, one has to use Object to implement classes that are general enough to works on multiple types, similar to what we did with Pair here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Pair {
  private Object first;
  private Object second;

  public Pair(Object first, Object second) {
    this.first = first;
    this.second = second;
  }

  public Object getFirst() {
    return this.first;
  }

  public Object getSecond() {
    return this.second;
  }
}

The Java type erasure process transforms the code below into the code above. Press on the tab to see the comparison.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Pair<S,T> {
  private S      first;
  private T      second;

  public Pair(S      first, T      second) {
    this.first = first;
    this.second = second;
  }

  public S      getFirst() {
    return this.first;
  }

  public T      getSecond() {
    return this.second;
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Pair      {
  private Object first;
  private Object second;

  public Pair(Object first, Object second) {
    this.first = first;
    this.second = second;
  }

  public Object getFirst() {
    return this.first;
  }

  public Object getSecond() {
    return this.second;
  }
}

Note that each type parameter S and T are replaced with Object. If the type parameter is bounded, it is replaced by the bounds instead (e.g., If T extends GetAreable, then T is replaced with GetAreable).

Where a generic type is instantiated and used, the code

1
Integer i = new Pair<String,Integer>("hello", 4).getSecond();

is transformed into

1
Integer i = (Integer) new Pair("hello", 4).getSecond();

The generated code is similar to what we would write earlier, but this is generated by the compiler after type checking, it ensures that the casting will not lead to ClassCastException during run-time.

Type erasures have several important implications. We will explore some of them below, and a few others during recitation.

Generics and Arrays Can't Mix

Let's consider the hypothetical code below (that does not compile):

1
2
3
4
5
6
7
8
// create a new array of pairs
Pair<String,Integer>[] pairArray = new Pair<String,Integer>[2];

// pass around the array of pairs as an array of object
Object[] objArray = pairArray;

// put a pair into the array -- no ArrayStoreException!
objArray[0] = new Pair<Double,Boolean>(3.14, true);
1
2
3
4
5
6
7
8
// create a new array of pairs
Pair                [] pairArray = new Pair                [2];

// pass around the array of pairs as an array of object
Object[] objArray = pairArray;

// put a pair into the array -- no ArrayStoreException!
objArray[0] = new Pair                (3.14, true);

This is similar to what we have in Unit 21, where we showed we could get an ArrayStoreException due to Java arrays being covariant. We would not, however, get an exception when we try to put a pair of double and boolean, into an array meant to store a pair of string and integer! This type checking is done during run-time, and due to type erasure, the run-time has no information about what is the type arguments to Pair. The run-time sees:

1
2
3
4
5
6
7
8
// create a new array of pairs
Pair[] pairArray = new Pair[2];

// pass around the array of pairs as an array of object
Object[] objArray = pairArray;

// put a pair into the array -- no ArrayStoreException!
objArray[0] = new Pair(3.14, true);

It checks that we have an array of pairs and we are putting another pair inside. Everything checks out. This would have caused a heap pollution, a term that refers to the situation where a variable of a parameterized type refers to an object that is not of that parameterized type.

Heap pollution is dangerous, as now, we will get a ClassCastException when we do:

1
2
// getting back a string?  -- now we get ClassCastException
String str = pairArray[0].getFirst();

The example above shows why generics and arrays don't mix well together. An array is what is called reifiable type -- a type where full type information is available during run-time. It is because Java array is reifiable that the Java run-time can check what we store into the array matches the type of the array and throw an ArrayStoreException at us if there is a mismatch. Java generics, however, is not reifiable due to type erasure. Java designers have decided not to mix the two.

The hypothetical code above actually is not a valid Java syntax. We can't compile this line:

1
Pair<String,Integer>[] pairArray = new Pair<String,Integer>[2];

The following are illegal as well:

1
2
new Pair<S,T>[2];
new T[2];

On the other hand, given a generic type T, the following is allowed.

1
T[] array;

So generic array declaration is fine but generic array instantiation is not.