Unit 24: Type Erasure

Learning Objectives

Students should

understand that generics are implemented with type erasure in Java.
understand that type information is not fully available during run-time when generics are used, and problems that this could cause.
be aware that arrays and generics don't mix well in Java.
know the terms reifiable type and heap pollution.

Implementing Generics

There are several ways one could implement generics in a programming language.

Code Specialization

For instance, in C#, every instantiation of a generic type causes new code to be generated for that instantiated type. For instance, instantiating Pair<S,T> into Pair<String,Integer> causes a new type to be generated during run-time. In C++ and in Rust, instantiating Pair<String,Integer> causes new code to be generated during compile-time. This approach is sometimes called code specialization.

The disadvantages of such approach are

codes are duplicated, so the resulting program is larger.
changing the client by adding more use cases requires recompilation of the entire generic Pair class as we may need to generate the specialized code for a new type.

Note, in the example below, the specialized codes are hypothethical as Java does not use code specialization technique. Additionally, the name of the class is improper as it contains the character #.

Template CodeString/IntegerSender/Receiver

class Pair<S, T> {
  private S first;
  pruvate T second;

  public Pair(S first, T second) {
    this.first = first;
    this.second = second;
  }
}

class Pair#String#Integer {
  private String first;
  pruvate Integer second;

  public Pair(String first, Integer second) {
    this.first = first;
    this.second = second;
  }
}

class Pair#Sender#Receiver {
  private Sender first;
  pruvate Receiver second;

  public Pair(Sender first, Receiver second) {
    this.first = first;
    this.second = second;
  }
}

Java takes a code sharing approach, instead of creating a new type for every instantiation, it chooses to erase the type parameters and type arguments during compilation (after type checking, of course). Thus, there is only one representation of the generic type in the generated code, representing all the instantiated generic types, regardless of the type arguments. The resulting compiled code does not to be recompiled when encountering new types!

Part of the reason to do this is for compatibility with the older version of Java. Java introduces generics only from version 5 onwards. Prior to version 5, one has to use Object to implement classes that are general enough to works on multiple types, similar to what we did with Pair here:

Pair v0.1
class Pair {
  private Object first;
  private Object second;

  public Pair(Object first, Object second) {
    this.first = first;
    this.second = second;
  }

  public Object getFirst() {
    return this.first;
  }

  public Object getSecond() {
    return this.second;
  }
}

The Java type erasure process transforms the code below into the code above. Press on the tab to see the comparison.

Generic CodeType Erased Code (aligned)

Pair v0.2
class Pair<S,T> {
  private S      first;
  private T      second;

  public Pair(S      first, T      second) {
    this.first = first;
    this.second = second;
  }

  public S      getFirst() {
    return this.first;
  }

  public T      getSecond() {
    return this.second;
  }
}

Pair v0.2 (Type Erased)
class Pair      {
  private Object first;
  private Object second;

  public Pair(Object first, Object second) {
    this.first = first;
    this.second = second;
  }

  public Object getFirst() {
    return this.first;
  }

  public Object getSecond() {
    return this.second;
  }
}

Note that each type parameter S and T are replaced with Object. If the type parameter is bounded, it is replaced by the bounds instead (e.g., If T extends GetAreable, then T is replaced with GetAreable).

Where a generic type is instantiated and used, the code

Integer i = new Pair<String,Integer>("hello", 4).getSecond();

is transformed into

Integer i = (Integer) new Pair("hello", 4).getSecond();

The generated code is similar to what we would write earlier, but this is generated by the compiler after type checking, it ensures that the casting will not lead to ClassCastException during run-time.

Type erasures have several important implications. We will explore some of them below, and a few others during recitation.

Generics and Arrays Can't Mix

Let's consider the hypothetical code below (that does not compile):

Generic CodeType Erased Code (aligned)

// create a new array of pairs
Pair<String,Integer>[] pairArray = new Pair<String,Integer>[2];

// pass around the array of pairs as an array of object
Object[] objArray = pairArray;

// put a pair into the array -- no ArrayStoreException!
objArray[0] = new Pair<Double,Boolean>(3.14, true);

// create a new array of pairs
Pair                [] pairArray = new Pair                [2];

// pass around the array of pairs as an array of object
Object[] objArray = pairArray;

// put a pair into the array -- no ArrayStoreException!
objArray[0] = new Pair                (3.14, true);

This is similar to what we have in Unit 21, where we showed we could get an ArrayStoreException due to Java arrays being covariant. We would not, however, get an exception when we try to put a pair of double and boolean, into an array meant to store a pair of string and integer! This type checking is done during run-time, and due to type erasure, the run-time has no information about what is the type arguments to Pair. The run-time sees:

Type Erased Code

// create a new array of pairs
Pair[] pairArray = new Pair[2];

// pass around the array of pairs as an array of object
Object[] objArray = pairArray;

// put a pair into the array -- no ArrayStoreException!
objArray[0] = new Pair(3.14, true);

It checks that we have an array of pairs and we are putting another pair inside. Everything checks out. This would have caused a heap pollution, a term that refers to the situation where a variable of a parameterized type refers to an object that is not of that parameterized type.

Heap pollution is dangerous, as now, we will get a ClassCastException when we do:

// getting back a string?  -- now we get ClassCastException
String str = pairArray[0].getFirst();

The example above shows why generics and arrays don't mix well together. An array is what is called reifiable type -- a type where full type information is available during run-time. It is because Java array is reifiable that the Java run-time can check what we store into the array matches the type of the array and throw an ArrayStoreException at us if there is a mismatch. Java generics, however, is not reifiable due to type erasure. Java designers have decided not to mix the two.

The hypothetical code above actually is not a valid Java syntax. We can't compile this line:

Pair<String,Integer>[] pairArray = new Pair<String,Integer>[2];

The following are illegal as well:

new Pair<S,T>[2];
new T[2];

On the other hand, given a generic type T, the following is allowed.

T[] array;

In summary, generic array declaration is fine but generic array instantiation is not.

Generic Type Rules

Before we add rules for determining type check as well as dynamic binding, we first need two additional terminologies¹: (i) class-level type parameters and (ii) method-level type parameters.

We can then state the type rules as follows.

Generic method signature includes type parameters.
- Two type parameters are considered the same if we can rename all type parameters into the same name. In other words, they are equal up to renaming.
Type check uses type argument for class-level type parameter if available.
- This allows for more type safety checks to be done but it may have counter-intuitive interaction with dynamic binding.
Method descriptor stored in dynamic binding during compile-time step is the type erased descriptor.
- This is because the code we are executing is the type erased version. Type erased code does not have type parameters anymore.
- The type parameters are available as additional information (i.e., metadata) that can be used for compilation but not used at run-time.

We can now apply these rules to some interesting cases. For simplicity of explanation, we will use the following types with the following subtyping relationships: T4 <: T3 <: T2 <: T1.

Method-Level Overriding

Consider the interface I without class-level type parameter. It has a single method-level type parameter which we will try to override.

Original Code

interface I {
  <T extends T1> int foo(T t);
}

Type Erased Code
1 2 3
interface I { int foo(T1 t); }

Since we use type parameter as part of method signature but allow renaming, we have the following correct overriding below.

PASS

class C implements I {
  @Override
  public <T extends T1> int foo(T t) {
    return 0;
  }
}

PASS

class C implements I {
  @Override
  public <S extends T1> int foo(S t) {
    return 0;
  }
}

Unfortunately, the following does not work because we either have different number of type parameters or different type parameter after renaming. Renaming is simply looking at the name and not at the bound of the type.

FAIL: Different Number

class C implements I {
  @Override
  public <T extends T1, S> int foo(T t) {
    return 0;
  }
}

FAIL: Not Renaming

class C<T extends T1> implements I {
  @Override
  public <S extends T> int foo(S t) {
    return 0;
  }
}

One exceptional case is when the type parameter is not present. In which case, for method-level type parameter, we use the type erased version. So the code below pass type checking.

PASS

class C implements I {
  @Override
  public int foo(T1 t) {
    return 0;
  }
}

PASS

class C<S> implements I {
  @Override
  public int foo(T1 t) {
    return 0;
  }
}

Unfortunately, if a method-level type parameter is present -- even if it is not used -- it no longer pass type check. As for class-level type parameter, it will only causes failure if used.

FAIL: Method-Level Type parameter

class C implements I {
  @Override
  public <S> int foo(T1 t) {
    return 0;
  }
}

FAIL: Class-Level Type parameter

class C<S extends T1> implements I {
  @Override
  public int foo(S t) {
    return 0;
  }
}

Class-Level Overriding

In the case of class-level type parameter, we need to activate the second rule too when determining overriding. To do that, we will consider a different interface I<T>.

Original Code

interface I<T> {
  int foo(T t);
}

Type Erased Code

interface I {
  int foo(Object t);
}

Since we use type argument when possible. This means implementing I<String> forces us to implement foo(String) instead of foo(Object).

PASS

class C implements I<String> {
  @Override
  public int foo(String t) {
    return 0;
  }
}

FAIL: Not Using Type Argument

class C implements I<String> {
  @Override
  public int foo(Object t) {
    return 0;
  }
}

On the other hand, if we are implemeting I<S>, we have two options. The code on the left has no compilation error because we simply replace T with S. The code on the right has no compilation error because we have no other information regarding S except that it is bounded by Object.

PASS

class C<S> implements I<S> {
  @Override
  public int foo(S t) {
    return 0;
  }
}

PASS

class C<S> implements I<S> {
  @Override
  public int foo(Object t) {
    return 0;
  }
}

We can test this hypothesis further by compiling the following code below. It compiles because we still have no other information about S but the bound is now T1.

Interface

interface I<T extends T1> {
  int foo(T t);
}

Class

class C<S extends T1> implements I<S> {
  @Override
  public int foo(T1 t) {
    return 0;
  }
}

Overloading

In the case of overloading, we need to ensure that the code has no ambiguity. The following code cannot even be compiled because the type erased version already has ambiguity.

Original Code

class C<T extends T3> {
  void foo(T t) { }
  <T extends T3> void foo(T t) { }
}

Type Erased Code

class C {
  void foo(T3 t) { }
  void foo(T3 t) { }
}

Now, let us look at it in relation to rule (3). This is because at run-time, we are executing the type erased code. But we also need to ensure that there is no ambiguity in which method to be selected. We will look at how rule (2) and (3) interacts at compilation of caller and not the callee. We assume that the following callee is already compiled.

Original Code

class C<T extends T1> {
  void foo(T t) { }
  <T extends T3> void foo(T t) { }
}

Type Erased Code

class C {
  void foo(T1 t) { }
  void foo(T3 t) { }
}

We should first note that the method signature after type erasure is different. So the class C can be compiled. However, this is insufficient to determine if the usage has no ambiguity.

Remember by rule (2), we need to consider the type argument at compilation. But here, we are compiling the caller. The code on the left has no ambiguity while the code on the right has ambiguity. The simplified method signature being checked are given below it.

PASS
1 2
C<T2> c = new C<T2>(); c.foo(new T3());
Simplified method signature checked using type signature.
- foo(T2)
- foo(T3)
FAIL
1 2
C<T3> c = new C<T3>(); c.foo(new T3());
Simplified method signature checked using type signature.
- foo(T3)
- foo(T3)

Assuming that all of these checks are ok, we then move on to the actual dynamic binding involving generic.

Dynamic Binding with Generic

Before continuing, we need to clarify rule (3). The simplified method signature above is only used for type checking. At the end, when we run the code, we are executing the type erased version. This means the method descriptor stored at compile-time step of dynamic binding is the type erased version.

How can we show that? We can use the following two related generic classes.

Superclass

class C<T extends T1> {
  void foo(T t) { }
  <T extends T3> void foo(T t) { }
}

Subclass

class D<T extends T1> extends C<T> {
  @Override void foo(T1 t) { }
  @Override void foo(T3 t) { }
}

We first check that the classes can compile. By the overriding rule, void D::foo(T1) overrides void C::foo(T) and void D::too(T3) overrides <T extends T3> void C::foo(T). Additionally, there is no ambiguity in the type signature after type erasure.

Let us consider the dynamic binding process in more details. We will use rule (2) before step (3) of compile-time step of dynamic binding. This will affect which methods are accessible, compatible, and most specific. Consider the code below and the simplified method signature created from our overloading explanation above.

Usage

C<T4> c = new D<T4>();
c.foo(new T4());

Signature

foo(T4) // from foo at line 2
foo(T3) // from foo at line 3

CTT of target	CTT of param	Accessible	Compatible	Most Specific	Method Descriptor		Method Executed
`C`	`T4`	`foo(T4)` `foo(T3)`	`foo(T4)` `foo(T3)`	`foo(T4)`	`void foo(T1)`		`void D::foo(T1)`

Now, although the most specific method is foo(T4), the method descriptor is the method descriptor corresponding to the method foo at line 2. This means, the method descriptor stored is void foo(T1).

This distinction is important because because when we execute the code, we will execute void D::foo(T1) since this method match the method descriptor we stored during compile-time step of dynamic binding.

We can look at three more usages to assure ourselves that our hypothesis correct. The explanation of which method executed is given as a similar table as above.

Usage

C<T4> c = new D<T4>();
c.foo(new T3());

Signature

foo(T4) // from foo at line 2
foo(T3) // from foo at line 3

CTT of target	CTT of param	Accessible	Compatible	Most Specific	Method Descriptor		Method Executed
`C`	`T3`	`foo(T4)` `foo(T3)`	`foo(T3)`	`foo(T3)`	`void foo(T3)`		`void D::foo(T3)`

Usage

C<T2> c = new D<T2>();
c.foo(new T2());

Signature

foo(T2) // from foo at line 2
foo(T3) // from foo at line 3

CTT of target	CTT of param	Accessible	Compatible	Most Specific	Method Descriptor		Method Executed
`C`	`T2`	`foo(T2)` `foo(T3)`	`foo(T2)`	`foo(T2)`	`void foo(T1)`		`void D::foo(T1)`

Usage

C<T2> c = new D<T2>();
c.foo(new T3());

Signature

foo(T2) // from foo at line 2
foo(T3) // from foo at line 3

CTT of target	CTT of param	Accessible	Compatible	Most Specific	Method Descriptor		Method Executed
`C`	`T3`	`foo(T2)` `foo(T3)`	`foo(T2)` `foo(T3)`	`foo(T3)`	`void foo(T3)`		`void D::foo(T3)`

You can test the hypothesis above by running the following code and check that the output matches our explanation.

class T1 {}
class T2 extends T1 {}
class T3 extends T2 {}
class T4 extends T3 {}

class C<T extends T1> {
  void foo(T t) { System.out.println(1); }
  <T extends T3> void foo(T t) { System.out.println(2); }
}

class D<T extends T1> extends C<T> {
  @Override
  void foo(T1 t) { System.out.println(3); }
  @Override
  void foo(T3 t) { System.out.println(4); }
}

class Main {
  public static void main(String[] args) {
    C<T4> c4 = new D<T4>();
    c4.foo(new T3());  // 4
    c4.foo(new T4());  // 3

    C<T2> c2 = new D<T2>();
    c2.foo(new T2());  // 3
    c2.foo(new T3());  // 4
    c2.foo(new T4());  // 4
  }
}

It is always useful to give names when explaining things. Unless we already have a name for a concept, you may use your own name as long as you also explained the meaning. ↩

Unit 24: Type Erasure

Implementing Generics

Code Specialization

Code Sharing

Generics and Arrays Can't Mix

Generic Type Rules

Method-Level Overriding

Class-Level Overriding

Overloading

Dynamic Binding with Generic