Unit 10: Heap and Stack
Learning Objectives
Students should
- understand when memory are allocated/deallocated from the heap vs. from the stack.
- understand the concept of call stack in JVM.
Heap and Stack
The Java Virtual Machine (JVM) manages the memory of Java programs while its bytecode instructions are interpreted and executed. Different JVM implementations may implement these differently, but typically a JVM implementation partitions the memory into several regions, including:
- method area for storing the code for the methods;
- metaspace for storing meta information about classes;
- heap for storing dynamically allocated objects;
- stack for local variables and call frames.
Since the concepts of heap and stack are common to all execution environments (either based on bytecode or machine code), we will focus on them here.
The heap is the region in memory where all objects are allocated in and stored, while the stack is the region where all variables (including primitive types and object references) are allocated in and stored.
Stack
The stack contains variables. Please note that instance and class fields are not variables. As such, fields are not in the stack.
Also recap that the same variable names can exist in the program as long as they are in different methods. This means that the variables are contained within the call frames. Call frames are created when we invoke a method and removed when the method finished.
Like a "stack of books" where we can only take the book at the top and can only put more books at the top, the call frames in the stack can only be added or removed from the top. This behavior is also called Last-In First-Out (LIFO). In other words, the last element that is inserted (i.e., Last-In) is the first element to be removed (i.e., First-Out).
Heap
The heap stored dynamically allocated objects. To put it simply, whenever you use the keyword new
, a new object is created in the heap.
Unlike stack, there is no concept of LIFO. So, an object can persists across multiple method invocation. This also means that an object can be shared between multiple method invocation.
An object in the heap contains the following information:
- Class name.
- Instance fields and the respective values.
- Captured variables.
The last one will be used in the future.
Examples
Constructor
Considers the following two lines of code.
1 2 |
|
Line 1 declares a variable p
. When the JVM executes this line of code, it allocates some memory space for an object reference for p
, the content is currently uninitialized. We show uninitialized variables with the content having the symbol ∅. Since p
is a variable, it resides in the stack.
Line 2 creates a new Point
object. When the JVM executes this line of code, it (i) allocates some memory space for a Point
object on the heap, (ii) invokes the constructor, and (iii) returns the reference to the newly allocated memory space back. The returned memory address of this memory space becomes the reference of the object and is assigned to the variable p
.
This is shown in the figures below in 3 steps. Note that we assume that the code snippet above is in the static method called main
. Although technically there should be a parameter in the call frame of main
usually called args
due to the typical main method public static void main(String[] args)
, we will often omit this because the name and values are unknown.
Notice the crucial difference between the static method main
and the constructor. Static method does not have the keyword this
. On the other hand, non-static methods including constructor has the keyword this
.
Although we mentioned that this
is a keyword, it behaves mostly like a variable1. As such, we have its representation in the stack. Further note that the parameters are ordered with the leftmost parameter appears at the bottom of the call frame after the keyword this
(if any).
Note that we use the symbol ∅ to indicate that the variable is not yet initialized. Java differentiate between uninitialized variables and variables initialized to null
. Uninitialized variables cannot be used. Further note that uninitialized fields have default values but not uninitialized variables.
Also, we will often simplify the presentation. First, we will omit the memory address (e.g., 9048ab50). The arrow from the variable p
containing the value 9048ab50 to an object located at 9048ab50 is already an abstraction of this. Furthermore, we do not know where the actual address will be and it will be different on different run. So, we can omit both memory addresses stored in the variable and of the object.
Secondly, we are often interested only in the snapshot of the stack and heap diagram at a particular moment. As such, the intermediate call frames (e.g., Point constructor) that are inserted and then removed can be omitted. Only the final effect matters.
Let us illustrate this further with the following code snippet.
1 2 |
|
In this case, the new Point(1, 2)
is evaluated first to create an object in the heap. Then, we evaluate new Circle(.., 3)
. The reference to this object is then assigned to the variable c
. The final effect is shown below. Note that the field c
in the class Circle
is an arrow to the point object in the heap.
Aliasing
Now, let's look at an example of aliasing in the stack and heap diagram with the following example.
1 2 3 4 5 6 7 |
|
In this example, we have three variables, c
, center
, and radius
. Lines 1-3 declare the variables, and as a result, we have three variables allocated on the stack. Again, we assume that the code is in the static method main
. Do note the order of these variables in the stack. Since we declared c
first, it is located at the bottom of the stack.
Recall that for object references, they are initialized to null
. Primitive type variables (e.g., radius
) are initialized to 0.0 because it is of type double
. If it is an int
, then it will be initialized to 0 instead,
There is a clear example of aliasing here. Note that the field c
of variable c
is referencing the same object as the variable center
. Hence, we can say that the expression c.c
is an alias of center
. In the stack and heap diagram, this is illustrated by having two different arrows pointing to the same location.
In this case, the expression c.c
consists of two arrows. The first is from variable c
to the object Circle
. The second is from the field c
to the object Point
. On the other hand, the variable center
is pointing directly to the object Point
.
We can also see that after Line 7, although the changes is done via center.moveTo(..)
, the same object referenced by the expression c.c
can see this change.
Call Stack
Now, let's look at what happens when we invoke a method. We have seen what happened when the constructor is invoked. This is similar. Take the distanceTo
method in Point
as an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
and the invocation:
1 2 3 |
|
After declaring p1
and p2
and creating both objects, we have:
When distanceTo
is called, the JVM creates a stack frame for this instance method call. This stack frame is a region of memory that tentatively contains (i) the this
reference, (ii) the method arguments, and (iii) local variables within the method, among other things23. When a class method is called, the stack frame does not contain the this
reference.
You can see that the references to the objects p1
and p2
are copied onto the stack frame. p1
and this
point to the same object, and p2
and q
point to the same object.
Within the method, any modification done to this
would change the object referenced to by p1
, and any change made to q
would change the object referenced to by p2
as well.
After the method returns, the stack frame for that method is destroyed.
Let's consider a new move
method for the class Point
that has two parameters (double x, double y)
and moves the x
and y
cordinates of the Point
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
1 2 3 4 5 |
|
Again, we create a stack frame, copy the reference to object p1
into this
, copy x
from the calling method to x
the argument within the method, copy y
from the calling method to y
the argument within the method.
What is important here is that, as x
and y
are primitive types instead of references, we copy the values onto the stack. If we change x
or y
within move
, the x
and y
of the calling function will not change. This behavior is the same as you would expect in C. However, unlike in C where you can pass in a pointer to a variable, you cannot pass in a reference to a primitive type in any way in Java. If you want to pass in a variable of primitive type into a method and have its value changed, you will have to use a wrapper class. The details of how to do this are left as an exercise.
Summary
To summarize, Java uses call by value for primitive types, and call by reference for objects4.
If we made multiple nested method calls, as we usually do, the stack frames get stacked on top of each other.
One final note: the memory allocated on the stack is deallocated when a method returns. The memory allocated on the heap, however, stays there as long as there is a reference to it (either from another object or from a variable in the stack). Unlike C or C++, in Java, you do not have to free the memory allocated to objects. The JVM runs a garbage collector that checks for unreferenced objects on the heap and cleans up the memory automatically.
-
It can also behaves like a function/method in a sense that it can be invoked (e..g,
this(..)
). In this case, the keywordthis
represents the constructor of the current class. We will illustrate more of this on the topic of overloading. ↩ -
This is not that different from how an OS handles function call in a machine code, as you will see in CS2100/CS2106. ↩
-
The other things are JVM implementation independent and not relevant to our discussion here. ↩
-
Alternatively, you can think of Java as always using call by value. It's just that the value of a reference is, in fact, just a reference. ↩