Skip to content

Unit 4: Encapsulation

Learning Objectives

Students should

  • understand composite data type as a even-higher level abstraction over variables.
  • understand encapsulation as an object-oriented (OO) principle.
  • understand the meaning of class, object, fields, methods, in the context of OO programming.
  • be able to define a class and instantiate one as an object in Java.
  • appreciate OO as a natural way to model the real world in programs.
  • understand reference types in Java and its difference from the primitive types.

Abstraction: Composite Data Type

Just like functions allow programmers to group instructions, give it a name, and refer to it later, a composite data type allows programmers to group primitive types together, give it a name to become a new type, and refer to it later. This is another powerful abstraction in programming languages that help us to think at a higher conceptual level without worrying about the details. Commonly used examples are mathematical objects such as complex numbers, 2D data points, multi-dimensional vectors, circles, etc, or everyday objects such as a person, a product, etc.

Defining composite data type allows programmers to abstract away and be separated from the concern of how a complex data type is represented.

For instance, a circle on a 2D plane can be represented by the center (i.e., x, y) and its radius r, or it can be represented by the top left corner (i.e., x,y) and the width w of the bounding square.

In C, we build a composite data type with struct. For example,

1
2
3
4
typedef struct {
  double x, y; // (x,y) coordinate of the center.
  double r;    // radius
} circle;

Once we have the struct defined, we have a new data type called circle. However, we are not completely shielded from its representation, until we write a set of functions that operates on the circle composite type. For instance,

1
2
3
4
double circle_area(circle c) { ... };
bool   circle_contains_point(circle c, double x, double y) { ... };
bool   circle_overlaps(circle c1, circle c2) { ... };
  :

Implementing these functions requires knowledge of how a circle is represented. The implementation will be different if we have a different representation of circle (e.g., x and y may represent the center of the circle or the top left corner of the bounding square). But once the set of functions that operates on and manipulates circles is available, we can use the circle type without worrying about the internal representation. Of course this assumes that we will only use the functions specifically written to work on circle type.

Additionally, the example on circle_overlap highlights another advantage of having composite data type. To see the advantage, imagine that you do not have the data type circle. Then the function to check if two circles overlap would require 6 parameters.

1
2
3
bool   circle_overlaps(double x1, double y1, double r1,
    double x2, double y2, double r2) { ... };
  :

We have used a nice numbering to clearly indicate how the parameters are related. Those with the same suffix belong to the same circle. But another programmer may instead write it in a different order.

1
2
3
bool   circle_overlaps(double x1, double x2,
    double y1, double y2, double r1, double r2) { ... };
  :

Even worse, lazy programmer may even omit the suffix and make the entire code unreadable. So the use of composite data type is like a "glue" that binds relevant data together. That way, we know that all the elements that make up a circle will always be together.

If we decide to change the representation of a circle, then only the set of functions that operates on a circle type need to be changed, but not the code that uses circles to do other things. In other words, the representation of the circle and the set of functions that operate on and manipulate circles, fall on the same side of the abstraction barrier.

If you are the programmer who writes the code for the implementation of circle as well as the functions that operate on and manipulate circle then you are the implementer. On the other hand, if you are the programmer who uses the function that manipulate circle, then you are the client.

Abstraction: Class and Object (or, Encapsulation)

We can further bundle the composite data type and its associated functions on the same side of the abstraction barrier together. This bundle is another abstraction called a class.

Class

A class is a data type with a group of functions associated with it.

We call the data in the class as fields (or members, or states, or attributes, or properties1). As for the associated functions, they are called methods. A well-designed class maintains the abstraction barrier, properly wraps the barrier around the internal representation and implementation, and exposes just the right method interface for others to use.

The concept of keeping all the data and functions operating on the data related to a composite data type together within an abstraction barrier is called encapsulation.

Let's see how we can encapsulate the fields and methods associated together, using Circle as an example, in Java.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Circle v0.1
class Circle {
  double x;
  double y;
  double r;

  double getArea() {
    return 3.141592653589793 * r * r;
  }
}

The code above defines a new class using the keyword class, give it a name Circle2, followed by a block listing the member variables (with types) and the function definitions.

Just like we can create variables of a given type, we can create objects of a given class. Objects are instances of a class, each allowing the same methods to be called, and each containing the same set of variables of the same types, but (possibly) storing different values.

In Java, the keyword new creates an object of a given class. For instance, to create a Circle object, we can use

1
2
3
Circle c = new Circle();
c.r = 10;    // set the radius to 10
c.getArea(); // return 314.1592653589793

To access the fields and the methods, we use the . notation. For example, object.field or object.method(..). This can be seen in Line 2 and Line 3 of the example above.

A Bad Example

Let us take a moment to appreciate the example circle v0.1 above. This is a reasonable example as the method getArea is computing the area of the circle with the radius as specified in the field r. So, we can clearly see that the method is associated with the data. Consider adding another method but this time it takes in a pen and a paper and writes "CS2030S is easy" on the paper.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// Bad Circle
class Circle {
  double x;
  double y;
  double r;

  double getArea() {
    return 3.141592653589793 * r * r;
  }
  void write(Pen pen, Paper paper) {  // irrelevant method
    pen.write(paper, "CS2030S is easy");
  }
}

Without even knowing the implementation of pen.write(..), we can already see that this method is not associated with any circle at all. In fact, it is not even using any of the fields of the circle.

Object-Oriented Programming

A program written in an object-oriented language such as Java consists of classes, with one main class as the entry point. One can view a running object-oriented (or OO) program as something that instantiates objects of different classes and orchestrates their interactions with each other by calling each others' methods.

One could argue that an object-oriented way of writing programs is much more natural, as it mirrors our world more closely. If we look around us, we see objects all around us, and each object has certain properties, exhibits certain behavior, and they allow certain actions. We interact with the objects through their interfaces, and we rarely need to know the internals of the objects we used every day (unless we try to repair them)3.

To model a problem in an object-oriented manner, we typically model the nouns as classes and objects, the properties or relationships among the classes as fields, and the verbs or actions of the corresponding objects as methods.

Take, for example, the following partial problem description about online airline reservation system.

Users need to be able to make bookings from an origin to a destination airport which may comprise multiple connecting flights. We record the booking date.

We can identify the following from the problem description.

Nouns Properties Associated Verbs
User
  • Bookings made
  • Make booking
Booking
  • Bookings date
  • Origin airport
  • Destination airport
currently unknown
Airport currently unknown currently unknown

From here, we can try to model the problem in OOP using three classes: User, Booking, and Airport. However, since not much information is known about Airport and assuming the only identifiable information about airport is the airport code, then maybe we do not need to create an Airport class. Instead, we use a String.

However, for User and Booking, we clearly need to encapsulate the information into a class. In the case of booking, it is simply because there are 3 different properties that need to be combined. In the case of user, we have an associated verb that needs to be modelled as method.

When to Stop?

In the discussion above, we put forward the possibility that Airport need not be a class. So the question is, when should we stop modelling a noun as a class? We may be too eager to model everything as a class, including the date to be stored as booking date. There is also the opposite problem of too lazy to model. For instance, we may lazily group user and booking together to form a class with 4 fields.

There is no clear answer to this but as a general guide, you may want to ask the following questions.

  • Is there multiple properties to be stored?
    • If so, then creating a class is good.
    • In the case of airport, if there is only a single data, then we need not make a class.
  • Is there an action associated with the entity?
    • If so, then creating a class is good.
    • In the case of user, although it only has a single property, it has an association action.
  • Is there a real world counterpart?
    • If so, model it based on the real world.
    • In the case of user and booking, we have real world counterpart so we model them as separate classes.
  • Is there potential changes to the entity?
    • If so, then creating a class is good.
    • For instance, if in the future we plan to store more information about an airport (e.g., the country it is located, etc), then having it as a class will minimize potential changes to other parts of the code (e.g., if we used String before, we now have to change all these String into Airport).

The guide above are not exhaustive. But they are still a good starting point if this is your first attempt at modelling in OOP.

Reference Types in Java

We mentioned in Unit 2 that there are two kinds of types in Java. You have been introduced to the primitive types. Everything else in Java is a reference type.

The Circle class is an example of a reference type. Unlike primitive variables, which never share the value, a reference variable stores only the reference to the value, and therefore two reference variables can share the same value. For instance,

1
2
3
4
5
Circle c1 = new Circle();
Circle c2 = c1;
System.out.println(c2.r); // print 0
c1.r = 10.0;
System.out.println(c2.r); // print 10.0

The behavior above is due to the variables c1 and c2 referencing to the same Circle object in the memory. Therefore, changing the field r of c1 causes the field r of c2 to change as well.

Special Reference Value: null

Any reference variable that is not initialized will have the special reference value null. A common error for beginners is to declare a reference variable and try to use it without instantiating an object:

1
2
Circle c1;
c1.r = 10.0;  // error

Line 2 would lead to a run-time error message

1
|  Exception java.lang.NullPointerException

Remember to always instantiate a reference variable before using it.

Class Diagram (Part 1)

A useful diagram to have when trying to visualize a class is called the class diagram. A class diagram consists of 3 segments:

  1. The class name.
  2. The fields.
  3. The methods.

In between each segment we draw a line to clearly delimit each segment. For best result, the order in which the fields and methods appear should be identical to how they appear in the code.

Additionally, we omit the implementation of the method and record only the minimal information needed. Otherwise, there is no difference between class diagram and code. What we want is to have a diagram that captures the essence of a class so that we can reason about our design without actually writing the code.

For instance, consider the class Circle v0.1 above (reproduced below).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Circle v0.1
class Circle {
  double x;
  double y;
  double r;

  double getArea() {
    return 3.141592653589793 * r * r;
  }
}

The corresponding class diagram is shown below.

Class Diagram 01

We will improve upon this class diagram with additional details while keeping the amount of information minimal to avoid information overload.

We encourage you to practice drawing class diagram from code and writing code from class diagram. This will be a useful design tools when dealing with larger programs, especially those involving multiple files. It will take more time to debug your code so a good design will save time.

Quote

"There has never been an unexpectedly short debugging period in the history of computers."

Steven Levy

Further Readings

  • Oracle's Java Tutorial on Classes and Objects.
  • Class Diagram. The version that we will introduce in CS2030/S is a simpler version that is sufficient for the purpose of this course.

  1. Computer scientists just could not decide what to call this :( 

  2. As a convention, we use PascalCase for class name and camelCase for variable and method names in Java. 

  3. This is a standard analogy in an OOP textbook. In practice, however, we often have to write programs that include abstract concepts with no tangible real-world analogy as classes.