Unit 34: Streams in Java
We have been building and using our own functional interfaces and abstractions.
Java provides its own version of functional interfaces that are comparable to ours, in the java.util.function
package. The table below shows some commonly used ones:
CS2030S | java.util.function |
---|---|
BooleanCondition<T>::test |
Predicate<T>::test |
Producer<T>::produce |
Supplier<T>::get |
Transformer<T,R>::transform |
Function<T,R>::apply |
Transformer<T,T>::transform |
UnaryOp<T>::apply |
Combiner<S,T,R>::combine |
BiFunction<S,T,R>::apply |
Besides, some of the abstractions we have built have similar counterparts in Java as well:
CS2030S | Java version |
---|---|
Maybe<T> |
java.util.Optional<T> |
Lazy<T> |
N/A |
InfiniteList<T> |
java.util.stream.Stream<T> |
There is a subtle and important difference between our Maybe<T>
and Optional<T>
, which we will explore in an upcoming unit. We will focus this unit on Stream
since the Java implementation of `stream is an infinite list with much more functionalities, some of which (such as parallel streams) are beyond what we can build ourselves in CS2030S.
Building a Stream
To start, let's see how we can build a stream object:
- We can use the static factory method
of
(e.g.,Stream.of(1, 2, 3)
) - We can use the
generate
anditerate
methods (similar to ourInfiniteList
) - We can convert an array into a
Stream
usingArrays::stream
- We can convert a
List
instance (or anyCollection
instance) into aStream
usingList::stream
Many other APIs in Java return a Stream
instance (e.g., Files::lines
)
Terminal Operations
A Stream
is lazy, just like InfiniteList
.
A terminal operation is an operation on the stream that triggers the evaluation of the stream. A typical way of writing code that operates on streams is to chain a series of intermediate operations together, ending with a terminal operation.
The forEach
method is a terminal operation that takes in a stream and applies a lambda expression to each element.
The lambda expression to apply does not return any value. Java provides the Consumer<T>
functional interface for this. Typical use is
1 2 |
|
Intermediate Stream Operations
An intermediate operation on stream returns another Stream
. Java provides map
, filter
, flatMap
, and other intermediate operations. Intermediate operations are lazy and do not cause the stream to be evaluated.
FlatMapping a Stream
You have seen flatMap
for Box<T>
, Maybe<T>
and Lazy<T>
. The method flatMap
in Stream
behaves similarly -- it takes a lambda expression that transforms every element in the stream into another stream. The resulting stream of streams is then flattened and concatenated together.
For instance,
1 2 |
|
1 2 |
|
Stateful and Bounded Operations
Some intermediate operations are stateful -- they need to keep track of some states to operate. Two examples are sorted
and distinct
.
sorted
returns a stream with the elements in the stream sorted. Without argument, it sorts according to the natural order as defined by implementing the Comparable
interface. You can also pass in a Comparator
to tell sorted
how to sort.
distinct
returns a stream with only distinct elements in the stream.
distinct
and sorted
are also known as bounded
operations, since they should only be called on a finite stream -- calling them on an infinite stream is a bad idea!
Truncating an Infinite List
There are several intermediate operations that convert from infinite stream to finite stream:
limit
takes in anint
\(n\) and returns a stream containing the first \(n\) elements of the stream;takeWhile
takes in a predicate and returns a stream containing the elements of the stream, until the predicate becomes false. The resulting stream might still be infinite if the predicate never becomes false.
For instance,
1 |
|
create a (lazy) finite stream of elements 0 to 4.
Peeking with a Consumer
A particularly useful intermediate operation of Stream
is peek
. peek
takes in a Consumer
, allowing us to apply a lambda on a "fork" of the stream. For instance,
1 |
|
Reducing a Stream
One of the more powerful terminal operations in Stream
is reduce
, also known as fold
or accumulate
elsewhere, the reduce
operation applies a lambda repeatedly on the elements of the stream to reduce it into a single value.
For instance,
1 |
|
The method reduce
takes in an identity value (0
in the example above) and an accumulation function ((x, y) -> x + y
above) and returns the reduced value. The process of reduction is equivalent to the following pseudocode:
1 2 3 4 |
|
Note that there are constraints on the identity and accumulation function, which are placed due to the potential parallelization of reduce
. We will revisit this operation later.
Java also overloaded reduce
with two other versions -- a simpler one (with null
identity) and a more complex one, which supports a different returned type than the type of the elements in the stream. You can read the java API for details.
Element Matching
Stream also provides terminal operations for testing if the elements pass a given predicate:
noneMatch
returns true if none of the elements pass the given predicate.allMatch
returns true if every element passes the given predicate.anyMatch
returns true if at least one element passes the given predicate.
Consumed Once
One of the greatest limitations of Stream
, which does not apply to our InfiniteList
, is that a stream can only be operated on once. We cannot iterate through a stream multiple times. Doing so would lead to an IllegalStateException
being thrown. We have to recreate the stream if we want to operate on the stream more than once.
Example,
1 2 3 |
|
Example: Is this a prime?
Consider the method below, which checks if a given int
is a prime:
1 2 3 4 5 6 7 8 |
|
Let's see how we can rewrite this with Stream
. Due to the overhead of wrapper classes, Java provides specialized streams on primitives: IntStream
, LongStream
, and DoubleStream
, with similar set of methods provided. Since we are dealing with int
here, we will use IntStream
. The code above can be rewritten as:
1 2 3 4 |
|
The IntStream::range(x,y)
method generates a stream of int
from x
to y-1
.
Example: First 500 primes
What if we want to print out the first 500 prime numbers, starting from 2? Normally, we would do the following:
1 2 3 4 5 6 7 8 9 10 11 |
|
The code is still considered simple, and understandable for many, but I am sure some of us will encounter a bug the first time we write this (either forgot to increment the counter or put the increment in the wrong place). If you look at the code, there are a couple of components:
- Lines 3 and 9 deal with iterating through different numbers for primality testing
- Line 5 is the test of whether a number is prime
- Lines 2, 4, and 7, deal with limiting the output to 500 primes
- Line 6 is the action to perform on the prime
With streams, we can write it like the following:
1 2 3 4 |
|
Notice how each of the four components matches neatly with one operation on stream!
With a stream, we no longer have to write loops, we have moved the iterations to within each operation in the stream. We no longer need to maintain states and counters, they are done within each operation as needed as well. This has another powerful implication: our code becomes more declarative, we only need to concern about what we want at each step, much less about how to do it. Doing so makes our code more succinct and less bug-prone.
Caution: Avoid Overusing Streams
We will end this unit with a note of caution.
Using stream in place of loops should make our code simpler, more elegant, and less bug-prone. One should note that not all loops can be translated into stream elegantly. A double-nested loop, for instance, stretches the elegance of streams. A triple-nested loop should perhaps be best written as a loop with appropriate inner components written with lambdas and streams.
As you go through exercises in using streams, you will find more examples of the limitations of streams.