Chapter 29 The Stream API _ Java_ The Complete Reference, Eleventh Edition, 11th Edition
Chapter 29 The Stream API _ Java_ The Complete Reference, Eleventh Edition, 11th Edition
29
The Stream API
Of the many new features recently added to Java, two of the most impor-
tant are lambda expressions and the stream API. Lambda expressions
were described in Chapter 15. The stream API is described here. As you
will see, the stream API is designed with lambda expressions in mind.
Moreover, the stream API provides some of the most significant demon-
strations of the power that lambdas bring to Java.
Stream Basics
Let’s begin by defining the term stream as it applies to the stream API: a
stream is a conduit for data. Thus, a stream represents a sequence of ob-
jects. A stream operates on a data source, such as an array or a collection.
A stream, itself, never provides storage for the data. It simply moves data,
possibly filtering, sorting, or otherwise operating on that data in the
process. As a general rule, however, a stream operation by itself does not
modify the data source. For example, sorting a stream does not change
the order of the source. Rather, sorting a stream results in the creation of
a new stream that produces the sorted result.
NOTE It is necessary to state that the term stream as used here differs
from the use of stream when the I/O classes were described earlier in this
book. Although an I/O stream can act conceptually much like one of the
streams defined by java.util.stream, they are not the same. Thus,
throughout this chapter, when the term stream is used, it refers to objects
based on one of the stream types described here.
Stream Interfaces
The stream API defines several stream interfaces, which are packaged in
java.util.stream and contained in the java.base module. At the founda-
tion is BaseStream, which defines the basic functionality available in all
streams. BaseStream is a generic interface declared like this:
Here, T specifies the type of the elements in the stream, and S specifies
the type of stream that extends BaseStream. BaseStream extends the
AutoCloseable interface; thus, a stream can be managed in a try-with-re-
sources statement. In general, however, only those streams whose data
source requires closing (such as those connected to a file) will need to be
closed. In most cases, such as those in which the data source is a collec-
tion, there is no need to close the stream. The methods declared by
BaseStream are shown in Table 29-1.
Table 29-1 The Methods Declared by BaseStream
interface Stream<T>
In both tables, notice that many of the methods are notated as being ei-
ther terminal or intermediate. The difference between the two is very im-
portant. A terminal operation consumes the stream. It is used to produce
a result, such as finding the minimum value in the stream, or to execute
some action, as is the case with the forEach( ) method. Once a stream has
been consumed, it cannot be reused. Intermediate operations produce an-
other stream. Thus, intermediate operations can be used to create a pipe-
line that performs a sequence of actions. One other point: intermediate
operations do not take place immediately. Instead, the specified action is
performed when a terminal operation is executed on the new stream cre-
ated by an intermediate operation. This mechanism is referred to as lazy
behavior, and the intermediate operations are referred to as lazy. The use
of lazy behavior enables the stream API to perform more efficiently.
DoubleStream
IntStream
LongStream
You can obtain a stream in a number of ways. Perhaps the most common
is when a stream is obtained for a collection. Beginning with JDK 8, the
Collection interface was expanded to include two methods that obtain a
stream from a collection. The first is stream( ), shown here:
A stream can also be obtained from an array by use of the static stream(
) method, which was added to the Arrays class. One of its forms is shown
here:
This method returns a sequential stream to the elements in array. For ex-
ample, given an array called addresses of type Address, the following
obtains a stream to it:
Before going any further, let’s work through an example that uses
streams. The following program creates an ArrayList called myList that
holds a collection of integers (which are automatically boxed into the
Integer reference type). Next, it obtains a stream that uses myList as a
source. It then demonstrates various stream operations.
The output is shown here:
Original list: [7, 18, 10, 24, 17, 5]
Minimum value: 5
Maximum value: 24
Sorted stream: 5 7 10 17 18 24
Odd values: 5 7 17
Odd values greater than 5: 7 17
Next, the program obtains the minimum value in the stream (which is, of
course, also the minimum value in the data source) and displays it, as
shown here:
First, notice that the type of min( )’s parameter is a Comparator. This
comparator is used to compare two elements in the stream. In the exam-
ple, min( ) is passed a method reference to Integer’s compare( ) method,
which is used to implement a Comparator capable of comparing two
Integers. Next, notice that the return type of min( ) is Optional. The
Optional class is described in Chapter 20, but briefly, here is how it
works. Optional is a generic class packaged in java.util and declared like
this:
class Optional<T>
Here, T specifies the element type. An Optional instance can either con-
tain a value of type T or be empty. You can use isPresent( ) to determine
if a value is present. Assuming that a value is available, it can be obtained
by calling get( ), or if you are using JDK 10 or later, orElseThrow( ). Here,
get( ) is used. In this example, the object returned will hold the minimum
value of the stream as an Integer object.
One other point about the preceding line: min( ) is a terminal operation
that consumes the stream. Thus, myStream cannot be used again after
min( ) executes.
The next lines obtain and display the maximum value in the stream:
The program then obtains a sorted stream through the use of this line:
Reduction Operations
Consider the min( ) and max( ) methods in the preceding example pro-
gram. Both are terminal operations that return a result based on the ele-
ments in the stream. In the language of the stream API, they represent re-
duction operations because each reduces a stream to a single value—in
this case, the minimum and maximum. The stream API refers to these as
special case reductions because they perform a specific function. In addi-
tion to min( ) and max( ), other special case reductions are also avail-
able, such as count( ), which counts the number of elements in a stream.
However, the stream API generalizes this concept by providing the re-
duce( ) method. By using reduce( ), you can return a value from a
stream based on any arbitrary criteria. By definition, all reduction opera-
tions are terminal operations.
Stream defines three versions of reduce( ). The two we will use first are
shown here:
Here, R specifies the result type, T is the type of the first operand, and U
is the type of second operand. Thus, apply( ) applies a function to its two
operands (val and val2) and returns the result. When BinaryOperator
extends BiFunction, it specifies the same type for all the type parame-
ters. Thus, as it relates to BinaryOperator, apply( ) looks like this:
• Stateless
• Non-interfering
• Associative
As explained earlier, stateless means that the operation does not rely on
any state information. Thus, each element is processed independently.
Non-interfering means that the data source is not modified by the opera-
tion. Finally, the operation must be associative. Here, the term associative
is used in its normal, arithmetic sense, which means that, given an asso-
ciative operator used in a sequence of operations, it does not matter
which pair of operands are processed first. For example,
(10 * 2) * 7
10 * (2 * 7)
In the program, the first version of reduce( ) uses the lambda expression
to produce a product of two values. In this case, because the stream con-
tains Integer values, the Integer objects are automatically unboxed for
the multiplication and reboxed to return the result. The two values repre-
sent the current value of the running result and the next element in the
stream. The final result is returned in an object of type Optional. The
value is obtained by calling get( ) on the returned object.
In the second version, the identity value is explicitly specified, which for
multiplication is 1. Notice that the result is returned as an object of the el-
ement type, which is Integer in this case.
S parallel()
Once a parallel stream has been obtained, operations on the stream can
occur in parallel, assuming that parallelism is supported by the environ-
ment. For example, the first reduce( ) operation in the preceding pro-
gram can be parallelized by substituting parallelStream( ) for the call to
stream( ):
Optional<Integer> productObj =
myList.parallelStream().reduce((a,b) -> a*b);
The results will be the same, but the multiplications can occur in differ-
ent threads.
When using parallel streams, you might find the following version of re-
duce( ) especially helpful. It gives you a way to specify how partial re-
sults are combined:
In this version, combiner defines the function that combines two values
that have been produced by the accumulator function. Assuming the pre-
ceding program, the following statement computes the product of the ele-
ments in myList by use of a parallel stream:
As you can see, in this example, both the accumulator and combiner per-
form the same function. However, there are cases in which the actions of
the accumulator must differ from those of the combiner. For example,
consider the following program. Here, myList contains a list of double
values. It then uses the combiner version of reduce( ) to compute the
product of the square roots of each element in the list.
Notice that the accumulator function multiplies the square roots of two
elements, but the combiner multiplies the partial results. Thus, the two
functions differ. Moreover, for this computation to work correctly, they
must differ. For example, if you tried to obtain the product of the square
roots of the elements by using the following statement, an error would
result:
S sequential( )
There is one other aspect of a stream to keep in mind when using parallel
execution: the order of the elements. Streams can be either ordered or
unordered. In general, if the data source is ordered, then the stream will
also be ordered. However, when using a parallel stream, a performance
boost can sometimes be obtained by allowing a stream to be unordered.
When a parallel stream is unordered, each partition of the stream can be
operated on independently, without having to coordinate with the others.
In cases in which the order of the operations does not matter, it is possi-
ble to specify unordered behavior by calling the unordered( ) method,
shown here:
S unordered( )
One other point: the forEach( ) method may not preserve the ordering of
a parallel stream. If you want to perform an operation on each element
in a parallel stream while preserving the order, consider using forEa-
chOrdered( ). It is used just like forEach( ).
Mapping
Often it is useful to map the elements of one stream to another. For exam-
ple, a stream that contains a database of name, telephone, and e-mail ad-
dress information might map only the name and e-mail address portions
to another stream. As another example, you might want to apply some
transformation to the elements in a stream. To do this, you could map the
transformed elements to a new stream. Because mapping operations are
quite common, the stream API provides built-in support for them. The
most general mapping method is map( ). It is shown here:
Here, R specifies the type of elements of the new stream; T is the type of
elements of the invoking stream; and mapFunc is an instance of
Function, which does the mapping. The map function must be stateless
and non-interfering. Since a new stream is returned, map( ) is an inter-
mediate method.
Function<T, R>
R apply(T val)
Here, val is a reference to the object being mapped. The mapped result is
returned.
Here is an example that uses map( ) to create a new stream that contains
only selected fields from the original stream. In this case, the original
stream contains objects of type NamePhoneEmail, which contains
names, phone numbers, and e-mail addresses. The program then maps
only the names and phone numbers to a new stream of NamePhone ob-
jects. The e-mail addresses are discarded.
The output, shown here, verifies the mapping:
Because you can pipeline more than one intermediate operation to-
gether, you can easily create very powerful actions. For example, the fol-
lowing statement uses filter( ) and then map( ) to produce a new stream
that contains only the name and phone number of the elements with the
name "James":
This type of filter operation is very common when creating database-style
queries. As you gain experience with the stream API, you will find that
such chains of operations can be used to create very sophisticated
queries, merges, and selections on a data stream.
Each mapFunc must implement the abstract method defined by the speci-
fied interface, returning a value of the indicated type. For example,
ToDoubleFunction specifies the applyAsDouble(T val ) method, which
must return the value of its parameter as a double.
Before leaving the topic of mapping, it is necessary to point out that the
stream API also provides methods that support flat maps. These are
flatMap( ), flatMapToInt( ), flatMapToLong( ), and flatMapToDouble(
). The flat map methods are designed to handle situations in which each
element in the original stream is mapped to more than one element in
the resulting stream.
Collecting
As the preceding examples have shown, it is possible (indeed, common)
to obtain a stream from a collection. Sometimes it is desirable to obtain
the opposite: to obtain a collection from a stream. To perform such an ac-
tion, the stream API provides the collect( ) method. It has two forms. The
one we will use first is shown here:
Here, R specifies the type of the result, and T specifies the element type of
the invoking stream. The internal accumulated type is specified by A. The
collectorFunc specifies how the collection process works. The collect( )
method is a terminal operation.
The toList( ) method returns a collector that can be used to collect ele-
ments into a List. The toSet( ) method returns a collector that can be
used to collect elements into a Set. For example, to collect elements into a
List, you can call collect( ) like this:
collect(Collectors.toList())
The following program puts the preceding discussion into action. It re-
works the example in the previous section so that it collects the names
and phone numbers into a List and a Set.
The output is shown here:
In the program, the following line collects the name and phone numbers
into a List by using toList( ):
List<NamePhone> npList =
nameAndPhone.collect(Collectors.toList());
After this line executes, the collection referred to by npList can be used
like any other List collection. For example, it can be cycled through by us-
ing a for-each for loop, as shown in the next line:
Here, target specifies how the object that holds the result is created. For
example, to use a LinkedList as the result collection, you would specify
its constructor. The accumulator function adds an element to the result
and combiner combines two partial results. Thus, these functions work
similarly to the way they do in reduce( ). For both, they must be stateless
and non-interfering. They must also be associative.
Note also that the types of accumulator and combiner are BiConsumer.
This is a functional interface defined in java.util.function. It specifies
the abstract method accept( ) that is shown here:
This method performs some type of operation on obj and obj2. As it re-
lates to accumulator, obj specifies the target collection, and obj2 specifies
the element to add to that collection. As it relates to combiner, obj and
obj2 specify two collections that will be combined.
Using the version of collect( ) just described, you could use a LinkedList
as the target in the preceding program, as shown here:
Notice that the first argument to collect( ) is a lambda expression that re-
turns a new LinkedList. The second argument uses the standard collec-
tion method add( ) to add an element to the list. The third element uses
addAll( ) to combine two linked lists. As a point of interest, you can use
any method defined by LinkedList to add an element to the list. For ex-
ample, you could use addFirst( ) to add elements to the start of the list, as
shown here:
Notice that the first argument specifies the HashSet constructor refer-
ence. The second and third specify method references to HashSet’s add( )
and addAll( ) methods.
One last point: In the language of the stream API, the collect( ) method
performs what is called a mutable reduction. This is because the result of
the reduction is a mutable (i.e., changeable) storage object, such as a
collection.
Although a stream is not a data storage object, you can still use an itera-
tor to cycle through its elements in much the same way as you would use
an iterator to cycle through the elements of a collection. The stream API
supports two types of iterators. The first is the traditional Iterator. The
second is Spliterator, which was added by JDK 8. It provides significant
advantages in certain situations when used with parallel streams.
As just mentioned, you can use an iterator with a stream in just the same
way that you do with a collection. Iterators are discussed in Chapter 19,
but a brief review will be useful here. Iterators are objects that imple-
ment the Iterator interface declared in java.util. Its two key methods are
hasNext( ) and next( ). If there is another element to iterate, hasNext( )
returns true, and false otherwise. The next( ) method returns the next
element in the iteration.
NOTE There are additional iterator types that handle the primitive
streams: PrimitiveIterator, PrimitiveIterator.OfDouble,
PrimitiveIterator.OfLong, and PrimitiveIterator.OfInt. These iterators
all extend the Iterator interface and work in the same general way as
those based directly on Iterator.
Iterator<T> iterator( )
Here, T specifies the element type. (The primitive streams return itera-
tors of the appropriate primitive type.)
Alpha
Beta
Gamma
Delta
Phi
Omega
Use Spliterator
Here, action specifies the action that is executed on the next element in
the iteration. tryAdvance( ) returns true if there is a next element. It re-
turns false if no elements remain. As discussed earlier in this chapter,
Consumer declares one method called accept( ) that receives an element
of type T as an argument and returns void.
As long as tryAdvance( ) returns true, the action is applied to the next el-
ement. When tryAdvance( ) returns false, the iteration is complete.
Notice how tryAdvance( ) consolidates the purposes of hasNext( ) and
next( ) provided by Iterator into a single method. This improves the effi-
ciency of the iteration process.
In some cases, you might want to perform some action on each element
collectively, rather than one at a time. To handle this type of situation,
Spliterator provides the forEachRemaining( ) method, shown here:
This method applies action to each unprocessed element and then re-
turns. For example, assuming the preceding program, the following dis-
plays the strings remaining in the stream:
Notice how this method eliminates the need to provide a loop to cycle
through the elements one at a time. This is another advantage of
Spliterator.
This chapter has discussed several key aspects of the stream API and in-
troduced the techniques required to use them, but the stream API has
much more to offer. To begin, here are a few of the other methods pro-
vided by Stream that you will find helpful:
One last point: the stream API is a powerful addition to Java. You will
want to explore all of the capabilities that java.util.stream has to offer.