Mastering Lambdas - Java Programming in A Multicore World
Mastering Lambdas - Java Programming in A Multicore World
com
Mastering Lambdas
www.allitebooks.com
This page intentionally left blank
www.allitebooks.com
Mastering Lambdas:
Java Programming in
a Multicore World
Maurice Naftalin
www.allitebooks.com
Copyright © 2015 by Maurice Naftalin. All rights reserved. Except as permitted under the United States Copyright Act of 1976, no
part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system,
without the prior written permission of the Publisher, with the exception that the program listings may be entered, stored, and executed in
a computer system, but they may not be reproduced for publication.
ISBN: 978-0-07-182963-2
MHID: 0-07-182963-6
The material in this eBook also appears in the print version of this title: ISBN: 978-0-07-182962-5,
MHID: 0-07-182962-8.
All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked
name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trade-
mark. Where such designations appear in this book, they have been printed with initial caps.
McGraw-Hill Education eBooks are available at special quantity discounts to use as premiums and sales promotions or for use in corporate
training programs. To contact a representative, please visit the Contact Us page at www.mhprofessional.com.
Oracle and Java are registered trademarks of Oracle Corporation and/or its affiliates. All other trademarks are the property of their
respective owners, and McGraw-Hill Education makes no claim of ownership by the mention of products that contain these marks.
Screen displays of copyrighted Oracle software programs have been reproduced herein with the permission of Oracle Corporation and/or
its affiliates.
Information has been obtained by Publisher from sources believed to be reliable. However, because of the possibility of human or
mechanical error by our sources, Publisher, or others, Publisher does not guarantee to the accuracy, adequacy, or completeness of any
information included in this work and is not responsible for any errors or omissions or the results obtained from the use of such information.
Oracle Corporation does not make any representations or warranties as to the accuracy, adequacy, or completeness of any information
contained in this Work, and is not responsible for any errors or omissions.
TERMS OF USE
This is a copyrighted work and McGraw-Hill Education and its licensors reserve all rights in and to the work. Use of this work is subject
to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may
not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate,
sell, publish or sublicense the work or any part of it without McGraw-Hill Education’s prior consent. You may use the work for your own
noncommercial and personal use; any other use of the work is strictly prohibited. Your right to use the work may be terminated if you fail
to comply with these terms.
THE WORK IS PROVIDED “AS IS.” McGRAW-HILL EDUCATION AND ITS LICENSORS MAKE NO GUARANTEES OR
WARRANTIES AS TO THE ACCURACY, ADEQUACY OR COMPLETENESS OF OR RESULTS TO BE OBTAINED FROM
USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK
OR OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
TO IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. McGraw-Hill Education and
its licensors do not warrant or guarantee that the functions contained in the work will meet your requirements or that its operation will be
uninterrupted or error free. Neither McGraw-Hill Education nor its licensors shall be liable to you or anyone else for any inaccuracy, error
or omission, regardless of cause, in the work or for any damages resulting therefrom. McGraw-Hill Education has no responsibility for the
content of any information accessed through the work. Under no circumstances shall McGraw-Hill Education and/or its licensors be liable
for any indirect, incidental, special, punitive, consequential or similar damages that result from the use of or inability to use the work, even
if any of them has been advised of the possibility of such damages. This limitation of liability shall apply to any claim or cause whatsoever
whether such claim or cause arises in contract, tort or otherwise.
www.allitebooks.com
To my boys: Joe, Isaac, Daniel, and Ben
www.allitebooks.com
About the Author
Maurice Naftalin has over 30 years’ experience in IT as a developer, designer, archi-
tect, manager, teacher, and author. Naftalin, a certified Java programmer, has worked
in every release of Java to date. His experience in Java and business gives him a unique
perspective on the fundamental change that comes with introducing lambda expres-
sions in Java SE 8. Naftalin is a frequent presenter at conferences worldwide, includ-
ing the annual JavaOne. He runs a popular tutorial site in collaboration with Oracle’s
development team, www.lambdafaq.org, focused on the new language features in
Java 8.
Brian Goetz is one of the leading authorities on Java programming. He is the author
of the very successful Java Concurrency in Practice, as well as over 75 articles on Java
development. He was the specification lead for JSR-335 (Lambda Expressions for the
Java Language) and has served on numerous other JCP Expert Groups. Brian is the Java
Language Architect at Oracle.
www.allitebooks.com
Contents
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
vii
www.allitebooks.com
viii Mastering Lambdas
www.allitebooks.com
Contents ix
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
www.allitebooks.com
This page intentionally left blank
Foreword
J ava SE 8 may well represent the largest change ever to the Java language and
libraries. Since you’re reading a book called Mastering Lambdas, you prob-
ably already know that the biggest new feature is the addition of lambda
expressions. Depending on your perspective, this evolution began in 2009 (when
Project Lambda was launched), or 2006 (when several proposals were made for
adding closures to Java), or 1997 (when inner classes were added), or 1941 (when
Alonzo Church published his foundational work on the theory of computing, from
which lambda expressions get their name).
However long it took to get here, it’s about time! While it may initially appear that
lambda expressions are “just another language feature,” in reality, they will change
the way you think about programming, offering a powerful new tool for applying
abstraction to the programming challenges you face every day. Of course, Java already
gives us powerful tools for abstraction, such as inheritance and generics, but these are
largely about abstracting over data. Lambda expressions complement these by giving
us a better tool for abstracting over behavior.
In embracing lambda expressions, Java has taken a mild turn towards functional
programming. While it might seem that object-oriented programming and functional
programming are competing disciplines, they offer us complementary tools for man-
aging program complexity. And, as hardware parallelism continues to increase, the
building blocks of functional programming—immutable values and pure functions—
become even more effective tools for managing that complexity.
Mastering Lambdas: Java Programming in a Multicore World covers all the details
of the new language and library features of Java SE 8—lambda expressions, default
methods, and the Streams library—in a layered, disciplined fashion. Even more impor-
tantly, it connects the details of the features back to their underlying design decisions,
xi
xii Mastering Lambdas
enabling readers to get the most out of these new features by understanding the mo-
tivation and principles behind them. At the same time, it remains focused on the real
payoff, which is not the features themselves, but what they enable: more expressive,
more powerful, less error-prone user code. And navigating the reader to this payoff is
what this book is really about.
Let Mastering Lambdas be your introduction to programming with this new-and-
improved Java. Once you get started, I’m sure you’ll be hooked!
–Brian Goetz
Java Language Architect, Oracle Corporation
Acknowledgments
I
could not have written this book without unfailing help, encouragement, and
feedback from people in the Oracle language team: Brian Goetz, Paul San-
doz, Aleksey Shipilev, and Dan Smith. Stuart Marks provided comments so
valuable that they helped to shape the book.
I thank the people whose reviews saved me from many mistakes and often sug-
gested new directions: Graham Allan, Maurizio Cimadore, Chris Czarnecki, John
Kostaras, Kirk Pepperdine, Jeremy Prime, and Philip Wadler. Of course, any remaining
errors are my responsibility alone.
I owe a huge debt to my editor, Brandi Shailer, for her endless patience and opti-
mism during the long gestation of this project.
xiii
This page intentionally left blank
Introduction
T
he changes in Java 8 are the biggest in the history of the language. They promise
to alter Java programming toward a functional style will help you to write
code that is more concise and expressive, and (in many cases) ready to take
advantage of parallel hardware. In this book, you will discover how the seemingly
minor changes that introduce lambda expressions make this possible. You will learn
how lambda expressions enable you to write a Java function in a single line of code,
and how to use this capability to program the new Stream API, condensing verbose
collection processing code into simple and readable stream programs. Studying the
mechanisms that create and consume streams will lead to an analysis of their perfor-
mance and enable you to judge when to invoke the parallel execution features of the
API.
Lastly, integrating the new features into the existing Java platform libraries required
the evolution of existing collection interfaces, previously prevented by compatibility
problems. You will learn about how default methods solve these problems and how
to use them in evolving your own APIs.
xv
xvi Mastering Lambdas
Intended Audience
This book is aimed at Java developers comfortable with any version from Java 5 onward
who have heard about the exciting changes in Java 8 and want to know about them.
You do not need to have met lambda expressions or closures in any other language,
or to have any experience with functional programming. (If you have, so much the
better, of course.)
The book doesn’t assume familiarity with the platform libraries except for the stan-
dard collections of the Java Collections Framework; if you don’t know them well, be
prepared to consult the Javadoc sometimes.
A few sections contain more advanced material: these are introduced as being
suitable for a second reading.
www.allitebooks.com
2 Mastering Lambdas
he changes in Java 8 are the biggest in the history of the language, combining
T coordinated changes to the language, the libraries, and the virtual machine.
They promise to alter the way we think about the execution of Java programs
and to make the language t for use in a world, soon to arrive, of massively parallel
hardware. Yet for such an important innovation, the actual changes to the language
seem quite minor. What is it about these apparently minor modi cations that will
make such a big difference? And why should we change a programming model that
has served us so well throughout the lifetime of Java, and indeed for much longer
before that? In this chapter we will explore some of the limitations of that model and
see how the lambda-related features of Java 8 will enable Java to evolve to meet the
challenges of a new generation of hardware architectures.
Before Java 5 introduced the for-each loop, we would have written the loop like
this:
for (Iterator pointItr = pointList.iterator(); pointItr.hasNext(); ) {
((Point) pointItr.next()).translate(1, 1);
}
Or, in a clearer idiom (though less favored because it increases the scope of
pointItr):
Here we are asking pointList to create an Iterator object on our behalf, and
we are then using that object to access the elements of pointList in turn. This version
is still relevant, because today this is the code that the Java compiler generates to
implement the for-each loop. Its key aspect for us is that the order of access to the
Chapter 1: Taking Java to the Next Level 3
Before Java 8 this would have been a strange suggestion, since java.util.List
(which is the type of pointList) has no forEach method and, as an interface, cannot
have one added. However, in Chapter 7 we’ll see that Java 8 overcomes this problem
with the introduction of non-abstract interface methods.
The new method Collection.forEach (actually inherited by Collection from
its superinterface Iterable) is an example of internal iteration, so called because,
although the explicit iterative code is no longer obvious, iteration is still taking place
internally. It is now managed by the forEach method, which applies its behavioral
parameter to each element of its collection.
The change from external to internal iteration may seem a small one, simply a mat-
ter of moving the work of iteration across the client-library boundary. But the conse-
quences are not small. The parallelization work that we require can now be de ned in
the collection class instead of repeatedly in every client method that must iterate over
the collection. Moreover, the implementation is free to use additional techniques such
as laziness and out-of-order execution—and, indeed, others yet to be discovered—to
get to the answer faster.
So internal iteration is necessary if a programming model is to allow collection
library writers the freedom to choose, for each collection, the best way of implement-
ing bulk processing. But what is to replace the comment in the call of forEach—how
can the collection’s methods be told what task is to be executed on each element?
Chapter 1: Taking Java to the Next Level 5
Of course, this toy code is absurdly specialized; we aren’t really going to write
a new interface for every element type we need to work with. Fortunately, we don’t
need to; there is nothing special about the names PointAction and doForPoint;
if we simply replace them consistently with other names, nothing changes. In the
Java 8 collections library they are called Consumer and accept. So our PointAction
interface becomes:
public interface Consumer<T> {
void accept(T t);
}
Parameterizing the type of the interface allows us to dispense with the spe-
cialized ArrayList subclass and instead add the method forEach directly to
the class itself, as is done by inheritance in Java 8. This method takes a
java.util.function.Consumer, which will receive and process each element of
the collection.
public class ArrayList<E> {
...
public void forEach(Consumer<E> c) {
for (E e : this) {
c.accept(e);
}
}
}
You may think that this code is still pretty clumsy. But notice that the clumsiness is
now concentrated in the representation of each command by an instance of a class.
In many cases, this is overkill. In the present case, for example, all that forEach really
needs is the behavior of the single method accept of the object that has been supplied
to it. State and all the other apparatus that make up the object are included only
because method arguments in Java, if not primitives, have to be object references. But
we have always needed to specify this apparatus—until now.
Chapter 1: Taking Java to the Next Level 7
Experienced Java developers are so accustomed to seeing code like this that we
have often forgotten how we felt when we rst encountered it. Common rst reactions
to the syntax for anonymous inner classes used in this way are that it is ugly, verbose,
and dif cult to understand quickly, even though all it is really doing is to say “do
this for each element.” You don’t have to agree completely with these judgements
to accept that any attempt to persuade developers to rely on this idiom for every
collection operation is unlikely to be very successful. And this, at last, is our cue for
the introduction of lambda expressions.2
To reduce the verbosity of this call, we should try to identify those places where
we are supplying information that the compiler could instead infer from the context.
One such piece of information is the name of the interface being implemented by the
anonymous inner class. It’s enough for the compiler to know that the declared type of
the parameter to forEach is ConsumerăTą; that is suf cient information to allow the
supplied argument to be checked for type compatibility. Let’s de-emphasize the code
that the compiler can infer:
pointList.forEach(new Consumer<Point>() {
public void accept(Point p) {
p.translate(1, 1);
}
});
Second, what about the name of the method being overridden—in this case, accept?
There’s no way that the compiler can infer that in general. But in the case of Consumer
there is no need to infer the name, because the interface has only a single method.
This “one method interface” pattern is so useful for de ning callbacks that it has an
of cial status: any object to be used in the abbreviated form that we are developing
must implement an interface like this, exposing a single abstract method (this is called
2 People are often curious about the origin of the name. The idea of lambda expressions comes from a
model of computation developed in the 1930s by the American mathematician Alonzo Church, in which
the Greek letter λ (lambda) represents functional abstraction. But why that particular letter? Church seems
to have liked to tease: asked about his choice, his usual explanation involved accidents of typesetting, but
in later years he had an alternative answer: “Eeny, meeny, miny, moe.”
8 Mastering Lambdas
a functional interface, or sometimes a SAM interface). That gives the compiler a way
to choose the correct method without ambiguity. Again let’s de-emphasize the code
that can be inferred in this way:
pointList.forEach(new Consumer<Point>() {
public void accept(Point p) {
p.translate(1, 1);
}
});
Finally, the instantiated type of Consumer can often be inferred from the context, in
this case from the fact that when the forEach method calls accept, it supplies it with
an element of pointList, previously declared as a ListăPointą. That identi es the
type parameter to Consumer as Point, allowing us to omit the explicit type declaration
of the argument to accept.
This is what’s left when we de-emphasize this last component of the forEach call:
pointList.forEach(new Consumer<Point>() {
public void accept(Point p) {
p.translate(1, 1);
}
});
If you are unused to reading lambda expressions, you may nd it helpful for the
moment to continue to think of them as an abbreviation for a method declaration,
mentally mapping the parameter list of the lambda to that of the imaginary method,
and its body (often preceded by an added return) to the method body. In the next
chapter, we will see that it is going to be necessary to vary the simple syntax of the
preceding example for lambda expressions with multiple parameters and with more
elaborate bodies and in cases where the compiler cannot infer parameter types. But
if you have followed the reasoning that brought us to this point, you should have a
basic understanding of the motivation for the introduction of lambda expressions and
of the form that they have taken.
Chapter 1: Taking Java to the Next Level 9
an implicit assumption, dif cult to spot, that the minimum value of an empty list is
Double.MIN_VALUE. But the worst aspect of all is the gap between the developer’s in-
tentions and the way that they are expressed in code. To understand this program, you
have to work out what it’s doing, then guess the developer’s intention (or, if you’re very
fortunate, read the comments), and only then check its correctness by matching the
operation of the program to the informal speci cation you deduced.3 All this work is
slow and error-prone—indeed, the very purpose of a high-level language is supposed
to be to minimize it by supporting code that is as close as possible to the developer’s
mental model. So how can the gap be closed?
Let’s restate the problem speci cation:
If we de-emphasize the parts of the preceding code that do not correspond to the
elements of this informal speci cation, we see what a poor match there is between
code and problem speci cation. Omitting the rst line, in which the list intList is
initially created, we get:
List<Point> pointList = new ArrayList<>();
for (Integer i : intList) {
pointList.add(new Point(i % 3, i / 3));
}
double maxDistance = Double.MIN_VALUE;
for (Point p : pointList) {
maxDistance = Math.max(p.distance(0, 0), maxDistance);
}
This suggests a new, data-oriented way of looking at the program, one that will
look familiar if you are used to Unix pipes and lters: we can follow the progress of
a single value from the source collection, viewing it as being transformed rst from
an Integer to a Point and second from a Point to a double. Both of these transfor-
mations can take place in isolation, without any reference to the other values being
processed—exactly the requirement for parallelization. Only with the third step, nd-
ing the greatest distance, is it necessary for the values to interact (and even then, there
are techniques for ef ciently computing this in parallel).
This data-oriented view can be represented diagrammatically, as in Figure 1-1. In
this gure it is clear that the rectangular boxes represent operations. The connecting
3 The situation is better than it used to be. Some of us are old enough to remember how much of this
kind of work was involved in writing big programs in assembler (really low-level languages, not far removed
from machine code). Programming languages have become much more expressive since then, but there is
still plenty of room for progress.
Chapter 1: Taking Java to the Next Level 11
The circle at the end of the pipeline represents the terminal operation max. Terminal
operations consume a stream, optionally returning a single value, or—if the stream is
empty—nothing, represented by an empty Optional or one of its specializations (see
p. 65):
OptionalDouble maxDistance = distances.max();
www.allitebooks.com
12 Mastering Lambdas
Pipelines like that in Figure 1-1 have a beginning, a middle, and an end. We have
seen the operations that de ned the middle and the end; what about the beginning?
The values owing into streams can be supplied by a variety of sources—collections,
arrays, or generating functions. In practice, a common use case will be feeding the
contents of a collection into a stream, as here. Java 8 collections expose a new method
stream() for this purpose, so the start of the pipeline can be represented as:
And the complete code with which this section began has become:
OptionalDouble maxDistance =
intList.stream()
.map(i -> new Point(i % 3, i / 3))
.mapToDouble(p -> p.distance(0, 0))
.max();
This style, often called uent because “the code ows,” is unfamiliar in the context
of collection processing and may seem initially dif cult to read in this context. How-
ever, compared to the successive iterations in the code that introduced this section, it
provides a nice balance of conciseness with a close correspondence to the problem
statement: “map each integer in the source intList to a corresponding Point, map
each Point in the resulting list to its distance from the origin, then nd the maximum
of the resulting values.” The structure of the code highlights the key operations, rather
than obscuring them as in the original.
As a bonus, the performance overhead of creating and managing intermediate
collections has disappeared as well: executed sequentially, the stream code is more
than twice as fast as the loop version. Executed in parallel, virtually perfect speedup
is achieved on large data sets (for more details of the experiment, see p. 148).
This illustrates what is meant by the slogan for the introduction of parallelism in
Java 8: explicit but unobtrusive. Parallel execution is achieved by breaking the initial
list of Integer values down recursively, as in the pseudocode for solve, until the
14 Mastering Lambdas
sublists are small enough, then executing the entire pipeline serially, and nally com-
bining the results with max. The process for deciding what is “small enough” takes
into account the number of cores available and, sometimes, characteristics of the list.
Figure 1-2 shows the decomposition of a list for processing by four cores: in this case,
“small enough” is just the list size divided by four. (A connected problem is deciding
when a list is “big enough” to make it worthwhile to incur the overhead of executing
in parallel. Chapter 6 will explore this problem in detail.)
fork
fork fork
max max
max
Unobtrusive parallelism is an example of one of the key themes of Java 8; the API
changes that it enables give much greater freedom to library developers. One impor-
tant way in which they can use it is to explore the many opportunities for performance
improvement that are provided by modern—and future—machine architectures.
Chapter 1: Taking Java to the Next Level 15
order; in this case, it implements the interface Comparable and so exposes a compareTo method that an
object can use to compare itself with another. Or a Comparator can be created for the purpose, as in this
case.
16 Mastering Lambdas
To start, let’s turn the two concrete component functions that we have into lambda
form. We know the type of the functional interface for the key extractor function—
Comparator—but we also need the type of the functional interface corresponding to
the function p -> p.getX(). Looking in the package devoted to the declaration of
functional interfaces, java.util.function, we nd the interface Function:
public interface Function<T,R> {
public R apply(T t);
}
So we can now write the lambda expressions for both key extraction and key
comparison:
Function<Point,Double> keyExtractor = p -> p.getX();
Comparator<Double> keyComparer = (d1, d2) -> Double.compare(d1, d2);
natural ordering can’t be used, they instead use the compare methods exposed by the wrapper classes.
Chapter 1: Taking Java to the Next Level 17
Using that method allows us to write the following (assuming a static import dec-
laration of Comparators.comparing) instead of Ì:
Comparator<Point> compareByX = comparing(p -> p.getX()); Í
Compared to Ê, Í is a big improvement: more concise and more immediately
understandable because it isolates and lays emphasis on the important element, the
key extractor, in a way that is possible only because comparing accepts a simple
behavior and uses it to build a more complex one from it.
To see the improvement in action, imagine that our problem changes slightly so
that instead of nding the single point that is furthest from the origin, we decide to
print all the points in ascending order of their distance. It is straightforward to capture
the necessary ordering:
Comparator<Point> byDistance = comparing(p -> p.distance(0, 0));
And to implement the changed problem speci cation, the stream pipeline needs
only a small corresponding change:
intList.stream()
.map(i -> new Point(i % 3, i / 3))
.sorted(comparing(p -> p.distance(0, 0)))
.forEach(p -> System.out.printf("(%f, %f)", p.getX(), p.getY()));
The change needed to accommodate the new problem statement illustrates some
of the advantages that lambdas will bring. Changing the Comparator was straightfor-
ward because it is being created by composition and we needed to specify only the
single component being changed. The use of the new comparator ts smoothly with
the existing stream operations, and the new code is again close to the problem state-
ment, with a clear correspondence between the changed part of the problem and the
changed part of the code.
1.5 Conclusion
It should be clear by now why the introduction of lambda expressions has been so
keenly awaited. In the earlier sections of this chapter we saw the possibilities they will
create for performance improvement, by allowing library developers to enable auto-
matic parallelization. Although this improvement will not be universally available—
one purpose of this book is to help you to understand exactly when your application
will bene t from “going parallel”—it represents a major step in the right direction,
18 Mastering Lambdas
C their introduction to Java. This chapter de nes more precisely what lambda
expressions are, and how and where they can be used in Java programs.
But, as you would expect from the similarity to method declarations, lambdas can
in general have any number of parameters. Except for lambdas that have a single pa-
rameter, like those we have seen, parameter lists must be surrounded by parentheses:
(x, y) -> x + y
() -> 23
Also, until now, parameters have been declared without being given explicit types,
because lambdas are often more readable without them. It is always permissible, how-
ever, to supply parameter types—and sometimes it is necessary, when the compiler
cannot infer them from the context. If you supply any types explicitly, you must supply
all of them, and the parameter list must be enclosed in parentheses:
(int x, int y) -> x + y
Such explicitly typed parameters can be modi ed in the same way as method
parameters—for example, they can be declared final—and annotated.
1 One common question about lambdas concerns whether they are objects as traditionally de ned in
Java. This question has no simple answer, because although lambda expressions do currently evaluate to
object references, they don’t behave like objects in all respects—see, for example, §2.2.1.
Chapter 2: The Basics of Java Lambda Expressions 21
The lambda body on the right-hand side of the function arrow can be an expres-
sion, as in all the examples seen so far. (Notice that method calls are expressions,
including those that return void.) Lambdas like these are sometimes called “expres-
sion lambdas.” A more general form is the “statement lambda,” in which the body is
a block—that is, a sequence of statements surrounded by braces:
(Thread t) -> { t.start(); }
() -> { System.gc(); return 0; }
An expression lambda
args -> expr
The rules for using or omitting the return keyword in a block body are the same as
those for an ordinary method body—that is, return is required whenever an expres-
sion within the body is to return a value, or can instead be used without an argument
to terminate execution of the body immediately. If the lambda returns void, then
return may be omitted or used without an argument.
Lambda expressions are neither required nor allowed to use a throws clause to
declare the exceptions they might throw.
errors, such as mistakenly calling Object methods on the inner class instance
when the enclosing instance was intended.
Parameters are like local declarations in that they may introduce new names:
IntUnaryOperator iuo = i -> { int j = 3; return i + j; };
Lambda parameters and lambda body local declarations may shadow eld names
(i.e., a eld name can be temporarily redeclared as a parameter or local variable
name).
class Foo {
Object i, j;
IntUnaryOperator iuo = i -> { int j = 3; return i + j; }
}
2 The effect of this rule is to exclude from the scope of a lambda any declarations in its supertype (i.e.,
its functional interface). Interfaces can declare—besides abstract methods—static nal elds, static nested
classes, and default methods (see Chapter 7). None of these are in scope for an implementing lambda.
Chapter 2: The Basics of Java Lambda Expressions 23
Since lambda declarations are scoped like simple blocks, the keywords this and
super have the same meaning as in the enclosing environment: that is, they refer re-
spectively to the enclosing object and its superclass object. For example, the following
program prints the message Hello, world! twice to the console:
public class Hello {
Runnable r1 = () -> { System.out.println(this); };
Runnable r2 = () -> { System.out.println(toString()); };
If the same program were written using anonymous inner classes instead of lamb-
das, it would print the result of calling the toString method of Object on the inner
classes. The more common use case of accessing the current instance of the enclos-
ing object, which is straightforward for lambdas, requires the awkward syntax Out-
erClass.this for anonymous inner classes.
One question often arises in connection with the rule for interpreting this: can a
lambda refer to itself? A lambda can refer to itself if its name is in scope, but the rules
restricting forward references in initializers (for both local and instance variables) pre-
vent lambda variable initialization. It is still possible to declare a recursively de ned
lambda:
public class Factorial {
IntUnaryOperator fact;
public Factorial() {
fact = i -> i == 0 ? 1 : i * fact.applyAsInt(i - 1);
}
}
This idiom is considered adequate for the relatively unusual occasions on which
a recursive lambda de nition is required.
environment, we still have to know what we can do with it—and what we should do
with it, which may not be the same thing.
First, notice that many useful lambda expressions do not in fact inherit any names
from their environment. Object-oriented programmers can understand this by analogy
with static methods; although in general the behavior of objects depends on their
state, it is often useful to de ne methods that do not depend on the system state in any
way. The utility class java.lang.Math, for example, contains only static methods—it
makes no sense to take account of the system state in calculating, for example, the
square root of a number. Lambdas can ful ll the same role; a lambda that will produce
the same result as a call of Math.sqrt could be written like this:
DoubleUnaryOperator sqrt = x -> Math.sqrt(x)
Lambdas like these, which interact with their environment only via arguments
and return values, are called stateless, or non-capturing. Capturing lambdas, by con-
trast, can access the state of their enclosing object. “Capture” is the technical term for
the retention by a lambda of a reference to its environment. The connotation is that
the variable has been ensnared by the lambda and held, to be queried or—in other
languages—modi ed, when the lambda is later evaluated.
The access provided by capture is restricted; the central principle of the restriction
is that captured variables may not have their values changed. So although the tradi-
tional term is “variable capture,” in fact it would be more accurate to call it “value
capture.” To understand how the principle is implemented, let’s rst consider local
variable capture; after that, eld capture will be a simpler case.
Traditionally, for local classes in general and anonymous inner classes in particu-
lar, the only local variables from the enclosing method that could be accessed in the
inner class were those declared final. The rule for lambdas in Java 8 is very similar,
with only a little relaxation of the syntactic requirements: to ensure that a lambda can
never change the value of a variable that it has captured from its enclosing environ-
ment, the variable must be effectively nal, meaning that it is never anywhere assigned
to after its initialization. (As of Java 8, anonymous and local classes can also access
effectively nal variables.)
Essentially, effective nality allows omission of the keyword final from the dec-
laration of a variable that is going to be treated as nal. The restriction of capture
to effectively nal variables has attracted controversy around the contrast with other
languages, for example JavaScript, that do not have this restriction. The justi cation
for preventing mutation of local variables by lambdas is that it would introduce very
complex changes that would affect both program correctness and performance, and
that it is in any case unnecessary:
• Correctness: Lifting this restriction would allow the introduction of a new class
of multithreading bugs involving local variables. Local variables in Java have
Chapter 2: The Basics of Java Lambda Expressions 25
until now been immune to race conditions and visibility problems because
they are accessible only to the thread executing the method in which they
are declared. But a lambda can be passed from the thread that created it to
a different thread, and that immunity would therefore be lost if the lambda,
when evaluated by the second thread, were given the ability to mutate local
variables.
Further, regardless of the number of threads involved, a lambda can outlive
the call to the method that evaluated it. If captured locals were mutable, they
too would need to outlive the method call that created them. This change
would introduce, among other consequences, a new possibility of memory
leaks involving local variables.
• Performance: Programs that allow multithread access to mutable variables can
be guaranteed correct, if access to the variables is guarded by synchronization.
But the cost of this would frustrate one of the principal purposes of introducing
lambdas—to allow strategies that ef ciently distribute evaluation of a function
for different arguments to different threads. Even the ability to read the value of
mutable local variables from a different thread would introduce the necessity
for synchronization or the use of volatile in order to avoid reading stale
data.
• Inessentiality: Another way to view this restriction is to consider the use cases
that it discourages. The idioms that are forbidden by this restriction involve
initialization and mutation, like this simple example for summing the values
in a List<Integer>:
int sum = 0;
integerList.forEach(e -> { sum += e; }; // illegal
The Stream API offers better alternatives. In this simple case, we can write
int sum = integerList.stream()
.mapToInt(Integer::intValue)
.sum();
It is no secret that the restriction to effective nality is easily evaded. For example, if
the local variable is an array reference, the variable can be nal but the array contents
26 Mastering Lambdas
will still be mutable. You can use this well-known trick to implement an iterative idiom,
but then executing in parallel, even unintentionally, will expose race conditions. You
can prevent these by synchronization, but only at the cost of increased contention and
reduced performance.3 In short, don’t do it!
It appears that no restriction such as effective nality applies to the capture of eld
names, but this is actually consistent with the treatment of local variables: a reference
to a eldname foo is actually a shorthand for this.foo, with the pseudo-variable
this in the role of an effectively immutable local variable. As with a reference to
a eld of any other object, the only value captured when the lambda is the object
reference—in this case, this. When the lambda is evaluated, this is dereferenced—
in the same way as any local reference variable—and the eld accessed.
It may appear that the arguments against allowing mutation of local variables
should apply to elds as well. But lambdas are intended to provide a gentle impetus
away from mutation, where better alternatives are available, not a wholesale conver-
sion of Java into a functional language. The situation prior to Java 8 was that shared
variable mutation was easily achieved—perhaps too easily!—and the responsibility
on developers was to avoid it or, if that was impossible, to manage it. Mutation of
eld values by lambdas doesn’t change that situation.
This is its function type, which in the simplest and most common case, as
here, is just the method type of a functional interface’s single abstract method:
that is, the method’s type parameters, formal parameter types, return types,
and—where applicable—thrown types. (Function types were previously called
3 Safer alternatives do exist, like AtomicInteger or LongAdder. But a still better alternative is to avoid
“function descriptors”; you may still come across this term.) §2.7.1 explains function
types in more detail.
The function type is what a lambda must match, allowing for some adaptation of
types through boxing or unboxing, widening or narrowing, etc. For example, suppose
we have declared a variable pointList as a ListăPointą, and we now want to
replace every element in it. The method replaceAll is suitable for this purpose:
List<E> i
replaceAll(UnaryOperator<E>) void
• The icon at the top right contains “i” or “s” to indicate whether the dia-
gram contains instance or static methods.
• Wildcard bounds on generic parameter types are omitted (for example,
the parameter to forEach is actually Consumer<? extends T>).
• If you see a type variable in a method declaration but not in the class
declaration, then you can assume it is a type parameter of the method.
• Diagrams are not necessarily complete—they list only the methods im-
portant for the discussion.
For this to compile, the lambda expression must match the function type of
UnaryOperatorăPointą, which is the type of the method
This case is straightforward, but in general the matching process offers some new
challenges to type checking. Previously, any well-formed Java expression had a de-
ned type; now, although every lambda in a well-typed statement or expression im-
plements a functional interface, exactly which functional interface it implements is
only partly determined by the lambda itself. Enough information must be provided by
28 Mastering Lambdas
the context to allow an exact type to be inferred, a process known as target typing.4
Here is an example of a lambda expression that, in isolation, has many possible types:
x -> x * 2
are legal, because in each case the type of the lambda expression is compatible with
the target type, namely the functional interface type of the variable being assigned.
The compiler determines that type for the rst case by typing the lambda parameter
as int, making the lambda as a whole a function from int to int. That matches the
function type of IntUnaryOperator, which also takes an int and returns an int.
Similarly, in the second case, the lambda parameter can be typed as double, making
the lambda a function from double to double, which matches the function type of
DoubleUnaryOperator.
Two lambdas can be textually identical, like those we have just seen, but have
different types—and different meanings; the operation on int values is different from
the one on doubles. The types are established at compile time and cannot be changed,
so there is no way of reusing the same lambda text for different types (unless they are
compatible by casting in the usual way). The next section shows an example (p. 31).
Table 2-1 shows the four basic types of functional interfaces declared in
java.util.function, with sample use cases and examples of lambda instances
(with the type parameter T instantiated to String and U to Integer).
The forty-odd types de ned in java.util.function are evolved from these four
by various combinations of three different routes:
4 Actually, target typing is not altogether new, but its greatly increased use in Java 8 for compiling lambda
expressions is a major departure from traditional type checking of Java programs. Until now, the type of an
expression has rst been computed and then checked for compatibility with its context; if the context is a
method call, an appropriate overload is chosen as the context. Untyped lambda expressions, by contrast,
have no single type to be checked against the context, so in the case of a method call the overload must be
chosen rst, but still be compatible with the lambda. This challenge to the compiler is explored further in
§§2.7 and 2.8.
Chapter 2: The Basics of Java Lambda Expressions 29
• The function types of Consumer, Predicate, and Function all take a sin-
gle argument. There are corresponding interfaces that take two arguments, for
example:
interface BiConsumer<T,U> { void accept(T t, U u); }
interface BiFunction<T,U,R> { R apply(T t,U u); }
interface ToIntBiFunction<T,U> { int apply(T t, U u); }
• Common use cases for Function require its parameter and result to have the
same type. We saw an example of this in the parameter to List.replaceAll.
These use cases are met by specializing the variations of Function to corre-
sponding Operators, for example:
interface UnaryOperator<T> extends Function<T,T> { ... }
interface BinaryOperator<T> extends BiFunction<T,T,T> { ... }
interface IntBinaryOperator { int applyAsInt(int left, int right); }
This library is a “starter kit,” intended to cover the common use cases for functional
interfaces. If your use case is not covered, it is easy to declare a functional interface of
your own, although it is good practice to make use of those in the library when possi-
ble. It is also good practice to annotate custom functional interface declarations with
@FunctionalInterface so that the compiler can check that your interface declares
exactly one abstract method, and so that the Javadoc for it will automatically have an
explanatory section added.
30 Mastering Lambdas
does not compile, because the target type required by the context is not that of a
functional interface, so does not give the compiler the type information that it needs
to compile the lambda.
Six kinds of contexts can provide appropriate target types:
• Method or constructor arguments, for which the target type is the type of the
appropriate parameter. We have already seen straightforward examples of this
throughout Chapter 1.
• Variable declarations and assignments, for which the target type is the type
being assigned to:
Comparator<String> cc =
(String s1, String s2) -> s1.compareToIgnoreCase(s2);
Array initializers are similar except that the target type is the type of the
array component:
IntBinaryOperator[] calculatorOps = new IntBinaryOperator[]{
(x,y) -> x + y, (x,y) -> x - y, (x,y) -> x * y, (x,y) -> x / y
};
• Return statements, for which the target type is the return type of the method:
Runnable returnDatePrinter() {
return () -> System.out.print(new Date());
}
• Lambda expression bodies, for which the target type is the type expected for
the body, which is derived in turn from the outer target type. Consider
Callable<Runnable> c = () -> () -> System.out.println("hi");
so the target type of the lambda body is the function type of Runnable, which
is the type of the run method. This takes no arguments and returns no values,
so matches the inner lambda.
• Ternary conditional expressions, for which the target type for both arms is
provided by the context. For example:
Callable<Integer> c = flag ? (() -> 23) : (() -> 42);
• Cast expressions, which provide the target type explicitly. For example:
Object o = () -> "hi"; // illegal
Object s = (Supplier) () -> "hi";
Object c = (Callable) () -> "hi";
www.allitebooks.com
32 Mastering Lambdas
But when a lambda expression is simply a way of calling a named method of an exist-
ing class, a better way of writing it may be just using the existing name. For example,
consider this code, which prints to the console every element of a list:
pointList.forEach(s -> System.out.print(s));
The lambda expression here simply passes its argument on to the call of print. A
lambda like this, whose only purpose is to supply its arguments to a concrete method,
is fully de ned by the type of that method. So, provided the type can be determined
by some means, a shorthand form containing only the method name will provide
as much information as the full lambda but in a more readable form. Instead of the
preceding code we can write:
pointList.forEach(System.out::print);
to mean exactly the same thing. This way of writing a handle to a concrete method
of an existing class is called a method reference. There are four types of method ref-
erence, as shown in Table 2-2. The rest of this section explains the purpose of each
type.
are references to static methods. To see how static method references can be used,
suppose we want to sort an array of Integer by magnitude, treating the values as
unsigned. The natural order for Integer is numerical (i.e., it takes account of the sign
of values), so we will need to provide an explicit Comparator. We can make use of
the static method Integer.compareUnsigned:
Chapter 2: The Basics of Java Lambda Expressions 33
would be legal, but more verbose and repetitive than its static method reference equiv-
alent:
Arrays.sort(integerArray, Integer::compareUnsigned);
In fact, this method was introduced in Java 8 partly in anticipation of its use in this
way. In the future, one factor in API design will be the desirability of method signatures
being suitable for functional interface conversion.
Notice in Table 2-2 that the syntax ReferenceType::Identifier doesn’t always
represent a reference to a static method. As we are about to see, this syntax can be
used to reference instance methods as well.
Bound references are so called because the receiver is xed as part of the method
reference. Every invocation of the method reference System.out::print will have
the same receiver: System.out. Often, however, you want to invoke a method ref-
erence with the method receiver as well as its arguments taken from the arguments
to the method reference. For this, you need an unbound method reference, so called
because the receiver is not xed; rather, the rst argument to the method reference
is used as the receiver. Unbound method references are easiest to understand when
there is only one argument; for example, to create a Comparator using the factory
method comparing (p. 16), we could use an unbound reference to replace the lambda
expression
Comparator personComp = Comparator.comparing(p -> p.getLastName());
34 Mastering Lambdas
An unbound method reference can be recognized from its syntax: as with static
method references, the form ReferenceType::Identifier is used, but in this case
Identifier refers to an instance method rather than a static one. To explore the dif-
ference between bound and unbound method references, consider calling the method
Map.replaceAll supplying each kind of instance method reference:
Map<K,V> i
replaceAll(BiFunction<K,V,V>) void
The result of each call would replace the corresponding value, so afterwards the
map would contain
{alpha=X-bravo-charlie, bravo=alpha-Y-charlie, charlie=alpha-bravo-Z}
Now let’s restart the example with map at its initial value, and again call
replaceAll, this time with the unbound method reference String::concat, a ref-
erence to an instance method of String that takes a single argument. It seems strange
to be using a one-argument instance method as a BiFunction, but in fact it is the
method reference that is the BiFunction: it is passed two arguments, the key-value
pair, and takes the rst argument as the receiver, so the method itself is called like
this:
key.concat(value)
Chapter 2: The Basics of Java Lambda Expressions 35
The rst argument to the method reference is shifted into the receiver position,
and the second (and subsequent, if any) arguments are shifted left one position. So the
result of the call
map.replaceAll(String::concat)
is
{alpha=alphaX, bravo=bravoY, charlie=charlieZ}
These complications do not have much general impact on the appealingly simple
idea of a single abstract method, but it is useful to bear them in mind when encoun-
tering corner cases like Comparator.
compiles because the lambda expression is compatible with the function type of
UnaryOperatorăIntegerą. These are the conditions for compatibility to hold:5
Arities The lambda and the function type must have the same number of arguments.
Parameter types If the lambda expression is explicitly typed, the types must exactly
match the parameters of the function type; if the lambda is implicitly typed then, for
return type checking (in the next step), its argument types are assumed to be the same
as those of the function type.
Return types
• If the function type returns void, then the lambda body must be a statement
expression (i.e., an expression that can be used as a statement, like a method
call or an assignment), for example:
(int i) -> i++;
• If the function type has a non-void return type, then the lambda body must
return an assignment-compatible value. For example, this lambda body returns
an int value, to be assigned to the Integer result of the UnaryOperator’s
function type:
UnaryOperator<Integer> b = x -> x.intValue();
Thrown types A lambda can throw a checked exception only if that exception, or
one of its supertypes, is declared to be thrown by the function type.
This last condition can cause problems. For example, suppose we want to declare
a method to centralize the handling of IOException for a number of different I/O
operations implemented by parameterless methods of File. A rst thought might be
to declare it like this:
5 This section is a simpli ed version of §15.27.3 of the Java 8 Edition of The Java Language Speci cation
(http://docs.oracle.com/javase/specs/jls/se8/html/).
6 The use of “value-bearing” or “stream-bearing” to describe methods that return values or streams will
soon become common enough to use without explanation; not quite yet, however.
38 Mastering Lambdas
but the function type for Function does not declare any exceptions, so a call like
executeFileOp(f, File::delete);
public U apply(T t)
Since the argument type T to the lambda is known to be String, the type returned
by String.length—which is int, boxed to Integer—can be substituted for U. All
the types are now known and the type bound on U in Ê, which guarantees that the re-
sult of the lambda implements Comparable, can be checked for Integer. Everything
is consistent, and the call can be compiled.
40 Mastering Lambdas
The rst of these could match the function type of SupplierăExceptioną, and
the second could match the function type of FunctionăString,Exceptioną. So if
we declare method overloads
<T> void foo(Supplier<T> factory)
<T,U> void foo(Function<T,U> transformer)
then a call to
foo(Exception::new);
will fail to compile with the error message “reference to foo is ambiguous.” However,
although we can’t make the constructor reference exact as we could by explicitly
typing a lambda, there is a remedy: since foo is generic, we can specify which type
instantiation we want for it by providing a type witness. This syntax requires the re-
ceiver to be stated explicitly:
this.<Exception>foo(Exception::new);
this.<String,Exception>foo(Exception::new);
Of course, this will only work to distinguish generic methods with different num-
bers of type arguments. Suppose instead that we want to call one of two method
overloads:
void bar(IntFunction<String> f)
void bar(DoubleFunction<String> f)
bar((IntFunction<String>) String::valueOf);
bar((DoubleFunction<String>) String::valueOf);
2.9 Conclusion
We’ve now seen all of the syntax changes needed to introduce lambdas; the other
major change for Java 8, the introduction of default methods, supports API evolution by
allowing interfaces to de ne behavior. Default methods will be covered in Chapter 7.
Some of the details in this chapter may have seemed intricate, but overall, the ideas
are quite simple, much simpler in fact than seemed achievable at many points in the
long debate that led to their nal form.
Language evolution is always hard because of the number of interactions that each
new feature can have with existing ones; a case in point is the complexity, outlined
in this chapter, that boxing/unboxing and untyped poly expressions have brought to
method overload resolution. Another is the dif culty of writing lambdas that throw
checked exceptions. Of the many trade-offs that had to be considered in the design
and implementation of lambda syntax, the ones involving type inference will proba-
bly be most noticeable, in particular because it sometimes seems less powerful than
you would expect. That is because the overriding motivation in designing the type
inference rules was to make them simple and consistent—the same motivation that
guided the entire language design.
CHAPTER
3
Introduction to Streams
and Pipelines
44 Mastering Lambdas
n Chapter 1 we saw two main themes in the motivations for introducing lambdas
At its heart is the idea that the power of a system comes more from the
relationships among programs than from the programs themselves. Many
UNIX programs do quite trivial things in isolation, but, combined with
other programs, become general and useful tools.
If you are familiar with the realization of that philosophy in the Unix pipeline,
you will see that re ected clearly in the intermediate operations of streams. But, more
broadly, the principle of composability was central throughout the design of the Java 8
changes: in §1.4 we saw how lambda expressions enable the design of ner-grained,
more composable operations, and later we will see its in uence in the design of stream
collectors.
In following the trail that takes us through the technical detail of streams, we should
be sure not to lose sight of their fundamental justi cation: the more expressive formu-
lation of common aggregate operations. The success of the stream mechanism will be
assessed by how far it achieves the goal of clearer expression of common business
logic.
collection values have been provided by the stream, it is exhausted and cannot be
used any further. But this is not the same as being empty; streams never hold values
at any point. Streams with non-collection sources behave very similarly: for example,
we could generate and print the rst ten powers of two by means of the following
code:
IntStream.iterate(1, i -> i*2)
.limit(10)
.forEachOrdered(System.out::println);
Although, as we will see later, the method iterate generates an in nite stream,
the function represented by the lambda is called only as often—in this case, nine
times—as a value is required for downstream processing.
The central idea behind streams is lazy evaluation: no value is ever computed until
it is required. Java programmers already know about laziness:1 iterators, which we use
every day—explicitly or implicitly—have this characteristic. Creation of an iterator
doesn’t cause any value processing to take place; only a call to its next method makes
it actually return values from its collection. Streams are conceptually quite similar to
iterators, but with important improvements:
• They handle exhaustion in a more client-friendly way. Iterators can signal ex-
haustion only by returning false from a hasNext call, so clients must test for
it each time they require an element. This interaction is inherently fault-prone,
because the time gap between the call of hasNext and next is a window
of opportunity for thread interference. Moreover, it forces element processing
into a sequential straitjacket, implemented by a complex and often inef cient
interaction between client and library.
• Unlike iterators, which always yield their values in a deterministic sequence,
streams can be unordered. We’ll explore this in detail in Chapter 6; for the
moment, all you need to know is that opportunities for optimizing a parallel
stream arise when we are unconcerned about the order in which its values are
presented.
• They have methods (the intermediate operations) that accept behavioral
parameters—transformations on streams—and return the stream resulting from
the transformation. This allows streams to be chained together into pipelines,
as we saw in Chapter 1, providing not only a uent style of programming,
but also the opportunity of big performance gains. We’ll study intermediate
operations in detail later in this chapter.
1 All programmers should know about laziness, of course, since it is the rst of the three great virtues of
a programmer: Laziness, Impatience, and Hubris (Larry Wall et al., Programming Perl, O’Reilly, 2012.)
46 Mastering Lambdas
One big advantage of lazy evaluation can be seen in the “search” methods of
Stream: findFirst, findAny, anyMatch, allMatch, and noneMatch. These are
called “short-circuit” operators because they often make it unnecessary to process
all the elements of a stream. For example, anyMatch has to nd only a single stream
element that satis es its predicate (boolean-valued function) for stream processing to
be complete, just as allMatch has only to nd one that fails to satisfy its predicate.
The ability to avoid generating and processing unnecessary elements can obviously
save a great deal of work—and, in the case of in nite streams, it is only the combina-
tion of lazy evaluation and short-circuit operators that makes it possible to complete
stream processing at all. True, this advantage of lazy evaluation can also, in principle,
be realized by iterators (in the case of collections processing) or explicit looping (in
the case of generator functions), but code using the Stream API is much easier to read
and (eventually) to write.
Lazy evaluation provides another major advantage: it allows multiple logical op-
erations to be fused together into a single pass on the data. Recall the code developed
in Chapter 1 to rst illustrate the idea of a pipeline:
OptionalDouble maxDistance =
intList.parallelStream()
.map(i -> new Point(i % 3, i / 3))
.mapToDouble(p -> p.distance(0, 0))
.max();
The uent style is natural and easy to read, but to understand what is happening
you need to bear in mind the implication of lazy evaluation. It is easier to see this if,
for demonstration purposes, the pipeline is broken into a stream declaration and the
terminal operation call:
DoubleStream ds =
intList.parallelStream()
.map(i -> new Point(i % 3, i / 3))
.mapToDouble(p -> p.distance(0, 0));
// the pipeline has now been set up, but no data has been processed yet
Separating out the terminal operation call clari es the situation: it is directly calling
the transformational code of the behavioral parameters, which is all executed in a
Chapter 3: Introduction to Streams and Pipelines 47
single pass. Moreover, because for each element these operations are executed by a
single thread, optimizations that depend on code and data locality can operate.
This is a very different model from bulk collection processing, which is normally
implemented by a series of passes, each one transforming every element of a collection
and storing the results in a new one. In Chapter 1 we saw how a sequence of loops
(e.g., p. 10) can be translated into an equivalent sequence of stream operations (p. 12).
The stream code is both pithier and more ef cient (as reported on p. 148), because it
is implemented by fusing the separate loop operations.
(deterministic) or non-fair (nondeterministic) thread scheduling policy. Fair locks are at least an order of
magnitude less ef cient and in practice are rarely used, as fair scheduling is not often required (Brian Goetz
et al., Java Concurrency in Practice, Addison-Wesley, 2006.)
48 Mastering Lambdas
This insight also helps to explain the choice of operations de ned in the API,
which avoids any that could not be implemented equally well in sequential and par-
allel modes. Some “obvious” operations conceal a deep sequential bias—for example,
the “takeWhile” operation, which on other platforms processes a stream until it en-
counters a sentinel value. While this operation is not actually infeasible in parallel, its
implementation would have been so expensive—both in design and execution—that
it was deprioritized.
This change in thinking can be reframed: programs with iteration contain two dif-
ferent kinds of information: what is to be done, and how it is to be done. Decoupling
these two kinds of content leads to a development model in which we write parallel-
ready code specifying functional behavior alone, and then separately specify its ex-
ecution mode—ideally, delegating the implementation of the execution to a library.
This is a big change: for many of us, the sequential model is so deeply ingrained that
it will take an effort to adjust to the implications of dethroning it. But that effort will
be worthwhile; not only will the resulting programs be future-proof, but our code will
be clearer, more concise, and more maintainable.
involves executing the Integer methods intValue and valueOf, respectively, before
and after every addition. We would like to avoid such overheads in applications that
process large collections of values; one way of doing this is to de ne streams whose
elements are already primitive rather than reference values. This decision brings ad-
vantages besides improved performance, in the form of useful numeric-only methods
like sum and the ability to create streams containing ranges of numbers. For example,
using the type IntStream, representing a stream of primitive int values, the preced-
ing code can be rewritten:
Chapter 3: Introduction to Streams and Pipelines 49
The improvements here are both in readability, with specialized range and max
methods, and in ef ciency, with no requirement for boxing and unboxing. §6.6 reports
the performance difference between these two code fragments, for various range sizes.
For suf ciently large data sets, the unboxed code is an order of magnitude faster.
The primitive stream types are IntStream, LongStream, and DoubleStream—
chosen as the most commonly used numeric types, the ones for which the cost-bene t
trade-off was most favorable. In case your application requires streams of one of the
other numeric types, these are also supported: float values can be embedded in a
DoubleStream, and char, short, and byte values in an IntStream. The APIs of
the primitive streams are very similar to one another, and all are suf ciently similar
to the reference stream type Stream to be discussed together in this chapter. As a
preliminary, it is worth noting the possible stream type interconversions:
• For boxing, each of the primitive stream types has a method boxed, which
returns a Stream of the appropriate wrapper type, for example:
Stream<Integer> is = IntStream.rangeClosed(1, 10).boxed();
each of these stages closely enough to build an intuition for the possibilities of stream
programming; subsequent chapters will explore the features that give them their full
power.
Collection<T> i
stream() Stream<T>
parallelStream() Stream<T>
The contract for parallelStream actually states that its return is a “pos-
sibly parallel Stream.” As we will see, it is the collection’s responsibility to
present its data in parallel, and not every collection can achieve that. Although
this will affect performance, it has no impact on functional behavior. We will
explore this issue fully in Chapter 6.
• java.util.stream.StreamăTą: This interface exposes a number of static
factory methods, with default implementations. In this chapter, we shall be
using Stream.empty and two overloads of Stream.of (the primitive stream
types have analogous methods):
3 The conventions for API diagrams are listed in the box on page 27. In brief: the icon at top right indicates
whether the diagram contains static or instance methods; wildcard bounds on generic types are omitted;
undeclared generic types in method signatures can be assumed to be method type parameters; diagrams
show only selected methods of the class or interface.
Chapter 3: Introduction to Streams and Pipelines 51
Stream<T> s
empty() Stream<T>
of(T) Stream<T>
of(T...) Stream<T>
These methods are enough to start our exploration of the features of streams and
pipelines; in Chapter 5 we will investigate other stream-bearing methods of the plat-
form libraries.
Book i
getTitle() String
getAuthors() List<String>
getPageCounts() int[ ]
getTopic() Topic
getPubDate() Year
getHeight() double
In exploring the different stream operations, it will be useful to have a few concrete
examples to focus on. Here are declarations for three books from my library:
Book nails = new Book("Fundamentals of Chinese Fingernail Image",
Arrays.asList("Li", "Fu", "Li"),
new int[]{256}, // pageCount per volume
Year.of(2014), // publication date
25.2, // height in cms
MEDICINE);
In the examples that follow, a real-life persistence mechanism has been replaced
by a placeholder variable library, declared as a ListăBooką. Figure 3-1 presents
examples of code using streams to process my library; they are all explained later in
this chapter, but are collected together here to give you an idea of what is possible
using stream processing.
This section will explore the action of the various intermediate operations by look-
ing at examples of how each could be used in operations to process my library. Al-
though it will mainly discuss operations on the reference stream type Stream, the
Chapter 3: Introduction to Streams and Pipelines 53
Use this sorted stream to create a stream of authors, in order of book title,
with duplicates removed:
Stream<String> authorsInBookTitleOrder = library.stream()
.sorted(Comparator.comparing(Book::getTitle))
.flatMap(book -> book.getAuthors().stream())
.distinct();
ideas carry over to primitive stream types also; for intermediate operations, their API
closely resembles that of Stream.
Filtering
The method filter allows selective processing of stream elements:
Stream<T> i
filter(Predicate<T>) Stream<T>
Its output is a stream containing only those elements of the input stream that satisfy
the supplied Predicate (p. 29). For example, we could isolate the computing books
in my collection by constructing this stream:
Stream<Book> computingBooks = library.stream()
.filter(b -> b.getTopic() == COMPUTING);
Figure 3-2 illustrates the action of filter. In part (a) two Book elements are ar-
riving on the input stream: nails, followed by dragon; in part (b) the output stream
contains only dragon, as that is the only one whose topic is COMPUTING.
Stream<Book> Stream<Book>
dragon, nails
(a)
dragon
(b)
Mapping
The method map transforms each stream element individually using the supplied
FunctionăT,Rą (p. 29):
Chapter 3: Introduction to Streams and Pipelines 55
Stream<T> i
map(Function<T,R>) Stream<R>
Its output is a stream containing the results of applying the Function to each
element of the input stream. For example, it could be used to create a Stream of
publication dates:
Stream<Year> bookTitles = library.stream()
.map(Book::getPubDate);
Figure 3-3 illustrates the action of map. In part (a) the input elements are the same
two Book instances; in part (b) the resulting stream contains references to the Year
objects obtained by calling getPubDate on each instance.
map(Book::getPubDate)
Stream<Book> Stream<Year>
dragon, nails
(a)
1986, 2014
(b)
Stream<T> i
mapToInt(ToIntFunction<T>) IntStream
mapToLong(ToLongFunction<T>) LongStream
mapToDouble(ToDoubleFunction<T>) DoubleStream
56 Mastering Lambdas
One-to-Many Mapping
An alternative (though less ef cient) way of implementing the last example would be
to convert the stream of Book into a stream of Author, each one representing an au-
thorship. We could then simply apply the terminal operation count to nd the number
of elements in the stream. But map isn’t suitable for this purpose, because it performs
a one-to-one transformation on elements of the input stream whereas this problem
requires a single Book to be transformed to several Author elements in the output
stream. The operation we need will map each Book into a stream of Author—writing,
say, book.getAuthors().stream()—then atten the resulting series of streams into
a single stream of Author for all books. That is the operation flatMap:
Stream<T> i
flatMap(Function<T, Stream<R>>) Stream<R>
Figure 3-4 shows how it works. In part (a) the same two Book instances are on the
input stream, in (b) each is mapped to a stream of String, and in (c) these individual
streams are fed into the output stream.
Like the methods corresponding to map for conversion to primitive streams,
there are primitive conversion methods: flatMapToInt, flatMapToLong, and
flatMapToDouble. For example, we could get the total page count of all volumes
Chapter 3: Introduction to Streams and Pipelines 57
flatMap(b -> b.getAuthors().stream())
Stream<Book> Stream<String>
dragon, nails
(a)
nails "Li","Fu","Li"
dragon …,"Lam","Aho"
(b)
…,"Lam","Aho","Li","Fu","Li"
(c)
of all books by creating an individual IntStream for each Book using IntStream.of
and then concatenating them using flatMapToInt:
int totalPageCount = library.stream()
.flatMapToInt(b -> IntStream.of(b.getPageCounts()))
.sum();
The primitive stream types have only flatMap; there are no type conversion at-
mapping operations.
Debugging
As we saw in §3.1, calling the terminal operation of a pipeline results in the execution
of a fusion of its intermediate operations. As a result, the usual debugging technique
of stepping through operations is not available for streams. The alternative provided
by the Stream API, the operation peek, differs from other intermediate operations in
58 Mastering Lambdas
that the output stream contains the same elements, and in the same order, as the input
stream. The purpose of peek is to allow processing to be carried out on elements of
streams that are intermediate in a pipeline; for example, we could print the title of
every book that passes a lter before sending them downstream for further processing
(in this case, accumulating them to a List: see §3.2.4):
List<Book> multipleAuthoredHistories = library.stream()
.filter(b -> b.getTopic() == Book.Topic.HISTORY)
.peek(b -> System.out.println(b.getTitle()))
.filter(b -> b.getAuthors().size() > 1)
.collect(toList());
If you are familiar with Unix pipelines, you will recognize the similarity to tee,
though peek is more general, accepting any appropriately typed Consumer as its ar-
gument. This method was provided to support debugging and, because it works by
means of side-effects, should not be used for any other purpose. The issues of side-
effects and interference are explored in §3.2.3.
Stream<T> i
sorted() Stream<T>
sorted(Comparator<T>) Stream<T>
distinct() Stream<T>
The second overload accepts a Comparator; for example, the static method
Comparator.comparing (p. 16) creates a Comparator from a key extractor:
The second operation in this group, distinct, removes duplicates from a stream.
Its output is a stream containing only a single occurrence of each of the elements in
its input—all duplicates, according to the equals method, are dropped. For example,
to use the sorted stream of books we just created to produce a list of authors, but with
duplicates removed, we would write
Stream<String> authorsInBookTitleOrder = library.stream()
.sorted(Comparator.comparing(Book::getTitle))
.flatMap(book -> book.getAuthors().stream())
.distinct();
Since the equals method may not take account of all the elds of a stream el-
ement, “duplicates” may in fact be distinguishable by other methods. So the idea of
stability, discussed in connection with sorted, also applies to distinct: if the in-
put stream is ordered, the relative order of elements is maintained. From a number of
equal elements, the rst will be chosen; if the input stream is unordered, any element
may be chosen (likely to be a less expensive operation in parallel pipelines).
Truncating
This category groups two operations that constrain the output from streams:
Stream<T> i
skip(long) Stream<T>
limit(long) Stream<T>
The operations skip and limit are duals: skip discards the rst n stream ele-
ments, returning a stream with the remainder, whereas limit preserves the rst n
elements, returning a stream containing only those elements. For example, we could
get the rst 100 books in alphabetical order of title by writing:
Stream<Book> readingList = library.stream()
.sorted(Comparator.comparing(Book::getTitle))
.limit(100);
60 Mastering Lambdas
Of course, there may be fewer than 100 books in the library; in that case, the
stream that limit returns will contain all of them. Analogously, skip could be used
to create a stream containing all but the rst 100 elements:
Stream<Book> remainderList = library.stream()
.sorted(Comparator.comparing(Book::getTitle))
.skip(100);
3.2.3 Non-interference
Among the features that the Stream API offers programmers is the execution of parallel
operations—even on non-threadsafe data structures. This is a valuable bene t, so it
should not be surprising that it does not come entirely free of charge. The cost is
cooperation with the programming model, according to the rules described in this
section. These are not arbitrary restrictions; indeed, they restrict you only if you are
thinking in sequential mode. If you are already designing parallel-ready code, they
will seem natural because your thinking will be free of hidden assumptions about
which thread will execute the behavioral parameters to stream operations, and about
the order in which elements will be processed. You will be aware that, potentially,
every behavioral parameter can be executed in a different thread, and that the only
ordering constraint is imposed by the encounter order (§6.4) of the stream.
The rules are mostly to protect your program from interference between the mul-
tiple threads required for parallel execution. Sometimes the framework itself provides
that guarantee, as we will see that it does for collectors. But it makes no such guar-
antee for the behavioral parameters to stream operations. Here is an example of how
not to create a classi cation map, mapping each topic to a list of books on that topic:
// behavioral parameter with state - don't do this!
Map<Topic,List<Book>> booksByTopic = new HashMap<>();
library.parallelStream()
.peek(b -> {
Topic topic = b.getTopic();
List<Book> currentBooksForTopic = booksByTopic.get(topic);
if (currentBooksForTopic == null) {
currentBooksForTopic = new ArrayList<>();
}
currentBooksForTopic.add(b);
booksByTopic.put(topic, currentBooksForTopic); // don't do this!
})
.anyMatch(b -> false); // throw the stream elements away
The lambda that forms the behavioral parameter to peek is not threadsafe.
When it is executed in parallel, only bad things can happen: map entries will
Chapter 3: Introduction to Streams and Pipelines 61
and then call a library method supplying books as an argument, the library method
may well call parallel on your stream before calling a terminal operation, result-
ing in the same race conditions as before. The method parallel, like sequential,
provides a hint to the implementation about which execution mode to choose; the
execution mode is xed for the entire pipeline only at the point that terminal opera-
tion execution begins, so control over it does not remain with you as the creator of the
pipeline. The lesson of this example is once again that a stream with a non-threadsafe
behavioral parameter is an accident waiting to happen.
The classi cation map example of this section might lead you to think that the
main problem with stateful behavioral parameters is thread safety. In fact, the docu-
mentation of java.util.stream gives a general description of a stateful operation as
“one whose result depends on any state that might change during the execution of the
stream pipeline.” For an example of a stateful operation that goes wrong despite being
threadsafe, let’s allow each Book to refer to another and to contain a boolean eld
indicating whether it is referred to. For a test, we could set up a ListăBooką, each
referring to an adjacent element in the list (treating the list as circular), then execute
this code:
long count = bookList().stream()
.peek(b -> b.refersTo.referred = true)
.filter(b -> b.referred) // stateful!
.count();
• Search operations, which are used to detect a stream element satisfying some
constraint, so may sometimes complete without processing the entire stream.
• Reductions, which return a single value that in some way summarizes the val-
ues of the stream elements. This topic is big enough to require the next chapter
all to itself. For now, we will restrict our view to two aspects: convenience re-
duction methods like count and max, and simple collectors, which terminate
streams by accumulating their elements in a collection.
Chapter 3: Introduction to Streams and Pipelines 63
Search Operations
The Stream methods that can be classi ed as “search” operations fall into two groups:
the rst group comprises matching operations, which test whether any or all stream
elements satisfy a given Predicate:
Stream<T> i
anyMatch(Predicate<T>) boolean
allMatch(Predicate<T>) boolean
noneMatch(Predicate<T>) boolean
The names are indicative: anyMatch returns true on nding an element that sat-
is es the predicate; allMatch returns false on nding any element that does not
satisfy it, and returns true otherwise; noneMatch is analogous, returning false on
nding any element that does satisfy it, and returning true otherwise.
For example, planning the organization of my bookshelves is complicated by the
fact that these built-in shelves are not all of the same height. If I need to know whether
my history books can be placed on the top shelf, whose headroom is only 19 cm, I
can write:
boolean withinShelfHeight = library.stream()
.filter(b -> b.getTopic() == HISTORY)
.allMatch(b -> b.getHeight() < 19);
But this is not really what I want if I am planning the allocation of topics to shelves.
It would be much better to determine in a single operation all of the topics that would
t on this low shelf (or, better still, to calculate for each topic what headroom it re-
quires). This aim can be achieved by wrapping this code in a loop, but only at the cost
of repeatedly calling library.stream—an inef cient and ugly solution. In Chapter 4
we will see much better solutions using collectors.
Note that, in accordance with the standard rules of logic, allMatch called on an
empty stream will always return true.
The second group of search operations is made up of the two “ nd” methods:
findFirst and findAny:
Stream<T> i
findFirst() Optional<T>
findAny() Optional<T>
64 Mastering Lambdas
These return a stream element if any is available, possibly differing in which one
they return. The return type needs a little explanation. At rst sight, we might naïvely
expect to write
Book anyBook = library.stream()
.filter(b -> b.getAuthors().contains("Herman Melville"))
.findAny(); // doesn't compile
But what if library contains no elements, so the stream that it sources is empty?
In this situation, there is no answer for findAny to return. The traditional Java solution
of returning null is unsatisfactory: it is ambiguous as to whether a null was actually
matched in the stream or whether it indicates the absence of a value, as in the result
of Map.get. Moreover, null cannot be used with primitive streams. The alternative
created in Java 8 is the class java.util.OptionalăTą; an instance of this class is a
wrapper that may or may not contain a non-null value of type T. The “ nd” methods
allow for the possibility of an empty stream by returning Optional, so the correct
version of the preceding code is
Optional<Book> anyBook = library.stream()
.filter(b -> b.getAuthors().contains("Herman Melville"))
.findAny();
The use case for findFirst is situations like this, where the problem statement
concerns nding the rst match in an ordered stream. If any match in an ordered
stream is acceptable, on the other hand, you should prefer findAny; findFirst
would be liable to do unnecessary work in maintaining an order that is not needed.
With an unordered stream, there is no real distinction between the two methods.
The search operations combine with lazy evaluation to save work, as described
at the beginning of this chapter: the matching methods can return after nding a sin-
gle element that satis es (or not, depending on the particular matching operation)
its Predicate. The “ nd” methods always return on nding a single element. Lazy
evaluation ensures that when, in these cases, stream processing stops, no (or few)
unnecessary elements have been generated.
Chapter 3: Introduction to Streams and Pipelines 65
Optional<T> i
get() T
ifPresent(Consumer<T>) void
isPresent() boolean
orElse(T) T
orElseGet(Supplier<T>) T
Reduction Methods
The Stream API is designed, to borrow Larry Wall’s well-known slogan, to make the
easy jobs easy without making the hard jobs impossible. We shall see how to do the
hard jobs, using collectors and spliterators, in the next two chapters. But in practice,
most jobs are easy ones, and for these we have convenience reductions—specialized
66 Mastering Lambdas
variants of Stream.reduce (§4.4), designed for simplicity of use. Here the numeric
primitive streams, for example IntStream, provide more features:
IntStream i
sum() int
min() OptionalInt
max() OptionalInt
count() long
average() OptionalDouble
summaryStatistics() IntSummaryStatistics
These methods are self-explanatory except for the last, summaryStatistics. This
creates an instance of the class IntSummaryStatistics, a value object with ve
properties: average, count, max, min, and sum. It is useful in situations in which we
want multiple results from a single pass over the data. For example, code to obtain
and print the summary statistics of the page count of the books in my library (summing
the page count for works spanning multiple volumes) like this:
IntSummaryStatistics pageCountStatistics = library.stream()
.mapToInt(b -> IntStream.of(b.getPageCounts()).sum())
.summaryStatistics();
System.out.println(pageCountStatistics);
Stream<T> i
count() long
min(Comparator<T>) Optional<T>
max(Comparator<T>) Optional<T>
Stream<T> i
collect(Collector<T, A, R>) R
Collectors s
toSet() Collector<T, ?, Set<T>>
toList() Collector<T, ?, List<T>>
For example, to collect the titles of the books in my library into a Set, I can write:
Set<String> titles = library.stream()
.map(Book::getTitle)
.collect(Collectors.toSet());
The usual idiom is to statically import the Collectors factory methods, making
the last line of this example
.collect(toSet());
68 Mastering Lambdas
Figure 3-5 shows how this collector works for these two books. In part (a) the input
elements are the two Book instances we saw earlier. In (b) they are being placed in
the Set; internally, that operation is implemented using the method Set.add, with the
usual semantics: elements are unordered and duplicates discarded. In (c) the stream
is exhausted and the populated container is being returned from the collector.
Collector
Stream<Book>
Set<Book>
dragon, nails
(a)
nails
dragon
(b)
nails
dragon
(c)
The method toList is closely analogous to toSet, with the method List.add
implementing the accumulation; so, if the stream is ordered, the created List has the
same ordering. (If not, stream elements are added in nondeterministic order.) Creating
a collector that accumulates to a Map, however, is a little more complex. Two method
overloads of toMap are provided for this, each accepting a key-extracting function
from T to K and a value-extracting function from T to U. Both of these functions are
applied to each stream element to produce a key-value pair.
Chapter 3: Introduction to Streams and Pipelines 69
Collectors s
toMap(Function<T,K>, Function<T,U>) Collector<T, ?, Map<K,U>>
toMap(Function<T,K>, Function<T,U>, BinaryOperator<U>) Collector<T, ?, Map<K,U>>
For example, I could use a collector from the rst overload of toMap to map each
book title in my collection to its publication date:
Map<String,Year> titleToPubDate = library.stream()
.collect(toMap(Book::getTitle, Book::getPubDate));
Figure 3-6 shows how the two functions provided to the toMap collector work to
populate the Map.
Map<String,Year>
Book::getTitle Book::getPubDate
Side-Effecting Operations
We rst met these operations at the beginning of this book, as the simplest way of
replacing external iteration. They terminate a stream, applying the same Consumer to
each element in turn. They are the main exception to the convention that the Stream
API does not support operations with side-effects:
Stream<T> i
forEach(Consumer<T>) void
forEachOrdered(Consumer<T>) void
We have seen both of these methods in use already. The main difference between
them is obvious from the names: forEach is designed for ef cient execution on par-
allel streams, so it does not preserve encounter order. Less obviously, it makes no
guarantee about the synchronization of its operations, which can be executed on dif-
ferent threads. So, for example, suppose I wanted to calculate the total page count of
all the books in my library, and naïvely declared an instance variable pageCount to
calculate the total by writing, I might then write:
// don't do this - race conditions!
library.stream()
.forEach(b -> {pageCount += b.getPageCount();});
This code is incorrect because the additions to pageCount can occur in different
threads without synchronization, so is subject to a race condition (simultaneous inter-
fering execution). Of course, the principle of writing parallel-ready code should have
warned me against this code even though the stream is sequential. As reinforcement,
the API documentation for forEach warns me that the action will be executed “in
whatever thread the library chooses.”
That code could be “corrected” by using forEachOrdered, which preserves or-
dering and guarantees synchronization, instead of forEach. Then I could rely on the
result from this code:
Chapter 3: Introduction to Streams and Pipelines 71
but although this code works, it is not parallel-ready; the use of forEachOrdered
forces execution into a sequential mode. Code written in the spirit of the Stream API
is both much more readable and more ef cient in the general case:
int totalPageCount = library.stream()
.mapToInt(Book::getPageCount)
.sum();
3.3 Conclusion
We now have an overview of the basic operation of streams and pipelines, but we
have yet to see many practical examples of using this style of programming to solve
real processing problems. In the next two chapters we will build on what we have
learned, exploring the API in more detail and investigating strategies for breaking down
complex queries into step-by-step operations on streams.
www.allitebooks.com
CHAPTER
4
Ending Streams:
Collection and Reduction
74 Mastering Lambdas
P divided into three groups: search operations, reductions, and operations with
side-effects. Although we surveyed all three groups then, there is much more
to say about reductions; this chapter takes up that story.
Broadly speaking, reductions are operations which return a single value that in
some way summarizes the values of the stream elements. But this description does
not make a distinction—very important to Java programmers—between summarizing
operations that create new objects and those that change their operands. Traditional
ideas about reduction have been developed for languages supporting immutability; al-
though modern Java programming practice encourages immutability in general, most
pipelines will end in mutable collections. So collection, also called mutable reduc-
tion, is the most important kind of terminal operation in the Stream API. Collection
is a generalization of classical reduction, which in Java programming will be useful
mainly with primitive streams. Collection is like reduction in the sense that it summa-
rizes values of stream elements into a single object, but it does so via mutation.
Here is a very simple comparison between traditional Java bulk processing
code and collection. If we allow a real-life data source to be modeled by an
IterableăBooką, declared here as library, we would usually accumulate values
from it to a ListăBooką like this:
List<Book> bookList = new ArrayList<>();
for (Book b : library ) {
bookList.add(b);
}
The collector version has a number of advantages, beyond the obvious improve-
ment in conciseness and readability: the stream operations can be safely executed in
parallel, even if the List into which the stream elements are accumulated (ArrayList
is used in the current implementation) is not threadsafe. Further, the collector pattern
turns out to be very exible, and collectors are easily composed: Figure 4-1 provides
a preview of examples developed later in this chapter to give an advance idea of this
exibility.
In the imperative version, the key components are the creation of the container
(new ArrayListăą()) and the accumulation of elements into it (bookList.add(b)).
Collectors also have components for these two tasks, called, respectively, the supplier
and the accumulator. We can identify these components on the diagrams introduced
Chapter 4: Ending Streams: Collection and Reduction 75
A map partitioning books into ction (mapped to true) and non- ction ( false):
Map<Boolean,List<Book>> fictionOrNon = library.stream()
.collect(partitioningBy(b -> b.getTopic() == FICTION));
A map associating each topic with the book on that topic having the most authors:
Map<Topic,Optional<Book>> mostAuthorsByTopic = library.stream()
.collect(groupingBy(Book::getTopic,
maxBy(comparing(b -> b.getAuthors().size()))));
A map associating each topic with the total number of volumes on that topic:
Map<Topic,Integer> volumeCountByTopic = library.stream()
.collect(groupingBy(Book::getTopic,
summingInt(b -> b.getPageCounts().length)));
A map from each topic to the concatenation of all the book titles on that topic:
Map<Topic,String> concatenatedTitlesByTopic = library.stream()
.collect(groupingBy(Book::getTopic,
mapping(Book::getTitle, joining(";"))));
in Chapter 3. Figure 4-2 shows the relationship between Figure 3-5 (p. 68), the diagram
for the collector created by Collectors.toSet, and these two components.
Accumulator Collector
Container created
voss by supplier
The supplier de nes the container type, always shown shaded in these diagrams
(in Figure 4-2, a framework-chosen implemention of SetăBooką). The accumulator
populates the container from the incoming stream; in Figure 4-2, it is Set::add.
We already met most of the rst group in Chapter 3 (p. 67ff)—those for which
the supplier is the constructor for a framework-supplied collection: for toSet it is an
implementation of Set, for toList of List, and for the two overloads of toMap it is
Chapter 4: Ending Streams: Collection and Reduction 77
an implementation of Map. The accumulators for these collectors are also obvious: for
the Collection implementations it is add, and for the toMap collectors, put.
The remaining method in this group is joining, which returns a collector that
concatenates a stream of String objects into a StringBuilder whose contents are
then returned as a String (this is further explained in §4.3.1):
Collectors s
joining() Collector<CharSequence,?,String>
joining(CharSequence) Collector<CharSequence,?,String>
joining(CharSequence, CharSequence, CharSequence) Collector<CharSequence,?,String>
The rst overload of joining simply concatenates the strings in its input stream.
The second overload accepts a CharSequence, inserting it as a delimiter between its
input strings. For example, here is code to concatenate the titles of all books in my
library, separating the titles by a double colon:
String concatenatedTitles = library.stream()
.map(Book::getTitle)
.collect(joining("::"));
The third overload accepts a delimiter, a pre x, and a suf x: for example, for a
book b, the following code will produce a string concatenating the book’s authors,
separated by commas, beginning with the book’s title and ending with a newline:
b.getAuthors().stream().collect(joining(
",",
b.getTitle() +": ",
"\\n"))
We could use this code to create a list of strings, each containing all the authors’
names for a single book:
List<String> authorsForBooks = library.stream()
.map(b -> b.getAuthors().stream()
.collect(joining(", ", b.getTitle() + ": ", "")))
.collect(toList());
The collectors we have seen so far all accumulate to a collection chosen by the
framework. Many Collectors methods have variants that let you specify a supplier
for the container:
78 Mastering Lambdas
Collectors s
<M extends Map<K,U>> toMap(Function<T,K> keyExtracter,
Function<T,U> valueExtracter,
BinaryOperator<U> mergeFunction,
Supplier<M> mapFactory
) Collector<T,?,M>
For example, on page 69 we saw toMap used with a merge function to construct
a map from book titles to the date of the latest edition. If the contents of the map will
subsequently be required in alphabetic order of titles, placing them into a sorted map
might improve performance:
Map<String,Year> titleToPubDate = library.stream()
.collect(toMap(Book::getTitle,
Book::getPubDate,
(x, y) -> x.isAfter(y) ? x : y, Ê
TreeMap::new));
Since the merge function is actually choosing the greater of two values of
java.time.Year using its natural order, line Ê can be replaced by
BinaryOperator.maxBy(Comparator.naturalOrder());
The three overloads of toMap that we have now seen are increasingly general: the
rst one we met would accept only key- and value-extracting functions. The second
accepts in addition a merge function, and this third one also accepts a supplier. This
means that if you want to specify a supplier, you must also provide a merge function.
But if you want to, you can re-create the behavior of the simple overload of toMap in
the presence of duplicate keys by supplying
(x,y) -> { throw new IllegalStateException(); }
A third group of Collectors methods return collectors with the function of clas-
sifying stream elements. They are related to the collectors returned by toMap, with
the difference that instead of using a value-extracting function, the values they place
in the map are the elements themselves—or rather, lists of the elements, one List
corresponding to each classi cation key:
Collectors s
groupingBy(Function<T,K>) Collector<T,?,Map<K,List<T>>>
Figure 4-3 shows how this example works: for each incoming element, the classi-
er function is called to determine the key. If the key is not present, it is added, with
the value being a singleton List containing only the current element. If the key and a
List value are already present, the stream element is added to the list. The supplier for
this collector is the no-arguments constructor for a framework-chosen implementation
of MapăTopic,ListăBookąą, and the accumulator has the function of adding each
incoming element as just described. Both of these defaults can be overridden, as we
shall see in the next section.
A variant of groupingBy is the convenience method partitioningBy, in which
the key type K is specialized to Boolean:
Collectors s
partitioningBy(Predicate<T>) Collector<T,?,Map<Boolean,List<T>>>
For example, this code would map true to a list of my ction books and false
to a list of the non- ction ones:
Map<Boolean,List<Book>> fictionOrNonFiction = library.stream()
.collect(partitioningBy(b -> b.getTopic() == FICTION ||
b.getTopic() == SCIENCE_FICTION));
COMPUTING [ dragon,... ]
voss FICTION [ voss ]
Book::getTopic Map<Topic,List<Book>>
etc.) Rather than provide specialized classifying collectors for these various use cases,
the Collector API provides an extension point by allowing collectors to be composed
together.
Composition allows new collectors to be created by combining the effects of
two or more collectors or other operations. The most important form it takes is to
allow groupingBy to be combined with a second, “downstream,” collector. In this
composition, groupingBy provides the classi er function and the classi cation keys,
and and the elements associated with a given key are forwarded to the downstream
collector for further summarization. Figure 4-5 shows how this works for the simple
groupingBy overload that we already saw, in which the default downstream collector
is the one returned by Collectors.toList.
voss FICTION
Downstream collector
manages stream
elements sent by voss [ voss ]
classifier function
Collectors s
groupingBy(Function<T,K>, Collector<T,A,D>) Collector<T,?,Map<K,D>>
82 Mastering Lambdas
Returning to the example of representing a distribution, we can now see that the
problem can be solved by composing groupingBy with a different downstream col-
lector, one that counts the incoming elements. And, in fact, there is a Collectors
factory method for exactly this purpose: Collectors.counting. (Its implementation
is explained in §4.4.3). Figure 4-6 shows how it will work in this example.
Collector from
Collectors.counting
voss FICTION
• Corresponding to the terminal operations max and min are the Collectors
factory methods maxBy and minBy. For example, we could create a mapping
that would contain, for each topic, the book with the greatest number of au-
thors:
Map<Topic,Optional<Book>> mostAuthorsByTopic = library.stream()
.collect(groupingBy(Book::getTopic,
maxBy(comparing(b -> b.getAuthors().size()))));
or one that would contain the average height of the books, by topic:
Map<Topic,Double> averageHeightByTopic = library.stream()
.collect(groupingBy(Book::getTopic,
averagingDouble(Book::getHeight)));
Collectors s
mapping(Function<T,U>, Collector<U,A,R>) Collector<T,?,R>
Figure 4-7 shows how collectors created by this method work. With the help of
Collectors.mapping, it is straightforward to create a mapping from each title to the
concatenation of all titles on that topic:
Map<Topic,String> concatenatedTitlesByTopic = library.stream()
.collect(groupingBy(Book::getTopic,
mapping(Book::getTitle, joining(";"))));
voss FICTION
We can now work with the components of individual Map.Entry objects us-
ing the methods Map.Entry.getKey and Map.Entry.getValue. Java 8 added the
comparator-yielding methods comparingByKey and comparingByValue for exactly
this kind of situation so, for the problem of nding the most popular topic, we can
write:
Optional<Topic> mostPopularTopic = entries
.max(Map.Entry.comparingByValue())
.map(Map.Entry::getKey);
86 Mastering Lambdas
We can imitate the uent style of pipeline processing by chaining the two pipelines
together:
Optional<Topic> mostPopularTopic = library.stream()
.collect(groupingBy(Book::getTopic, counting()))
.entrySet().stream()
.max(Map.Entry.comparingByValue())
.map(Map.Entry::getKey);
At this early stage in the adoption of Java 8, it’s not possible to predict in detail
how idioms will evolve. This one has advantages and disadvantages:
One way of thinking about problems like this is to work backward from the goal.
The last important step of the new program will also use max, as before, but the result
will now include all the most popular topics instead of only one of them. That suggests
they will need to be in a collection and—if this program is similar to the preceding
one, so that we are seeking a maximum among map entries—leads us to look for a way
of creating a MapăLong,SetăTopicąą (either Set or List would be a reasonable
Chapter 4: Ending Streams: Collection and Reduction 87
choice), in which the keys are the popularity of each topic and the value is a collection
of the topics with that popularity. Following this line of reasoning, we are aiming for
a mapping like this—call it targetMap:
{98=[COMPUTING, FICTION], 33=[HISTORY]}
And our starting point in getting to it can be the map produced by the rst two
lines of the previous code:
{COMPUTING=98, FICTION=98, HISTORY=33}
Call this startMap. How can we get from startMap to targetMap? If you have
not found a solution, stop now and try to devise one. Hint: the easiest way to get
between them is by using groupingBy.
If the goal of a groupingBy operation is targetMap, then the keys that the clas-
si er function is extracting must be the popularities, and since they are the values in
startMap, the classi er function must be Map.Entry.getValue.
The values in targetMap come from the keys of the startMap entries, so the
action of the collector downstream of groupingBy must be rst, to extract the keys
from the incoming entries, and second, to accumulate them into a Set. This suggests
composing groupingBy with a mapping collector to extract the keys, that collector
itself being composed with a toSet collector to accumulate them:
startMap.entrySet().stream()
.collect(groupingBy(Map.Entry::getValue,
mapping(Map.Entry::getKey, toSet())));
Putting everything together, the code to nd the most popular topics is:
Optional<Set<Topic>> mostPopularTopics = library.stream()
.collect(groupingBy(Book::getTopic, counting()))
.entrySet().stream()
.collect(groupingBy(Map.Entry::getValue,
mapping(Map.Entry::getKey, toSet())))
.entrySet().stream()
.max(Map.Entry.comparingByKey())
.map(Map.Entry::getValue);
This is by no means the only solution to the problem. You may have considered
one of a variety of alternatives:
• Insertion of Book elements into a sorted Map using Topic as the sort key, and
retrieving an initial subset of the elements.
88 Mastering Lambdas
If you can take the time to explore some of these alternatives, you will be re-
warded with an appreciation of the variety of possible strategies available in this style
of processing.
ArrayList::new ArrayList::new
[] []
List::add List::add
[ ]nails
[ ] voss
List::add
[ nails ,dragon]
List::addAll
If we call the type of the stream element T (here Book) and the type of the result
container R (here ListăBooką), these three functions are implementations of three
interfaces introduced in §2.4:
These are the three essential components of a collector. They must work together
to give consistent results—for example, it is easy to see that Figure 4-8 is only one of a
number of possible executions of the code, all of which must produce the same value.
In §4.3.3 we will study the rules that ensure that our custom collectors guarantee this,
as the prede ned collectors do.
sured by that property. A physical example would be a route for a power transmission
line, where the tower spacing must be no more than a constant, say MAX_DISTANCE.
A potential route consisting of a series of tower sites could be analyzed by grouping
the sites in segments, in each of which the towers are separated by no more than
MAX_DISTANCE, whereas the distance between two towers in different segments is
always greater than MAX_DISTANCE. For a very simple example (Figure 4-9), a route
planned exactly along the x-axis might contain these points:
(3,0), (6,0), (8,0), (10,0), (14,0)
Close
Too far enough Too far
0 5 10 15
The obvious representation for the solution to this problem is a nested linear data
structure. Since adding a new element to it will require access to the last existing
element, it will be convenient to use Deque rather than List as its basis; conveniently,
Deque has a modern and ef cient implementation in ArrayDeque.
Here is a simple iterative solution:
Deque<Deque<Point>> groupByProximity(List<Point> sortedPointList) {
Deque<Deque<Point>> points = new ArrayDeque<>();
points.add(new ArrayDeque<>());
for (Point p : sortedPointList) {
Deque<Point> lastSegment = points.getLast();
if (! lastSegment.isEmpty() &&
lastSegment.getLast().distance(p) > MAX_DISTANCE ) {
Deque<Point> newSegment = new ArrayDeque<>();
newSegment.add(p);
points.add(newSegment);
} else {
lastSegment.add(p);
}
92 Mastering Lambdas
}
return points;
}
This is a good starting point for the collector implementation—in fact, it can serve
with little modi cation as the basis for the accumulator. The combiner is more of a
challenge: its function is to merge two Deque instances, each representing a solution
over part of the input. There are two possibilities for the merge, depending on the
distance between the last point in the left part and the rst point in the right part. If
these are suf ciently close, the last left segment and the rst right segment must be
merged; otherwise the right part can be simply appended to the left.
Here are de nitions of the three components—supplier, accumulator, and
combiner—of the collector:
• A supplier can often simply be the constructor for the container; it is tempt-
ing to write ArrayDeque::new here. But the correct initial value is an empty
container, ready for use by the accumulator or combiner. In this case that is a
single empty segment:
Supplier<Deque<Deque<Point>>> supplier =
() -> {
Deque<Deque<Point>> ddp = new ArrayDeque<>();
ddp.add(new ArrayDeque<>());
return ddp;
}
• The accumulator has the same task as in the sequential code, that of adding a
single Point to a partial solution:
BiConsumer<Deque<Deque<Point>>,Point> accumulator =
(ddp, p) -> {
Deque<Point> last = ddp.getLast();
if (! last.isEmpty()
&& last.getLast().distance(p) > MAX_DISTANCE ) {
Deque<Point> dp = new ArrayDeque<>();
dp.add(p);
ddp.add(dp);
} else {
last.add(p);
}
}
Chapter 4: Ending Streams: Collection and Reduction 93
Figure 4-10 shows the supplier, the accumulator, and the combiner working to-
gether on one possible execution of the sample input from p. 91.
These three functions can be assembled into a Collector using the factory
method Collector.of. The Characteristics parameter to Collector.of is a way
of supplying performance-related metadata about the collector (§6.8):
Collector<T,A,R> s
of(Supplier<A>, BiConsumer<A,T>, BinaryOperator<A>, Characteristics…) Collector<T,A,R>
Now all that is needed to realize Figure 4-10 is to call Collector.of and supply
the resulting collector to Stream.collect:
Deque<Deque<Point>> displacementRecords = sortedPointList.stream()
.collect(Collector.of(supplier, accumulator, combiner));
Performance This collector parallelizes well. Its performance characteristics are ex-
amined in more detail in 6.8.2, but the discussion can be summarized as follows:
executed sequentially, its performance is very close to that of the iterative version
shown earlier. Terminating a stream with no intermediate operation costs, the parallel
speedup is a factor of about 1.8 on a 4-core machine; as the intermediate operations
become more expensive, the relative parallel speedup improves.
94 Mastering Lambdas
3 6 8 10 14
Supplier Supplier
Accumulator
3 Accumulator
10
3 6
10 14
3 6 Combiner
8
3 6 14
8
10
4.3.1 Finishers
The discussion of collector composition mentioned in passing (p. 84) the method col-
lectingAndThen, which accepts a function to be applied to the container after col-
lection is complete:
Collectors s
collectingAndThen(Collector<T,A,R>, Function<R,RR>) Collector<T,A,RR>
Some collectors always require a nisher for their operation. An example is pro-
vided by joining, implemented by a collector although its result type is String,
which—as an immutable type—is unsuitable for collection. (Later we will see a re-
duction form that could be used instead, but that would perform very poorly in the
course of concatenating strings for every accumulation or combination operation.) In
fact, the container for joining is the mutable type StringBuilder; conversion to
String only takes place in the nishing function, after the reduction is complete. As
this nisher is always needed, it should be built into the collector; there is an overload
of Collector.of for that purpose:
Collector<T,A,R> s
of(Supplier<A> supplier,
BiConsumer<A,T> accumulator,
BinaryOperator<A> combiner,
Function<A,R> finisher,
Characteristics… characteristics
) Collector<T,A,R>
This nally makes sense of the second type parameter of Collector: it is the type
of the intermediate container into which accumulation takes place. When we use a
collector, we don’t have to concern ourselves with this type, which relates only to the
collector’s internal mechanism; and often in writing one, we may just use the output
type, as in the example of the previous section. In the next example (§4.3.2), we have
96 Mastering Lambdas
a choice between using a nishing function to convert the output of the combiner to
the desired type or of applying a map operation subsequently.
To visualize a built-in nisher, consider this code to create a stream of strings, each
containing all the authors’ names for a single book:
Stream<String> concatenatedAuthors = library.stream()
.map(b -> b.getAuthors().stream().collect(joining());
Figure 4-11 shows successive stages in its execution. In part (a) the characters in
the input strings are accumulated in the internal StringBuilder object that is being
built up; in part (b) the input stream is exhausted, the nisher function is applied to
the intermediate object, and the resulting String is returned.
Accumulator Finisher
Stream<String>
"Li" "Fu" "Li"
StringBuilder
(a)
"LiFuLi"
L i F u L i
(b) StringBuilder::toString
If the nisher for joining were not part of the collector but had to be applied
separately, after the collection by groupingBy had taken place, it would be necessary
to iterate over the resulting map, applying the nisher to each value. Instead, the
Chapter 4: Ending Streams: Collection and Reduction 97
The partial results for a single book is the title-displacement pair, and the partial
input associated with that is the page count for the book. That leads to the de nition
of a helper class:
class DispRecord {
final String title;
final int disp, length;
DispRecord(String t, int d, int l) {
this.title = t; this.disp = d; this.length = l;
}
int totalDisp() { return disp + length; }
}
98 Mastering Lambdas
This example will illustrate why there is so much enthusiasm for the (very long-
term) project of introducing tuples to the Java language. Value class declarations are
verbose and potentially inef cient compared to the implementation of value objects
as tuples in other languages. That said, we can at least make the best of a bad job:
the ability to de ne convenience methods like totalDisp is some compensation for
having to use a value class.
The problem speci es an unordered map from title to displacement, but the cal-
culation of each DispRecord relies on its predecessor, so an ordered container will
be required during the collection process. We’ll use Deque for this purpose, again in
order to take advantage of its convenient provision of access to its last element.
We are now ready to write the components of the collector:
• The supplier is easy to de ne: in this case, all that is needed is to create an
empty container, via ArrayDeque::new:
Supplier<Deque<DispRecord>> supplier = ArrayDeque::new;
• The accumulator is only a little more dif cult: its task is to add a DispRecord
to the end of an existing Deque. Stop again and outline the code for this.
• Now for the combiner. This has the task of merging two DequeăDispRecordą
instances. If you have not already worked it out, stop now to write or outline
the code.
The combiner must increase the displacement eld of each of the elements
in the second one by the total page count of the elements in the rst one. This
Chapter 4: Ending Streams: Collection and Reduction 99
can be calculated from the last of these, again by adding its displacement to
its page count. Then the two collections can be merged:
BinaryOperator<Deque<DispRecord>> combiner =
(left, right) -> {
if (left.isEmpty()) return right;
int newDisp = left.getLast().totalDisp();
List<DispRecord> displacedRecords = right.stream()
.map(dr -> new DispRecord(
dr.title, dr.disp + newDisp, dr.length))
.collect(toList());
left.addAll(displacedRecords);
return left;
};
Now the main objective of the problem is achieved—the book displacements are
calculated—but the results are not exactly in the form required by the problem spec-
i cation, which demanded a mapping from book title to displacement. One way of
implementing this requirement is to stream the output from the collector we have just
de ned into a further collector created by Collectors.toMap. The alternative is to
add a nisher to the existing collector. Stop for the last time on this problem to think
how a suitable nisher could be de ned, and what other change would be required.
In this case, the work of the nisher is just to create a Map from its input. By the
time the nisher is invoked, however, the fork/join threads will have completed work
on the combiner, so should be available for a concurrent map merge:
Function<Deque<DispRecord>,Map<String,Integer>> finisher =
ddr -> ddr.parallelStream().collect(
toConcurrentMap(dr -> dr.title, dr -> dr.disp));
Figure 4-12 shows the four functions we have de ned working together on our
three sample books from Chapter 3.
All that is needed to realize Figure 4-12 is to call Collector.of and supply the
resulting collector to Stream.collect:
Map<String,Integer> displacementMap = library.stream()
.collect(Collector.of(supplier, accumulator, combiner, finisher));
Performance The different factors that contribute to the performance of this program
make it an interesting example; §6.8.3 analyzes them in some detail. To brie y sum-
marize that discussion: as it stands, the program is slower than the iterative version,
100 Mastering Lambdas
Supplier Supplier
[ ] [ ]
Accumulator Accumulator
[ "Fundamentals..." 0 256
,"Voss" 256 478
]
Combiner
[ "Compilers..." 0 1009
,"Fundamentals..." 1009 256
, "Voss" 1265 478
]
Finisher
for a number of reasons: it is an example of a pre x sum, in which the value of every
element depends on the values of the preceding ones. In a naïve parallel algorithm for
pre x sum, like this one, the total cost of the combine operations is proportional to the
size of the input data set, regardless of the level of parallelism in use.2 The combine
operation used here is quite expensive (unnecessarily so, in fact). A second problem
is caused by the expense of the map merge performed by the nisher; this can be
mitigated by presizing the map. A third problem is the low per-element workload of
the program—in real-life situations, preprocessing before collection will usually be
2 Java 8 provided java.util.Arrays with various overloads of a new method parallelPrefix for
computing pre x sums ef ciently, but this innovation has not yet reached the Stream API.
Chapter 4: Ending Streams: Collection and Reduction 101
required. Whether these problems make this program unsuitable for parallelization is
discussed in §6.8.3.
• New values will appear only as the second argument to the accumulator; all
other values will be results previously returned from supplier, accumulator, or
combiner.
• Results from the supplier, accumulator, and combiner may be returned to the
caller of collect; otherwise, they will only ever be used as arguments to the
accumulator, combiner, or nisher.
• Values passed to the combiner or nisher and not returned are never used
again; their contents have been processed and should not be reused.
s == combiner.apply(s, supplier.get())
s == combiner.apply(supplier.get(), s)
102 Mastering Lambdas
4.4 Reduction
At the start of this chapter, a comparison between collection and its special case,
reduction, came out in favor of collection as more generally useful in Java programs.
However, reduction is useful in some circumstances; in this section, we will explore
the uses of reduction as implemented by the Stream API.
This gure has two signi cant differences from the collector diagram:
• A base value—the identity for the reduction—is used in place of empty con-
tainer instances created by a supplier function in the earlier gure.
Chapter 4: Ending Streams: Collection and Reduction 103
1 2 3
0 0
+ +
1 2
+
3
+
6
• The accumulator and combiner are the same, because only one type is in-
volved.
IntStream i
reduce(IntBinaryOperator) OptionalInt
reduce(int, IntBinaryOperator) int
If the convenience method sum were not available, we could write code to imple-
ment Figure 4-13 using the second overload of reduce:
int sum = IntStream.of(1,2,3)
.reduce(0, (a, b) -> a + b);
Of the convenience methods that we saw in §3.2.4, sum, count, and average are
derived in this way from reduce. New functions can be similarly de ned: for example,
the following code computes the factorial of a variable intArg:
int intArgFactorial = IntStream.rangeClosed(1, intArg)
.reduce(1, (a, b) -> a * b);
Given an empty stream, this variant of reduce returns the supplied identity. By
contrast, the rst overload of reduce does not accept an identity and so, given an
104 Mastering Lambdas
empty stream, must return an empty OptionalInt. This is the variant used to de ne
the convenience methods max and min. If they were not part of the API, we could
use the one-argument version of IntStream.reduce to obtain the same effect. For
example:
OptionalInt min = IntStream.of(1,2,3)
.reduce((a, b) -> Math.min(a,b));
1 2 3
Math::min
1
Math::min
Stream<T> i
reduce(BinaryOperator<T>) Optional<T>
reduce(T, BinaryOperator<T>) T
reduce(U, BiFunction<U,T,U>, BinaryOperator<U>) U
We now consider each of these three overloads in turn. The rst accepts only a
combiner without an identity and so, analogously to the single-argument primitive
reduce, returns an Optional. It is useful in several different situations: one is where a
binary operator, like Comparator.compare, returns one of its operands. For example,
we could nd the rst book title in alphabetic order by writing
Chapter 4: Ending Streams: Collection and Reduction 105
In fact, this is how the convenience reduction method Stream.min (p. 66) is imple-
mented. Execution of this code can be visualized in the same way as the corresponding
reduction over primitives (Figure 4-14). A binary operator could also return a newly
created object, for example the result of calling one of the binary arithmetic operators
on BigInteger or BigDecimal (notice that we choose these types as examples for
reduction because they are immutable):
Stream<BigInteger> biStream = LongStream.of(1,2,3)
.mapToObj(BigInteger::valueOf);
Optional<BigInteger> bigIntegerSum = biStream
.reduce(BigInteger::add);
The combiner in all three overloads of reduce must respect the same associativity
constraint as for the combiner of a collector and for the same reason: that different
executions splitting a computation in different places must nevertheless return the
same results. So, for all values of q, r, and s:
combiner.apply(combiner.apply(q, r), s)) ==
combiner.apply(q, combiner.apply(r, s))
Notice again the similarity of this to reduction over primitives; it is only appro-
priate with an immutable identity like BigInteger.ZERO. Execution of this code can
again be visualized in the same way as the corresponding reduction over primitives
(Figure 4-13).
Again, as with collectors, the combiner must respect the identity constraint: given
any s and the identity id
The third overload introduces an accumulator, and with it the possibility of return-
ing a different type, for example an aggregate of some kind. Here it is used to calculate
the total number of volumes in my library (as illustrated in Figure 4-15):
int totalVolumes = library.stream()
.reduce(0,
(sum, book) -> sum + book.getPageCounts().length,
Integer::sum);
0 0
accumulator accumulator
1 1
combiner
2
combiner
Integer::sum 3
(s,b) ->
s + b.getPageCounts().length
Notice that you can (and normally would) write this code as separate map and
reduce operations:
int totalVolumes = library.stream()
.mapToInt(b -> b.getPageCounts().length)
.sum();
However, this overload of reduce is provided for those situations in which signi -
cant optimization can be achieved by combining mapping and reducing into a single
function.
Once again, as with collectors, the accumulator and combiner functions must be
compatible, to ensure that all possible executions of a program will give the same
Chapter 4: Ending Streams: Collection and Reduction 107
result, however the computation is divided between them. The situation is simpler
here, however (things are always simpler without mutation!): for all values of r, s,
and t, the following equality must hold:
To conclude this section, we should reconsider the claim with which the chapter
began, that collection was likely to prove more useful than reduction in Java pro-
gramming with streams. The fact that the overloads with identity cannot accumulate
into mutable types de nitely makes them less valuable than collection. But the single-
argument overload of reduce can serve the same purpose as collection, if it is com-
bined with a preliminary mapping stage. For the purpose of comparison, this section
concludes with a reduction solution to the cumulative page count problem already
solved by collection (§4.3). Figure 4-16 shows a visualization of the program; compare
it with Figure 4-12 to see the difference in the two approaches.
map
wrap wrap wrap
[ "Compilers..." 0 1009
][ "Fundamentals..." 0 256
][ "Voss" 1265 478
]
Combiner
[ "Compilers..." 0 1009
, "Fundamentals..." 1009 256
]
reduce
Combiner
[ "Compilers..." 0 1009
, "Fundamentals..." 1009 256
, "Voss" 1265 478
]
Figure 4-16 is divided into two parts, labeled “map” and “reduce”. The map sec-
tion does the work of the supplier and accumulator components of the collection
version; rst it creates a DispRecord for each Book using a new constructor:
DispRecord(Book b) {
this(b.getTitle(), 0, IntStream.of(b.getPageCounts()).sum());
}
The reduce section is the same as the combiner in the collection version (p. 99).
Finally, the client code has more work to do: in this version, it must compose the
map and reduce stages, as well as explicitly handle the case of an empty stream, rather
than delegating everything to the collector as before:
Map<String, Integer> displacementMap = library.stream()
.map(DispRecord::new)
.map(this::wrap)
.reduce(combiner).orElseGet(ArrayDeque::new)
.stream()
.collect(toMap(dr -> dr.title, dr -> dr.disp));
Collectors s
reducing(BinaryOperator<T>) Collector<T,?,Optional<T>>
reducing(T, BinaryOperator<T>) Collector<T,?,T>
reducing(U, Function<T,U>, BinaryOperator<U>) Collector<T,?,U>
These collectors are used for the same reasons as their corresponding reduce over-
loads, but in a downstream context, typically downstream of a groupingBy collector.
For example, to nd the tallest book in each topic, we could use the rst overload:
Chapter 4: Ending Streams: Collection and Reduction 109
The three-argument overload is used in a similar way. Notice, however, that in-
stead of an accumulator, this overload expects a FunctionăT,Uą, similar in intent
to the mapping stage of the book displacement example of the previous section. For
example, I could calculate the number of volumes in each topic of my library using
this overload:
Map<Topic,Integer> volumesByTopic = library.stream()
.collect(groupingBy(Book::getTopic,
reducing(0, b -> b.getPageCounts().length, Integer::sum)));
4.5 Conclusion
In this chapter we have explored what Stream API offers for summarizing the results of
stream processing. These fall into two categories: reduction and collection. The role of
reduction is for summarizing immutable values, and the Stream API provides various
convenience methods to support reduction, particularly over primitives.
However, the most important technique is collection, a generalization of reduction
that adapts it to concurrent accumulation into mutable collections, managing access
even to those that are not threadsafe. We reviewed library collectors for accumulation
to standard library collections, to custom collections, and to classi cation mappings.
Since the collector pattern lends itself to composition, the library also contains a num-
ber of collectors speci cally designed to be composed with others. Further, the API
provides extension points to allow the development of custom collectors where nec-
essary, and we have seen examples of situations that call for these and techniques
for developing them. Overall, collection is a powerful and exible tool. Mastering its
possibilities is central to becoming expert in using the Stream API.
That concludes our investigation of how streams can be ended; in the next chapter,
we will turn our attention to the dual problem of how they can be started.
CHAPTER
5
Starting Streams: Sources
and Spliterators
112 Mastering Lambdas
hapter 3 brie y introduced the subject of stream sources in general, but the
Stream<T> s
iterate(T, UnaryOperator<T>) Stream<T>
generate(Supplier<T>) Stream<T>
The method iterate accepts a seed and a function (represented by the functional
interface UnaryOperator) and repeatedly applies the function to produce succes-
sive stream elements. Analogous iterate methods are declared by all the primitive
streams. Primitive streams have analogous methods. For example, to create a sequen-
tial ordered IntStream that will deliver alternating values of 1 and –1 inde nitely,
we could write:
IntStream alternatingSigns = IntStream.iterate(1, i -> -i);
Each element in the stream is produced by applying the function to the preceding
element. Streams produced by iterate are in nite: you get useful results from them
only by using an operation that can be applied to a nite initial substream, like limit
and the short-circuit “search” operations (p. 63).
The method generate takes a Supplier, representing a function that produces a
value without requiring input, and repeatedly calls it to produce successive stream ele-
ments. The resulting sequential stream is not ordered, so generate is intended for use
in situations where a series of values is within a distribution rather than sequentially
Chapter 5: Starting Streams: Sources and Spliterators 113
related. For example, it can be used to generate constant streams or custom random
distributions.
Factory methods in the third group create an ordered stream containing a range
of values. They are de ned only on the integral primitive types IntStream and
LongStream. These are the IntStream versions:
IntStream s
range(int, int) IntStream
rangeClosed(int, int) IntStream
The API includes both range and rangeClosed in order to accommodate con-
icting expectations about whether a range should include or exclude the speci ed
end value. So the same stream can be created in two different ways; for example:
IntStream.range(1, 6).forEach(System.out::print); //prints 12345
IntStream.rangeClosed(1, 5).forEach(System.out::print); //prints 12345
This technique, of using the elements of an int range as an index, can be extended
to multiple data sources, providing a workaround for the absence in the Stream API
of an operation to “zip” two streams together.
114 Mastering Lambdas
The last Stream method, not a stream creator but a stream combinator, is the static
method concat, which creates a new stream from the concatenation of two existing
ones:
Stream<T> s
concat(Stream<T>, Stream<T>) Stream<T>
The rest of this section lists the methods that have been added to other platform
classes to expose their data for processing in the new style.
java.util.Arrays This class has methods to stream arrays of reference types and
each of the primitive stream types. For each stream type there are two overloads, the
rst one streaming the entire array and the second streaming a slice of the array de ned
by an inclusive start and exclusive end index.
Arrays s
stream(T[ ]) Stream<T>
stream(T[ ], int, int) Stream<T>
stream(int[ ]) IntStream
stream(int[ ], int, int) IntStream
stream(long[ ]) LongStream
stream(long[ ], int, int) LongStream
stream(double[ ]) DoubleStream
stream(double[ ], int, int) DoubleStream
It appears that the methods accepting an entire array (rather than a slice) overlap
in function with the various specializations of Stream.of, whose varargs parameter
can be passed an array argument. However, Stream.of is really intended for the
case when you want to supply a xed and known number of arguments, and a call
supplying an array is, in effect, arbitrarily interpreted as a series of individual values
to be streamed rather than as a single array. Avoid this minor problem by favoring
Arrays.stream for array arguments.
java.io.BufferedReader This class reads text les; prior to Java 8, its principal
usage was through the method readLine, which returns one line per call. In Java 8,
it now also declares the method lines:
BufferedReader i
lines() Stream<String>
Chapter 5: Starting Streams: Sources and Spliterators 115
Files s
walk(Path, FileVisitOption...) Stream<Path>
walk(Path, int, FileVisitOption...) Stream<Path>
find(Path, int, BiPredicate<Path,BasicFileAttributes>, FileVisitOption...) Stream<Path>
list(Path) Stream<Path>
The listings produced by walk and find are recursive—that is, they include
all subdirectories of the start directory; those produced by list are for the start
directory only. The recursive methods take zero or more arguments of the type
FileVisitOption, an enum that allows the traversal to be con gured, for example
by specifying whether symbolic links should be followed. Otherwise, they differ in
whether to support a maximum traversal depth and whether to accept a predicate that
lters paths for acceptability based on their BasicFileAttributes (such as modi -
cation, access time, or size).
116 Mastering Lambdas
The streams returned from these methods indirectly encapsulate native le han-
dles, so it is good practice to allocate them in a try-with-resources construct. For ex-
ample, the following code recursively descends the directory tree, starting from the
current directory, printing some details about each le:
Path start = new File(".").toPath();
try(Stream<Path> pathStream = Files.walk(start)) {
pathStream
.map(Path::toFile)
.filter(File::isFile)
.map(f -> f.getAbsolutePath() + " " + f.length())
.forEachOrdered(System.out::println);
} catch (IOException e) {
e.printStackTrace();
}
The last two Files methods allow a le to be parsed into lines of text; they
conveniently wrap the BufferedReader method. Since creating a BufferedReader
requires a Charset to manage byte-to-character conversion, the new Files.lines
methods must be given a Charset also:
Files s
lines(Path path, Charset cs) Stream<String>
lines(Path path) Stream<String>
Pattern i
splitAsStream(CharSequence) Stream<String>
Like split, this method divides its input into substrings, each of which is termi-
nated either by a subsequence matching this pattern or by the end of the input. Here
are some examples to show the effect of different pattern matches at the beginning of
Chapter 5: Starting Streams: Sources and Spliterators 117
the input string. The method exampleSplit returns a string showing the splits made
by the supplied pattern on sample output from ls -l:
String exampleSplit(String pattern) {
Pattern p = Pattern.compile(pattern);
String s = "-rw-rw-r-- 1 root admin 21508 26 Feb 2014 /.bashrc";
return p.splitAsStream(s)
.collect(joining("\",\"", "\"", "\""));
}
If the pattern does not match any subsequence of the input, then the resulting
stream contains a single element corresponding to the input sequence (line 2 of
Table 5-1). A match of non-zero width at the beginning of the input sequence results
in the insertion of an empty string at the beginning of the stream (line 3). A match of
zero width does not (line 4).
Input Output
"o+" "-rw-rw-r-- 1 r","t admin 21508 26 Feb 2013 /.bashrc"
"@" "-rw-rw-r-- 1 root admin 21508 26 Feb 2013 /.bashrc"
"^-" "","rw-rw-r-- 1 root admin 21508 26 Feb 2013 /.bashrc"
"^" "-rw-rw-r-- 1 root admin 21508 26 Feb 2013 /.bashrc"
JarFile i
stream() Stream<JarEntry>
java.util.zip.ZipFile The new method added to this class is similar to that for
JarFile:
ZipFile i
stream() Stream<ZipEntry>
118 Mastering Lambdas
CharSequence i
chars() IntStream
codePoints() IntStream
The difference between these two methods is in their handling of Unicode supple-
mentary characters, that is, those whose code point is greater than U+FFFF. Supple-
mentary characters include ideographs from East Asian languages and pictorial char-
acters (including emojis). The method chars is generally preferable to codePoints
on grounds of ef ciency but, unlike codePoints, it does not interpret supplementary
characters correctly.
The new methods return IntStream (the most appropriate choice in the absence
of a primitive specialized CharStream), so it is useful to know how to convert the
int values into a StreamăCharacterą if the need arises. The key point is that since
boxing is required, Stream.mapToObj must be given the right primitive type to work
on:
Stream<Character> chrStream = "example".chars().mapToObj(i -> (char)i);
java.util.BitSet The toString method of this class has always returned a string
representation of the set of indices of the bits that are set. Now the stream method
returns these indices in a form ready for processing:
BitSet i
stream() IntStream
For example:
byte[] bits = {10, 18}; // 01010000 01001000 (little-endian)
BitSet bs = BitSet.valueOf(bits);
System.out.println(bs.stream()
.boxed().collect(toList())); // prints [1, 3, 9, 12]
Random, SplittableRandom i
ints() IntStream
ints(int, int) IntStream
ints(long) IntStream
ints(long, int, int) IntStream
longs() LongStream
longs(long, long) LongStream
longs(long) LongStream
longs(long, long, long) LongStream
doubles() DoubleStream
doubles(double, double) DoubleStream
doubles(long) DoubleStream
doubles(long, double, double) DoubleStream
makeTask(newSpliterator).fork();
}
// process remaining data by iteratively calling
// this.spliterator.tryAdvance() in this thread
The loop condition for this code shows that a decision to split rather than to process
iteratively requires two criteria to be met:
• That a newly forked task could take advantage of currently unused processing
capacity. In Java 8, the value of sizeThreshold is just the result of dividing
the total data size by the number of threads available, but more sophisticated
criteria might be used in the future by the framework to decide this.
Chapter 5: Starting Streams: Sources and Spliterators 121
• That the data structure’s own conditions are met for making splitting worth-
while. A Spliterator is permitted to respond to a call of trySplit by re-
turning null instead of a new Spliterator, perhaps because the data set
is small enough that the gains from going parallel would not outweigh the
overhead of splitting, or for some other reason.
Spliterator<T> i
trySplit() Spliterator<T>
tryAdvance(Consumer<T>) boolean
forEachRemaining(Consumer<T>) void
estimateSize() long
characteristics() int
All except for forEachRemaining are abstract. A brief explanation of these meth-
ods follows. Development of a custom spliterator as part of the worked example in
§5.4 should help to clarify further how spliterators work and when you would choose
to write your own.
trySplit As we have seen, this method creates a new Spliterator and trans-
fers some—ideally, half—of its elements to it. Splits that do not achieve this ideal
may still be effective; for example, splitting an approximately balanced binary tree
can still lead to useful parallelism. On the other hand we have BufferedReader,
which cannot even approximate an equal split and, at an extreme, classes that cannot
split at all and whose spliterators always decline to split (which is why the method
Collection.parallelStream is speci ed to return a “possibly parallel” stream).
If order is signi cant for a stream source, then the elements that the spliterator
covers after it has executed trySplit must succeed those covered by the spliterator
that it returned.
tryAdvance This method combines the functions of the Iterator methods next
and hasNext. If no elements remain, tryAdvance returns false; otherwise, it returns
true after calling the accept method of the supplied Consumer with the next element
as the argument. This is the heart of two of the major improvements that spliterators
offer over iterators: fewer method calls are needed for each element, resulting in big
performance improvements, and there is now no risk of the race conditions that can
arise as a result of a collection being mutated between a call of hasNext and a call
of next.
122 Mastering Lambdas
where the argument false means that in this case the generated stream will be
sequential rather than parallel. This is in fact how the stream-bearing methods of
Collection are implemented.
The default implementation of Iterable.spliterator has poor performance,
because any good splitting strategy will depend on the physical structure of the spe-
ci c Iterable. So implementations will usually override it, as the classes of the Java
Collections Framework do. But if you have to use an API that accepts or returns
Iterable rather than Collection instances, it is worthwhile knowing this way of
performing stream processing directly on them, rather than having to rst dump them
into a collection.
Chapter 5: Starting Streams: Sources and Spliterators 123
5.3 Exceptions
We saw in §2.7.2 that lambdas have no special mechanism for managing checked ex-
ceptions; any checked exception that a lambda throws must be explicitly declared by
its function type. As a result, checked exceptions do not t well with lazy evaluation as
implemented in the Stream API. In this section we will explore the problems; it should
also become clear why this section is located in the chapter on creating streams: the
most common occurrence of checked exceptions in stream processing is in the API
for les used as stream sources.
To disentangle the issues, we will look at a series of increasingly complex scenar-
ios. The basic use case that we are working toward is the problem of listing the contents
of every text le in a directory. The idea is that Files.list (p. 115), supplied with
a directory path, generates a stream of le paths, one for each le in the directory;
these are in turn supplied to Files.lines, which generates a stream of lines from
each one. These streams can be concatenated into a single stream by flatMap. If no
exceptions were in the picture, we could write, for any directory path start:
Files.list(start)
.flatMap(Files::lines)
.forEachOrdered(System.out::println);
This is the basis of the worked example later in this chapter (§5.4): an implemen-
tation in Java of grep, the Unix utility that matches lines in text les against regular
expression patterns.
Since the problem revolves around the point at which exceptions are thrown,
let’s start with the basic case in which exception throwing is not delayed at all. The
following code creates a stream of Path objects corresponding to the contents of
the directory whose path is start. For the sake of the example, we will just cre-
ate a stream without using it, and exception handling will be delegated to the caller
by rethrowing the checked exception wrapped in the RuntimeException subclass
UncheckedIOException. First, though, we will print a stack trace to show what has
happened:
try (Stream<Path> paths = Files.list(start)) { // line 19
} catch (IOException e) {
e.printStackTrace();
throw new UncheckedIOException(e);
}
IOException will be thrown and caught if start is not a path to a directory that can
be opened for reading. For example, if the directory permissions make it inaccessible,
the stack trace will look like this:
124 Mastering Lambdas
java.nio.file.AccessDeniedException: ./fooDir
<omitted: frames for platform-specific filesystem access methods>
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at java.nio.file.Files.list(Files.java:3448)
at ExceptionsExample.main(ExceptionsExample.java:19)
This is in line with our ideas of how exceptions normally work, though perhaps
not obviously in line with our expectations about the lazy evaluation of streams. No
terminal method has been invoked on the stream created by the call of Files.list, so
its elements have not been evaluated. But the attempt to open the directory ./fooDir
has been made eagerly.
Next, suppose instead that the directory has been opened and the stream of paths
has been constructed successfully. Now flatMap calls Files.lines, which will rst
open each le and then read its contents into a stream of lines that will be appended
into the stream to be returned. What will happen if the attempt to open one of these
les fails? As the introduction to Files mentioned (p. 115), I/O-based stream-creating
methods like Files.lines throw the checked exception IOException in this situa-
tion. Instead of a directory, we can make a le inaccessible to show the problem; in
the interests of having the simplest possible intermediate operation, we use peek to
evaluate the behavioral parameter:
try (Stream<Path> paths = Files.list(start)) {
paths.peek(path -> {
try {
Files.lines(path); // line 42
} catch (IOException e) {
System.err.println("** exception from Files.lines");
e.printStackTrace();
}
})
.forEach(line -> {}); // line 47
} catch (IOException e) {
System.err.println("++ exception from Files.list");
e.printStackTrace();
throw new UncheckedIOException(e);
}
Notice the stack frame for forEach. This corresponds with our understanding that
it is execution of the terminal operation that initiates evaluation of the pipeline ele-
ments.
Finally, we can replace the call of peek with flatMap, which will take the streams
of strings from each le and concatenate them into a single stream:
try (Stream<Path> paths = Files.list(start)) {
paths.flatMap(path -> {
Stream<String> lines;
try {
lines = Files.lines(path);
} catch (IOException e) {
e.printStackTrace();
lines = Stream.of("Unreadable file: " + path);
}
return lines;
})
.forEach(line -> {}); // line 57
} catch (IOException e) {
e.printStackTrace();
throw new UncheckedIOException(e);
}
In this version, the failure to open a le will produce a stack trace very like the
last example. But suppose instead that Files.lines succeeds in opening a le and
returns a stream, as yet unevaluated, of the lines that it contains. Once the les are
open, the terminal operation will call the attening code to concatenate the separate
streams of lines into a single stream. In order to do this, the streams must be evaluated
and the now-opened les must be read; this action too can fail. But the lambda whose
execution opened the le is now out of scope and the le-reading code is being called
directly, so failure noti cation must take place via an unchecked exception. To em-
phasize this, we can break the pipeline code down to an equivalent version in which
each stage is extracted to a local variable:
Stream<Path> paths = Files.list(start);
Stream<String> lines = paths.flatMap(path -> ...);
lines.forEach(line -> {});
Since flatMap is no longer on the call stack when its lambda fails, its throws
clause is irrelevant. For this reason, platform library code used to create streams always
126 Mastering Lambdas
wraps checked exceptions in unchecked ones. The most common cases are covered by
wrapping IOException in the new java.io.UncheckedIOException; this is thrown
when failures occur after a le has been opened. In the case of such a failure, the stack
trace will look like this:
Exception in thread "main" java.io.UncheckedIOException:ê
java.nio.charset.MalformedInputException: Input length = 1
at java.io.BufferedReader$1.hasNext(BufferedReader.java:574)
<omitted: frames corresponding to stream implementation methods>
at java.util.stream.ReferencePipeline.forEach(ê
ReferencePipeline.java:418)
at ExceptionsExample.main(ExceptionsExample.java:57)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
<omitted: frames for text file reading and charset decoding>
at java.io.BufferedReader$1.hasNext(BufferedReader.java:571)
... 18 more
This stack results from the evaluation of one of the streams of lines raising a
MalformedInputException, thrown in the course of processing a le by a Charset
that has encountered a Unicode character it cannot decode (for example, as a result
of attempting to read a binary le as though it were text). This checked exception is
caught by the hasNext method of BufferedReader (called from Files.lines) and
wrapped in an UncheckedIOException that rises uncaught to the top of the call stack.
The “18 more” stack frames referred to at the bottom of the checked exception stack
are the same 18 frames that constitute the whole of the unchecked stack above it; the
duplication is omitted for brevity.
So lazily evaluated code called from within pipeline operations can throw only
unchecked exceptions, which will end the terminal operation and so stop pipeline
processing altogether. If instead we expect errors from which we want to recover, they
must be noti ed by checked exceptions thrown from eagerly evaluated operations. For
example, we saw earlier that attempting to read characters from a binary le results
in a (checked) MalformedInputException; for the current use case—printing the
contents of every text le in a directory—there is a simple recovery action: skip that
le. But that can only be taken if the method throwing it is eagerly evaluated, as
happens with Files.readAllLines, which eagerly reads an entire le, throwing a
checked exception if it encounters an undecodable byte sequence. It is this method
that will form the basis of our solution to the recursive grep problem of the next
section.
Chapter 5: Starting Streams: Sources and Spliterators 127
grep -Rh The rst version of the problem is to search every le whose name begins
with test and has the extension .txt, anywhere in the lesystem under the directory
startDir, for mixed decimal numbers. These will match with the regular expression
“[-+]?[0-9]*\.?[0-9]+”. Our program will imitate the behavior of grep -Rh: the
-R option forces grep to search the lesystem recursively, and the -h option prevents
the usual pre xing of each matching output line with the name of the le in which
it was found. (Implementing that feature will be the next problem after this one.) We
will need the following preliminary declarations:
Path start = new File("startDir").toPath();
Pattern pattern = Pattern.compile("[-+]?[0-9]*\.?[0-9]+");
PathMatcher pathMatcher =
FileSystems.getDefault().getPathMatcher("**/test*.txt");
Two kinds of pattern matching are in use here: regular expressions and glob-
bing, which you may know as shell pattern matching. Globbing is used here to show
how easily it can be introduced into stream-processing programs. Although it is less
128 Mastering Lambdas
powerful than regular expression matching, it is much more concise and convenient
in those cases where it does work.
A suitable starting point is the code we developed in exploring the subject of
exceptions thrown in pipeline operations (p. 123). Here it is repeated for conve-
nience, only changed in respect of an added assumption that the enclosing method
can rethrow an IOException resulting from a failure to open the start directory, so it
need not be handled in this code:
try (Stream<Path> paths = Files.list(start)) {
paths.flatMap(path -> {
Stream<String> lines;
try {
lines = Files.lines(path);
} catch (IOException e) {
e.printStackTrace();
lines = Stream.of("Unreadable file: " + a);
}
return lines;
})
.forEach(line -> {});
} // no longer catching IOException
This requires some alterations before it can become our rst version. Stop reading
for a moment, examine the code carefully, and make a list of the changes that are
needed to make it satisfy the problem description given. (We will still leave calls of
printStackTrace as placeholders for exception handling.)
• Filters are also needed on the text lines returned from the les: to remove
the empty strings resulting from processing of non-text les and, of course, to
remove lines that do not match the regular expression.
• The terminal operation should print the matching lines. It uses forEach, which
does not impose ordering, except accidentally. For the moment, let’s diverge
from grep by assuming that we only care about getting the matching lines, not
about the order in which they appear.
Applying these changes to the preceding code, we get code that reproduces the
behavior of grep -h:
try (Stream<Path> pathStream = Files.walk(start)) {
pathStream
.filter(Files::isRegularFile)
.filter(pathMatcher::matches)
.flatMap(path -> {
try {
return Files.readAllLines(path).stream();
} catch (IOException e) {
return Stream.of("");
}
})
.filter(line -> ! line.isEmpty())
.filter(line -> pattern.matcher(line).find())
.forEach(System.out::println);
}
grep -R For the rst variant on this solution, consider removing the option -h so
that grep is required to pre x every output line with the path of the le in which it
was found. Before reading on, stop and think how you would change the preceding
code to implement this, restructuring it if necessary.
Clearly, the path can only be pre xed to the lines of the le at a point where it
is still available, which means within the lambda for the flatMap. In the body of the
inner try, the lambda parameter path is in scope and can be pre xed to each line:
try {
return Files.readAllLines(path).stream()
.map(line -> path + ": " + line);
} catch (IOException e) {
return Stream.of("");
}
130 Mastering Lambdas
And the solution to this problem is a small modi cation of the preceding one:
try (Stream<Path> pathStream = Files.walk(start)) {
pathStream
.filter(Files::isRegularFile)
.filter(pathMatcher::matches)
.flatMap(path -> {
try {
return Files.readAllLines(path).stream()
.filter(line -> pattern.matcher(line).find());
.map(line -> path + ": " + line);
} catch (IOException e) {
return Stream.of("");
}
})
.filter(line -> ! line.isEmpty())
.forEach(System.out::println);
}
grep -Rc The -c option suppresses normal output, instead printing a count of
matching lines for each input le. Again, stop and think how to do this before reading
on.
This solution is quite straightforward, once you see that since the output lines
are one-to-one with les containing matching text, the output from each le must be
collected into a single string before being passed to the terminal operation. So map
rather than flatMap is the appropriate operation, with each path, startDir/foo say,
being mapped to a string like “startDir/foo: 3”. The line count can be obtained
Chapter 5: Starting Streams: Sources and Spliterators 131
from the terminal operation count that applied a stream of matching lines for each
le. This gives us:
try (Stream<Path> pathStream = Files.walk(start)) {
pathStream
.filter(Files::isRegularFile)
.filter(pathMatcher::matches)
.map(path -> {
try {
long matchCount = Files.readAllLines(path).stream()
.filter(line -> pattern.matcher(line).find())
.count();
return matchCount == 0 ? "" : path + ": " + matchCount;
} catch (IOException e) {
return "";
}
})
.filter(line -> ! line.isEmpty())
.forEach(System.out::println);
}
grep -b The -b option requires each line of output to be pre xed by its character
displacement in the le. So a search for the pattern “W.*t” in this farsighted tribute to
immutability:1
The Moving Finger writes; and, having writ,
Moves on: nor all thy Piety nor Wit
Shall lure it back to cancel half a Line,
Nor all thy Tears wash out a Word of it.
At rst sight, this looks very like the book displacement problem of Chapter 4
(p. 90), and indeed one option would be to solve it in just the same way using a custom
collector. Closer examination, however, shows an important difference in grep -b that
will lead to a better solution. The book displacement problem requires the calculation
of a running total, making it an example of a pre x sum, a problem in which the value
of every element depends on the values of the preceding ones. In a naïve parallel
algorithm for pre x sum, like the book displacement example, the total cost of the
combine operations (that is, the total delay imposed by them) is proportional to the
1 The Rubaiyyat of Omar Khayyam (q. 51), trans. Edward Fitzgerald, 1st edition
www.allitebooks.com
132 Mastering Lambdas
size of the input data set, so going parallel may not produce the performance gains
that we would like to see.2
The current problem can be recast, however, to avoid this dif culty. If we could
treat the data of the input le as a byte array in memory, the displacement of each
line could be determined without any need to use the displacements of its prede-
cessors. The memory mapping feature of Java NIO provides exactly this ability; its
use allows the operating system to manage the correspondence between lestore
and memory transparently and ef ciently. For medium or large les (memory map-
ping is usually worthwhile only for les of hundreds of kilobytes in size, or more),
memory mapping can offer big performance gains; in particular, random access car-
ries no performance penalty. Memory mapping is implemented in Java by the class
java.nio.MappedByteBuffer.
Notice that the earlier grep examples, like most le processing tasks, are also
unlikely to bene t from parallelization, because the performance bottleneck will be
at the le input stage rather than in stream processing. So the Spliterator technique
we are about to use would improve performance for any such task, provided that the
le is large enough to justify the overhead of memory mapping.
In this problem, memory mapping has the added advantage of allowing us to avoid
having to calculate the displacement of each line from that of its predecessor; instead,
when the custom spliterator breaks the buffer into lines, the displacement of each line
will be known from its index in the buffer. The function of the spliterator will be to
supply the stream with value objects, each of which encapsulates a line together with
its index:
class DispLine {
final int disp;
final String line;
DispLine(int d, String l) { disp = d; line = l; }
public String toString() { return disp + " " + line; }
}
Spliterators are usually named by the data structure that they are splitting, but to
describe this spliterator adequately, its name should include not only its data struc-
ture, ByteBuffer, but also the type of the objects that it will yield. Since the name
ByteBufferToDispLineSpliterator is too long to use in code to be printed in book
form, we’ll abbreviate it to LineSpliterator. A LineSpliterator is constructed to
cover a range of the ByteBuffer bounded by two indices, lo and hi, both inclusive,
supplied at construction time:
2 The seriousness of this problem depends on how expensive the combine operation is. Java 8 provided
java.util.Arrays with various overloads of a new method parallelPrefix for computing pre x sums
ef ciently, but this innovation has not yet reached the Stream API.
Chapter 5: Starting Streams: Sources and Spliterators 133
Figure 5-1 shows the buffer at the start of processing. The range of the rst spliter-
ator created is the data in the entire buffer, including the terminating newline byte.
ByteBuffer
indices
0 limit()-1
\n
If this range is to be split, the aim will be to divide the buffer into two approxi-
mately equal parts, each of them again terminated by a newline. A suitable dividing
point between the parts will be found by making a linear search for a newline, in either
direction, starting from the midpoint of the buffer. This search represents the algorith-
mic part of the overhead of splitting (the rest is the infrastructure overhead of creating
and forking a new fork/join task). Its cost is proportional to the average line length,
rather than to the total buffer size as with the book displacement program (§4.3.2).
Figure 5-2 shows this search process and the coverage of the two spliterators that
result.
Suppose, however, that the search fails to detect a newline character before en-
countering the end of the LineSpliterator range. Given that it starts from the range
134 Mastering Lambdas
mid
\n \n
midpoint, such a failure suggests that there are probably only a few lines in the entire
range—possibly only one—so in this situation it is reasonable to decline to split by
returning null rather than searching for a newline in the opposite direction.
Once you have understood the algorithm for trySplit, the code is straightfor-
ward:
public Spliterator<DispLine> trySplit() {
int mid = (lo + hi) >>> 1;
while (bb.get(index) != '\\n') mid++;
LineSpliterator newSpliterator = null;
if (mid != hi) {
newSpliterator = new LineSpliterator(bb, lo, mid);
lo = mid + 1;
}
return newSpliterator;
}
If you have not already written code for tryAdvance, stop now and sketch an
implementation.
options. For this one, however, a user would certainly be surprised if the matching
lines appeared in a random order.
Summarizing this section, we should note that programming a grep equivalent
has shown some strengths of the Stream API: the solutions for the earlier examples
are straightforward, and respecifying the problem to simulate the different options
required only small corresponding changes in the code. In the last example, we saw
how cooperation with Java NIO enables the Stream API to process input from large
les with, as we shall see in §6.7, a very high degree of parallel speedup.
5.5 Conclusion
This chapter had three connected purposes: to explore the library facilities for creating
streams, to explain the mechanism by which stream creation works, and to show how
and why you would write your own implementation of it. We saw that stream-bearing
methods have been added to a wide variety of platform classes, allowing streams to
ful ll their role of conveying any kind of data for processing through intermediate
operations and into a terminal operation. In processing streams from any of these
methods, you gain the advantages in expressiveness and readability of parallel-ready
code; whether you can effectively extract parallelism depends, as we have seen, on
how effectively the stream source can split its data. Splittability is only part of the story,
however; in Chapter 6 we will see how it combines with other factors, such as data
set size and pipeline workload, to determine the overall speedup that can be obtained
by parallelization.
The main example of the chapter showed off some of the strengths of the Stream
API: the solutions for the earlier grep options were straightforward, and respecify-
ing the problem to simulate the different options required only small corresponding
changes in the code. In the last part of the example, we saw how cooperation with
Java NIO enables the Stream API to process input from large les with, as we shall see
in §6.7, a very high degree of parallel speedup.
CHAPTER
6
Stream Performance
138 Mastering Lambdas
his book has put forward two reasons for using lambdas and streams: to get bet-
T ter programs, and to get better performance. By now, the code examples should
have demonstrated the claim for better programs, but we have not yet exam-
ined the claims about performance. In this chapter, those claims will be investigated
through the technique of microbenchmarking.
This is a dangerous business: results from microbenchmarking run the risk of be-
ing used to guide program design in unsuitable situations. Since the overwhelming
majority of situations are unsuitable, that is a serious problem! For example, a pro-
grammer’s intuition is a highly unreliable guide for locating performance bottlenecks
in a complex system; the only sure way to nd them is to measure program behavior.
So initial development, in which there is no working system to measure, is a prime
instance of an unsuitable time for applying optimizations.
This is important, because mistakenly optimizing code that is not performance-
critical is actually harmful.
• It wastes valuable effort, since even successful optimization will not improve
overall system performance.
• It diverts attention from the important objectives of program design—to pro-
duce code that is well structured, readable, and maintainable.
• Source-code optimizations may well prevent just-in-time compilers from ap-
plying their own (current and future) optimizations, because these optimiza-
tions are tuned to work on common idioms.
That said, programmers enjoy learning about code performance, and there are
real bene ts to doing so. Foremost among these is the ability provided to develop-
ers by the Stream API to tune code—by choosing between sequential and parallel
execution—without altering its design or behavior. This kind of unobtrusive perfor-
mance management is new to the Java core libraries, but may become as successful
and widespread here as it already is in enterprise programming. To take advantage of
it, you need a mental model of the in uences on the performance of streams.
However, even if you never act on that model directly, it can help to guide (but
not dictate!) your design, complementing the mental model that programmers already
have for the execution of traditional Java code.1 In addition, the situations do exist in
which optimization really is justi ed. Suppose, for a simple example, that you have
identi ed a performance-critical bulk processing step and you now want to decide
between the options for implementing it. Should you use a loop or a stream? If you
use a stream, you often have a choice of different combinations of intermediate op-
1 Unfortunately, the commonplace mental model for performance is itself sadly out of date; many pro-
grammers are unaware of JIT optimizations or the effect of hardware features like pipelining and multicore
CPUs.
Chapter 6: Stream Performance 139
erations and collectors; which will perform best? This chapter will present results like
Figure 6-1 to help answer such questions:
Figure 6-1 might be an interesting and useful result, but we need to know more. As
it stands, it provides no information about the experimental conditions under which
its result was obtained. The results will have been in uenced both by factors external
to the benchmark—varying system load, the behavior of other processes, network
and disk activity, the hardware, operating system, and JVM con gurations—and by
properties of the benchmark itself: the data being processed and the operations that
were performed.
Obviously, this is a problem; we want to be able to discuss performance of stream
operations without taking all these factors into consideration. How can we get mean-
ingful results from highly simpli ed observations of such complex systems? To draw
an analogy—only a little far-fetched—think of the dif culty of trialing a new drug on
humans. The goal of a trial might be simple—to discover if a drug is effective in treat-
ing a certain disease—but achieving it isn’t simple at all: you can’t just administer the
drug to a person with the disease and observe whether they are cured. Many factors
can in uence the result: the background health of the subject, their diet, age, gender,
and often many others. All of these factors can affect the result for any individual.
Of course, working computer systems are not nearly so complex as biological
systems, but they are complex enough for observational experiments to share the
problem: just as you can’t draw a conclusion about a drug’s effectiveness by sim-
ply administering it to someone and observing the outcome, you can’t change a line
of code in a working system and expect to get a useful result simply by observing the
outcome. In both cases there are too many variables involved. In the next section we
140 Mastering Lambdas
will look at an experimental method for eliminating these effects as far as we can.
Then, in the rest of the chapter, we’ll apply that method to measuring performance of
stream operations.
Many of the experiments in this chapter are derived from examples used earlier
for explaining stream programming techniques. These examples almost all illustrate
the use of reference streams; for this reason, and because in practice it is reference
streams that will raise the important performance questions, it is these rather than
primitive streams that are the main focus of this chapter.
6.1 Microbenchmarking
Our aim is to discover the execution cost of a particular code fragment. But the ar-
gument in this chapter so far has been that a measurement of that cost is meaningful
only in a working system under realistic load. Pro ling or metering a production sys-
tem is often not practicable; the alternative, of simulating the production system on
a load-testing platform, is more feasible but still involves the dif culty of setting up
an environment that reproduces the production system in all aspects: hardware and
operating system con guration, demand on resources by other processes, the network
and disk environments, and of course simulated typical and extreme loads.
A more practicable alternative is microbenchmarking, which compares two dif-
ferent code fragments operating in an isolated (and unrealistic) system. Each of these
results will be meaningless in absolute terms, but the comparison between them can
be meaningful—provided all those aspects are controlled. The idea of a controlled
experiment is simple in principle. If all the factors listed earlier (the hardware and
software environments, the system load, and so on) are held constant and only the
code being compared is allowed to vary, then variation in the results must be due to
what has been changed.
The apparent simplicity of this pattern hides some serious dif culties. Some
of these relate to the measuring process itself, particularly for short run times:
System.currentTimeMillis may be much less accurate than its name suggests,
because it depends on a system clock that on some platforms may be updated only
Chapter 6: Stream Performance 141
Warmup effects Measuring the cost of performing, say, the rst 100 executions of
a code fragment will often produce quite different results from taking the same mea-
surement a minute later. A variety of initialization overheads may contribute to the
earlier measurement, for example class loading—an expensive operation, which the
JVM delays until the classes concerned are actually required. Another potent source
of inaccuracies is provided by JIT compilation: initially the JVM interprets bytecode
directly, but compiles it to native code when it has been executed enough times to
qualify as a “hot spot” (and then continues to pro le it in the interests of possible fur-
ther optimization). The compilation overhead, followed by markedly improved per-
formance, will give results far out of line with the steady-state observations that we
require.
Running this test on a modern JVM will result in very low execution times, because
the compiler can detect that since sum is never used, it is unnecessary to calculate it
and therefore unnecessary to execute the loop at all. In this case, there is a simple
remedy: including the nal value of sum in the output statement should ensure that
it has to be calculated, although even then a compiler may detect that it can use
an algebraic formula to calculate it without iterating; in general it is quite dif cult
to devise ways of preventing the compiler from applying DCE without distorting the
measurements by introducing expensive new operations.
142 Mastering Lambdas
Statistics Given the number of factors that we now know affect microbenchmark-
ing measurements, we can hardly expect that the same experiment, repeated several
times, will produce exactly the same result; the in uence of different external factors
will vary from one set of observations to another. We can calculate a point estimate
(such as the mean) for the most probable value, but that will have little meaning if
external factors dominate the experiments. If we can assume that the effects of ex-
ternal factors are varying randomly, then running the same experiment a number of
times can increase our con dence in the result. This can be quanti ed in a con dence
interval (CI)—a range of values within which we can state with a degree of certainty
(conventionally 95%) that the “true” value lies—that is, the value that we might get
from many experiments. The larger the experiment, the narrower the con dence in-
tervals that we can obtain.
Con dence intervals give us a useful indicator of experimental signi cance: if CIs
from two different situations do not overlap, it is an indication that these situations
really do give different results, always assuming that the other factors are varying ran-
domly in a way that would not bias one set compared to another. In practice, for the
experiments of this chapter, the variation between conditions is small compared to the
differences in the means, and the number of trials is large enough that the con dence
intervals are in practice too small to appear on the plots.
Open peer review Like all scienti c experiments, performance measurements are
subject to aws and bias. A partial safeguard against these sources of error is to en-
able open peer review of measurement experiments. This means publishing not only
the summarized results, but also enough detail about the experimental conditions to
enable others to reproduce and check them. This information includes the hardware,
operating system, and JVM environments, as well as statistical information about the
results. What follows is an example: a benchmark and its resulting data as used for
Figure 6-1. Instead of tediously reproducing this information for each experiment in
this chapter, a URL refers to the location on the book website of the full setup; as the
experiments are easy to reproduce, raw experimental results are not usually provided.
@State(Scope.Benchmark) // The objects of this scope (e.g., instance
// variables) are shared between all threads.
public int N;
@Setup(Level.Trial)
public void setUp() {
integerList = IntStream.range(0, N).boxed().collect(toList());
}
@Benchmark
public void iterative(Blackhole bh) {
for (Integer i : integerList) {
Blackhole.consumeCPU(payload);
bh.consume(i);
}
}
@Benchmark
public Optional<Integer> sequential() {
return integerList.stream()
.filter(l -> {
Blackhole.consumeCPU(payload);
return false;
})
.findFirst();
}
@Benchmark
public Optional<Integer> parallel() {
return integerList.stream().parallel()
.filter(l -> {
Blackhole.consumeCPU(payload);
return false;
})
.findFirst();
}
}
JMH was used to run these benchmarks with their default settings—for each value
of limit, it starts a new JVM, then performs the following sequence (called an “iter-
ation”) for each of the annotated methods:
1. For the warmup, it executes the method repeatedly for one second, 20 times
over.
Chapter 6: Stream Performance 145
2. For the measurement, it again executes the method repeatedly for one second,
20 times over; this time, it records the number of times the method could be
called in each one-second run (the “Score”).
Each line in Table 6-1 shows, for a given data set size, the mean execution count
and the con dence interval for a sample of 20 iterations.
This table shows the results, somewhat simpli ed, of executing the benchmark
above on JDK1.8u5, running Linux (kernel version 3.11.0) on an Intel Core 2 Q8400
Quad CPU (2.66 GHz) with 2MB L2 cache. The column headed “Error (99.9%)” shows
the spread of values in the 99.9 percent con dence interval (99.9% is the JMH default
CI, very demanding by normal statistical standards).
The rest of this chapter will present microbenchmarking results in the form of
graphs like Figure 6-1 without providing this level of detail. But it is important that
you know everything should be available: the full conditions for each experiment,
146 Mastering Lambdas
the statistical methods used to prepare the presentation, and, where necessary, the
raw results. This should give you con dence that you can review the method of the
experiment, if necessary repeat it for yourself, and—most importantly—design and
execute your own measurement experiments.
You should read what follows with a warning in mind: developments in hardware
and in the Java platform will sooner or later change many of the trade-offs discussed
here, generally in favor of parallelism. Like all discussions of speci c aspects of perfor-
mance, this material comes with a use-before notice. Only the actual date is missing!
Execution context What is the context in which your program is executing? Remem-
ber that the implementation of parallel streams is via the common fork/join pool,
which by default decides its thread count on the basis of a query to the operating sys-
tem for the number of hardware cores. Operating systems vary in their responses to
this query; for example, they may take account of hyperthreading, a technology that
allows two threads to share a single core, so that the core does not have to idle when
one thread is waiting for data. In a processor-bound application, that may very well
result in parallelizing over more threads than can be used at one time.
Another aspect of the execution context is competition from other applications
running on the same hardware. An implicit assumption in the common fork/join pool
con guration is that there are no other demands on the hardware while parallel oper-
ations are being performed. If the machine is already heavily loaded, then either some
of the fork/join pool threads will be starved of CPU time, or the performance of the
other applications will be degraded.
If either of these problems means that you want to reduce the number of threads in
the common fork/join pool—or indeed, if you want to change it for any other reason—
the system property java.util.concurrent.ForkJoinPool.common.parallelism
can be set to override the default con guration of the fork/join pool. Of course, if you
do set this property, you then have the challenging task of justifying your change by
measuring the resulting change in your application’s performance!
In addition to external competition, you should consider internal competition in
deciding whether to execute streams in parallel. Some applications are already highly
concurrent; for example, consider a web server, which will perform best if it can assign
2 Recall from §3.2.3 that all streams expose the methods parallel and sequential, which set the
execution mode for an entire pipeline. That section makes the point that if there is more than one of these
method calls on a single pipeline, it is the last one that decides the execution mode.
Chapter 6: Stream Performance 147
each user request to a single core for execution. That strategy is obviously incompat-
ible with one that attempts to take over all cores for the execution of a single request.
Spliterator and collector performance These are discussed in more detail in §6.7
and §6.8. For now, it is enough to notice that splitting by sources and concurrent
accumulation by collectors becomes increasingly important as Q decreases; a high
Q value tends to make intermediate operations into the pipeline bottleneck, making
parallelism more worthwhile. In the situation where Q is low, by contrast, concurrent
performance of the stream source and terminal operation become more important in
deciding whether to go parallel.
One problem of applying this model lies in the dif culty of estimating Q in real-life
situations, and in the fact that stream operations with high Q are unlikely to be as sim-
ple as in this experiment, as we saw in the larger examples of Chapters 4 and 5. If in
fact they are, then measuring the cost of intermediate pipeline operations is straightfor-
ward and the results of this experiment will be directly useful to you; but, in practice,
3 This informal statement appeals to Gustafson’s law, which reframes the well-known analysis of Amdahl’s
law to draw a less pessimistic conclusion. In contrast to Amdahl’s law, which sets very tight limits on the
parallel speedup attainable with a xed N and varying P , the scenario for Gustafson’s law is one in which
N scales with P , giving a constant run time (see http://www.johngustafson.net/pubs/pub13/pub13.htm).
148 Mastering Lambdas
more complex intermediate operations will usually involve some I/O or synchroniza-
tion, resulting in fewer gains from going parallel. As always, if you are in doubt, be
prepared to repeat this experiment, adapting it to your own requirements.
removed or new ones added. Here, for example, is a selection of the characteristics
of each stage in a pipeline.
Stream.of(8, 3, 5, 6, 7, 4) // ORDERED, SIZED
.filter(i -> i % 2 == 0) // ORDERED
.sorted() // ORDERED, SORTED
.distinct() // DISTINCT, ORDERED, SORTED
.map(i -> i + 1) // ORDERED
.unordered(); // none
We will consider ORDERED in the next section; here is the meaning of the others:
• SORTED: Elements of streams with this characteristic are sorted in their natu-
ral ordering—that is, their type implements Comparable, and they have been
sorted using its compareTo method. Stream elements may have been sorted in
other orders if a Comparator has been de ned and used for the purpose, but
such streams do not have the SORTED characteristic. So, for example, a stream
whose source is a NavigableSet has the characteristic SORTED, but only if
the implementation was created using a default constructor rather than one
that accepts a Comparator. By contrast, this characteristic will not be set in a
stream sourced from a List.
An example of the use of this characteristic is in sorting: the operation
of sorting in natural order can be optimized away entirely when applied to
a stream with the characteristic SORTED. For another example, distinct on
a sequential sorted stream can apply a simple algorithm to avoid storing el-
ements: an element is added to the output stream only if it is unequal to its
predecessor.
• SIZED: A stream has this characteristic if the number of elements is xed and
accurately known. This is true, for example, of all streams sourced from non-
concurrent collections. These must not be modi ed in any way during stream
processing, so their size remains known. Concurrent collections, by contrast,
may have elements inserted or deleted during stream processing, so the num-
ber of elements cannot be accurately predicted before the stream is exhausted.
Streams from most of the non-collection stream sources in §5.1 are not sized.
An example of an optimization that makes use of SIZED is accumulating
to a collection; this is more ef cient if dynamic resizing of the collection can
be avoided by creating it with the appropriate size.
• DISTINCT: A stream with this characteristic will have no two elements x and
y for which x.equals(y). A stream sourced from a Set will have this char-
acteristic, but not one sourced from a List.
For an example of its usefulness, the operation distinct can be optimized
away entirely when applied to a stream with this characteristic.
Chapter 6: Stream Performance 151
6.4 Ordering
Of the characteristics listed in the previous section, ordering is the one that most
deserves attention in a performance chapter, because dispensing with ordering can
remove a big overhead from parallel stream processing for some pipelines. Moreover,
as Chapter 1 emphasized, this is the characteristic that we are most likely to impose
unnecessarily from our long familiarity with iterative processing—think of the friend
in §1.1.1 who insisted that you put the letters for mailing into the letterbox in alphabet-
ical order. So it is very much worthwhile distinguishing those cases in which ordering
really does matter.
A stream is said to have an encounter order if the ordering of the elements is se-
mantically signi cant. For example, take List: given a sequence of operations to add
elements to a List, its contract speci es how those elements will be positioned and
guarantees to preserve their ordering when returning them, for example by iteration.
Now consider processing the elements of a List using a stream:
String joined = stringList.parallelStream()
.map(String::toUpperCase)
.collect(Collectors.joining());
You would be surprised and disappointed if the string joined did not re ect the
ordering of stringList. You can think of encounter order as the property by which
a stream preserves the “position” of elements passing through it. In this case because
List has an encounter order (data structures can also have this property), so too does
the stream, and the spatial arrangement of the elements is preserved into the terminal
operation. For example, if stringList has the value ["a", "B", "c"], then joined
is guaranteed to be "ABC", not "BAC" or "CAB".
By contrast, if the stream source has no encounter order—if it is a HashSet, say—
then there is no spatial order to be preserved: the stream source has no encounter order
and neither has the stream.4 Since the stream has been explicitly identi ed as paral-
lel, we must assume that the chunks of the collection may be processed by different
threads on different processors. There is no guarantee that the results of processing,
used by the joining method as soon they are available, will arrive in the order in
which processing began. If your subsequent use of the concatenated string doesn’t
require ordering—say, for example, that you will use it only to extract the occurrence
count of each character—then to specify it may hurt performance and will be confus-
ingly at odds with the purpose of the program.
4 Note that although you might expect all Set implementations to be intrinsically unordered, encounter
order for a collection actually depends on whether the speci cation de nes an iteration order: among the
Collections Framework implementations of Set, that is the case for LinkedHashSet and for the implemen-
tations of the Set subinterfaces SortedSet and NavigableSet.
152 Mastering Lambdas
You should not need to understand detail of the stream class implementations to
apply this general rule. For example, you might not expect that unordering a stream
would improve performance in a program like this:
long distinctCount = stringList.parallelStream()
.unordered()
.distinct()
.count();
whereas when executed in parallel, you can never predict the ordering of side-effects:
AtomicInteger counter = new AtomicInteger(1);
IntStream.rangeClosed(1,5).parallel()
.mapToObj(i -> i + ":" + counter.getAndAdd(1) + " ")
.forEachOrdered(System.out::print);
But notice that in neither case can you depend on the order of side-effects.
Chapter 6: Stream Performance 153
The performance of these two programs is compared for different stream lengths in
Figure 6-4, which includes, for reference, a similar comparison for iterative programs.
As expected, the code without boxing easily outperforms the boxed code; for large
data sets the speedup approaches an order of magnitude.
Stream-bearing I/O methods and others like iterate can sometimes bene t from
parallelization, however, if an extra processing step is inserted, pre-buffering their
output in an in-memory data structure, as the discussion of BufferedReader (p. 115)
suggested. Generalizing, this leads to the conclusion that programs using the Stream
API will gain most from parallel speedup when their data source is an in-memory,
random-access structure lending itself to ef cient splitting. (Of course, that does not
prevent programs with other data sources bene ting from the Stream API in other
ways.)
A realistically sized example that illustrates this is the program of Chapter 5 simu-
lating grep -b. Reading a sequentially accessed le into a MappedByteBuffer brings
the data into a highly suitable form for parallelization. An experiment to measure
the performance of this program (http://git.io/-CziKQ) compares the speed of split-
ting a MappedByteBuffer into lines by three different algorithms: iterative, stream
sequential, and stream parallel. The overhead of setting up the MappedByteBuffer
is excluded from the measurements in order to focus on the ef ciency of the split-
ting algorithm. The results are relatively independent of the data set size: sequential
stream processing is between 1.9 and 2.0 times as fast as iterative processing for les
of between 10,000 and a million lines; over the same range, parallel processing is
between 5.2 and 5.4 times as fast as iterative processing. This is unsurprising, given
the ef cient splitting algorithm developed in §5.4.
In fact, Figure 6-5 still does not tell the whole story. Sequential groupingBy cre-
ates new objects, so garbage collection becomes a factor in performance. Parallel GC
Chapter 6: Stream Performance 157
biases the experiment in favor of sequential groupingBy because the cores unused
by the sequential operation are available to run GC in parallel, hiding its cost. Run-
ning the same test with four measurement threads (supplying the option -t 4 to JMH)
results in a break-even point below 10,000 elements.
• This problem is an example of a pre x sum, in which the value of every element
depends on the values of the preceding ones. In a naïve parallel algorithm for
pre x sum, like this one, the total cost of the combine operations (that is,
the total delay imposed by them) is proportional to the size of the input data
set, regardless of the level of parallelism in use.5 The gravity of this problem
depends on the cost of the combine operation; in this case, streaming the
right-hand Deque into an intermediate List imposes an overhead that reduces
speedup, for an intermediate workload, by about 20 percent compared to a
simple iterative updating of the right-hand Deque before it is merged with the
left-hand one.
• The nisher accumulates the results into a ConcurrentHashMap, whose initial
size is currently left at the default value (16). For a realistically sized data set,
5 Java 8 provided java.util.Arrays with various overloads of a new method parallelPrefix for
computing pre x sums ef ciently, but this innovation has not yet reached the Stream API.
158 Mastering Lambdas
When these three changes are applied (http://git.io/wZe_tg), the parallel stream
program shows a speedup of 2.5 (for four cores) over the sequential version, for a data
set of a million elements, each requiring 200 JMH tokens to process.
6.9 Conclusion
The last example emphasizes again that the parallelization mechanism of the Stream
API is by no means bound to produce performance improvement. To achieve that, your
application must fall into the focus area for fork/join—CPU-bound processing of in-
memory, random-access data—and must satisfy the other conditions discussed in this
chapter, including the requirements for suf ciently high data set size and per-element
processing cost. Fork/join parallelism is no silver bullet; the good news, however, is
that your investment in writing parallel-ready code will not be wasted, whatever the
outcome of your performance analysis. Your code will be better and, when advances
in technology change the cost-bene t balance of going parallel, it will—as the name
suggests—be ready.
CHAPTER
7
API Evolution with
Default Methods
160 Mastering Lambdas
D change, and what are its consequences? The short answer to the rst ques-
tion is: to support API evolution. The need for this has been felt for a long
time, but became pressing with the requirement for stream support in collections.
(The longer answer is that, once introduced, default methods have other uses as well,
as we shall see.) The Java Collections Framework—like its extensions, for example
Google Guava—is strongly interface-based: that is, the capabilities of a collection are
(with some exceptions) de ned by the Javadoc contract of its interface. For the existing
collections library to be enhanced to support streams, interface methods like Col-
lection.stream were needed. The alternative—complete replacement of the Java
Collections Framework—would have presented extremely unattractive problems of
compatibility and code maintenance.
So it was decided to change the situation in which adding methods to existing
interfaces created unmanageable compatibility problems. Reviewing these problems
will help us understand the choices that the designers made in implementing that deci-
sion. Consider, for example, adding an abstract method stream to Collection. (But
note that the problem described here applies to all published interfaces, not just those
in the platform library.) Prior to Java 8, Collection declared only abstract methods:
public interface Collection<E> {
int size();
boolean isEmpty();
// 11 more abstract method declarations
}
and a class implemented the interface only if it overrode every abstract method dec-
laration with a concrete method:
class CustomCollection<E> implements Collection<E> {
public int size() { ... }
public boolean isEmpty() { ... }
// 11 more concrete method declarations
}
Now consider the problem of making collections into stream sources. The inter-
face Collection had to be extended with a method stream; without default methods,
that could be done only by adding an abstract method declaration:
public interface Collection<E> {
int size();
boolean isEmpty();
// 11 more abstract methods
Stream<E> stream(); // abstract method - not added in reality
}
Chapter 7: API Evolution with Default Methods 161
For a client, the interface still has the same meaning: it de nes a type and lists
the declarations of the methods that a client can call on that type. For an implemen-
tation, the main element of its relationship with the interface is also unchanged: it is
still required to override the abstract method declarations with concrete ones. A new
element is added to the interface-implementation relationship, however: the imple-
mentation can now choose to override default method declarations—because these
are virtual—and replace or modify their behaviors, just as in traditional Java instance
method overriding.
So now an interface can be extended to allow client code to call new methods
without requiring any immediate change in existing implementations. This removes a
serious block on API evolution, present ever since JDK1.0. It is worth asking why this
block was allowed to remain in place for so long.
Aside from the considerable practical dif culties in making a change of this size,
attitudes to multiple inheritance had to change. Aversion to multiple inheritance was
a strong element in the original design of Java because of the dif culties that it had
brought to other languages, most prominently to C++ in the form of the “diamond
problem” (p. 169). Even so, multiple inheritance of interfaces has always been con-
sidered acceptable because interfaces were essentially type de nitions, and multiple
inheritance of types did not involve the same problems. Introducing default methods
extends the role of interfaces to provide behavior, so multiple inheritance of behavior
would now have to be allowed. The Java 8 designers were able to devise rules that
make the problems with behavior inheritance manageable; the classic problems of
multiple inheritance are associated with inheritance of state, which Java does not and
never will support.
reverse natural order before Java 8, you had to call two static Collections
methods:
Collections.sort(integerList,Collections.reverseOrder());
Now, however, the default method List.sort (together with a static in-
terface method Comparator.reverseOrder) allows this code to be rewritten
in a more readable way:
integerList.sort(Comparator.reverseOrder());
Now suppose that we want to expose a method on BookIntf that will return the
names of its authors concatenated into a single string. That would fall into the category
of convenience methods described in the previous section:
interface BookIntf {
default String getAuthorsAsString() {
return getAuthors().stream().collect(joining(","));
}
...
}
The need for instance state, as in this example, is the main reason for continuing
to use abstract classes. There are other limitations to default methods that mean that
abstract classes are still needed: default methods can only be implemented in terms of
methods on the same interface (together with accessible static methods on any type).
Further, abstract methods can declare protected state and methods to share with their
subclasses; this option is not available to interfaces, all of whose declarations are
automatically public.
place over whether the default keyword should be mandatory, given that it is not
needed to make the syntax unambiguous—an interface method declaration with a
block body can only be a default method. One advantage of making it mandatory is
that it immediately prompts a reader’s understanding, in the same way as the modi er
abstract (also not strictly necessary) does for abstract methods and classes.
The main syntactic difference between default methods and concrete instance
methods is in the modi ers that they are allowed—or, in the case of default,
required—to have. Obviously, default methods may not be declared abstract or
static, since these keywords distinguish the other kinds of interface methods (we
will explore static interface methods in §7.5). They may not be declared final, be-
cause they must always be overridable by instance methods, as we shall see in the next
section. Like all interface methods, they may be declared public but are implicitly
public anyway; no other accessibility is possible.
The keyword this has its usual meaning; it refers to the current ob-
ject, referenced using the type of the interface. For example, the method
Iterator.forEachRemaining has the following implementation (slightly simpli ed
for presentation):
default void forEach(Consumer<T> action) {
for (T t : this) {
action.accept(t);
}
}
The keyword super is also allowed, but only when it is quali ed by the name of
a superinterface, as the next section explains.
Interfaces are not allowed to declare a default method with the same sig-
nature as any of the methods of Object. So, for example, you cannot rede ne
Object.toString in an interface. This is in line with the principles for default method
inheritance that we will explore in the next section: the most important of these prin-
ciples dictates that default methods can never override instance methods, whether
inherited or not.
faces. The language speci cation rules for method call resolution are complex, but
they are carefully designed so that they can be understood by reference to two simple
principles.
The rst principle ensures that instance methods are chosen in preference to de-
fault methods. This is sometimes stated as “classes win over interfaces.” For example,
in the following code FooBar inherits the method hello from both the interface Foo
and the superclass Bar:
// competing instance and default method declarations of hello()
When FooBar.main is run, the output is Hello from Bar. This principle holds
whether the instance method is declared within the class or inherited from a super-
class, and whether the instance method is abstract or concrete. The motivation for this
rule is to prevent behavioral incompatibility: that is, the addition of a default method
resulting in a change in the behavior of an implementing class. If classes always win,
then a class calling a method inherited from a superclass will continue to call that
method even when one of the interfaces it implements introduces a matching method
declaration of its own.
The second principle ensures that if more than one competing default method is in-
herited by a class, the non-overridden default method* is selected. By “non-overridden
method” (this is not a standard term) we mean a default method not overridden by any
other that is also inherited by the class. For example:
// competing default method declarations of hello():
// Bar.hello() overrides Foo.hello(), so is a non-overridden method
That is a reasonable response; the compiler has no basis for choosing between the
two inherited methods, so prompts you to disambiguate the call. You can do this by
making FooBar itself override hello, using a syntactic form (already present in Java,
but until now only used for a different purpose in inner classes) provided to allow
selection of one of the competing methods:
class FooBar implements Foo, Bar {
public String hello() {
return Bar.super.hello();
}
}
The same syntax can also be used in the body of an interface default method. Note
that it can only be used to resolve a con ict, not to override either of the two main
principles. So you cannot use it to select a method that is not non-overridden:
// Foo.hello() is overridden by Bar.hello(), which is inherited by FooBar
Even if FooBar implements both Foo and Bar directly, Foo.hello is not a non-
overridden method and cannot be selected using the super syntax.
How do these principles help to resolve the “diamond problem”? This is the situ-
ation in which a class inherits a declaration by two different routes:
interface Apex { default String hello() { return "Apex"; } }
interface Foo extends Apex {}
interface Bar extends Apex {}
The problem gets its name from the shape of the class diagram:
Apex
Foo Bar
FooBar
would produce the output Hello from Foo. The static type of bar is unimportant;
what counts is that it refers to an instance of FooBar, whose non-overridden method
is inherited from Foo.
If both Foo and Bar declare default implementations, then they con ict, and
FooBar must provide an overriding declaration.
In this version, running Client.main obviously results in the output Hello from
Impl. We now consider two different changes to Intf, each of which adds a method
to Intf that almost matches the declaration of Impl.hello. For example, Intf could
declare a version of hello with the same signature but an incompatible return type:
interface Intf {
default void hello(long l) { }
...
}
Between overriding methods, classes always win; but this is not an overriding
method. Compiling Impl against the new version of Intf produces this message:
Error: foo(long) in Impl cannot implement foo(long) in Intf
return type java.lang.String is not compatible with void
Chapter 7: API Evolution with Default Methods 171
According to the rules for method overloading, this declares a new overload of
hello, which, since it does not override the existing one, is inherited by Impl. So
both the new and the old method overloads are available for the call of hello(3)
from Client and, when Client is compiled, the most speci c one is chosen, as de-
ned by the Java Language Speci cation (§15.12). Since 3 is an int, not a long, the
newly inherited overload is more speci c, and the output will now be Hello from
Intf. The change in behavior takes place without any change in Impl! This example
illustrates the dif culty of making fully compatible language changes in the face of
already complex semantic rules. (Method overloading is notoriously dif cult in this
respect, as we also saw in §2.8.)
A different kind of behavioral incompatibility is unrelated to syntax problems but is
inherent to dynamic method dispatch. A newly-introduced supertype method may be
unable to respect the invariants of an implementing class, because it has no knowledge
of them. For a real-life case, consider Map.putIfAbsent, introduced in Java 8:
default V putIfAbsent(K key, V value) {
V v = get(key);
if (v == null) {
v = put(key, value);
}
return v;
}
This method, if not overridden, will destroy the thread safety of any implementing
Map: between the time at which a thread evaluates the test and the time at which it
executes the action, the value of v could have been set by another thread. The current
thread would then overwrite that value, contrary to the speci cation of the method.
There is no true solution to this problem; in the worst case, as here, all implementations
must be inspected to ensure that they override newly introduced default methods.
172 Mastering Lambdas
Notice that the problems of this section aren’t new: any of them could occur with
class inheritance. What makes them more serious is that, whereas it is expected and
understood that changes in a class hierarchy are liable to cause problems like these, it
is new for Java that interface changes can have these effects. Fortunately, they do not
often occur in practice.
With the perspective of the preceding section of this chapter, however, we can see
the importance of minimizing compatibility problems. Actually, these can be elimi-
nated altogether for static interface methods by restricting the syntax that can be used
for referencing them to the speci c form DeclaringInterface.MethodName. (Ob-
viously, this solution wasn’t available for default methods, since it’s not compatible
with virtual method dispatch.) So one difference from static class methods is that static
interface methods are not inherited:
interface Bar {static void barHello() {} }
class Foo { static void fooHello() {} }
For the same reason, you cannot refer to static interface methods by the syntax
ObjectReference.MethodName.1 In other respects, static interface methods are de-
clared in the same way, and have essentially the same properties as static class meth-
ods. Like other interface methods, they may be declared public but are implicitly
1 The two differences with static class methods—preventing inheritance, and disallowing calls through
public anyway. It is not permitted to declare a static interface method final, since
that would be a meaningless modi er for a method that cannot be inherited anyway.
7.6 Conclusion
It is no easy task to change a programming language that has been in widespread
use for nearly for two decades. When the feature to be changed is as central to the
language as interfaces are to Java, then the problem is even more dif cult. Despite
the problems described in this chapter, the addition of behavior to Java interfaces has,
overall, caused remarkably few dif culties.
This is partly because the features have been strictly tailored to the purpose of
enabling API evolution. Some of the controversy around the introduction of default
174 Mastering Lambdas
methods was based on expectations that they would reproduce features like traits or
mixins in other languages. The absence of state in interfaces prevents that, however.
Similarly, as we have seen in this chapter, expectations that interfaces could now re-
place abstract classes, or make utility classes redundant, are exaggerated. But they do
ful l the purpose of their design, to enable API evolution; the opportunities that this
opens up, initially bene ting the Stream API, will create possibilities for maintaining
and enhancing other APIs far beyond what has been feasible until now.
Conclusion
I
should like to end this book on a personal note. I have long since lost count of
the technologies that I have seen, over four decades, announced as the future of
the industry, only to be forgotten within a year or two. A much smaller number
have been slow burners: technologies that have grown a faithful following without
ever reaching the commercial mainstream. In this category, one that stands out is
functional programming, which has attracted some of the best minds and produced
some of the best ideas in software development while remaining a minority interest.
For myself, although I toyed with functional programming in the 1980s, I did not
become a functional programmer. Instead, I followed the industry mainstream towards
object-oriented programming, and eventually to Java. Nearly 20 years later, Java is, by
most measures, the most popular programming language in the world. But, during
the last few years, I have not felt complacent about the good fortune of my choice; a
programmer who knows only Java would have been missing out on many useful new
programming techniques that were appearing in rival languages. Many of these—
for example, lazy evaluation, closures, and pattern matching—have their origins in
functional languages. And this trend continues: functional programmers are optimistic
about their future, above all because trends in hardware manufacturing technology
and costs mean that massively concurrent systems are the future. Data immutability
will be the key to reasoning about such systems.
Java is not about to become a functional language, but Java programmers should
be able to take advantage of some of the insights that functional programming has de-
veloped. The changes of Java 8 are a first step in that direction. They bring immutability
and lazy evaluation into practical Java programming and so address part of the great
and ongoing challenge of partitioning tasks over multiple processors. Despite the com-
mitment to backward compatibility that makes any change to a 20-year-old language
so difficult, the Java design team has shown impressive ingenuity in integrating the
175
176 Mastering Lambdas
Maurice Naftalin
Pune, August 2014
Index
A B
abstract classes, interfaces vs. 164–165 behaviors
accumulator functions, 74, 76–77 associated with objects, 22
as collector components, 90–94, composing, 15–17
98–100 lambdas representing, 22
collection and, 90 BitSet class, 118
in reduction over 10, 103 bodies, lambda expression,
in reduction over references, 105–107 scoping rules for 22–23
replaced in reference reduction, 108 bottlenecks, concurrent collections and, 90
rules for collectors, 101–102 bound method references
rule for reduction, 107 overview of, 33
anonymous inner classes unbound method references vs.,
lambda expressions vs., 21–23 33–35
transforming into lambda boxing
expressions, 7 auto-boxing/unboxing, 48
API evolution, default methods for. See stream performance and, 153–154
default methods BufferedReader class, 114–115
archives, JarFile class, 117 spliterator performance and, 154–155
Arrays class, stream-bearing methods, 114
arrays
spliterator performance and, 154–155 C
associativity constraint,
collectors, 102 capturing lambdas, 24
reduction, 105 chaining pipelines, 84–88
auto-boxing/unboxing, 48 characteristics method, Spliterator
stream performance and, 153–154 interface, 122
177
178 Mastering Lambdas
R side-effecting operations
ending pipelines and, 63
race conditions, mutation of local variables ordering in stream performance and,
and, 25–26 152–153
Random class, 118 Stream API, 70–71
range method, IntStream interface, 113 SIZED characteristic, Spliterator
rangeClosed method, IntStream interface, 150
interface, 113 skip method, Stream interface, 59–60
recursive decomposition, 13–14 sorting (sorted operation)
recursive lambda definitions, 23 sorting streams, 58–59
reduce methods, Stream interface, 104 split method, Pattern class,
reducing methods, Collectors 116–117
class, 108 splitAsStream method, Pattern class,
reduction operations. See also collection 116–117
collection compared with, 74 Spliterator interface, 119–122
composing collectors with reduction, spliterators
108–109 creating streams from, 122–123
ending pipelines and, 62 example applying, 132–136
mutable reduction, 67, 74 methods, 121–122
over primitive streams, 102–104 overriding methods and, 162–163
over reference streams, 104–108 overview of, 119–121
overview of, 65–67 performance, 147–148, 154–155
parallel mutable reduction, 89 SplittableRandom class, 118–119
regular expressions stateful operations, stream performance,
java.util.regex.Pattern class 153
method, 116–117 stateless (non-capturing) lambdas, 24
types of pattern matching, 127 stateless operations, stream performance,
return keyword, lambda syntax, 21 153
return types, matching function types, 37 statements, lambdas, 21
runtime, measuring dynamic static method references
DCE (dead code elimination), 141 bound method references analogous
garbage collection, 141 to, 33
overview of, 140–141 overview of, 32–33
warmup effects, 141 syntax/lambda equivalent, 32
static methods
S interfaces and, 161, 172–173
using static interface methods, 173
SAM interface. See functional interfaces statistics, in experimental method, 143
scope. See naming scope Stream API
search operations, Stream API, 63–64 debugging, 57–58
advantages of lazy evaluation and, 46 ending pipelines, 62–63
ending pipelines and, 62 examples of stream processing, 53
Optional class and, 65 lazy evaluation and, 45–47
184 Mastering Lambdas
T unboxing
auto-boxing/unboxing, 48
target types, contexts providing, 30–31 stream performance and, 153–154
this keyword, scoping rules and, 23 Unicode characters, 117–118
thread safety, 60–62 Unix pipeline, 44
thrown types, matching function types, unobtrusive parallelism, 13–14
37–38 use cases, for functional interfaces, 29–30
throws clause, lambda expressions
and, 21
toConcurrentMap method, Collectors V
class, 90, 156
toList method, Collectors class, value sequences, streams as, 44–45
76–78, 89 variables
toMap method, Collectors class, 76–78 capture, 23–26
toSet method, Collectors class, 76–78 declaration and assignment, 30
truncating streams, 59–60 scoping rules for lambdas, 23–26
tryAdvance method, Spliterator
interface, 121, 134
trySplit method, Spliterator
W
interface, 121, 133–134
type checking warmup effects, measuring dynamic
lambda expressions and, 35–38 runtime, 141
overloading with lambda wildcards, in class diagrams, 27
expressions, 40 workload, influence in choosing execution
type variables, 27 mode, 147
U Z
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates.
Beta Test
Oracle
Software
Get a first look at our newest products—and help
perfect them. You must meet the following criteria:
If your interests match upcoming activities, we’ll contact you. Profiles are kept on file for 12 months.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates.