First Part of This Tutorial On The Java 8 Stream Api: Map Maptoint Maptolong Maptodouble
First Part of This Tutorial On The Java 8 Stream Api: Map Maptoint Maptolong Maptodouble
In the first part of this tutorial on the Java 8 Stream API, we covered what streams are and
some of their most common operations.
Without further ado, let's continue with the methods used to program streams in a functional
style. After that, we'll take a look at parallel streams.
Map
map() is used to transform the value or the type of the elements of a stream:
As you can see, map() takes a Function to convert the elements of a stream of type T to type
R, returning a stream of that type R:
The output:
Will output:
FlatMap
flatMap() is used to flatten (or combine) the elements of a stream into one (new) stream:
From its signature (and the signature of the primitive versions) we can see that, in contrast to
map() which returns a single value, flatMap() must return a Stream. If flatMap() maps to
null, the return value will be an empty stream, not null itself.
Let's see how this works. Suppose we have a stream comprising lists of characters:
We want to convert all the characters to their int representation. Notice through the code
below that we we can't use map() anymore; c represents an object of type List<Character>,
not Character:
Instead, we need to get the elements of the lists into one stream and then convert each
character to an int. Fortunately, we have flatMap() to combine the list elements into a single
Stream object:
stream
.flatMap(l -> l.stream())
.map(c -> (int)c)
.forEach(i -> System.out.format("%d ", i));
Using peek() (which just executes the provided expression and returns a new stream with the
same elements of the original one) after flatMap() may clarify how the elements are
processed:
stream
.flatMap(l -> l.stream())
.peek(System.out::print)
.map(c -> (int)c)
.forEach(i -> System.out.format("%d ", i));
As you can see from the output, the stream returned from flatMap() is passed through the
pipeline, as if we were working with a stream of single elements rather than a stream of lists
of elements:
Reduction
A reduction is an operation that takes many elements and combines them to reduce them into
a single value or object. Reduction is done by applying an operation multiple times.
Some examples of reductions include summing N elements, finding the maximum element of
N numbers, or counting elements.
In the following example, we use a for loop to reduce an array of numbers to their sum:
java
Of course, making reductions with streams instead of loops has benefits, such as easier
parallelization and improved readability.
1
2
collect()
reduce()
We can implement reductions with both of these methods, but collect() helps us implement
a type of reduction called mutable reduction, where a container (like a Collection) is used
to accumulate the result of the operation.
T reduce(T identity,
BinaryOperator<T> accumulator)
<U> U reduce(U identity,
BiFunction<U,? super T,U> accumulator,
BinaryOperator<U> combiner)
Let's start with the version that takes one argument. This is equivalent to:
This code just applies a function for each element, accumulating the result and returning an
Optional wrapping that result, or an empty Optional if there were no elements.
Let's see a concrete example. We just see how a sum is a reduce operation:
Notice how the primitive version of Stream uses the primitive version of Optional.
However, what if you need to have an initial value? For cases like that, we have the version
that takes two arguments:
java
The first argument is the initial value, and it is called the identity because, strictly speaking,
this value must be an identity for the accumulator function. In other words, for each value v,
accumulator.apply(identity, v) must be equal to v.
T result = identity;
for (T element : stream) {
result = accumulator.apply(result, element);
}
return result;
Notice that this version does not return an Optional object because if the stream empty, the
identity value is returned.
However, notice that in the example above, the first value cannot be considered an identity
because, for instance, 4 + 1 is not equal to 4.
This can bring some problems when working with parallel streams, which we'll review in a
few moments.
Now, notice that with these versions, you take elements of type T and return a reduced value
of type T as well.
However, if you want to return a reduced value of a different type, you have to use the three
arguments version of reduce():
U result = identity;
for (T element : stream) {
result = accumulator.apply(result, element)
}
return result;
Consider for example that we want to get the sum of the length of all strings of a stream, so
we take strings (type T), and we want an integer result (type U).
int length =
Stream.of("Parallel", "streams", "are", "great")
.reduce(0,
(accumInt, str) ->
accumInt + str.length(), //accumulator
(accumInt1, accumInt2) ->
accumInt1 + accumInt2);//combiner
int length =
Stream.of("Parallel", "streams", "are", "great")
.reduce(0,
(Integer accumInt, String str) ->
accumInt + str.length(), //accumulator
(Integer accumInt1, Integer accumInt2) ->
accumInt1 + accumInt2);//combiner
As the accumulator function adds a mapping (transformation) step to the accumulator
function, this version of the reduce() can be written as a combination of map() and the other
versions of the reduce() method (you may know this as the map-reduce pattern):
int length =
Stream.of("Parallel", "streams", "are", "great")
.mapToInt(s -> s.length())
.reduce(0,
(sum, strLength) ->
sum + strLength);
Or simply:
In fact, the calculation operations that we learned about in the first part are implemented as
reduce operations under the hood:
average
count
max
min
sum
Also, notice that if we return a value of the same type, the combiner function is no longer
necessary (it turns out that this function is the same as the accumulator function). So, in this
case, it's better to use the two argument version.
Remember that collect() performs a mutable reduction on the elements of a stream, which
means that it uses a mutable object for accumulating, like a Collection or a
StringBuilder. In contrast, reduce() combines two elements to produce a new one and
represents an immutable reduction.
However, let's start with the version that takes three arguments, as it's similar to the reduce()
version that also takes three arguments.
As you can see from its signature, first, it takes a Supplier that returns the object that will be
used and returned as a container (accumulator).
The second parameter is an accumulator function, which takes the container and the element
to be added to it.
The third parameter is the combiner function, which merges the intermediate results into the
final one (useful when working with parallel streams).
R result = supplier.get();
for (T element : stream) {
accumulator.accept(result, element);
}
return result;
For example, if we want to "reduce" or "collect" all the elements of a stream into a List, use
the following algorithm:
List<Integer> list =
Stream.of(1, 2, 3, 4, 5)
.collect(
() -> new ArrayList<>(),// Creating the container
(l, i) -> l.add(i), // Adding an element
(l1, l2) -> l1.addAll(l2) // Combining elements
);
List<Integer> list =
Stream.of(1, 2, 3, 4, 5)
.collect(
() -> new ArrayList<>(),
(l, i) -> l.add(i),
(l1, l2) -> l1.addAll(l2)
);
List<Integer> list =
Stream.of(1, 2, 3, 4, 5)
.collect(
ArrayList::new,
ArrayList::add,
ArrayList::addAll
);
Collectors
The previous version of collect() is useful to learn how collectors work, but in practice, it's
better to use the other version.
Since calculation methods can be implemented as reductions, the Collectors class also
provides them as collectors:
List<Integer> list =
Stream.of(1, 2, 3, 4, 5)
.collect(
ArrayList::new,
ArrayList::add,
ArrayList::addAll
);
As:
List<Integer> list =
Stream.of(1, 2, 3, 4, 5)
.collect(Collectors.toList()); // [1, 2, 3, 4, 5]
Since all these methods are static, we can use static imports.
If we are working with streams of strings, we can join all the elements into one String with:
The calculation methods are easy to use. Except for counting(), they either take a Function
to produce a value to apply the operation or (in the case of maxBy and minBy) they take a
Comparator to produce the result:
Stream.of(1, 2, 3)
.collect(maxBy(Comparator.naturalOrder()))
.ifPresent(System.out::println); // 3
The Collectors class also provides two functions to group the elements of a stream into a
list, in a kind of an SQL GROUP BY style.
The first method is groupingBy() and it has three versions. This is the first one:
It takes a Function that classifies elements of type T, groups them into a list and returns the
result in a Map where the keys (of type K) are the Function returned values.
For example, if we want to group a stream of numbers by the range they belong (tens,
twenties, etc.), we can do it with something like this:
The moment you compare this code with the iterative method (with a for loop), you realize
the power of streams and collect(). Just look at how many lines of code are used in the
traditional implementation.
List<Integer> stream =
Arrays.asList(2,34,54,23,33,20,59,11,19,37);
Map<Integer, List<Integer>> map = new HashMap<>();
for(Integer i : stream) {
int key = i/10 * 10;
List<Integer> list = map.get(key);
if(list == null) {
list = new ArrayList<>();
map.put(key, list);
}
list.add(i);
}
We can use any collector here, for instance, to count the elements in each group of the
previous example:
Notice how the type of the values of the Map change to reflect the type returned by the
downstream collector, counting().
We can even use another groupingBy() to classify the elements in a second level. For
instance, instead of counting, we can further classify the elements as even or odd:
{
0 = {EVEN=[2]},
50 = {EVEN=[54], ODD=[59]},
20 = {EVEN=[20], ODD=[23]},
10 = {ODD=[11, 19]},
30 = {EVEN=[34], ODD=[33, 37]}
}
The key to the high-level map is an Integer because the first groupingBy() returns a one.
The type of the values of the high-level map changed (again) to reflect the type returned by
the downstream collector, groupingBy().
In this case, a String is returned; this will be the type of the keys of the second-level map,
and since we are working with an Integer Stream, the values have a type of
List<Integer>.
Seeing the output of these examples, you may be wondering, is there a way to order the
results?
Well, TreeMap is the only implementation of Map that is ordered. Fortunately, the third
version of groupingBy() has a Supplier argument that lets us choose the type of the
resulting Map:
{
0 = {EVEN=[2]},
10 = {ODD=[11, 19]},
20 = {EVEN=[20], ODD=[23]},
30 = {EVEN=[34], ODD=[33, 37]},
50 = {EVEN=[54], ODD=[59]}
}
partitioningBy()
The second method for grouping is partitioningBy().
The difference with groupingBy() is that partitioningBy() will return a Map with a
Boolean as the key type, which means there are only two groups, one for true and one for
false.
There are two versions of this method. The first one is:
It partitions the elements according to a Predicate and organizes them into a Map<Boolean,
List<T>>.
For example, if we want to partition a stream of numbers by the ones that are less than 50 and
the ones that don't, we can do it this way:
As you can see, because of the Predicate, the map will always have two elements.
And like groupingBy(), this method has a second version that takes a downstream collector.
For example, if we want to remove duplicates, we just have to collect the elements into a Set
like this:
java
However, unlike groupingBy(), there's no version that allows us to change the type of the
Map returned. However, we only need two keys for our groups.
Parallel streams split the stream into multiple parts. Each part is processed by a different
thread at the same time (in parallel).
This means that, by default, the number of threads available to process parallel streams equals
the number of available cores in your machine's processor (CPU).
Stream<String> parallelStream =
Stream.of("a","b","c").parallel();
You can turn a parallel stream into a sequential one with the sequential() method:
stream
.parallel()
.filter(..)
.sequential()
.forEach(...);
stream.parallel().isParallel(); // true
And turn an ordered stream into an unordered one (or ensure that the stream is unordered)
with unordered();
stream
.parallel()
.unordered()
.collect(...);
But how do parallel streams work? Let's start with the simplest example:
Stream.of("a","b","c","d","e")
.forEach(System.out::print);
Printing a list of elements with a sequential stream will output the expected result:
abcde
Stream.of("a","b","c","d","e")
.parallel()
.forEach(System.out::print);
Going back to the definition of parallel streams, this output starts making sense. The
differences in output can be attributed to thread processing; it is possible that a different core
is involved with a particular command each time the code is executed.
Thus parallel streams are more appropriate for operations where the order of processing
doesn't matter and the operations don't need to keep a state (stateless and independent
operations).
In the first part, we mentioned that findFirst() method returns the first element of a stream.
But since we're using parallel streams, this method has to "know" which element is the first
one:
The output:
Because of that, if the order doesn't matter, it's better to use findAny() with parallel streams:
The output:
Since a parallel stream is processed by multiple cores, it's reasonable to believe that it will be
processed faster than a sequential stream. But as you can see with findFirst(), this is not
always the case.
For example:
Stream<T> distinct()
Stream<T> sorted()
Stream<T> sorted(Comparator<? super T> comparator)
Stream<T> limit(long maxSize)
Stream<T> skip(long n)
The stateful operations above incorporate state from previously processed elements and
usually need to go through the entire stream to produce a result. Thus they work better with
sequential streams since they end up looking through the stream anyway.
But don't believe that by first executing the stateful operations in a sequential format and then
turning the stream into a parallel one, the performance will be better in all cases. It would be
worse to assume that the entire operation may run in parallel, like the following example:
Filter:c
Map:c
cc
Filter:a
Map:a
aa
Filter:b
Map:b
bb
Filter:d
Filter:e
Map:e
ee
79.470779 milliseconds
Compare this with the output of the sequential version (just comment out .parallel()):
Filter:a
Map:a
aa
Filter:b
Map:b
bb
Filter:c
Map:c
cc
Filter:d
Filter:e
Map:e
ee
1.554562 milliseconds
But if we have an independent or stateless operation, and order doesn't matter, such as with
counting the number of odd numbers in a large range, the parallel version will perform better:
In summary, parallel streams don't always perform better than sequential streams when it
comes to stateful operations, but they usually perform better when ordering is not an issue
and operations are independent and stateless.
This, the fact that parallel streams process results independently, and the idea that the order
cannot be guaranteed are the most important things you need to know.
In practice, how do you know when to use sequential or parallel streams for better
performance?
For a small set of data, sequential streams are almost always the best choice due to the
overhead of the parallelism. Using parallel streams is simply unnecessary.
Typically avoid using parallel streams with stateful (like sorted()) and order-based
(like findFirst()) operations. Sequential streams do just fine (if not better) in these
cases.
Use parallel streams with operations that are computationally expensive (considering
all the operation in the pipeline).
When in doubt, check the performance with an appropriate benchmark. To
demonstrate, I used an execution time comparison, but this is just one benchmark.
You may need your program to compile faster or use less memory.
This is because if you mutate the state of variables (especially if they are shared by more than
one thread), you may run into invalid states.
Consider this example, which implements the factorial of 10 in a very particular way:
class Total {
public long total = 1;
public void multiply(long n) { total *= n; }
}
...
Total t = new Total();
LongStream.rangeClosed(1, 10)
.forEach(t::multiply);
System.out.println(t.total);
Here, we are using a variable to gather the result of the factorial. The output of executing this
snippet of code is:
3628800
LongStream.rangeClosed(1, 10)
.parallel()
.forEach(t::multiply);
The problem is caused by the multiple threads accessing the variable total concurrently. Yes,
we can synchronize the access to this variable, but that defeats the purpose of parallelism.
Remember that reduce() combines the elements of a stream into a single one.
With parallel streams, this method creates intermediate values and then combines them,
avoiding the ordering problem while still allowing streams to be processed in parallel by
eliminating the shared state and keeping it inside the reduction process.
The only requirement is that the applied reducing operation must be associative.
(a op b) op c == a op (b op c)
Or:
a op b op c op d == (a op b) op (c op d)
We can implement our example using parallel() and reduce() in this way:
When we execute this snippet of code, it produces the correct result every time (3628800).
Reduce guaranteed that the threads would not access the same stream entries simultaneously
and throw off the results.
Plus, if we time the execution of the first snippet and this last one, we can see a drastic
improvement in performance.
We can safely use collect() with parallel streams if we follow the same requirements of
associativity and identity. (For example, combining any partially accumulated result with an
empty result container must produce an equivalent result.)
Or, if we are grouping with the Collectors class and ordering is not important, we can use
the method groupingByConcurrent(), the concurrent version of groupingBy().
If you understand when to use parallel streams and the issues associated with concurrent
execution, you should be ready to use parallel streams in practice!
Conclusion
We touched the most important parts of the Stream interface. I hope you find streams useful.
Please post all your comments and feedback in the discussion section below. Thanks for
reading.
We use cookies to make interactions with our websites and services easy and meaningful. For
more information about the cookies we use or to find out how you can disable cookies, click
here.
Features
·
Authors
·
Mobile & offline apps
·
Blog
·
Help center
·
How-to videos
·