0% found this document useful (0 votes)
0 views5 pages

Depth of Parallel Stream in Java

The document explains the workings of Java parallel streams, detailing how they utilize the ForkJoin Framework for processing data in parallel. It covers the use of Spliterators for efficient task splitting, characteristics affecting parallelism, common pitfalls, and performance tips. Additionally, it provides examples of custom ForkJoinPools and parallel collectors to illustrate best practices.

Uploaded by

splusmm24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views5 pages

Depth of Parallel Stream in Java

The document explains the workings of Java parallel streams, detailing how they utilize the ForkJoin Framework for processing data in parallel. It covers the use of Spliterators for efficient task splitting, characteristics affecting parallelism, common pitfalls, and performance tips. Additionally, it provides examples of custom ForkJoinPools and parallel collectors to illustrate best practices.

Uploaded by

splusmm24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Inside Parallel Streams in Java

- by utk

Great — diving into deep-level understanding of Java parallel streams requires looking
at:

🔹 1. What Actually Happens in parallelStream()?


When you call collection.parallelStream(), Java uses the ForkJoin Framework
under the hood — specifically, the ForkJoinPool.commonPool() — to split your stream
into smaller substreams, process them in parallel threads, and combine the results.

Internals:

• Uses work-stealing algorithm: idle threads "steal" tasks from busy ones.
• Operates on the Spliterator (split + iterator) to divide the source.
• Default parallelism: number of available processors
(Runtime.getRuntime().availableProcessors()), unless overridden.

🔹 2. How It Splits Work (Using Spliterator)


Each data source (like List, Set, etc.) provides a Spliterator. It's like an advanced iterator
that can split itself for parallel processing.

java
CopyEdit
List<Integer> list = Arrays.asList(1, 2, 3, 4);
Spliterator<Integer> spliterator = list.spliterator();

Spliterator<Integer> firstHalf = spliterator.trySplit();

Efficient splitting = better parallel performance.


Poor splitting (e.g., linked lists) = worse performance.
🔹 3. Stream Characteristics That Affect Parallelism
Characteristic Impact on Parallel Stream
SIZE known Helps divide tasks optimally
May cause extra effort to maintain
ORDERED
order
NON-NULL / DISTINCT Reduces validation overhead
IMMUTABLE Thread-safe, better for parallelism

Check with:

java
CopyEdit
Spliterator<T> spliterator = stream.spliterator();
int characteristics = spliterator.characteristics();

🔹 4. Common Pitfalls and Misuses

❌ Modifying Shared State (Not Thread-Safe)

java
CopyEdit
List<Integer> list = new ArrayList<>();
IntStream.range(0, 1000).parallel().forEach(list::add); // Unsafe

✅ Use Collectors for Thread Safety

java
CopyEdit
List<Integer> list = IntStream.range(0, 1000)
.parallel()
.boxed()
.collect(Collectors.toList());

🔹 5. When NOT to Use Parallel Streams


Situation Why to Avoid Parallel Streams
Small data sets Overhead > performance gain
Dependent data Threads might block or wait
IO-bound tasks Threads can block each other
Stateful or side-effect
May cause race conditions
logic
Unordered results Causes extra work to maintain
required order

🔹 6. Performance Tips
Tip Explanation
Prefer arrays or ArrayList They split better (random access is fast)
Use IntStream, LongStream, etc. to avoid
Avoid boxing/unboxing
performance hit
Minimize stateful operations Stateless lambdas are much safer
Measure using
For precise benchmarking
System.nanoTime()
Consider a custom To avoid overloading commonPool (used
ForkJoinPool elsewhere)

🔹 7. Custom ForkJoinPool Example


java
CopyEdit
ForkJoinPool pool = new ForkJoinPool(4); // Limit parallelism to 4
threads

List<Integer> data = IntStream.range(1,


100_000).boxed().collect(Collectors.toList());

try {
int sum = pool.submit(() ->
data.parallelStream()
.mapToInt(i -> i)
.sum()
).get();

System.out.println("Sum: " + sum);


} catch (Exception e) {
e.printStackTrace();
}

🔹 8. Parallel Collector Example


java
CopyEdit
Map<Boolean, List<Integer>> result = IntStream.range(1, 1000)
.parallel()
.boxed()
.collect(Collectors.partitioningBy(i -> i % 2 == 0));

System.out.println("Even numbers: " + result.get(true).size());

🔹 Summary Table
Sequential
Aspect Parallel Stream
Stream
Multi-threaded
Threading Single-threaded
(ForkJoinPool)
Order Preserved May not preserve
Performance (small
Faster Slower (due to overhead)
data)
Performance (large
Slower Can be faster
data)
Safety Simpler Must manage shared state

You might also like