Inside Parallel Streams in Java
- by utk
Great — diving into deep-level understanding of Java parallel streams requires looking
at:
🔹 1. What Actually Happens in parallelStream()?
When you call collection.parallelStream(), Java uses the ForkJoin Framework
under the hood — specifically, the ForkJoinPool.commonPool() — to split your stream
into smaller substreams, process them in parallel threads, and combine the results.
Internals:
• Uses work-stealing algorithm: idle threads "steal" tasks from busy ones.
• Operates on the Spliterator (split + iterator) to divide the source.
• Default parallelism: number of available processors
(Runtime.getRuntime().availableProcessors()), unless overridden.
🔹 2. How It Splits Work (Using Spliterator)
Each data source (like List, Set, etc.) provides a Spliterator. It's like an advanced iterator
that can split itself for parallel processing.
java
CopyEdit
List<Integer> list = Arrays.asList(1, 2, 3, 4);
Spliterator<Integer> spliterator = list.spliterator();
Spliterator<Integer> firstHalf = spliterator.trySplit();
Efficient splitting = better parallel performance.
Poor splitting (e.g., linked lists) = worse performance.
🔹 3. Stream Characteristics That Affect Parallelism
Characteristic Impact on Parallel Stream
SIZE known Helps divide tasks optimally
May cause extra effort to maintain
ORDERED
order
NON-NULL / DISTINCT Reduces validation overhead
IMMUTABLE Thread-safe, better for parallelism
Check with:
java
CopyEdit
Spliterator<T> spliterator = stream.spliterator();
int characteristics = spliterator.characteristics();
🔹 4. Common Pitfalls and Misuses
❌ Modifying Shared State (Not Thread-Safe)
java
CopyEdit
List<Integer> list = new ArrayList<>();
IntStream.range(0, 1000).parallel().forEach(list::add); // Unsafe
✅ Use Collectors for Thread Safety
java
CopyEdit
List<Integer> list = IntStream.range(0, 1000)
.parallel()
.boxed()
.collect(Collectors.toList());
🔹 5. When NOT to Use Parallel Streams
Situation Why to Avoid Parallel Streams
Small data sets Overhead > performance gain
Dependent data Threads might block or wait
IO-bound tasks Threads can block each other
Stateful or side-effect
May cause race conditions
logic
Unordered results Causes extra work to maintain
required order
🔹 6. Performance Tips
Tip Explanation
Prefer arrays or ArrayList They split better (random access is fast)
Use IntStream, LongStream, etc. to avoid
Avoid boxing/unboxing
performance hit
Minimize stateful operations Stateless lambdas are much safer
Measure using
For precise benchmarking
System.nanoTime()
Consider a custom To avoid overloading commonPool (used
ForkJoinPool elsewhere)
🔹 7. Custom ForkJoinPool Example
java
CopyEdit
ForkJoinPool pool = new ForkJoinPool(4); // Limit parallelism to 4
threads
List<Integer> data = IntStream.range(1,
100_000).boxed().collect(Collectors.toList());
try {
int sum = pool.submit(() ->
data.parallelStream()
.mapToInt(i -> i)
.sum()
).get();
System.out.println("Sum: " + sum);
} catch (Exception e) {
e.printStackTrace();
}
🔹 8. Parallel Collector Example
java
CopyEdit
Map<Boolean, List<Integer>> result = IntStream.range(1, 1000)
.parallel()
.boxed()
.collect(Collectors.partitioningBy(i -> i % 2 == 0));
System.out.println("Even numbers: " + result.get(true).size());
🔹 Summary Table
Sequential
Aspect Parallel Stream
Stream
Multi-threaded
Threading Single-threaded
(ForkJoinPool)
Order Preserved May not preserve
Performance (small
Faster Slower (due to overhead)
data)
Performance (large
Slower Can be faster
data)
Safety Simpler Must manage shared state