Mastering Concurrency Programming With Java 8 - Sample Chapter
Mastering Concurrency Programming With Java 8 - Sample Chapter
Mastering Concurrency Programming With Java 8 - Sample Chapter
P U B L I S H I N G
pl
C o m m u n i t y
E x p e r i e n c e
D i s t i l l e d
$ 54.99 US
34.99 UK
Sa
m
Mastering Concurrency
Programming with Java 8
Mastering Concurrency
Programming with Java 8
ee
Mastering Concurrency
Programming with Java 8
Master the principles and techniques of multithreaded
programming with the Java 8 concurrency API
Preface
Nowadays, computer systems (and other related systems, such as tablets or
smartphones) allow you to do several tasks simultaneously. This can be possible
because they have concurrent operating systems that control several tasks at the
same time. You can also have one application that executes several tasks (read a file,
show a message, or read data over a network) if you work with the concurrency API
of your favorite programming language. Java includes a very powerful concurrency
API that allows you to implement any kind of concurrency application with little
effort. This API increases the features provided to programmers in every version.
Now, in Java 8, it has included the stream API and new methods and classes to
facilitate the implementation of concurrent applications. This book covers the most
important elements of the Java concurrency API, showing you how to use them in
real-world applications. These elements are as follows:
The Phaser class, to execute tasks that can be divided into phases
Preface
Preface
[1]
Synchronization
In concurrency, we can define synchronization as the coordination of two or more
tasks to get the desired results. We have two kinds of synchronization:
Control synchronization: When, for example, one task depends on the end
of another task, the second task can't start before the first has finished
[2]
Chapter 1
Keep in mind that synchronization helps you avoid some errors you can have with
concurrent tasks (they will be described later in this chapter), but it introduces some
overhead to your algorithm. You have to calculate very carefully the number of
tasks, which can be performed independently without intercommunication in your
parallel algorithm. It's the granularity of your concurrent algorithm. If you have a
coarse-grained granularity (big tasks with low intercommunication), the overhead
due to synchronization will be low. However, maybe you won't benefit all the
cores of your system. If you have a fine-grained granularity (small tasks with high
intercommunication), the overhead due to synchronization will be high and maybe
the throughput of your algorithm won't be good.
There are different mechanisms to get synchronization in a concurrent system.
The most popular mechanisms from a theoretical point of view are:
The last concept related to synchronization you're going to learn in this chapter is
thread safety. A piece of code (or a method or an object) is thread-safe if all the
users of shared data are protected by synchronization mechanisms, a nonblocking
compare-and-swap (CAS) primitive or data is immutable, so you can use that code
in a concurrent application without any problem.
Immutable object
An immutable object is an object with a very special characteristic. You can't modify
its visible state (the value of its attributes) after its initialization. If you want to
modify an immutable object, you have to create a new one.
Its main advantage is that it is thread-safe. You can use it in concurrent applications
without any problem.
An example of an immutable object is the String class in Java. When you assign a
new value to a String object, you are creating a new string.
[3]
Data race
You can have a data race (also named race condition) in your application when you
have two or more tasks writing a shared variable outside a critical sectionthat's to
say, without using any synchronization mechanisms.
[4]
Chapter 1
Under these circumstances, the final result of your application may depend on the
order of execution of the tasks. Look at the following example:
package com.packt.java.concurrency;
public class Account {
private float balance;
public void modify (float difference) {
float value=this.balance;
this.balance=value+difference;
}
}
Imagine that two different tasks execute the modify() method in the same Account
object. Depending on the order of execution of the sentences in the tasks, the final
result can vary. Suppose that the initial balance is 1000 and the two tasks call the
modify() method with 1000 as a parameter. The final result should be 3000, but if
both tasks execute the first sentence at the same time and then the second sentence at
the same time, the final result will be 2000. As you can see, the modify() method is
not atomic and the Account class is not thread-safe.
Deadlock
There is a deadlock in your concurrent application when there are two or more tasks
waiting for a shared resource that must be free from the other, so none of them will
get the resources they need and will be blocked indefinitely. It happens when four
conditions happen simultaneously in the system. They are Coffman's conditions,
which are as follows:
There exist some mechanisms that you can use to avoid deadlocks:
Ignore them: This is the most commonly used mechanism. You suppose that
a deadlock will never occur on your system, and if it occurs, you can see the
consequences of stopping your application and having to re-execute it.
Detection: The system has a special task that analyzes the state of the system
to detect if a deadlock has occurred. If it detects a deadlock, it can take
action to remedy the problem. For example, finishing one task or forcing the
liberation of a resource.
Livelock
A livelock occurs when you have two tasks in your systems that are always
changing their states due to the actions of the other. Consequently, they are in
a loop of state changes and unable to continue.
For example, you have two tasksTask 1 and Task 2and both need two
resources: Resource 1 and Resource 2. Suppose that Task 1 has a lock on Resource 1,
and Task 2 has a lock on Resource 2. As they are unable to gain access to the resource
they need, they free their resources and begin the cycle again. This situation can
continue indefinitely, so the tasks will never end their execution.
Resource starvation
Resource starvation occurs when you have a task in your system that never gets a
resource that it needs to continue with its execution. When there is more than one
task waiting for a resource and the resource is released, the system has to choose
the next task that can use it. If your system has not got a good algorithm, it can
have threads that are waiting for a long time for the resource.
[6]
Chapter 1
Fairness is the solution to this problem. All the tasks that are waiting for a resource
must have the resource in a given period of time. An option is to implement
an algorithm that takes into account the time that a task has been waiting for a
resource when it chooses the next task that will hold a resource. However, fair
implementation of locks requires additional overhead, which may lower your
program throughput.
Priority inversion
Priority inversion occurs when a low-priority task holds a resource that is needed
by a high-priority task, so the low-priority task finishes its execution before the
high-priority task.
[7]
Step 1 analysis
In this step, we are going to analyze the sequential version of the algorithm to look
for the parts of its code that can be executed in a parallel way. We should pay special
attention to those parts that are executed most of the time or that execute more code
because, by implementing a concurrent version of those parts, we're going to get a
greater performance improvement.
Good candidates for this process are loops where one step is independent of the
other steps or portions of code that are independent of other parts of the code (for
example, an algorithm to initialize an application that opens the connections with the
database, loads the configuration files, initialize some objects. All the previous tasks
are independent of each other).
Step 2 design
Once you know what parts of the code you are going to parallelize, you have to
decide how to do that parallelization.
The changes in the code will affect two main parts of the application:
Task decomposition: You do task decomposition when you split the code in
two or more independent tasks that can be executed at once. Maybe some of
these tasks have to be executed in a given order or have to wait at the same
point. You must use synchronization mechanisms to get this behavior.
[8]
Chapter 1
Another important point to keep in mind is the granularity of your solution. The
objective of implementing a parallel version of an algorithm is to achieve improved
performance, so you should use all the available processors or cores. On the other
hand, when you use a synchronization mechanism, you introduce some extra
instructions that must be executed. If you split the algorithm into a lot of small tasks
(fine-grained granularity), the extra code introduced by the synchronization can
provoke performance degradation. If you split the algorithm into fewer tasks than
cores (coarse-grained granularity), you are not taking advantage of all the resources.
Also, you must take into account the work every thread must do, especially if you
implement a fine-grained granularity. If you have a task longer than the rest, that
task will determine the execution time of the application. You have to find the
equilibrium between these two points.
Step 3 implementation
The next step is to implement the parallel algorithm using a programming language
and, if it's necessary, a thread library. In the examples of this book, you are going to
use Java to implement all the algorithms.
Step 4 testing
After finishing the implementation, you have to test the parallel algorithm. If you
have a sequential version of the algorithm, you can compare the results of both
algorithms to verify that your parallel implementation is correct.
Testing and debugging a parallel implementation are difficult tasks because
the order of execution of the different tasks of the application is not guaranteed.
In Chapter 11, Testing and Monitoring Concurrent Applications, you will learn tips,
tricks, and tools to do these tasks efficiently.
Step 5 tuning
The last step is to compare the throughput of the parallel and the sequential
algorithms. If the results are not as expected, you must review the algorithm,
looking for the cause of the bad performance of the parallel algorithm.
You can also test different parameters of the algorithm (for example, granularity or
number of tasks) to find the best configuration.
[9]
There are different metrics to measure the possible performance improvement you
can obtain parallelizing an algorithm. The three most popular metrics are:
Speedup =
Tsequential
Tconcurrent
Here, Tsequential is the execution time of the sequential version of the algorithm
and Tconcurrent is the execution time of the parallel version.
Speedup
(1 P ) +
P
N
Here, P is the percentage of code that can be parallelized and N is the number
of cores of the computer where you're going to execute the algorithm.
For example, if you can parallelize 75% of the code and you have four cores,
the maximum speedup will be given by the following formula:
Speedup
1
0.75
(1 0.75) +
1
2.29
0.44
[ 10 ]
Chapter 1
If we use the same example as before, the scaled speedup calculated by the
Gustafson law is:
Conclusion
In this section, you learned some important issues you have to take into account
when you want to parallelize a sequential algorithm.
First of all, not every algorithm can be parallelized. For example, if you have to
execute a loop where the result of an iteration depends on the result of the previous
iteration, you can't parallelize that loop. Recurrent algorithms are another example of
algorithms that can be parallelized for a similar reason.
Another important thing you have to keep in mind is that the sequential version
of an algorithm with better performance can be a bad starting point to parallelize
it. If you start parallelizing an algorithm and you find yourself in trouble because
you don't easily find independent portions of the code, you have to look for other
versions of the algorithm and verify that the version can be parallelized in an
easier way.
Finally, when you implement a concurrent application (from scratch or based on a
sequential algorithm), you must take into account the following points:
Efficiency: The parallel algorithm must end in less time than the sequential
algorithm. The first goal of parallelizing an algorithm is that its running time
is less than the sequential one, or it can process more data in the same time.
[ 11 ]
The Thread class: This class represents all the threads that execute a
concurrent Java application
The ThreadFactory interface: This is the base of the Factory design pattern
that you can use to create customized threads
Synchronization mechanisms
The Java concurrency API includes different synchronization mechanisms that allow
you to:
[ 12 ]
Chapter 1
The Semaphore class: The class that implements the classical semaphore to
implement synchronization. Java supports binary and general semaphores.
The CountDownLatch class: A class that allows a task to wait for the
finalization of multiple operations.
The Phaser class: A class that allows you to control the execution of tasks
divided into phases. None of the tasks advance to the next phase until all of
the tasks have finished the current phase.
Executors
The executor framework is a mechanism that allows you to separate thread creation
and management for the implementation of concurrent tasks. You don't have to
worry about the creation and management of threads, only about creating tasks and
sending them to the executor. The main classes involved in this framework are:
The Future interface: This is an interface that includes the methods to obtain
the value returned by a Callable interface and to control its status
[ 13 ]
Parallel streams
Streams and Lambda expressions are maybe the two most important new features
of the Java 8 version. Streams have been added as a method in the Collection
interface and other data sources and allow processing all elements of a data
structure, generating new structures, filtering data and implementing algorithms
using the map and reduce technique.
A special kind of stream is a parallel stream which realizes its operations in a parallel
way. The most important elements involved in the use of parallel streams are:
The Stream interface: This is an interface that defines all the operations that
you can perform on a stream.
[ 14 ]
Chapter 1
Blocking data structures: These include methods that block the calling task
when, for example, the data structure is empty and you want to get a value.
Concurrency also has its own design patterns. In this section, we describe some
of the most useful concurrency design patterns and their implementation in the
Java language.
Signaling
This design pattern explains how to implement the situation where a task has to
notify an event to another task. The easiest way to implement this pattern is with a
semaphore or a mutex, using the ReentrantLock or Semaphore classes of the Java
language or even the wait() and notify() methods included in the Object class.
See the following example:
public void task1() {
section1();
commonObject.notify();
}
public void task2() {
commonObject.wait();
section2();
}
Under these circumstances, the section2() method will always be executed after
the section1() method.
Rendezvous
This design pattern is a generalization of the Signaling pattern. In this case, the first
task waits for an event of the second task and the second task waits for an event of
the first task. The solution is similar to that of Signaling, but in this case you must use
two objects instead of one.
See the following example:
public void task1() {
section1_1();
commonObject1.notify();
commonObject2.wait();
section1_2();
}
public void task2() {
section2_1();
commonObject2.notify();
commonObject1.wait();
section2_2();
}
[ 16 ]
Chapter 1
Mutex
A mutex is a mechanism that you can use to implement a critical section ensuring
mutual exclusion. That is to say, only one task can execute the portion of code
protected by the mutex at one time. In Java, you can implement a critical section
using the synchronized keyword (that allows you to protect a portion of code or a
full method), the ReentrantLock class, or the Semaphore class.
Look at the following example:
public void task() {
preCriticalSection();
lockObject.lock() // The critical section begins
criticalSection();
lockObject.unlock(); // The critical section ends
postCriticalSection();
}
Multiplex
The Multiplex design pattern is a generalization of the mutex. In this case, a
determined number of tasks can execute the critical section at once. It is useful, for
example, when you have multiple copies of a resource. The easiest way to implement
this design pattern in Java is using the Semaphore class initialized to the number of
tasks that can execute the critical section at once.
Look at the following example:
public void task() {
preCriticalSection();
semaphoreObject.acquire();
criticalSection();
semaphoreObject.release();
postCriticalSection();
}
[ 17 ]
Barrier
This design pattern explains how to implement the situation where you need to
synchronize some tasks at a common point. None of the tasks can continue with
their execution until all the tasks have arrived at the synchronization point. The Java
concurrency API provides the CyclicBarrier class, which is an implementation
of this design pattern.
Look at the following example:
public void task() {
preSyncPoint();
barrierObject.await();
postSyncPoint();
}
Double-checked locking
This design pattern provides a solution to the problem that occurs when you
acquire a lock and then check for a condition. If the condition is false, you have had
the overhead of acquiring the lock ideally. An example of this situation is the lazy
initialization of objects. If you have a class implementing the Singleton design
pattern, you may have some code like this:
public class Singleton{
private static Singleton reference;
private static final Lock lock=new ReentrantLock();
public static Singleton getReference() {
lock.lock();
try {
if (reference==null) {
reference=new Object();
}
} finally {
lock.unlock();
}
return reference;
}
}
Chapter 1
try {
if (reference == null) {
reference=new Object();
}
} finally {
lock.unlock();
}
}
return reference;
}
}
This solution still has problems. If two tasks check the condition at once, you
will create two objects. The best solution to this problem doesn't use any explicit
synchronization mechanism:
public class Singleton {
private static class LazySingleton {
private static final Singleton INSTANCE = new Singleton();
}
public static Singleton getSingleton() {
return LazySingleton.INSTANCE;
}
}
Read-write lock
When you protect access to a shared variable with a lock, only one task can access
that variable, independently of the operation you are going to perform on it.
Sometimes, you will have variables that you modify a few times but read many
times. In this circumstance, a lock provides poor performance because all the read
operations can be made concurrently without any problem. To solve this problem,
there exists the read-write lock design pattern. This pattern defines a special kind
of lock with two internal locks: one for read operations and the other for write
operations. The behavior of this lock is as follows:
If one task is doing a read operation and another task wants to do another
read operation, it can do it
If one task is doing a read operation and another task wants to do a write
operation, it's blocked until all the readers finish
Thread pool
This design pattern tries to remove the overhead introduced by creating a thread
for the task you want to execute. It's formed by a set of threads and a queue of tasks
you want to execute. The set of threads usually has a fixed size. When a thread
approaches the execution of a task, it doesn't finish its execution; it looks for another
task in the queue. If there is another task, it executes it. If not, the thread waits until a
task is inserted in the queue, but it's not destroyed.
The Java concurrency API includes some classes that implement the
ExecutorService interface, which internally uses a pool of threads.
Chapter 1
It creates a partial ordering of the volatile read, volatile write, lock, and
unlock instructions denominated as happens-before. Task synchronization
helps us establish the happens-before relations too. If one action happensbefore another, then the first is visible to and ordered before the second.
When a task releases a monitor, the cache data is flushed into the
main memory.
The main objective of the Java memory model is that the properly written concurrent
application will behave correctly on every Java Virtual Machine (JVM) regardless of
operating system, CPU architecture, and the number of CPUs and cores.
[ 21 ]
You don't have to worry about the creation and management of threads.
You only create tasks and send them for execution. The Java concurrency
API controls the creation and management of threads for you.
They are optimized to give better performance than using threads directly.
For example, they use a pool of threads to reuse them and avoid thread
creation for every task. You can implement these mechanisms from scratch,
but it will take you a lot of time, and it will be a complex task.
They include advanced features that make the API more powerful. For
example, with executors in Java, you can execute tasks that return a result
in the form of a Future object. Again, you can implement these mechanisms
from scratch, but it's not advisable.
Your application will be migrated easier from one operating system to
another, and it will be more scalable.
[ 22 ]
Chapter 1
Your application might become faster in the future Java versions. Java
developers constantly improve the internals, and JVM optimizations
will be likely more tailored for JDK APIs.
[ 23 ]
For example, if you need a List in a concurrent application, you should not use the
ArrayList class if you are going to update it from several threads because it's not
thread-safe. In this case, you can use a thread-safe class such as ConcurrentLinked
Deque,CopyOnWriteArrayList, or LinkedBlockingDeque. If the class you want to
use is not thread-safe, first you must look for a thread-safe alternative. Probably,
it will be more optimized to work with concurrency that any alternative that you
can implement.
[ 24 ]
Chapter 1
Another situation when you can take advantage of Thread local variables is with
static attributes. All instances of a class share the static attributes, but you declare
them with the ThreadLocal class. In this case, every thread will have access to its
own copy.
Another option you have is to use something like ConcurrentHashMap<Thread,
MyType> and use it like var.get(Thread.currentThread()) or var.put(Thread.
currentThread(), newValue). Usually, this approach is significantly slower than
ThreadLocal because of possible contention (ThreadLocal has no contention at all).
It has an advantage though: you can clear the map completely and the value will
disappear for every thread; thus, sometimes it's useful to use such an approach.
You can easily test the correctness of the results of your parallel algorithm
You can measure the improvement in performance obtained with the use
of concurrency
But not every algorithm can be parallelized, at least not so easily. You might
think that the best starting point could be the sequential algorithm with the best
performance solving the problem you want to parallelize, but this can be a wrong
assumption. You should look for an algorithm than can be easily parallelized. Then,
you can compare the concurrent algorithm with the sequential one with the best
performance to see which offers the best throughput.
For example, when you work with an object-oriented language such as Java, you
implement your application as a collection of objects. Each object has a number
of attributes and some methods to read and change the values of the attributes. If
some tasks share an object and call to a method to change a value of an attribute of
that object and that method is not protected by a synchronization mechanism, you
probably will have data inconsistency problems.
There are special kinds of object named immutable objects. Their main characteristic
is that you can't modify any attributes after initialization. If you want to modify
the value of an attribute, you must create another object. The String class in Java
is the best example of immutable objects. When you use an operator (for example,
= or +=) that we might think changes the value of a String, you are really creating a
new object.
The use of immutable objects in a concurrent application has two very
important advantages:
There is a drawback with immutable objects. If you create too many objects, this may
affect the throughput and memory use of the application. If you have a simple object
without internal data structures, it's usually not a problem to make it immutable.
However, making immutable complex objects that incorporate collections of other
objects usually leads to serious performance problems.
Chapter 1
For example, a bad use of this tip is as follows. You have two tasks that need to get
two Lock objects. They try to get the locks in different order:
public void operation1() {
lock1.lock();
lock2.lock();
.
}
public void operation2() {
lock2.lock();
lock1.lock();
..
}
It's possible that operation1() executes its first sentence and operation2() its first
sentence too, so they will be waiting for the other Lock and you will have a deadlock.
You can avoid this simply by getting the locks in the same order. If you change
operation2(), you will never have a deadlock as follows:
public void operation2() {
lock1.lock();
lock2.lock();
..
}
[ 27 ]
In Java 5, the concurrency API included a new kind of variable called atomic
variables. These variables are classes that support atomic operations on single
variables. They include a method, denominated by compareAndSet(oldValue,
newValue), that includes a mechanism to detect if assigning to the new value to
the variable is done in one step. If the value of the variable is equal to oldValue,
it changes it to newValue and returns true. Otherwise, it returns false. There
are more methods that work in a similar way, such as getAndIncrement() or
getAndDecrement(). These methods are also atomic.
This solution is lock-free; that is to say, it doesn't use locks or any synchronization
mechanism, so its performance is better than any synchronized solution.
The most important atomic variables that you can use in Java are:
AtomicInteger
AtomicLong
AtomicReference
AtomicBoolean
LongAdder
DoubleAdder
[ 28 ]
Chapter 1
Avoid executing inside the critical section the code you don't control. For example,
you are writing a library that accepts a user-defined Callable, which you need to
launch sometimes. You don't know what exactly will be in that Callable. Maybe
it blocks input/output, acquires some locks, calls other methods of your library,
or just works for a very long time. Thus, whenever possible, try to execute it when
your library does not hold any locks. If it's impossible for your algorithm, specify
this behavior in your library documentation and possibly specify the limitations
to the user-supplied code (for example, it should not take any locks). A good
example of such documentation can be found in the compute() method of the
ConcurrentHashMap class.
[ 29 ]
Summary
Concurrent programming includes all the necessary tools and techniques to have
multiple tasks or process running at the same time in a computer, communicating
and synchronizing between them without data loss or inconsistency.
We started this chapter by introducing the basic concepts of concurrency. You must
know and understand terms such as concurrency, parallelism, and synchronization
to fully understand the examples of this book. However, concurrency can generate
some problems, such as data race conditions, deadlocks, livelocks, and others. You
must also know the potential problems of a concurrent application. It will help you
identify and solve these problems.
We also explained a simple methodology of five steps introduced by Intel to convert
a sequential algorithm into a concurrent one and showed you some concurrency
design patterns implemented in the Java language and some tips to take into
account when you implement a concurrent application.
Finally, we explained briefly the components of the Java concurrency API. It's a very
rich API with low- and very high-level mechanisms that allow you to implement
powerful concurrency applications easily. We also described the Java memory
model, which determines how concurrent applications manage the memory
and the execution order of the instructions internally.
In the next chapter, you will learn how to implement applications that use a lot of
threads using the executor framework. This allows you to execute a big number of
threads by controlling the resources you use and reducing the overhead introduced
by thread creation (it reuses Thread objects to execute different tasks).
[ 30 ]
www.PacktPub.com
Stay Connected: