Minimizing Memory Usage For Creating Application Subprocesses
Minimizing Memory Usage For Creating Application Subprocesses
SAN Storage (Note: In the Solaris OS, the term swap space is used to describe a combination of physical memory and disk swap space configured for the system.
However, with other Unix systems this term may mean swap space on disk, also known as backing store. To avoid any confusion, I'll use the term
Secure Global Desktop Virtual Memory (VM) to mean physical memory plus disk swap space.)
Server Management Tools
Generally, the fork/exec method has worked quite well. However, it has disadvantages in some cases, such as running out of memory without a
Software in Silicon good reason and poor fork performance.
Software in Silicon Cloud Out of Memory: For a large-memory process, the fork() system call can fail due to an inadequate amount of VM, because fork() requires twice
Solaris 10 the amount of the parent memory. This can happen even when fork() is immediately followed by an exec() call that would release most of that
extra memory. When this happens, the application will usually terminate.
Solaris 11
Solaris Cluster
For example, suppose a 64-bit application is consuming 6 gigabytes (Gbytes) of VM at the moment, and it needs to create a subprocess to run the
ls(1) command. The parent process issues a fork() call that will succeed only if there is another 6 Gbytes of VM available at the moment. If the
SPARC Servers system doesn't have that much VM available (which is a frequent situation), fork() will fail with ENOMEM. Obviously, the ls(1) command doesn't
StorageTek Tape Storage need anywhere near 6 Gbytes of memory to run, but fork() doesn't know that.
Sun Blade 6000 Modular Not only applications, but also Sun's own tools can suffer from the same problem. For example, the following Sun RFE (request for enhancement)
Systems has been filed for dbx: "4748951 dbx shell should use posix_spawn() for non-builtin commands rather than fork(2)".
Sun Desktops & Peripherals
RFE 4748951 came about when a customer's utility invoked dbx to read a huge core file using a script that also needed to run a cut(1) command
Sun Storage Software from within dbx. They got a cannot fork - try again error message causing dbx to abort. An investigation revealed that dbx used fork/exec to
Sun Ray Products execute that tiny cut(1) command and ran out of VM during the fork() call.
Virtual Desktop Infrastructure The Solaris Java Virtual Machine (JVM) is also suffering from the same problem currently, as described in this Sun RFE: "5049299 Use
posix_spawn, not fork, on S10 to avoid swap exhaustion".
Fork Performance: The fork() call can hurt performance. Even though fork() has been improved over the years to use the COW (copy-on-write)
semantics, it still has to copy a certain amount of data from parent to child. That copying is not really necessary if the child process is immediately
replaced with a new one by a call to exec(). This performance hit may be especially noticeable when the parent process has many memory mapped
regions.
These disadvantages, i.e. running out of memory and poor fork performance, become especially important when the parent process consumes a
large amount of memory (which has become very common in recent years), and when the memory requests require reservation (commitment) of VM
as they do in the Solaris OS; see discussion of Solaris memory commitment below.
To deal with such disadvantages of the fork/ exec model when fork() is immediately followed by exec(), the Berkeley version of Unix (BSD)
introduced the vfork() system call in the early 1980s. vfork(2) does not copy the parent process to child. Both processes share the parent's
virtual address space; the parent is suspended until the child exits or calls exec().
The vfork(2) system call was also adopted in the Solaris OS. Much later, however, when multithreading (MT) became available and widely used it
was discovered that vfork() may introduce a new problem when the application has multiple threads running: deadlock. The deadlock can happen
due to the dynamic linker ld.so.1 involvement in resolving the necessary symbols. Particularly, if the child process calls an external function (such
as exec()), the dynamic linker may be invoked to resolve the Procedure Linkage Table (PLT) entry, for which the dynamic linker will acquire a mutex
lock. This lock may already be held by a different thread in the parent process. If this happens it will create a deadlock between the parent and child
processes, because no thread in the parent can run until the child has called exec() or exit(). As a result, both the parent and the child processes
will hang.
The introduction of posix_spawn(3C), see reference [ 1], starting with the Solaris 10 OS, has solved these problems. In the earlier Solaris versions
(Solaris 9 and Solaris 8 2/02), the system(3C) and popen(3C) calls were also made safe of deadlock and not demanding a double amount of VM.
Some operating systems, notably Linux, have what's known as the memory overcommit feature. That is, when an application calls malloc() or any
other memory-requesting interface, the OS always returns a non-NULL pointer without reserving any swap space for the requested memory.
Linux has posix_spawn() implemented with fork() and exec(). Due to its memory overcommit policy (as explained below), Linux may not suffer
from the first problem described above: running out of VM space when a large process calls fork(). However, the memory overcommit feature has
created serious problems of its own. See the "Memory Overcommit" discussion below.
It's worth mentioning here that there is an alternative method to create subprocesses that can be safely and efficiently used with any version of the
Solaris OS or other Unix versions. During initialization of the main process (before it creates any threads or allocates any significant amount of
memory) you can fork/exec a special small helper process dedicated to creating subprocesses for the big parent process. The parent process can
send requests to create a subprocess, commands to execute, and so on, to the helper process via a pipe or any other inter-process communication
mechanism. The helper process will not run out of VM space during the fork() call and it will be fast, because its memory requirements are very
small so it can safely call fork/exec to create each subprocess.
The disadvantage of this alternative method is its extra complexity. An application using this method will have to make sure the extra helper process
is properly terminated any time the main process terminates, to use MT-safe methods to communicate with the helper process, and so on. Also,
using a helper process makes it harder to share file descriptors with the child processes. Calling posix_spawn(), popen(), or system() directly
from the large parent process is much simpler.
https://www.oracle.com/technetwork/server-storage/solaris10/subprocess-136439.html 1/4
08/10/2018 Minimizing Memory Usage for Creating Application Subprocesses
The simplest way to start a new process from a C/C++ application is to use system(3C). The system() call causes the shell to execute the given
command. See the system(3C) man page for details. This interface is adequate if all you need to do is run a shell command and wait until it's
finished executing.
A somewhat more powerful interface to start a new process is popen(3C). In addition to starting a new process, popen() allows you to capture the
output of the given shell command and manipulate it in various ways, for example parse it. For details, see the popen(3C) man page.
In addition to more flexibility, the popen() call is safer in multithreaded programs than system(). The system() call modifies certain signal
dispositions that may affect other threads, while popen() doesn't do that. This is why popen(3C) man page marks it MT-Safe, while system(3C)
is marked as MT-unsafe. See the system(3C) man page for details.
The most powerful interface for this functionality is posix_spawn(3C) (and its variant posix_spawnp(3C)) introduced in the Solaris 10 OS. It
allows you to do additional manipulations of the kind that can be done between the fork() and exec() calls.
The posix_spawn() interface requires you to specify the full path of the executable file that the child process will run. The posix_spawnp()
variant can do the same, but it can also search the $PATH directories for the executable file if the file name is provided without a path.
Here are two example programs showing how you can use the posix_spawn() interface. These examples should be useful, considering that the
posix_spawn(3C) man page contains no examples and that the use of posix_spawn() can be quite bewildering.
posix_spawn_example1.c
(Note: Please save file without the .txt suffix.)
This shows how to call posix_spawn() for the simple purpose of executing a shell command ( /bin/ls -l /etc/passwd in this case) and
waiting until the command is finished. Also note how the program checks for errors from posix_spawn().
posix_spawn_example2.c
(Note: Please save file without the .txt suffix.)
This example demonstrates how you can use posix_spawn() to do the kind of manipulation that is often done between fork() and exec(). It
mimics what a shell might do for file redirection. The program creates a child process that has its standard input bound to a particular file, without
disturbing the open file descriptors in the parent. In the interest of simplicity, it's using /bin/cat as the child process.
The posix_spawn_example2.c example program shows the use of both posix_spawn() and fork()/exec(), such that you can compare the
interfaces.
If invoked with the argument -spawn, posix_spawn_example2.c uses posix_spawn() and binds the child's input to file /etc/hosts.
If invoked with no argument, posix_spawn_example2.c uses fork()/ exec() and binds the child's input to the file /etc/passwd.
Even without a debugger, you can verify which path is being used by using truss(1). The fork() option actually calls fork1() in the Solaris 10
OS, while the posix_spawn() execution path calls vfork() (the libc-internal version of it).
Note how posix_spawn_example2.c performs error detection. It checks for an error code returned from each function related to
posix_spawn(). In addition, it checks for error code 127 that may be returned from the child process to indicate a problem there. See the ERRORS
section of the posix_spawn(3C) man page (reference [ 1]) for details of when the child process may exit with status 127.
These examples demonstrate how to use the simpler abilities of posix_spawn(), which is enough in many cases. However, posix_spawn() can
also perform more esoteric adjustments to the child process, such as changing user and group IDs, signal mask, and scheduling class. See the
posix_spawn() man page for details.
Starting with the Solaris 10 OS for posix_spawn() (and starting with the Solaris 8 2/02 release for system() and popen()), all three of these
interfaces in the Solaris OS will never create a deadlock, and they will not cause the out-of-swap condition for large applications.
In the Solaris 10 OS, posix_spawn() is currently implemented using private-to-libc vfork(), execve(), and exit() functions. They are identical
to regular vfork(), execve(), and exit() in functionality, but they are not exported from libc and therefore don't cause the deadlock-in-the-
dynamic-linker problem that any multithreaded code outside of libc that calls vfork() can cause.
The Solaris 10 versions of system(3C) and popen(3C) are implemented using posix_spawn(). The Solaris 9 and Solaris 8 2/02 versions of
those interfaces are implemented with the private-to-libc vfork() and execve().
Now that the Solaris OS has been open sourced, you can see all the implementation details online, here for example:
http://cvs.opensolaris.org/source/xref/usr/src/lib/libc/port/threads/spawn.c
Of course, the Solaris implementation of posix_spawn() can change in the future, perhaps to make it more efficient in various ways. However, any
new implementation will always support the standard API.
Note that posix_spawn() is a requirement of Unix Specification, Version 3 (SUSv3), see reference [ 3].
The vfork(2) system call itself can't be made MT-safe and it's no longer necessary anyway. In the Solaris 10 OS, vfork(2) is deprecated.
According to its man page, "Its sole legitimate use as a prelude to an immediate call to a function from the exec family can be achieved safely by
posix_spawn(3C) or posix_spawnp(3C)."
Advantages
The memory overcommit feature changes the malloc() failure semantics. All those good applications that faithfully check for malloc() returning
NULL and producing meaningful error messages and workarounds in that case are doing it for nothing when memory overcommit is used.
https://www.oracle.com/technetwork/server-storage/solaris10/subprocess-136439.html 2/4
08/10/2018 Minimizing Memory Usage for Creating Application Subprocesses
Arguably, memory overcommit violates the C/C++ standards that require that when malloc() returns a non-NULL pointer, the allocated memory
should be available when needed.
The memory overcommit feature is global for the entire system. There is no way to use it for some applications but not others, or only for certain
memory buffers within a certain application.
Most importantly, when a memory overcommit system is out of VM, one or more processes will be killed by the infamous OOM (out-of-memory)
process killer due to memory pressure. This may be unacceptable, especially in enterprise-class environments. Random application programs
shouldn't be killed just because somebody else allocated too much VM and filled it with data.
An interesting analogy for this issue is described in "Respite from the OOM killer," see reference [ 5] (which also contains a related discussion). Look
for the sentence starting with "An aircraft company discovered ..."
The Linux documentation regarding this issue is somewhat contradictory. Red Hat has the following article explaining it: "Understanding Virtual
Memory" (reference [ 6]), containing an explanation of the overcommit_memory parameter. See the paragraph starting with "overcommit_memory
is a value ..."
Compare the overcommit_memory explanation in the above Red Hat article with the one given in the Linux Documentation, see reference [ 7]. See
the section there starting with "The Linux kernel supports the following overcommit handling modes."
Under the Linux kernel version 2.6 and later, theoretically there is a way to modify the kernel's behavior so that it will not overcommit memory. This is
done by selecting what is called the strict overcommit mode via sysctl:
sysctl -w vm.overcommit_memory=2
or placing an equivalent vm.overcommit_memory=2 entry in /etc/sysctl.conf.
Mode 2 (which is new in 2.6) is certainly an improvement over modes 0 and 1 available in the older versions of the Linux kernel. However, mode 2
doesn't mean that memory will never be overcommitted. It just uses a different heuristic for guessing how much memory is safe to allow to be
allocated.
Also note that vm.overcommit_memory=2 is still not the default setting. The default is vm.overcommit_memory=0.
In contrast, under the Solaris OS when the application calls malloc() (internally invoking sbrk(2) to get more memory from the system), the
kernel goes through its free memory lists trying to find the requested amount of VM. If found, the kernel returns a pointer to that memory and reserves
the swap space for it such that no other process can use it until the owner releases it. If not found, malloc() will return NULL.
The Solaris OS does not use any heuristics or guessing of any kind. Therefore, it never needs to kill random processes when it runs out of memory.
While the Solaris OS doesn't use memory overcommit in its malloc() and sbrk(), it does allow similar functionality but with a much finer
granularity and more control, via the mmap(MAP_NORESERVE) feature. Using mmap(MAP_NORESERVE), you can use this facility only for certain
selected memory buffers and/or only in selected applications. For details, see "Virtual Memory Arrays for Application Software," reference [ 4].
Your application can dynamically determine if posix_spawn() is available by calling dlsym(RTLD_NEXT,"posix_spawn")). For example:
#include <dlfcn.h>
int (*posix_spawn_ptr)();
...
posix_spawn_ptr = (int(*)())dlsym(RTLD_NEXT, "posix_spawn");
if(posix_spawn_ptr != NULL)
{
/* Call posix_spawn_ptr() the same way as posix_spawn() */
}
else
{
/* posix_spawn() is not available; use older methods */
}
However, there is a problem with using this general method in this case. Using posix_spawn(3C) requires inclusion of system include file spawn.h
like the following:
#include <spawn.h>
The problem is that the file spawn.h is not available under the Solaris 9 OS or earlier. To work around this problem, you can replace the above
#include <spawn.h> statement with the following:
/*
* To allow compiling such a program under Solaris 9 or earlier,
* copy /usr/include/spawn.h from Solaris 10
* locally and explicitly add definition of _RESTRICT_KYWD.
* Note that "spawn.h" is included, rather than <spawn.h>.
*/
#if (defined(__STDC__) && defined(_STDC_C99))
#define _RESTRICT_KYWD restrict
#else
#define _RESTRICT_KYWD
#endif
#include "spawn.h"
The resulting program will compile successfully under either Solaris 10 or an earlier Solaris version.
Note that having a local copy of a Solaris 10 header file in your application area is likely to require a certain amount of maintenance in the future
when the system header file spawn.h may change. When you start building your application on the Solaris 10 OS or later, it will be best to remove
the local copy of spawn.h and the special trick shown above, and use the system version of spawn.h directly instead. You may want to add a
comment to that effect to your special code, so that the future developers will know why you added this code and a local copy of spawn.h to your
application.
Acknowledgments
I'd like to thank my Sun colleagues Morgan Herrington (who also created example program posix_spawn_example2.c), Chris Quenelle, and Eric
Sosman for their advice related to the issues discussed in this article.
References
[ 1] Solaris posix_spawn(3C) man page
[ 2] Bugzilla Bug 131938 -- posix_spawn implementation inquiry (why fork/exec rather than vfork/exec)
[ 3] Solaris Single Unix Specification Version 3 (SUSv3) Advanced Real Time Support
[ 4] Virtual Memory Arrays for Application Software
[ 5] Respite from the OOM killer
[ 6] Understanding Virtual Memory
[ 7] Linux/Documentation/vm/overcommit-accounting
About the Author
Greg Nakhimovsky is a Sun engineer working with application software vendors to make sure their products run well on Sun systems.
https://www.oracle.com/technetwork/server-storage/solaris10/subprocess-136439.html 3/4
08/10/2018 Minimizing Memory Usage for Creating Application Subprocesses
Rate and Review
Tell us what you think of the content of this page.
Excellent Good Fair Poor
Comments:
© Oracle Site Map Terms of Use and Privacy Cookie Preferences Ad Choices
https://www.oracle.com/technetwork/server-storage/solaris10/subprocess-136439.html 4/4