3.4 Process Termination

Normally, a process terminates in one of two ways. Either the executing program calls the exit function, or the program's main function returns. Each process has an exit code: a number that the process returns to its parent. The exit code is the argument passed to the exit function, or the value returned from main.

A process may also terminate abnormally, in response to a signal. For instance, the SIGBUS, SIGSEGV, and SIGFPE signals mentioned previously cause the process to terminate. Other signals are used to terminate a process explicitly. The SIGINT signal is sent to a process when the user attempts to end it by typing Ctrl+C in its terminal. The SIGTERM signal is sent by the kill command. The default disposition for both of these is to terminate the process. By calling the abort function, a process sends itself the SIGABRT signal, which terminates the process and produces a core file. The most powerful termination signal is SIGKILL, which ends a process immediately and cannot be blocked or handled by a program.

Any of these signals can be sent using the kill command by specifying an extra command-line flag; for instance, to end a troublesome process by sending it a SIGKILL, invoke the following, where pid is its process ID:

 
%  kill  -KILL  pid 

To send a signal from a program, use the kill function. The first parameter is the target process ID. The second parameter is the signal number; use SIGTERM to simulate the default behavior of the kill command. For instance, where child pid contains the process ID of the child process, you can use the kill function to terminate a child process from the parent by calling it like this:

 
kill  (child_pid,  SIGTERM); 

Include the <sys/types.h> and <signal.h> headers if you use the kill function.

By convention, the exit code is used to indicate whether the program executed correctly. An exit code of zero indicates correct execution, while a nonzero exit code indicates that an error occurred. In the latter case, the particular value returned may give some indication of the nature of the error. It's a good idea to stick with this convention in your programs because other components of the GNU/Linux system assume this behavior. For instance, shells assume this convention when you connect multiple programs with the && (logical and) and || (logical or) operators. Therefore, you should explicitly return zero from your main function, unless an error occurs.

With most shells, it's possible to obtain the exit code of the most recently executed program using the special $? variable. Here's an example in which the ls command is invoked twice and its exit code is displayed after each invocation. In the first case, ls executes correctly and returns the exit code zero. In the second case, ls encounters an error (because the filename specified on the command line does not exist) and thus returns a nonzero exit code.

 
%ls / 
bin   coda  etc   lib         misc  nfs  proc  sbin  usr 
boot  dev   home  lost+found  mnt   opt  root  tmp   var 
% echo $? 
0 
% ls bogusfile 
ls: bogusfile: No such file or directory 
% echo $? 
1 

Note that even though the parameter type of the exit function is int and the main function returns an int, Linux does not preserve the full 32 bits of the return code. In fact, you should use exit codes only between zero and 127. Exit codes above 128 have a special meaning—when a process is terminated by a signal, its exit code is 128 plus the signal number.

3.4.1 Waiting for Process Termination

If you typed in and ran the fork and exec example in Listing 3.4, you may have noticed that the output from the ls program often appears after the "main program" has already completed. That's because the child process, in which ls is run, is scheduled independently of the parent process. Because Linux is a multitasking operating system, both processes appear to execute simultaneously, and you can't predict whether the ls program will have a chance to run before or after the parent process runs.

In some situations, though, it is desirable for the parent process to wait until one or more child processes have completed. This can be done with the wait family of system calls. These functions allow you to wait for a process to finish executing, and enable the parent process to retrieve information about its child's termination. There are four different system calls in the wait family; you can choose to get a little or a lot of information about the process that exited, and you can choose whether you care about which child process terminated.

3.4.2 The wait System Calls

The simplest such function is called simply wait. It blocks the calling process until one of its child processes exits (or an error occurs). It returns a status code via an integer pointer argument, from which you can extract information about how the child process exited. For instance, the WEXITSTATUS macro extracts the child process's exit code.

You can use the WIFEXITED macro to determine from a child process's exit status whether that process exited normally (via the exit function or returning from main) or died from an unhandled signal. In the latter case, use the WTERMSIG macro to extract from its exit status the signal number by which it died.

Here is the main function from the fork and exec example again. This time, the parent process calls wait to wait until the child process, in which the ls command executes, is finished.

 
int main  () 
{
  int child_status; 
 
  /* The argument list to pass to the "ls" command.  */ 
  char* arg_list[] == {
    "ls",     /* argv[0], the name of the program.  */ 
    "-l", 
    "/", 
    NULL /*  The  argument list must end with a NULL.  */ 
  }; 
 
  /* Spawn a child process running the "ls" command. Ignore the 
     returned child process ID.  */ 
  spawn ("ls", arg_list); 
 
  /* Wait for the child process to complete.  */ 
  wait (&child_status); 
  if (WIFEXITED (child_status)) 
    printf ("the child process exited normally, with exit code %d\n", 
            WEXITSTATUS (child_status)); 
  else 
    printf ("the child process exited abnormally\n"); 
 
  return  0; 
} 

Several similar system calls are available in Linux, which are more flexible or provide more information about the exiting child process. The waitpid function can be used to wait for a specific child process to exit instead of any child process. The wait3 function returns CPU usage statistics about the exiting child process, and the wait4 function allows you to specify additional options about which processes to wait for.

3.4.3 Zombie Processes

If a child process terminates while its parent is calling a wait function, the child process vanishes and its termination status is passed to its parent via the wait call. But what happens when a child process terminates and the parent is not calling wait? Does it simply vanish? No, because then information about its termination—such as whether it exited normally and, if so, what its exit status is—would be lost. Instead, when a child process terminates, is becomes a zombie process.

A zombie process is a process that has terminated but has not been cleaned up yet. It is the responsibility of the parent process to clean up its zombie children. The wait functions do this, too, so it's not necessary to track whether your child process is still executing before waiting for it. Suppose, for instance, that a program forks a child process, performs some other computations, and then calls wait. If the child process has not terminated at that point, the parent process will block in the wait call until the child process finishes. If the child process finishes before the parent process calls wait, the child process becomes a zombie. When the parent process calls wait, the zombie child's termination status is extracted, the child process is deleted, and the wait call returns immediately.

What happens if the parent does not clean up its children? They stay around in the system, as zombie processes. The program in Listing 3.6 forks a child process, which terminates immediately and then goes to sleep for a minute, without ever cleaning up the child process.

Listing 3.6 (zombie.c) Making a Zombie Process
#include <stdlib.h> 
#include <sys/types.h> 
#include <unistd.h> 
 
int main () 
{
  pid_t child_pid; 
 
  /* Create a child process.  */ 
  child_pid = fork  (); 
  if (child_pid > 0)  {
    /* This is the parent process. Sleep for a minute.  */ 
    sleep (60); 
  } 
  else {
    /* This is the child process. Exit immediately.  */ 
    exit (0); 
  } 
  return 0; 
} 

Try compiling this file to an executable named make-zombie. Run it, and while it's still running, list the processes on the system by invoking the following command in another window:

 
%  ps  -e  -o  pid,ppid,stat,cmd 

This lists the process ID, parent process ID, process status, and process command line. Observe that, in addition to the parent make-zombie process, there is another make-zombie process listed. It's the child process; note that its parent process ID is the process ID of the main make-zombie process. The child process is marked as <defunct>, and its status code is Z, for zombie.

What happens when the main make-zombie program ends when the parent process exits, without ever calling wait ? Does the zombie process stay around? No—try running ps again, and note that both of the make-zombie processes are gone. When a program exits, its children are inherited by a special process, the init program, which always runs with process ID of 1 (it's the first process started when Linux boots). The init process automatically cleans up any zombie child processes that it inherits.

3.4.4 Cleaning Up Children Asynchronously

If you're using a child process simply to exec another program, it's fine to call wait immediately in the parent process, which will block until the child process completes. But often, you'll want the parent process to continue running, as one or more children execute synchronously. How can you be sure that you clean up child processes that have completed so that you don't leave zombie processes, which consume system resources, lying around?

One approach would be for the parent process to call wait3 or wait4 periodically, to clean up zombie children. Calling wait for this purpose doesn't work well because, if no children have terminated, the call will block until one does. However, wait3 and wait4 take an additional flag parameter, to which you can pass the flag value WNOHANG. With this flag, the function runs in nonblocking mode—it will clean up a terminated child process if there is one, or simply return if there isn't. The return value of the call is the process ID of the terminated child in the former case, or zero in the latter case.

A more elegant solution is to notify the parent process when a child terminates. There are several ways to do this using the methods discussed in Chapter 5, "Interprocess Communication," but fortunately Linux does this for you, using signals. When a child process terminates, Linux sends the parent process the SIGCHLD signal. The default disposition of this signal is to do nothing, which is why you might not have noticed it before.

Thus, an easy way to clean up child processes is by handling SIGCHLD. Of course, when cleaning up the child process, it's important to store its termination status if this information is needed, because once the process is cleaned up using wait, that information is no longer available. Listing 3.7 is what it looks like for a program to use a SIGCHLD handler to clean up its child processes.

Listing 3.7 (sigchld.c) Cleaning Up Children by Handling SIGCHLD
#include <signal.h> 
#include <string.h> 
#include <sys/types.h> 
#include <sys/wait.h> 
 
sig_atomic_t child_exit_status; 
 
void clean_up_child_process (int signal_number) 
{
  /* Clean up the child process.  */ 
  int status; 
  wait (&status); 
  /* Store its exit status in a global variable.  */ 
  child_exit_status = status; 
} 
 
int main () 
{
  /* Handle SIGCHLD by calling clean_up_child_process.  */ 
  struct sigaction sigchld_action; 
  memset (&sigchld_action, 0, sizeof (sigchld_action)); 
  sigchld_action.sa_handler = &clean_up_child_process; 
  sigaction (SIGCHLD, &sigchld_action, NULL); 
 
  /* Now do things, including forking a child process.  */ 
  /* ...  */ 
 
 return 0; 
} 

Note how the signal handler stores the child process's exit status in a global variable, from which the main program can access it. Because the variable is assigned in a signal handler, its type is sig_atomic_t.