2.1 Interaction With the Execution Environment

When you first studied C or C++, you learned that the special main function is the primary entry point for a program. When the operating system executes your program, it automatically provides certain facilities that help the program communicate with the operating system and the user. You probably learned about the two parameters to main, usually called argc and argv, which receive inputs to your program. You learned about the stdout and stdin (or the cout and cin streams in C++) that provide console input and output. These features are provided by the C and C++ languages, and they interact with the GNU/Linux system in certain ways. GNU/Linux provides other ways for interacting with the operating environment, too.

2.1.1 The Argument List

You run a program from a shell prompt by typing the name of the program. Optionally, you can supply additional information to the program by typing one or more words after the program name, separated by spaces. These are called command-line arguments. (You can also include an argument that contains a space, by enclosing the argument in quotes.) More generally, this is referred to as the program's argument list because it need not originate from a shell command line. In Chapter 3, "Processes," you'll see another way of invoking a program, in which a program can specify the argument list of another program directly.

When a program is invoked from the shell, the argument list contains the entire command line, including the name of the program and any command-line arguments that may have been provided. Suppose, for example, that you invoke the ls command in your shell to display the contents of the root directory and corresponding file sizes with this command line:

 
% ls -s / 

The argument list that the ls program receives has three elements. The first one is the name of the program itself, as specified on the command line, namely ls. The second and third elements of the argument list are the two command-line arguments, -s and /.

The main function of your program can access the argument list via the argc and argv parameters to main (if you don't use them, you may simply omit them). The first parameter, argc, is an integer that is set to the number of items in the argument list. The second parameter, argv, is an array of character pointers. The size of the array is argc, and the array elements point to the elements of the argument list, as NUL-terminated character strings.

Using command-line arguments is as easy as examining the contents of argc and argv. If you're not interested in the name of the program itself, don't forget to skip the first element.

Listing 2.1 demonstrates how to use argc and argv.

Listing 2.1 (arglist.c) Using argc and argv
#include <stdio.h> 
 
int main (int argc, char* argv[]) 
{
 printf ("The name of this program is '%s'.\n", argv[0]); 
 printf ("This program was invoked with %d arguments.\n", argc - 1); 
 
 /* Were any command-line arguments specified?  */ 
 if (argc > 1) {
   /* Yes, print them.  */ 
   int i; 
   printf ("The arguments are:\n"); 
   for (i = 1; i < argc; ++i) 
    printf (" %%s\n", argv[i]); 
  } 
 
  return 0; 
} 

2.1.2 GNU/Linux Command-Line Conventions

Almost all GNU/Linux programs obey some conventions about how command-line arguments are interpreted. The arguments that programs expect fall into two categories: options (or flags) and other arguments. Options modify how the program behaves, while other arguments provide inputs (for instance, the names of input files).

Options come in two forms:

·         Short options consist of a single hyphen and a single character (usually a lowercase or uppercase letter). Short options are quicker to type.

·         Long options consist of two hyphens, followed by a name made of lowercase and uppercase letters and hyphens. Long options are easier to remember and easier to read (in shell scripts, for instance).

Usually, a program provides both a short form and a long form for most options it supports, the former for brevity and the latter for clarity. For example, most programs understand the options -h and --help, and treat them identically. Normally, when a program is invoked from the shell, any desired options follow the program name immediately. Some options expect an argument immediately following. Many programs, for example, interpret the option --output foo to specify that output of the program should be placed in a file named foo. After the options, there may follow other command-line arguments, typically input files or input data.

For example, the command ls -s / displays the contents of the root directory. The -s option modifies the default behavior of ls by instructing it to display the size (in kilobytes) of each entry. The / argument tells ls which directory to list. The --size option is synonymous with -s, so the same command could have been invoked as ls --size /.

The GNU Coding Standards list the names of some commonly used command-line options. If you plan to provide any options similar to these, it's a good idea to use the names specified in the coding standards. Your program will behave more like other programs and will be easier for users to learn. You can view the GNU Coding Standards' guidelines for command-line options by invoking the following from a shell prompt on most GNU/Linux systems:

 
% info "(standards)User Interfaces" 

2.1.3 Using getopt_long

Parsing command-line options is a tedious chore. Luckily, the GNU C library provides a function that you can use in C and C++ programs to make this job somewhat easier (although still a bit annoying). This function, getopt_long, understands both short and long options. If you use this function, include the header file <getopt.h>.

Suppose, for example, that you are writing a program that is to accept the three options shown in Table 2.1.

Table 2.1. Example Program Options

Short Form

Long Form

Purpose

-h

--help

Display usage summary and exit

-o filename

--output filename

Specify output filename

-v

--verbose

Print verbose messages

In addition, the program is to accept zero or more additional command-line arguments, which are the names of input files.

To use getopt_long, you must provide two data structures. The first is a character string containing the valid short options, each a single letter. An option that requires an argument is followed by a colon. For your program, the string ho:v indicates that the valid options are -h, -o, and -v, with the second of these options followed by an argument.

To specify the available long options, you construct an array of struct option elements. Each element corresponds to one long option and has four fields. In normal circumstances, the first field is the name of the long option (as a character string, without the two hyphens); the second is 1 if the option takes an argument, or 0 otherwise; the third is NULL; and the fourth is a character constant specifying the short option synonym for that long option. The last element of the array should be all zeros. You could construct the array like this:

 
const struct option long_options[] = {
  { "help",     0, NULL, 'h' }, 
  { "output",   1, NULL, 'o' }, 
  { "verbose",  0, NULL, 'v' }, 
  { NULL,       0, NULL, 0   } 
}; 

You invoke the getopt_long function, passing it the argc and argv arguments to main, the character string describing short options , and the array of struct option elements describing the long options.

·         Each time you call getopt_long, it parses a single option, returning the short-option letter for that option, or -1 if no more options are found.

·         Typically, you'll call getopt_long in a loop, to process all the options the user has specified, and you'll handle the specific options in a switch statement.

·         If getopt_long encounters an invalid option (an option that you didn't specify as a valid short or long option), it prints an error message and returns the character ? (a question mark). Most programs will exit in response to this, possibly after displaying usage information.

·         When handling an option that takes an argument, the global variable optarg points to the text of that argument.

·         After getopt_long has finished parsing all the options, the global variable optind contains the index (into argv) of the first nonoption argument.

Listing 2.2 shows an example of how you might use getopt_long to process your arguments.

Listing 2.2 (getopt_long.c) Using getopt_long
#include <getopt.h> 
#include <stdio.h> 
#include <stdlib.h> 
 
/* The name of this program.  */ 
 
const char* program_name; 
 
/* Prints usage information for this program to STREAM (typically 
   stdout or stderr), and exit the program with EXIT_CODE. Does not 
   return.  */ 
 
void print_usage (FILE* stream, int exit_code) 
{
 fprintf (stream, "Usage: %s options [ inputfile .... ]\n", program_name); 
 fprintf (stream, 
          "  -h  --help            Display this usage information.\n" 
          "  -o  --output filename Write output to file.\n" 
          "  -v  --verbose         Print verbose messages.\n"); 
 exit (exit_code); 
} 
/* Main program entry point.  ARGC contains number of argument list 
   elements; ARGV is an array of pointers to them.  */ 
 
int main (int argc, char* argv[]) 
{
 int next_option; 
 
 /* A string listing valid short options letters. */ 
const char* const short_options = "ho:v"; 
 /* An array describing valid long options. */ 
 const struct option long_options[] = {
  { "help",    0, NULL, 'h' }, 
  { "output",  1, NULL, 'o' }, 
  { "verbose", 0, NULL, 'v' }, 
  { NULL,       0, NULL, 0   }   /* Required at end of array.  */ 
}; 
 
/* The name of the file to receive program output, or NULL for 
   standard output.  */ 
const char* output_filename = NULL; 
/* Whether to display verbose messages.   */ 
int verbose = 0; 
 
/* Remember the name of the program, to incorporate in messages. 
   The name is stored in argv[0].   */ 
program_name = argv[0]; 
 
do {
  next_option = getopt_long (argc, argv, short_options, 
                             long_options, NULL); 
  switch (next_option) 
  {
  case 'h':   /* -h or --help  */ 
    /* User has requested usage information. Print it to standard 
       output, and exit with exit code zero (normal termination).  */ 
    print_usage (stdout, 0); 
 
  case 'o':   /* -o or --output */ 
    /* This option takes an argument, the name of the output file.  */ 
    output_filename = optarg; 
    break; 
 
  case 'v':   /* -v or  --verbose   */ 
    verbose = 1; 
    break; 
 
  case '?':   /* The user specified an invalid option.  */ 
    /* Print usage information to standard error, and exit with exit 
       code one (indicating abnormal termination).  */ 
    print_usage (stderr, 1); 
 
  case -1:    /* Done with options.  */ 
    break; 
 
  default:    /* Something else: unexpected.  */ 
    abort (); 
  } 
} 
while (next_option != -1); 
 
/* Done with options.  OPTIND points to first nonoption argument. 
   For demonstration purposes, print them if the verbose option was 
   specified.  */ 
  if (verbose) {
    int i; 
    for (i = optind; i < argc; ++i) 
      printf ("Argument: %s\n", argv[i]); 
  } 
 
  /* The main program goes here.  */ 
 
  return 0; 
} 

Using getopt_long may seem like a lot of work, but writing code to parse the command-line options yourself would take even longer. The getopt_long function is very sophisticated and allows great flexibility in specifying what kind of options to accept. However, it's a good idea to stay away from the more advanced features and stick with the basic option structure described.

2.1.4 Standard I/O

The standard C library provides standard input and output streams (stdin and stdout, respectively). These are used by scanf, printf, and other library functions. In the UNIX tradition, use of standard input and output is customary for GNU/Linux programs. This allows the chaining of multiple programs using shell pipes and input and output redirection. (See the man page for your shell to learn its syntax.)

The C library also provides stderr, the standard error stream. Programs should print warning and error messages to standard error instead of standard output. This allows users to separate normal output and error messages, for instance, by redirecting standard output to a file while allowing standard error to print on the console. The fprintf function can be used to print to stderr, for example:

 
fprintf (stderr, ("Error: ...")); 

These three streams are also accessible with the underlying UNIX I/O commands (read, write, and so on) via file descriptors. These are file descriptors 0 for stdin, 1 for stdout, and 2 for stderr.

When invoking a program, it is sometimes useful to redirect both standard output and standard error to a file or pipe. The syntax for doing this varies among shells; for Bourne-style shells (including bash, the default shell on most GNU/Linux distributions), the syntax is this:

 
% program > output_file.txt 2>&1 
% program 2>&1 | filter 

The 2>&1 syntax indicates that file descriptor 2 (stderr) should be merged into file descriptor 1 (stdout). Note that 2>&1 must follow a file redirection (the first example) but must precede a pipe redirection (the second example).

Note that stdout is buffered. Data written to stdout is not sent to the console (or other device, if it's redirected) until the buffer fills, the program exits normally, or stdout is closed. You can explicitly flush the buffer by calling the following:

 
fflush (stdout); 

In contrast, stderr is not buffered; data written to stderr goes directly to the console. [1]

[1] In C++, the same distinction holds for cout and cerr, respectively. Note that the endl token flushes a stream in addition to printing a newline character; if you don't want to flush the stream (for performance reasons, for example), use a newline constant, ' \n ', instead.

This can produce some surprising results. For example, this loop does not print one period every second; instead, the periods are buffered, and a bunch of them are printed together when the buffer fills.

 
while (1) {
  printf ("."); 
  sleep (1); 
} 

In this loop, however, the periods do appear once a second:

 
while (1) {
  fprintf (stderr, "."); 
  sleep (1); 
} 

2.1.5 Program Exit Codes

When a program ends, it indicates its status with an exit code. The exit code is a small integer; by convention, an exit code of zero denotes successful execution, while nonzero exit codes indicate that an error occurred. Some programs use different nonzero exit code values to distinguish specific errors.

With most shells, it's possible to obtain the exit code of the most recently executed program using the special $? variable. Here's an example in which the ls command is invoked twice and its exit code is printed after each invocation. In the first case, ls executes correctly and returns the exit code zero. In the second case, ls encounters an error (because the filename specified on the command line does not exist) and thus returns a nonzero exit code.

 
% ls / 
bin   coda etc  lib        misc nfs proc sbin usr 
boot  dev  home lost+found mnt  opt root tmp  var 
% echo $? 
0 
% ls bogusfile 
ls: bogusfile: No such file or directory 
% echo $? 
1 

A C or C++ program specifies its exit code by returning that value from the main function. There are other methods of providing exit codes, and special exit codes are assigned to programs that terminate abnormally (by a signal). These are discussed further in Chapter 3.

2.1.6 The Environment

GNU/Linux provides each running program with an environment. The environment is a collection of variable/value pairs. Both environment variable names and their values are character strings. By convention, environment variable names are spelled in all capital letters.

You're probably familiar with several common environment variables already. For instance:

·         USER contains your username.

·         HOME contains the path to your home directory.

·         PATH contains a colon-separated list of directories through which Linux searches for commands you invoke.

·         DISPLAY contains the name and display number of the X Window server on which windows from graphical X Window programs will appear.

Your shell, like any other program, has an environment. Shells provide methods for examining and modifying the environment directly. To print the current environment in your shell, invoke the printenv program. Various shells have different built-in syntax for using environment variables; the following is the syntax for Bourne-style shells.

·         The shell automatically creates a shell variable for each environment variable that it finds, so you can access environment variable values using the $varname syntax. For instance:

·                 
·                % echo $USER 
·                samuel 
·                % echo $HOME 
/home/samuel 

·         You can use the export command to export a shell variable into the environment. For example, to set the EDITOR environment variable, you would use this:

·                 
·                % EDITOR=emacs 
% export EDITOR 

Or, for short:

 
% export EDITOR=emacs 

In a program, you access an environment variable with the getenv function in <stdlib.h>. That function takes a variable name and returns the corresponding value as a character string, or NULL if that variable is not defined in the environment. To set or clear environment variables, use the setenv and unsetenv functions, respectively.

Enumerating all the variables in the environment is a little trickier. To do this, you must access a special global variable named environ, which is defined in the GNU C library. This variable, of type char**, is a NULL -terminated array of pointers to character strings. Each string contains one environment variable, in the form VARIABLE=value.

The program in Listing 2.3, for instance, simply prints the entire environment by looping through the environ array.

Listing 2.3 (print-env.c) Printing the Execution Environment
#include <stdio.h> 
 
/* The ENVIRON variable contains the environment.  */ 
extern char** environ; 
 
int main () 
{
  char** var; 
  for (var = environ; *var != NULL; ++var) 
    printf ("%s\n", *var); 
  return 0; 
} 

Don't modify environ yourself; use the setenv and unsetenv functions instead.

Usually, when a new program is started, it inherits a copy of the environment of the program that invoked it (the shell program, if it was invoked interactively). So, for instance, programs that you run from the shell may examine the values of environment variables that you set in the shell.

Environment variables are commonly used to communicate configuration information to programs. Suppose, for example, that you are writing a program that connects to an Internet server to obtain some information. You could write the program so that the server name is specified on the command line. However, suppose that the server name is not something that users will change very often. You can use a special environment variable—say SERVER_NAME—to specify the server name; if that variable doesn't exist, a default value is used. Part of your program might look as shown in Listing 2.4.

Listing 2.4 (client.c) Part of a Network Client Program
#include <stdio.h> 
#include <stdlib.h> 
 
int main () 
{
  char* server_name = getenv ("SERVER_NAME"); 
  if (server_name == NULL) 
    /* The SERVER_NAME environment variable was not set. Use the 
       default.   */ 
    server_name = "server.my-company.com"; 
 
  printf ("accessing server %s\n", server_name); 
  /* Access the server here...  */ 
 
  return 0; 
} 

Suppose that this program is named client. Assuming that you haven't set the SERVER_NAME variable, the default value for the server name is used:

 
% client 
accessing server server.my-company.com 

But it's easy to specify a different server:

 
% export SERVER_NAME=backup-server.elsewhere.net 
% client 
accessing server backup-server.elsewhere.net 

2.1.7 Using Temporary Files

Sometimes a program needs to make a temporary file, to store large data for a while or to pass data to another program. On GNU/Linux systems, temporary files are stored in the /tmp directory. When using temporary files, you should be aware of the following pitfalls:

·         More than one instance of your program may be run simultaneously (by the same user or by different users). The instances should use different temporary filenames so that they don't collide.

·         The file permissions of the temporary file should be set in such a way that unauthorized users cannot alter the program's execution by modifying or replacing the temporary file.

·         Temporary filenames should be generated in a way that cannot be predicted externally; otherwise, an attacker can exploit the delay between testing whether a given name is already in use and opening a new temporary file.

GNU/Linux provides functions, mkstemp and tmpfile, that take care of these issues for you (in addition to several functions that don't). Which you use depends on whether you plan to hand the temporary file to another program, and whether you want to use UNIX I/O (open, write, and so on) or the C library's stream I/O functions (fopen, fprintf, and so on).

Using mkstemp

The mkstemp function creates a unique temporary filename from a filename template, creates the file with permissions so that only the current user can access it, and opens the file for read/write. The filename template is a character string ending with "XXXXXX" (six capital X's); mkstemp replaces the X's with characters so that the filename is unique. The return value is a file descriptor; use the write family of functions to write to the temporary file.

Temporary files created with mkstemp are not deleted automatically. It's up to you to remove the temporary file when it's no longer needed. (Programmers should be very careful to clean up temporary files; otherwise, the /tmp file system will fill up eventually, rendering the system inoperable.) If the temporary file is for internal use only and won't be handed to another program, it's a good idea to call unlink on the temporary file immediately. The unlink function removes the directory entry corresponding to a file, but because files in a file system are reference-counted, the file itself is not removed until there are no open file descriptors for that file, either. This way, your program may continue to use the temporary file, and the file goes away automatically as soon as you close the file descriptor. Because Linux closes file descriptors when a program ends, the temporary file will be removed even if your program terminates abnormally.

The pair of functions in Listing 2.5 demonstrates mkstemp. Used together, these functions make it easy to write a memory buffer to a temporary file (so that memory can be freed or reused) and then read it back later.

Listing 2.5 (temp_file.c) Using mkstemp
#include <stdlib.h> 
#include <unistd.h> 
 
/* A handle for a temporary file created with write_temp_file. In 
   this implementation, it's just a file descriptor.  */ 
typedef int temp_file_handle; 
 
/* Writes LENGTH bytes from BUFFER into a temporary file. The 
   temporary file is immediately unlinked. Returns a handle to the 
   temporary file.  */ 
 
temp_file_handle write_temp_file (char* buffer, size_t length) 
{
  /* Create the filename and file. The XXXXXX will be replaced with 
     characters that make the filename unique.  */ 
  char temp_filename[] = "/tmp/temp_file.XXXXXX"; 
  int fd = mkstemp (temp_filename); 
  /* Unlink the file immediately, so that it will be removed when the 
     file descriptor is closed.  */ 
  unlink (temp_filename); 
  /* Write the number of bytes to the file first.  */ 
  write (fd, &length, sizeof (length)); 
  /* Now write the data itself.  */ 
  write (fd, buffer, length); 
  /* Use the file descriptor as the handle for the temporary file. */ 
  return fd; 
} 
 
/* Reads the contents of a temporary file TEMP_FILE created with 
   write_temp_file. The return value is a newly allocated buffer of 
   those contents, which the caller must deallocate with free. 
   *LENGTH is set to the size of the contents, in bytes. The 
   temporary file is removed.  */ 
 
char* read_temp_file (temp_file_handle temp_file, size_t* length) 
{
  char* buffer; 
  /* The TEMP_FILE handle is a file descriptor to the temporary file.  */ 
  int fd = temp_file; 
  /* Rewind to the beginning of the file.  */ 
  lseek (fd, 0, SEEK_SET); 
  /* Read the size of the data in the temporary file.  */ 
  read (fd, length, sizeof (*length)); 
  /* Allocate a buffer and read the data.  */ 
  buffer = (char*) malloc (*length); 
  read (fd, buffer, *length); 
  /* Close the file descriptor, which will cause the temporary file to 
     go away.  */ 
  close (fd); 
  return buffer; 
} 
Using tmpfile

If you are using the C library I/O functions and don't need to pass the temporary file to another program, you can use the tmpfile function. This creates and opens a temporary file, and returns a file pointer to it. The temporary file is already unlinked, as in the previous example, so it is deleted automatically when the file pointer is closed (with fclose) or when the program terminates.

GNU/Linux provides several other functions for generating temporary files and temporary filenames, including mktemp, tmpnam, and tempnam. Don't use these functions, though, because they suffer from the reliability and security problems already mentioned.