10.6 More Security Holes

Although this chapter will point out a few common security holes, you should by no means rely on this book to cover all possible security holes. A great many have already been discovered, and many more are out there waiting to be found. If you are trying to write secure code, there is really no substitute for having a security expert audit your code.

10.6.1 Buffer Overruns

Almost every major Internet application daemon, including the sendmail daemon, the finger daemon, the talk daemon, and others, has at one point been compromised through a buffer overrun.

If you are writing any code that will ever be run as root, you absolutely must be aware of this particular kind of security hole. If you are writing a program that performs any kind of interprocess communication, you should definitely be aware of this kind of security hole. If you are writing a program that reads files (or might read files) that are not owned by the user executing the program, you should be aware of this kind of security hole. That last criterion applies to almost every program. Fundamentally, if you're going to write GNU/Linux software, you ought to know about buffer overruns.

The idea behind a buffer overrun attack is to trick a program into executing code that it did not intend to execute. The usual mechanism for achieving this feat is to overwrite some portion of the program's process stack. The program's stack contains, among other things, the memory location to which the program will transfer control when the current function returns. Therefore, if you can put the code that you want to have executed into memory somewhere and then change the return address to point to that piece of memory, you can cause the program to execute anything. When the program returns from the function it is executing, it will jump to the new code and execute whatever is there, running with the privileges of the current process. Clearly, if the current process is running as root, this would be a disaster. If the process is running as another user, it's a disaster "only" for that user—and anybody else who depends on the contents of files owned by that user, and so forth.

If the program is running as a daemon and listening for incoming network connections, the situation is even worse. A daemon typically runs as root. If it contains buffer overrun bugs, anyone who can connect via the network to a computer running the daemon can seize control of the computer by sending a malignant sequence of data to the daemon over the network. A program that does not engage in network communications is much safer because only users who are already able to log in to the computer running the program are able to attack it.

The buggy versions of finger, talk, and sendmail all shared a common flaw. Each used a fixed-length string buffer, which implied a constant upper limit on the size of the string but then allowed network clients to provide strings that overflowed the buffer. For example, they contained code similar to this:

 
#include <stdio.h> 
 
int main () 
{
  /* Nobody in their right mind would have more than 32 characters in 
     their username. Plus, I think UNIX allows only 8-character 
     usernames. So, this should be plenty of space.  */ 
  char username[32]; 
  /* Prompt the user for the username.  */ 
  printf ("Enter your username: "); 
  /* Read a line of input.  */ 
  gets (username); 
  /* Do other things here...  */ 
 
  return 0; 
} 

The combination of the 32-character buffer with the gets function permits a buffer overrun. The gets function reads user input up until the next newline character and stores the entire result in the username buffer. The comments in the code are correct in that people generally have short usernames, so no well-meaning user is likely to type in more than 32 characters. But when you're writing secure software, you must consider what a malicious attacker might do. In this case, the attacker might deliberately type in a very long username. Local variables such as username are stored on the stack, so by exceeding the array bounds, it's possible to put arbitrary bytes onto the stack beyond the area reserved for the username variable. The username will overrun the buffer and overwrite parts of the surrounding stack, allowing the kind of attack described previously.

Fortunately, it's relatively easy to prevent buffer overruns. When reading strings, you should always use a function, such as getline, that either dynamically allocates a sufficiently large buffer or stops reading input if the buffer is full. For example, you could use this:

 
char* username = getline (NULL, 0, stdin); 

This call automatically uses malloc to allocate a buffer big enough to hold the line and returns it to you. You have to remember to call free to deallocate the buffer, of course, to avoid leaking memory.

Your life will be even easier if you use C++ or another language that provides simple primitives for reading input. In C++, for example, you can simply use this:

 
string username; 
getline (cin, username); 

The username string will automatically be deallocated as well; you don't have to remember to free it. [8]

[8] Some programmers believe that C++ is a horrible and overly complex language. Their arguments about multiple inheritance and other such complications have some merit, but it is easier to write code that avoids buffer overruns and other similar problems in C++ than in C.

Of course, buffer overruns can occur with any statically sized array, not just with strings. If you want to write secure code, you should never write into a data structure, on the stack or elsewhere, without verifying that you're not going to write beyond its region of memory.

10.6.2 Race Conditions in / tmp

Another very common problem involves the creation of files with predictable names, typically in the /tmp directory. Suppose that your program prog, running as root, always creates a temporary file called /tmp/prog and writes some vital information there. A malicious user can create a symbolic link from /tmp/prog to any other file on the system. When your program goes to create the file, the open system call will succeed. However, the data that you write will not go to /tmp/prog; instead, it will be written to some arbitrary file of the attacker's choosing.

This kind of attack is said to exploit a race condition. There is implicitly a race between you and the attacker. Whoever manages to create the file first wins. This attack is often used to destroy important parts of the file system. By creating the appropriate links, the attacker can trick a program running as root that is supposed to write a temporary file into overwriting an important system file instead. For example, by making a symbolic link to /etc/passwd, the attacker can wipe out the system's password database. There are also ways in which a malicious user can obtain root access using this technique.

One attempt at avoiding this attack is to use a randomized name for the file. For example, you could read from /dev/random to get some bits to use in the name of the file. This certainly makes it harder for a malicious user to guess the filename, but it doesn't make it impossible. The attacker might just create a large number of symbolic links, using many potential names. Even if she has to try 10,000 times before wining the race condition, that one time could be disastrous.

Another approach is to use the O_EXCL flag when calling open. This flag causes open to fail if the file already exists. Unfortunately, if you're using the Network File System (NFS), or if anyone who's using your program might ever be using NFS, that's not a sufficiently robust approach because O_EXCL is not reliable when NFS is in use. You can't ever really know for sure whether your code will be used on a system that uses NFS, so if you're highly paranoid, don't rely on using O_EXCL.

In Chapter 2, "Writing Good GNU/Linux Software," Section 2.1.7, "Using Temporary Files," we showed how to use mkstemp to create temporary files. Unfortunately, what mkstemp does on Linux is open the file with O_EXCL after trying to pick a name that is hard to guess. In other words, using mkstemp is still insecure if /tmp is mounted over NFS. [9] So, using mkstemp is better than nothing, but it's not fully secure.

[9] Obviously, if you're also a system administrator, you shouldn't mount /tmp over NFS.

One approach that works is to call lstat on the newly created file (lstat is discussed in Section B.2, "stat"). The lstat function is like stat, except that if the file referred to is a symbolic link, lstat tells you about the link, not the file to which it refers. If lstat tells you that your new file is an ordinary file, not a symbolic link, and that it is owned by you, then you should be okay.

Listing 10.5 presents a function that tries to securely open a file in /tmp. The authors of this book have not had it audited professionally, nor are we professional security experts, so there's a good chance that it has a weakness, too. We do not recommend that you use this code without getting an audit, but it should at least convince you that writing secure code is tricky. To help dissuade you, we've deliberately made the interface difficult to use in real programs. Error checking is an important part of writing secure software, so we've included error-checking logic in this example.

Listing 10.5 (temp-file.c) Create a Temporary File
#include <fcntl.h> 
#include <stdlib.h> 
#include <sys/stat.h> 
#include <unistd.h> 
 
/* Returns the file descriptor for a newly created temporary file. 
   The temporary file will be readable and writable by the effective 
   user ID of the current process but will not be readable or 
   writable by anybody else. 
 
   Returns -1 if the temporary file could not be created.  */ 
 
int secure_temp_file ( ) 
{
  /* This file descriptor points to /dev/random and allows us to get 
     a good source of random bits.  */ 
  static int random_fd = -1; 
  /* A random integer.  */ 
  unsigned int random; 
  /* A buffer, used to convert from a numeric to a string 
     representation of random. This buffer has fixed size, meaning 
     that we potentially have a buffer overrun bug if the integers on 
     this machine have a *lot* of bits.  */ 
  char filename[128]; 
  /* The file descriptor for the new temporary file.  */ 
  int fd; 
  /* Information about the newly created file.  */ 
  struct stat stat_buf; 
 
  /* If we haven't already opened /dev/random, do so now.  (This is 
     not threadsafe.)  */ 
  if (random_fd == -1) {
    /* Open /dev/random. Note that we're assuming that /dev/random 
       really is a source of random bits, not a file full of zeros 
       placed there by an attacker. */ 
    random_fd = open ("/dev/random", O_RDONLY); 
    /* If we couldn't open /dev/random, give up. */ 
    if (random_fd == -1) 
      return -1; 
  } 
 
  /* Read an integer from /dev/random. */ 
  if (read (random_fd, &random, sizeof (random)) != 
      sizeof (random)) 
    return -1; 
  /* Create a filename out of the random number. */ 
  sprintf (filename, "/tmp/%u", random); 
  /* Try to open the file. */ 
  fd = open (filename, 
             /* Use O_EXECL, even though it doesn't work under NFS. */ 
             O_RDWR | O_CREAT | O_EXCL, 
             /* Make sure nobody else can read or write the file. */ 
             S_IRUSR | S_IWUSR); 
  if (fd == -1) 
    return -1; 
 
  /* Call lstat on the file, to make sure that it is not a symbolic 
     link. */ 
  if (lstat (filename, &stat_buf) == -1) 
    return -1; 
  /* If the file is not a regular file, someone has tried to trick 
     us. */ 
  if (!S_ISREG (stat_buf.st_mode)) 
    return -1; 
  /* If we don't own the file, someone else might remove it, read it, 
     or change it while we're looking at it. */ 
  if (stat_buf.st_uid != geteuid () || stat_buf.st_gid != getegid ()) 
    return -1; 
  /* If there are any more permission bits set on the file, 
     something's fishy. */ 
  if ((stat_buf.st_mode & ~(S_IRUSR | S_IWUSR)) != 0) 
    return -1; 
 
  return fd; 
} 

This function calls open to create the file and then calls lstat a few lines later to make sure that the file is not a symbolic link. If you're thinking carefully, you'll realize that there seems to be a race condition at this point. In particular, an attacker could remove the file and replace it with a symbolic link between the time we call open and the time we call lstat. That won't harm us directly because we already have an open file descriptor to the newly created file, but it will cause us to indicate an error to our caller. This attack doesn't create any direct harm, but it does make it impossible for our program to get its work done. Such an attack is called a denial-of-service (DoS) attack.

Fortunately, the sticky bit comes to the rescue. Because the sticky bit is set on /tmp, nobody else can remove files from that directory. Of course, root can still remove files from /tmp, but if the attacker has root privilege, there's nothing you can do to protect your program.

If you choose to assume competent system administration, then /tmp will not be mounted via NFS. And if the system administrator was foolish enough to mount /tmp over NFS, then there's a good chance that the sticky bit isn't set, either. So, for most practical purposes, we think it's safe to use mkstemp. But you should be aware of these issues, and you should definitely not rely on O_EXCL to work correctly if the directory in use is not /tmp—nor you should rely on the sticky bit being set anywhere else.

10.6.3 Using system or popen

The third common security hole that every programmer should bear in mind involves using the shell to execute other programs. As a toy example, let's consider a dictionary server. This program is designed to accept connections via the Internet. Each client sends a word, and the server tells it whether that is a valid English word. Because every GNU/Linux system comes with a list of about 45,000 English words in /usr/dict/words, an easy way to build this server is to invoke the grep program, like this:

 
% grep -x word /usr/dict/words 

Here, word is the word that the user is curious about. The exit code from grep will tell you whether that word appears in /usr/dict/words. [1]

[1] If you don't know about grep, you should look at the manual pages. It's an incredibly useful program.

Listing 10.6 shows how you might try to code the part of the server that invokes grep:

Listing 10.6 (grep-dictionary.c) Search for a Word in the Dictionary
#include <stdio.h> 
#include <stdlib.h> 
 
/* Returns a nonzero value if and only if the WORD appears in 
   /usr/dict/words.                                              */ 
int grep_for_word (const char* word) 
{
  size_t length; 
  char* buffer; 
  int exit_code; 
/* Build up the string 'grep -x WORD /usr/dict/words'. Allocate the 
     string dynamically to avoid buffer overruns.  */ 
  length = 
    strlen ("grep -x ") + strlen (word) + strlen (" //usr/dict/words") + 1; 
  buffer = (char*) malloc (length); 
  sprintf (buffer, "grep -x %s /usr/dict/words", word); 
 
  /* Run the command.  */ 
  exit_code = system (buffer); 
  /* Free the buffer.  */ 
  free (buffer); 
  /* If grep returned 0, then the word was present in the 
     dictionary.  */ 
  return exit_code == 0; 
} 

Note that by calculating the number of characters we need and then allocating the buffer dynamically, we're sure to be safe from buffer overruns.

Unfortunately, the use of the system function (described in Chapter 3, "Processes," Section 3.2.4, "Using system") is unsafe. This function invokes the standard system shell to run the command and then returns the exit value. But what happens if a malicious hacker sends a "word" that is actually the following line or a similar string?

 
foo /dev/null; rm -rf / 

In that case, the server will execute this command:

 
grep -x foo /dev/null; rm -rf / /usr/dict/words 

Now the problem is obvious. The user has turned one command, ostensibly the invocation of grep, into two commands because the shell treats a semicolon as a command separator. The first command is still a harmless invocation of grep, but the second removes all files on the entire system! Even if the server is not running as root, all the files that can be removed by the user running the server will be removed. The same problem can arise with popen (described in Section 5.4.4, "Popen and pclose"), which creates a pipe between the parent and child process but still uses the shell to run the command.

There are two ways to avoid these problems. One is to use the exec family of functions instead of system or popen. That solution avoids the problem because characters that the shell treats specially (such as the semicolon in the previous command) are not treated specially when they appear in the argument list to an exec call. Of course, you give up the convenience of system and popen.

The other alternative is to validate the string to make sure that it is benign. In the dictionary server example, you would make sure that the word provided contains only alphabetic characters, using the isalpha function. If it doesn't contain any other characters, there's no way to trick the shell into executing a second command. Don't implement the check by looking for dangerous and unexpected characters; it's always safer to explicitly check for the characters that you know are safe rather than try to anticipate all the characters that might cause trouble.