2.3 Writing and Using Libraries

Virtually all programs are linked against one or more libraries. Any program that uses a C function (such as printf or malloc) will be linked against the C runtime library. If your program has a graphical user interface (GUI), it will be linked against windowing libraries. If your program uses a database, the database provider will give you libraries that you can use to access the database conveniently.

In each of these cases, you must decide whether to link the library statically or dynamically. If you choose to link statically, your programs will be bigger and harder to upgrade, but probably easier to deploy. If you link dynamically, your programs will be smaller, easier to upgrade, but harder to deploy. This section explains how to link both statically and dynamically, examines the trade-offs in more detail, and gives some "rules of thumb" for deciding which kind of linking is better for you.

2.3.1 Archives

An archive (or static library) is simply a collection of object files stored as a single file. (An archive is roughly the equivalent of a Windows .LIB file.) When you provide an archive to the linker, the linker searches the archive for the object files it needs, extracts them, and links them into your program much as if you had provided those object files directly.

You can create an archive using the ar command. Archive files traditionally use a .a extension rather than the .o extension used by ordinary object files. Here's how you would combine test1.o and test2.o into a single libtest.a archive:

 
% ar cr libtest.a test1.o test2.o 

The cr flags tell ar to create the archive. [3] Now you can link with this archive using the -ltest option with gcc or g++, as described in Section 1.2.5, "Linking Object Files," in Chapter 1, "Getting Started."

[3] You can use other flags to remove a file from an archive or to perform other operations on the archive. These operations are rarely used but are documented on the ar man page.

When the linker encounters an archive on the command line, it searches the archive for all definitions of symbols (functions or variables) that are referenced from the object files that it has already processed but not yet defined. The object files that define those symbols are extracted from the archive and included in the final executable. Because the linker searches the archive when it is encountered on the command line, it usually makes sense to put archives at the end of the command line. For example, suppose that test.c contains the code in Listing 2.7 and app.c contains the code in Listing 2.8.

Listing 2.7 (test.c) Library Contents
int f () 
{
  return 3; 
} 
Listing 2.8 (app.c) A Program That Uses Library Functions
int main () 
{
  return f (); 
} 

Now suppose that test.o is combined with some other object files to produce the libtest.a archive. The following command line will not work:

 
% gcc -o app -L. -ltest app.o 
app.o: In function 'main': 
app.o(.text+0x4): undefined reference to 'f' 
collect2: ld returned 1 exit status 

The error message indicates that even though libtest.a contains a definition of f, the linker did not find it. That's because libtest.a was searched when it was first encountered, and at that point the linker hadn't seen any references to f.

On the other hand, if we use this line, no error messages are issued:

 
% gcc -o app app.o -L. –ltest 

The reason is that the reference to f in app.o causes the linker to include the test.o object file from the libtest.a archive.

2.3.2 Shared Libraries

A shared library (also known as a shared object, or as a dynamically linked library) is similar to a archive in that it is a grouping of object files. However, there are many important differences. The most fundamental difference is that when a shared library is linked into a program, the final executable does not actually contain the code that is present in the shared library. Instead, the executable merely contains a reference to the shared library. If several programs on the system are linked against the same shared library, they will all reference the library, but none will actually be included. Thus, the library is "shared" among all the programs that link with it.

A second important difference is that a shared library is not merely a collection of object files, out of which the linker chooses those that are needed to satisfy undefined references. Instead, the object files that compose the shared library are combined into a single object file so that a program that links against a shared library always includes all of the code in the library, rather than just those portions that are needed.

To create a shared library, you must compile the objects that will make up the library using the -fPIC option to the compiler, like this:

 
% gcc -c -fPIC test1.c 

The -fPIC option tells the compiler that you are going to be using test.o as part of a shared object.

Position-Independent Code (PIC)

PIC stands for position-independent code. The functions in a shared library may be loaded at different addresses in different programs, so the code in the shared object must not depend on the address (or position) at which it is loaded. This consideration has no impact on you, as the programmer, except that you must remember to use the -fPIC flag when compiling code that will be used in a shared library.

Then you combine the object files into a shared library, like this:

 
% gcc -shared -fPIC -o libtest.so test1.o test2.o 

The -shared option tells the linker to produce a shared library rather than an ordinary executable. Shared libraries use the extension .so, which stands for shared object. Like static archives, the name always begins with lib to indicate that the file is a library.

Linking with a shared library is just like linking with a static archive. For example, the following line will link with libtest.so if it is in the current directory, or one of the standard library search directories on the system:

 
% gcc -o app app.o -L. –ltest 

Suppose that both libtest.a and libtest.so are available. Then the linker must choose one of the libraries and not the other. The linker searches each directory (first those specified with -L options, and then those in the standard directories). When the linker finds a directory that contains either libtest.a or libtest.so, the linker stops search directories. If only one of the two variants is present in the directory, the linker chooses that variant. Otherwise, the linker chooses the shared library version, unless you explicitly instruct it otherwise. You can use the -static option to demand static archives. For example, the following line will use the libtest.a archive, even if the libtest.so shared library is also available:

 
% gcc -static -o app app.o -L. –ltest 

The ldd command displays the shared libraries that are linked into an executable. These libraries need to be available when the executable is run. Note that ldd will list an additional library called ld-linux.so, which is a part of GNU/Linux's dynamic linking mechanism.

Using LD_LIBRARY_PATH

When you link a program with a shared library, the linker does not put the full path to the shared library in the resulting executable. Instead, it places only the name of the shared library. When the program is actually run, the system searches for the shared library and loads it. The system searches only /lib and /usr/lib, by default. If a shared library that is linked into your program is installed outside those directories, it will not be found, and the system will refuse to run the program.

One solution to this problem is to use the -Wl, -rpath option when linking the program. Suppose that you use this:

 
% gcc -o app app.o -L. -ltest -Wl,-rpath,/usr/local/lib 

Then, when app is run, the system will search /usr/local/lib for any required shared libraries.

Another solution to this problem is to set the LD_LIBRARY_PATH environment variable when running the program. Like the PATH environment variable, LD_LIBRARY_PATH is a colon-separated list of directories. For example, if LD_LIBRARY_PATH is /usr/local/lib:/opt/lib, then /usr/local/lib and /opt/lib will be searched before the standard /lib and /usr/lib directories. You should also note that if you have LD_LIBRARY_PATH, the linker will search the directories given there in addition to the directories given with the -L option when it is building an executable. [4]

[4] You might see a reference to LD_RUN_PATH in some online documentation. Don't believe what you read; this variable does not actually do anything under GNU/Linux.

2.3.3 Standard Libraries

Even if you didn't specify any libraries when you linked your program, it almost certainly uses a shared library. That's because GCC automatically links in the standard C library, libc, for you. The standard C library math functions are not included in libc; instead, they're in a separate library, libm, which you need to specify explicitly. For example, to compile and link a program compute.c which uses trigonometric functions such as sin and cos, you must invoke this code:

 
% gcc -o compute compute.c –lm 

If you write a C++ program and link it using the c++ or g++ commands, you'll also get the standard C++ library, libstdc++, automatically.

2.3.4 Library Dependencies

One library will often depend on another library. For example, many GNU/Linux systems include libtiff, a library that contains functions for reading and writing image files in the TIFF format. This library, in turn, uses the libraries libjpeg (JPEG image routines) and libz (compression routines).

Listing 2.9 shows a very small program that uses libtiff to open a TIFF image file.

Listing 2.9 (tifftest.c) Using libtiff
#include <stdio.h> 
#include <tiffio.h> 
int main (int argc, char** argv) 
{
  TIFF* tiff; 
  tiff = TIFFOpen (argv[1], "r"); 
  TIFFClose (tiff); 
  return 0; 
} 

Save this source file as tifftest.c. To compile this program and link with libtiff, specify -ltiff on your link line:

 
% gcc -o tifftest tifftest.c –ltiff 

By default, this will pick up the shared-library version of libtiff, found at /usr/lib/libtiff.so. Because libtiff uses libjpeg and libz, the shared-library versions of these two are also drawn in (a shared library can point to other shared libraries that it depends on). To verify this, use the ldd command:

 
% ldd tifftest 
        libtiff.so.3 => /usr/lib/libtiff.so.3 (0x4001d000) 
        libc.so.6 => /lib/libc.so.6 (0x40060000) 
        libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0x40155000) 
        libz.so.1 => /usr/lib/libz.so.1 (0x40174000) 
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) 

Static libraries, on the other hand, cannot point to other libraries. If decide to link with the static version of libtiff by specifying -static on your command line, you will encounter unresolved symbols:

 
% gcc -static -o tifftest tifftest.c -ltiff 
/usr/bin/../lib/libtiff.a(tif_jpeg.o): In function 'TIFFjpeg_error_exit': 
tif_jpeg.o(.text+0x2a): undefined reference to 'jpeg_abort' 
/usr/bin/../lib/libtiff.a(tif_jpeg.o): In function 'TIFFjpeg_create_compress': 
tif_jpeg.o(.text+0x8d): undefined reference to 'jpeg_std_error' 
tif_jpeg.o(.text+0xcf): undefined reference to 'jpeg_CreateCompress' 
... 

To link this program statically, you must specify the other two libraries yourself:

 
% gcc -static -o tifftest tifftest.c -ltiff -ljpeg -lz 

Occasionally, two libraries will be mutually dependent. In other words, the first archive will reference symbols defined in the second archive, and vice versa. This situation generally arises out of poor design, but it does occasionally arise. In this case, you can provide a single library multiple times on the command line. The linker will research the library each time it occurs. For example, this line will cause libfoo.a to be searched multiple times:

 
% gcc -o app app.o -lfoo -lbar –lfoo 

So, even if libfoo.a references symbols in libbar.a, and vice versa, the program will link successfully.

2.3.5 Pros and Cons

Now that you know all about static archives and shared libraries, you're probably wondering which to use. There are a few major considerations to keep in mind.

One major advantage of a shared library is that it saves space on the system where the program is installed. If you are installing 10 programs, and they all make use of the same shared library, then you save a lot of space by using a shared library. If you used a static archive instead, the archive is included in all 10 programs. So, using shared libraries saves disk space. It also reduces download times if your program is being downloaded from the Web.

A related advantage to shared libraries is that users can upgrade the libraries without upgrading all the programs that depend on them. For example, suppose that you produce a shared library that manages HTTP connections. Many programs might depend on this library. If you find a bug in this library, you can upgrade the library. Instantly, all the programs that depend on the library will be fixed; you don't have to relink all the programs the way you do with a static archive.

Those advantages might make you think that you should always use shared libraries. However, substantial reasons exist to use static archives instead. The fact that an upgrade to a shared library affects all programs that depend on it can be a disadvantage. For example, if you're developing mission-critical software, you might rather link to a static archive so that an upgrade to shared libraries on the system won't affect your program. (Otherwise, users might upgrade the shared library, thereby breaking your program, and then call your customer support line, blaming you!)

If you're not going to be able to install your libraries in /lib or /usr/lib, you should definitely think twice about using a shared library. (You won't be able to install your libraries in those directories if you expect users to install your software without administrator privileges.) In particular, the -Wl, -rpath trick won't work if you don't know where the libraries are going to end up. And asking your users to set LD_LIBRARY_PATH means an extra step for them. Because each user has to do this individually, this is a substantial additional burden.

You'll have to weigh these advantages and disadvantages for every program you distribute.

2.3.6 Dynamic Loading and Unloading

Sometimes you might want to load some code at run time without explicitly linking in that code. For example, consider an application that supports "plug-in" modules, such as a Web browser. The browser allows third-party developers to create plug-ins to provide additional functionality. The third-party developers create shared libraries and place them in a known location. The Web browser then automatically loads the code in these libraries.

This functionality is available under Linux by using the dlopen function. You could open a shared library named libtest.so by calling dlopen like this:

 
dlopen ("libtest.so", RTLD_LAZY) 

(The second parameter is a flag that indicates how to bind symbols in the shared library. You can consult the online man pages for dlopen if you want more information, but RTLD_LAZY is usually the setting that you want.) To use dynamic loading functions, include the <dlfcn.h> header file and link with the –ldl option to pick up the libdl library.

The return value from this function is a void * that is used as a handle for the shared library. You can pass this value to the dlsym function to obtain the address of a function that has been loaded with the shared library. For example, if libtest.so defines a function named my_function, you could call it like this:

 
void* handle = dlopen ("libtest.so", RTLD_LAZY); 
void (*test)() = dlsym (handle, "my_function"); 
(*test)(); 
dlclose (handle); 

The dlsym system call can also be used to obtain a pointer to a static variable in the shared library.

Both dlopen and dlsym return NULL if they do not succeed. In that event, you can call dlerror (with no parameters) to obtain a human-readable error message describing the problem.

The dlclose function unloads the shared library. Technically, dlopen actually loads the library only if it is not already loaded. If the library has already been loaded, dlopen simply increments the library reference count. Similarly, dlclose decrements the reference count and then unloads the library only if the reference count has reached zero.

If you're writing the code in your shared library in C++, you will probably want to declare those functions and variables that you plan to access elsewhere with the extern "C" linkage specifier. For instance, if the C++ function my_function is in a shared library and you want to access it with dlsym, you should declare it like this:

 
extern "C" void foo (); 

This prevents the C++ compiler from mangling the function name, which would change the function's name from foo to a different, funny-looking name that encodes extra information about the function. A C compiler will not mangle names; it will use whichever name you give to your function or variable.