11.1 Overview

The example program is part of a system for monitoring a running GNU/Linux system. It includes these features:

·         The program incorporates a minimal Web server. Local or remote clients access system information by requesting Web pages from the server via HTTP.

·         The program does not serve static HTML pages. Instead, the pages are generated on the fly by modules, each of which provides a page summarizing one aspect of the system's state.

·         Modules are not linked statically into the server executable. Instead, they are loaded dynamically from shared libraries. Modules can be added, removed, or replaced while the server is running.

·         The server services each connection in a child process. This enables the server to remain responsive even when individual requests take a while to complete, and it shields the server from failures in modules.

·         The server does not require superuser privilege to run (as long as it is not run on a privileged port). However, this limits the system information that it can collect.

We provide four sample modules that demonstrate how modules might be written. They further illustrate some of the techniques for gathering system information presented previously in this book. The time module demonstrates using the gettimeofday system call. The issue module demonstrates low-level I/O and the sendfile system call. The diskfree module demonstrates the use of fork, exec, and dup2 by running a command in a child process. The processes module demonstrates the use of the /proc file system and various system calls.

11.1.1 Caveats

This program has many of the features you'd expect in an application program, such as command-line parsing and error checking. At the same time, we've made some simplifications to improve readability and to focus on the GNU/Linux-specific topics discussed in this book. Bear in mind these caveats as you examine the code.

·         We don't attempt to provide a full implementation of HTTP. Instead, we implement just enough for the server to interact with Web clients. A real-world program either would provide a more complete HTTP implementation or would interface with one of the various excellent Web server implementations [1] available instead of providing HTTP services directly.

[1] The most popular open source Web server for GNU/Linux is the Apache server, available from http://www.apache.org.

·         Similarly, we don't aim for full compliance with HTML specifications (see http://www.w3.org/MarkUp/). We generate simple HTML output that can be handled by popular Web browsers.

·         The server is not tuned for high performance or minimum resource usage. In particular, we intentionally omit some of the network configuration code that you would expect in a Web server. This topic is outside the scope of this book. See one of the many excellent references on network application development, such as UNIX Network Programming, Volume 1: Networking APIs—Sockets and XTI, by W. Richard Stevens (Prentice Hall, 1997), for more information.

·         We make no attempt to regulate the resources (number of processes, memory use, and so on) consumed by the server or its modules. Many multiprocess Web server implementations service connections using a fixed pool of processes rather than creating a new child process for each connection.

·         The server loads the shared library for a server module each time it is requested and then immediately unloads it when the request has been completed. A more efficient implementation would probably cache loaded modules.

HTTP

The Hypertext Transport Protocol (HTTP) is used for communication between Web clients and servers. The client connects to the server by establishing a connection to a well-known port (usually port 80 for Internet Web servers, but any port may be used). HTTP requests and headers are composed of plain text.

Once connected, the client sends a request to the server. A typical request is GET /page HTTP/1.0. The GET method indicates that the client is requesting that the server send it a Web page. The second element is the path to that page on the server. The third element is the protocol and version. Subsequent lines contain header fields, formatted similarly to email headers, which contain extra information about the client. The header ends with a blank line.

The server sends back a response indicating the result of processing the request. A typical response is HTTP/1.0 200 OK. The first element is the protocol version. The next two elements indicate the result; in this case, result 200 indicates that the request was processed successfully. Subsequent lines contain header fields, formatted similarly to email headers. The header ends with a blank line. The server may then send arbitrary data to satisfy the request.

Typically, the server responds to a page request by sending back HTML source for the Web page. In this case, the response headers will include Content-type: text/html, indicating that the result is HTML source. The HTML source follows immediately after the header.

See the HTTP specification at http://www.w3.org/Protocols/ for more information.