5.5 Sockets

A socket is a bidirectional comm unication device that can be used to communicate with another process on the same machine or with a process running on other machines. Sockets are the only interprocess communication we'll discuss in this chapter that permit communication between processes on different computers. Internet programs such as Telnet, rlogin, FTP, talk, and the World Wide Web use sockets.

For example, you can obtain the WWW page from a Web server using the Telnet program because they both use sockets for network communications. [4] To open a connection to a WWW server at www.codesourcery.com, use telnet www.codesourcery.com 80. The magic constant 80 specifies a connection to the Web server programming running www.codesourcery.com instead of some other process. Try typing GET / after the connection is established. This sends a message through the socket to the Web server, which replies by sending the home page's HTML source and then closing the connection—for example:

[4] Usually, you'd use telnet to connect a Telnet server for remote logins. But you can also use telnet to connect to a server of a different kind and then type comments directly at it.

 
% telnet www.codesourcery.com 80 
Trying 206.168.99.1... 
Connected to merlin.codesourcery.com (206.168.99.1). 
Escape character is '^]'. 
GET / 
<html> 
<head> 
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 
... 

5.5.1 Socket Concepts

When you create a socket, you must specify three parameters: communication style, namespace, and protocol.

A communication style controls how the socket treats transmitted data and specifies the number of communication partners. When data is sent through a socket, it is packaged into chunks called packets. The communication style determines how these packets are handled and how they are addressed from the sender to the receiver.

·         Connection styles guarantee delivery of all packets in the order they were sent. If packets are lost or reordered by problems in the network, the receiver automatically requests their retransmission from the sender.

A connection-style socket is like a telephone call: The addresses of the sender and receiver are fixed at the beginning of the communication when the connection is established.

·         Datagram styles do not guarantee delivery or arrival order. Packets may be lost or reordered in transit due to network errors or other conditions. Each packet must be labeled with its destination and is not guaranteed to be delivered. The system guarantees only "best effort," so packets may disappear or arrive in a different order than shipping.

A datagram-style socket behaves more like postal mail. The sender specifies the receiver's address for each individual message.

A socket namespace specifies how socket addresses are written. A socket address identifies one end of a socket connection. For example, socket addresses in the "local namespace" are ordinary filenames. In "Internet namespace," a socket address is composed of the Internet address (also known as an Internet Protocol address or IP address) of a host attached to the network and a port number. The port number distinguishes among multiple sockets on the same host.

A protocol specifies how data is transmitted. Some protocols are TCP/IP, the primary networking protocols used by the Internet; the AppleTalk network protocol; and the UNIX local communication protocol. Not all combinations of styles, namespaces, and protocols are supported.

5.5.2 System Calls

Sockets are more flexible than previously discussed communication techniques. These are the system calls involving sockets:

socket— Creates a socket

closes— Destroys a socket

connect— Creates a connection between two sockets

bind— Labels a server socket with an address

listen— Configures a socket to accept conditions

accept— Accepts a connection and creates a new socket for the connection

Sockets are represented by file descriptors.

Creating and Destroying Sockets

The socket and close functions create and destroy sockets, respectively. When you create a socket, specify the three socket choices: namespace, communication style, and protocol. For the namespace parameter, use constants beginning with PF_ (abbreviating "protocol families"). For example, PF_LOCAL or PF_UNIX specifies the local namespace, and PF_INET specifies Internet namespaces. For the communication style parameter, use constants beginning with SOCK_. Use SOCK_STREAM for a connection-style socket, or use SOCK_DGRAM for a datagram-style socket.

The third parameter, the protocol, specifies the low-level mechanism to transmit and receive data. Each protocol is valid for a particular namespace-style combination. Because there is usually one best protocol for each such pair, specifying 0 is usually the correct protocol. If socket succeeds, it returns a file descriptor for the socket. You can read from or write to the socket using read, write, and so on, as with other file descriptors. When you are finished with a socket, call close to remove it.

Calling connect

To create a connection between two sockets, the client calls connect, specifying the address of a server socket to connect to. A client is the process initiating the connection, and a server is the process waiting to accept connections. The client calls connect to initiate a connection from a local socket to the server socket specified by the second argument. The third argument is the length, in bytes, of the address structure pointed to by the second argument. Socket address formats differ according to the socket namespace.

5.5.24.6 Sending Information

Any technique to write to a file descriptor can be used to write to a socket. See Appendix B for a discussion of Linux's low-level I/O functions and some of the issues surrounding their use. The send function, which is specific to the socket file descriptors, provides an alternative to write with a few additional choices; see the man page for information.

5.5.3 Servers

A server's life cycle consists of the creation of a connection-style socket, binding an address to its socket, placing a call to listen that enables connections to the socket, placing calls to accept incoming connections, and then closing the socket. Data isn't read and written directly via the server socket; instead, each time a program accepts a new connection, Linux creates a separate socket to use in transferring data over that connection. In this section, we introduce bind, listen, and accept.

An address must be bound to the server's socket using bind if a client is to find it. Its first argument is the socket file descriptor. The second argument is a pointer to a socket address structure; the format of this depends on the socket's address family. The third argument is the length of the address structure, in bytes. When an address is bound to a connection-style socket, it must invoke listen to indicate that it is a server. Its first argument is the socket file descriptor. The second argument specifies how many pending connections are queued. If the queue is full, additional connections will be rejected. This does not limit the total number of connections that a server can handle; it limits just the number of clients attempting to connect that have not yet been accepted.

A server accepts a connection request from a client by invoking accept. The first argument is the socket file descriptor. The second argument points to a socket address structure, which is filled with the client socket's address. The third argument is the length, in bytes, of the socket address structure. The server can use the client address to determine whether it really wants to communicate with the client. The call to accept creates a new socket for communicating with the client and returns the corresponding file descriptor. The original server socket continues to accept new client connections. To read data from a socket without removing it from the input queue, use recv. It takes the same arguments as read, plus an additional FLAGS argument. A flag of MSG_PEEK causes data to be read but not removed from the input queue.

5.5.4 Local Sockets

Sockets connecting processes on the same computer can use the local namespace represented by the synonyms PF_LOCAL and PF_UNIX. These are called local sockets or UNIX-domain sockets. Their socket addresses, specified by filenames, are used only when creating connections.

The socket's name is specified in struct sockaddr_un. You must set the sun_family field to AF_LOCAL, indicating that this is a local namespace. The sun_path field specifies the filename to use and may be, at most, 108 bytes long. The actual length of struct sockaddr_un should be computed using the SUN_LEN macro. Any filename can be used, but the process must have directory write permissions, which permit adding files to the directory. To connect to a socket, a process must have read permission for the file. Even though different computers may share the same filesystem, only processes running on the same computer can communicate with local namespace sockets.

The only permissible protocol for the local namespace is 0.

Because it resides in a file system, a local socket is listed as a file. For example, notice the initial s:

 
% ls -l /tmp/socket 
srwxrwx--x      1 user      group      0 Nov 13 19:18 /tmp/socket 

Call unlink to remove a local socket when you're done with it.

5.5.5 An Example Using Local Namespace Sockets

We illustrate sockets with two programs. The server program, in Listing 5.10, creates a local namespace socket and listens for connections on it. When it receives a connection, it reads text messages from the connection and prints them until the connection closes. If one of these messages is "quit," the server program removes the socket and ends. The socket-server program takes the path to the socket as its command-line argument.

Listing 5.10 (socket-server.c) Local Namespace Socket Server
#include <stdio.h> 
#include <stdlib.h> 
#include <string.h> 
#include <sys/socket.h> 
#include <sys/un.h> 
#include <unistd.h> 
 
/* Read text from the socket and print it out. Continue until the 
   socket closes. Return nonzero if the client sent a "quit" 
   message, zero otherwise.  */ 
 
int server (int client_socket) 
{
   while (1) {
     int length; 
     char* text; 
 
     /* First, read the length of the text message from the socket. If 
        read returns zero, the client closed the connection.  */ 
     if (read (client_socket, &length, sizeof (length)) == 0) 
       return 0; 
     /* Allocate a buffer to hold the text.  */ 
     text = (char*) malloc (length); 
     /* Read the text itself, and print it.  */ 
 
     read (client_socket, text, length); 
     printf ("%s\n", text); 
     /* Free the buffer.  */ 
     free (text); 
     /* If the client sent the message "quit," we're all done.  */ 
     if (!strcmp (text, "quit")) 
       return 1; 
   } 
} 
 
int main (int argc, char* const argv[]) 
{
const char* const socket_name = argv[1]; 
    int socket_fd; 
    struct sockaddr_un name; 
    int client_sent_quit_message; 
 
    /* Create the socket.   */ 
    socket_fd = socket (PF_LOCAL, SOCK_STREAM, 0); 
    /* Indicate that this is a server.   */ 
    name.sun_family = AF_LOCAL; 
    strcpy (name.sun_path, socket_name); 
    bind (socket_fd, &name, SUN_LEN (&name)); 
    /* Listen for connections.   */ 
    listen (socket_fd, 5); 
 
    /* Repeatedly accept connections, spinning off one server() to deal 
       with  each  client.  Continue  until  a  client  sends  a  "quit"  message.   */ 
    do {
      struct sockaddr_un client_name; 
      socklen_t client_name_len; 
      int client_socket_fd; 
 
      /* Accept a connection.   */ 
      client_socket_fd = accept (socket_fd, &client_name, &client_name_len); 
      /* Handle the connection.   */ 
      client_sent_quit_message = server (client_socket_fd); 
      /* Close our end of the connection.   */ 
      close (client_socket_fd); 
    } 
    while (!client_sent_quit_message); 
 
    /* Remove the socket file.   */ 
    close (socket_fd); 
    unlink (socket_name); 
 
    return 0; 
} 
The client program, in Listing 5.11, connects to a local namespace socket and sends a message. The name path to the socket and the message are specified on the command line.

Listing 5.11 (socket-client.c) Local Namespace Socket Client
#include <stdio.h> 
#include <string.h> 
#include <sys/socket.h> 
#include <sys/un.h> 
#include <unistd.h> 
/* Write TEXT to the socket given by file descriptor SOCKET_FD.  */ 
 
void write_text (int socket_fd, const char* text) 
{
   /* Write the number of bytes in the string, including 
      NUL-termination.  */ 
   int length = strlen (text) + 1; 
   write (socket_fd, &length, sizeof (length)); 
   /* Write the string.  */ 
   write (socket_fd, text, length); 
} 
 
int main (int argc, char* const argv[]) 
{
   const char* const socket_name = argv[1]; 
   const char* const message = argv[2]; 
   int socket_fd; 
   struct sockaddr_un name; 
 
   /* Create the socket.  */ 
   socket_fd = socket (PF_LOCAL, SOCK_STREAM, 0); 
   /* Store the server's name in the socket address.  */ 
   name.sun_family = AF_LOCAL; 
   strcpy (name.sun_path, socket_name); 
   /* Connect the socket.   */ 
   connect (socket_fd, &name, SUN_LEN (&name)); 
   /* Write the text on the command line to the socket.   */ 
   write_text (socket_fd, message); 
   close (socket_fd); 
   return 0; 
} 

Before the client sends the message text, it sends the length of that text by sending the bytes of the integer variable length. Likewise, the server reads the length of the text by reading from the socket into an integer variable. This allows the server to allocate an appropriately sized buffer to hold the message text before reading it from the socket.

To try this example, start the server program in one window. Specify a path to a socket—for example, /tmp/socket.

 
%  ./socket-server /tmp/socket 

In another window, run the client a few times, specifying the same socket path plus messages to send to the client:

 
% ./socket-client /tmp/socket "Hello, world." 
% ./socket-client /tmp/socket "This is a test." 

The server program receives and prints these messages. To close the server, send the message "quit" from a client:

 
% ./socket-client /tmp/socket "quit" 

The server program terminates.

5.5.6 Internet-Domain Sockets

UNIX-domain sockets can be used only for communication between two processes on the same computer. Internet-domain sockets, on the other hand, may be used to connect processes on different machines connected by a network.

Sockets connecting processes through the Internet use the Internet namespace represented by PF_INET. The most common protocols are TCP/IP. The Internet Protocol (IP), a low-level protocol, moves packets through the Internet, splitting and rejoining the packets, if necessary. It guarantees only "best-effort" delivery, so packets may vanish or be reordered during transport. Every participating computer is specified using a unique IP number. The Transmission Control Protocol (TCP), layered on top of IP, provides reliable connection-ordered transport. It permits telephone-like connections to be established between computers and ensures that data is delivered reliably and in order.

DNS Names

Because it is easier to remember names than numbers, the Domain Name Service (DNS) associates names such as www.codesourcery.com with computers' unique IP numbers. DNS is implemented by a worldwide hierarchy of name servers, but you don't need to understand DNS protocols to use Internet host names in your programs.

Internet socket addresses contain two parts: a machine and a port number. This information is stored in a struct sockaddr_in variable. Set the sin_family field to AF_INET to indicate that this is an Internet namespace address. The sin_addr field stores the Internet address of the desired machine as a 32-bit integer IP number. A port number distinguishes a given machine's different sockets. Because different machines store multibyte values in different byte orders, use htons to convert the port number to network byte order. See the man page for ip for more information.

To convert human-readable hostnames, either numbers in standard dot notation (such as 10.0.0.1) or DNS names (such as www.codesourcery.com) into 32-bit IP numbers, you can use gethostbyname. This returns a pointer to the struct hostent structure; the h_addr field contains the host's IP number. See the sample program in Listing 5.12.

Listing 5.12 illustrates the use of Internet-domain sockets. The program obtains the home page from the Web server whose hostname is specified on the command line.

Listing 5.12 (socket-inet.c) Read from a WWW Server
#include <stdlib.h> 
#include <stdio.h> 
#include <netinet/in.h> 
#include <netdb.h> 
#include <sys/socket.h> 
#include <unistd.h> 
#include <string.h> 
 
/* Print the contents of the home page for the server's socket. 
   Return an indication of success.   */ 
 
void get_home_page (int socket_fd) 
{
    char buffer[10000]; 
    ssize_t number_characters_read; 
 
    /* Send the HTTP GET command for the home page.   */ 
    sprintf (buffer, "GET /\n"); 
    write (socket_fd, buffer, strlen (buffer)); 
    /* Read from the socket. The call to read may not 
    return all the data at one time, so keep 
    trying until we run out.   */ 
    while (1) {
      number_characters_read = read (socket_fd, buffer, 10000); 
      if (number_characters_read == 0) 
        return; 
      /* Write the data to standard output.   */ 
      fwrite (buffer, sizeof (char), number_characters_read, stdout); 
    } 
} 
 
int main (int argc, char* const argv[]) 
{
    int socket_fd; 
    struct sockaddr_in name; 
    struct hostent* hostinfo; 
 
    /* Create the socket.   */ 
    socket_fd = socket (PF_INET, SOCK_STREAM, 0); 
    /* Store the server's name in the socket address.   */ 
    name.sin_family = AF_INET; 
    /* Convert from strings to numbers.   */ 
    hostinfo = gethostbyname (argv[1]); 
    if (hostinfo == NULL) 
      return 1; 
    else 
      name.sin_addr = *((struct in_addr *) hostinfo->h_addr); 
    /* Web servers use port 80.   */ 
    name.sin_port = htons (80); 
    /* Connect to the Web server   */ 
    if (connect (socket_fd, &name, sizeof (struct sockaddr_in)) == -1) {
      perror ("connect"); 
      return 1; 
    } 
    /* Retrieve the server's home page.   */ 
    get_home_page (socket_fd); 
 
    return 0; 
} 

This program takes the hostname of the Web server on the command line (not a URL—that is, without the "http://"). It calls gethostbyname to translate the hostname into a numerical IP address and then connects a stream (TCP) socket to port 80 on that host. Web servers speak the Hypertext Transport Protocol (HTTP), so the program issues the HTTP GET command and the server responds by sending the text of the home page.

Standard Port Numbers

By convention, Web servers listen for connections on port 80. Most Internet network services are associated with a standard port number. For example, secure Web servers that use SSL listen for connections on port 443, and mail servers (which speak SMTP) use port 25.

On GNU/Linux systems, the associations between protocol/service names and standard port numbers are listed in the file /etc/services. The first column is the protocol or service name. The second column lists the port number and the connection type: tcp for connection-oriented, or udp for datagram. If you implement custom network services using Internet-domain sockets, use port numbers greater than 1024.

For example, to retrieve the home page from the Web site www.codesourcery.com, invoke this:

 
% ./socket-inet www.codesourcery.com 
<html> 
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 
 
... 

5.5.7 Socket Pairs

As we saw previously, the pipe function creates two file descriptors for the beginning and end of a pipe. Pipes are limited because the file descriptors must be used by related processes and because communication is unidirectional. The socketpair function creates two file descriptors for two connected sockets on the same computer. These file descriptors permit two-way communication between related processes. Its first three parameters are the same as those of the socket call: They specify the domain, connection style, and protocol. The last parameter is a two-integer array, which is filled with the file descriptions of the two sockets, similar to pipe. When you call socketpair, you must specify PF_LOCAL as the domain.