Poster of Linux kernelThe best gift for a Linux geek
 Linux kernel map 
⇦ prev ⇱ home next ⇨

17.10. The Socket Buffers

We've now covered most of the issues related to network interfaces. What's still missing is some more detailed discussion of the sk_buff structure. The structure is at the core of the network subsystem of the Linux kernel, and we now introduce both the main fields of the structure and the functions used to act on it.

Although there is no strict need to understand the internals of sk_buff, the ability to look at its contents can be helpful when you are tracking down problems and when you are trying to optimize your code. For example, if you look in loopback.c, you'll find an optimization based on knowledge of the sk_buff internals. The usual warning applies here: if you write code that takes advantage of knowledge of the sk_buff structure, you should be prepared to see it break with future kernel releases. Still, sometimes the performance advantages justify the additional maintenance cost.

We are not going to describe the whole structure here, just the fields that might be used from within a driver. If you want to see more, you can look at <linux/skbuff.h>, where the structure is defined and the functions are prototyped. Additional details about how the fields and functions are used can be easily retrieved by grepping in the kernel sources.

17.10.1. The Important Fields

The fields introduced here are the ones a driver might need to access. They are listed in no particular order.

struct net_device *dev;

The device receiving or sending this buffer.

union { /* ... */ } h;

union { /* ... */ } nh;

union { /*... */} mac;

Pointers to the various levels of headers contained within the packet. Each field of the union is a pointer to a different type of data structure. h hosts pointers to transport layer headers (for example, struct tcphdr *th); nh includes network layer headers (such as struct iphdr *iph); and mac collects pointers to link-layer headers (such as struct ethdr *ethernet).

If your driver needs to look at the source and destination addresses of a TCP packet, it can find them in skb->h.th. See the header file for the full set of header types that can be accessed in this way.

Note that network drivers are responsible for setting the mac pointer for incoming packets. This task is normally handled by eth_type_trans, but non-Ethernet drivers have to set skb->mac.raw directly, as shown in Section 17.11.3.

unsigned char *head;

unsigned char *data;

unsigned char *tail;

unsigned char *end;

Pointers used to address the data in the packet. head points to the beginning of the allocated space, data is the beginning of the valid octets (and is usually slightly greater than head), tail is the end of the valid octets, and end points to the maximum address tail can reach. Another way to look at it is that the available buffer space is skb->end - skb->head, and the currently used data space is skb->tail - skb->data.

unsigned int len;

unsigned int data_len;

len is the full length of the data in the packet, while data_len is the length of the portion of the packet stored in separate fragments. The data_len field is 0 unless scatter/gather I/O is being used.

unsigned char ip_summed;

The checksum policy for this packet. The field is set by the driver on incoming packets, as described in the Section 17.6.

unsigned char pkt_type;

Packet classification used in its delivery. The driver is responsible for setting it to PACKET_HOST (this packet is for me), PACKET_OTHERHOST (no, this packet is not for me), PACKET_BROADCAST, or PACKET_MULTICAST. Ethernet drivers don't modify pkt_type explicitly because eth_type_trans does it for them.

shinfo(struct sk_buff *skb);

unsigned int shinfo(skb)->nr_frags;

skb_frag_t shinfo(skb)->frags;

For performance reasons, some skb information is stored in a separate structure that appears immediately after the skb in memory. This "shared info" (so called because it can be shared among copies of the skb within the networking code) must be accessed via the shinfo macro. There are several fields in this structure, but most of them are beyond the scope of this book. We saw nr_frags and frags in Section 17.5.3.

The remaining fields in the structure are not particularly interesting. They are used to maintain lists of buffers, to account for memory belonging to the socket that owns the buffer, and so on.

17.10.2. Functions Acting on Socket Buffers

Network devices that use an sk_buff structure act on it by means of the official interface functions. Many functions operate on socket buffers; here are the most interesting ones:

struct sk_buff *alloc_skb(unsigned int len, int priority);

struct sk_buff *dev_alloc_skb(unsigned int len);

Allocate a buffer. The alloc_skb function allocates a buffer and initializes both skb->data and skb->tail to skb->head. The dev_alloc_skb function is a shortcut that calls alloc_skb with GFP_ATOMIC priority and reserves some space between skb->head and skb->data. This data space is used for optimizations within the network layer and should not be touched by the driver.

void kfree_skb(struct sk_buff *skb);

void dev_kfree_skb(struct sk_buff *skb);

void dev_kfree_skb_irq(struct sk_buff *skb);

void dev_kfree_skb_any(struct sk_buff *skb);

Free a buffer. The kfree_skb call is used internally by the kernel. A driver should use one of the forms of dev_kfree_skb instead: dev_kfree_skb for noninterrupt context, dev_kfree_skb_irq for interrupt context, or dev_kfree_skb_any for code that can run in either context.

unsigned char *skb_put(struct sk_buff *skb, int len);

unsigned char *_ _skb_put(struct sk_buff *skb, int len);

Update the tail and len fields of the sk_buff structure; they are used to add data to the end of the buffer. Each function's return value is the previous value of skb->tail (in other words, it points to the data space just created). Drivers can use the return value to copy data by invoking memcpy(skb_put(...), data, len) or an equivalent. The difference between the two functions is that skb_put checks to be sure that the data fits in the buffer, whereas _ _skb_put omits the check.

unsigned char *skb_push(struct sk_buff *skb, int len);

unsigned char *_ _skb_push(struct sk_buff *skb, int len);

Functions to decrement skb->data and increment skb->len. They are similar to skb_put, except that data is added to the beginning of the packet instead of the end. The return value points to the data space just created. The functions are used to add a hardware header before transmitting a packet. Once again, _ _skb_push differs in that it does not check for adequate available space.

int skb_tailroom(struct sk_buff *skb);

Returns the amount of space available for putting data in the buffer. If a driver puts more data into the buffer than it can hold, the system panics. Although you might object that a printk would be sufficient to tag the error, memory corruption is so harmful to the system that the developers decided to take definitive action. In practice, you shouldn't need to check the available space if the buffer has been correctly allocated. Since drivers usually get the packet size before allocating a buffer, only a severely broken driver puts too much data in the buffer, and a panic might be seen as due punishment.

int skb_headroom(struct sk_buff *skb);

Returns the amount of space available in front of data, that is, how many octets one can "push" to the buffer.

void skb_reserve(struct sk_buff *skb, int len);

Increments both data and tail. The function can be used to reserve headroom before filling the buffer. Most Ethernet interfaces reserve two bytes in front of the packet; thus, the IP header is aligned on a 16-byte boundary, after a 14-byte Ethernet header. snull does this as well, although the instruction was not shown in Section 17.6 to avoid introducing extra concepts at that point.

unsigned char *skb_pull(struct sk_buff *skb, int len);

Removes data from the head of the packet. The driver won't need to use this function, but it is included here for completeness. It decrements skb->len and increments skb->data; this is how the hardware header (Ethernet or equivalent) is stripped from the beginning of incoming packets.

int skb_is_nonlinear(struct sk_buff *skb);

Returns a true value if this skb is separated into multiple fragments for scatter/gather I/O.

int skb_headlen(struct sk_buff *skb);

Returns the length of the first segment of the skb (that part pointed to by skb->data).

void *kmap_skb_frag(skb_frag_t *frag);

void kunmap_skb_frag(void *vaddr);

If you must directly access fragments in a nonlinear skb from within the kernel, these functions map and unmap them for you. An atomic kmap is used, so you cannot have more than one fragment mapped at a time.

The kernel defines several other functions that act on socket buffers, but they are meant to be used in higher layers of networking code, and the driver doesn't need them.

    ⇦ prev ⇱ home next ⇨
    Poster of Linux kernelThe best gift for a Linux geek