17.10. The Socket Buffers
We've now covered
most of the issues related to network interfaces.
What's still missing is some more detailed
discussion of the sk_buff structure. The structure
is at the core of the network subsystem of the Linux kernel, and we
now introduce both the main fields of the structure and the functions
used to act on it.
Although there is no strict need to understand the internals of
sk_buff, the ability to look at its contents can
be helpful when you are tracking down problems and when you are
trying to optimize your code. For example, if you look in
loopback.c, you'll find an
optimization based on knowledge of the sk_buff
internals. The usual warning applies here: if you write code that
takes advantage of knowledge of the sk_buff
structure, you should be prepared to see it break with future kernel
releases. Still, sometimes the performance advantages justify the
additional maintenance cost.
We are not going to describe the whole structure here, just the
fields that might be used from within a driver. If you want to see
more, you can look at <linux/skbuff.h>,
where the structure is defined and the functions are prototyped.
Additional details about how the fields and functions are used can be
easily retrieved by grepping in the kernel sources.
17.10.1. The Important Fields
The
fields introduced here are the ones a driver might need to access.
They are listed in no particular order.
- struct net_device *dev;
-
The device receiving or sending this buffer.
- union { /* ... */ } h;
- union { /* ... */ } nh;
- union { /*... */} mac;
-
Pointers to the various levels of headers contained within the
packet. Each field of the union is a pointer to a different type of
data structure. h hosts pointers to transport
layer headers (for example, struct
tcphdr *th);
nh includes network layer headers (such as
struct iphdr
*iph); and mac collects
pointers to link-layer headers (such as struct
ethdr *ethernet).
If your driver needs to look at the source and destination addresses
of a TCP packet, it can find them in skb->h.th.
See the header file for the full set of header types that can be
accessed in this way.
Note that network drivers are responsible for setting the
mac pointer for incoming packets. This task is
normally handled by eth_type_trans, but
non-Ethernet drivers have to set skb->mac.raw
directly, as shown in Section 17.11.3.
- unsigned char *head;
- unsigned char *data;
- unsigned char *tail;
- unsigned char *end;
-
Pointers used to address the data in the packet.
head points to the beginning of the allocated
space, data is the beginning of the valid octets
(and is usually slightly greater than head),
tail is the end of the valid octets, and
end points to the maximum address
tail can reach. Another way to look at it is that
the available buffer space is
skb->end -
skb->head, and the currently
used data space is skb->tail
- skb->data.
- unsigned int len;
- unsigned int data_len;
-
len is the full length of the data in the packet,
while data_len is the length of the portion of the
packet stored in separate fragments. The data_len
field is 0 unless scatter/gather I/O is being
used.
- unsigned char ip_summed;
-
The checksum policy for this packet.
The field is set by the driver on incoming packets, as described in
the Section 17.6.
- unsigned char pkt_type;
-
Packet classification used in its delivery.
The driver is responsible for setting it to
PACKET_HOST (this packet is for me),
PACKET_OTHERHOST (no, this packet is not for me),
PACKET_BROADCAST, or
PACKET_MULTICAST. Ethernet drivers
don't modify pkt_type explicitly
because eth_type_trans does it for them.
- shinfo(struct sk_buff *skb);
- unsigned int shinfo(skb)->nr_frags;
- skb_frag_t shinfo(skb)->frags;
-
For performance reasons, some skb information is stored in a separate
structure that appears immediately after the skb in memory. This
"shared info" (so called because it
can be shared among copies of the skb within the networking code)
must be accessed via the shinfo macro. There are
several fields in this structure, but most of them are beyond the
scope of this book. We saw nr_frags and
frags in Section 17.5.3.
The remaining fields in the structure are not particularly
interesting. They are used to maintain lists of buffers, to account
for memory belonging to the socket that owns the buffer, and so on.
17.10.2. Functions Acting on Socket Buffers
Network devices that use an
sk_buff structure act on it by means of the
official interface functions. Many functions operate on socket
buffers; here are the most interesting ones:
- struct sk_buff *alloc_skb(unsigned int len, int priority);
- struct sk_buff *dev_alloc_skb(unsigned int len);
-
Allocate a buffer. The
alloc_skb function allocates a buffer and
initializes both skb->data and
skb->tail to skb->head.
The dev_alloc_skb function is a shortcut that
calls alloc_skb with
GFP_ATOMIC priority and reserves some space
between skb->head and
skb->data. This data space is used for
optimizations within the network layer and should not be touched by
the driver.
- void kfree_skb(struct sk_buff *skb);
- void dev_kfree_skb(struct sk_buff *skb);
- void dev_kfree_skb_irq(struct sk_buff *skb);
- void dev_kfree_skb_any(struct sk_buff *skb);
-
Free
a buffer. The kfree_skb call is used internally
by the kernel. A driver should use one of the forms of
dev_kfree_skb instead:
dev_kfree_skb for noninterrupt context,
dev_kfree_skb_irq for interrupt context, or
dev_kfree_skb_any for code that can run in
either context.
- unsigned char *skb_put(struct sk_buff *skb, int len);
- unsigned char *_ _skb_put(struct sk_buff *skb, int len);
-
Update
the tail and len fields of the
sk_buff structure; they are used to add data to
the end of the buffer. Each function's return value
is the previous value of skb->tail (in other
words, it points to the data space just created). Drivers can use the
return value to copy data by invoking
memcpy(skb_put(...), data,
len) or an equivalent. The difference between the
two functions is that skb_put checks to be sure
that the data fits in the buffer, whereas _
_skb_put omits the check.
- unsigned char *skb_push(struct sk_buff *skb, int len);
- unsigned char *_ _skb_push(struct sk_buff *skb, int len);
-
Functions to
decrement skb->data and increment
skb->len. They are similar to
skb_put, except that data is added to the
beginning of the packet instead of the end. The return value points
to the data space just created. The functions are used to add a
hardware header before transmitting a packet. Once again, _
_skb_push differs in that it does not check for adequate
available space.
- int skb_tailroom(struct sk_buff *skb);
-
Returns
the amount of space available for putting data in the buffer. If a
driver puts more data into the buffer than it can hold, the system
panics. Although you might object that a printk
would be sufficient to tag the error, memory corruption is so harmful
to the system that the developers decided to take definitive action.
In practice, you shouldn't need to check the
available space if the buffer has been correctly allocated. Since
drivers usually get the packet size before allocating a buffer, only
a severely broken driver puts too much data in the buffer, and a
panic might be seen as due punishment.
- int skb_headroom(struct sk_buff *skb);
-
Returns
the amount of space available in front of data,
that is, how many octets one can
"push" to the buffer.
- void skb_reserve(struct sk_buff *skb, int len);
-
Increments
both data and tail. The
function can be used to reserve headroom before filling the buffer.
Most Ethernet interfaces reserve two bytes in front of the packet;
thus, the IP header is aligned on a 16-byte boundary, after a 14-byte
Ethernet header. snull does this as well,
although the instruction was not shown in Section 17.6 to avoid introducing extra concepts at
that point.
- unsigned char *skb_pull(struct sk_buff *skb, int len);
-
Removes
data from the head of the packet. The driver won't
need to use this function, but it is included here for completeness.
It decrements skb->len and increments
skb->data; this is how the hardware header
(Ethernet or equivalent) is stripped from the beginning of incoming
packets.
- int skb_is_nonlinear(struct sk_buff *skb);
-
Returns a true value if this skb is separated into multiple fragments
for scatter/gather I/O.
- int skb_headlen(struct sk_buff *skb);
-
Returns the length of the first segment of the skb (that part pointed
to by skb->data).
- void *kmap_skb_frag(skb_frag_t *frag);
- void kunmap_skb_frag(void *vaddr);
-
If you must directly access fragments in a nonlinear skb from within
the kernel, these functions map and unmap them for you. An atomic
kmap is used, so you cannot have more than one fragment mapped at a
time.
The kernel defines several other functions that act on socket
buffers, but they are meant to be used in higher layers of networking code,
and the driver doesn't need them.
|