17.3. The net_device Structure in Detail
The net_device structure is at the very
core of the network driver layer and deserves a complete description.
This list describes all the fields, but more to provide a reference
than to be memorized. The rest of this chapter briefly describes each
field as soon as it is used in the sample code, so you
don't need to keep referring back to this section.
17.3.1. Global Information
The first part of struct net_device is composed of
the following fields:
- char name[IFNAMSIZ];
The name of the device. If the name set by the driver contains a
%d format string,
register_netdev replaces it with a number to
make a unique name; assigned numbers start at 0.
- unsigned long state;
Device state. The field includes several
flags. Drivers do not normally manipulate these flags directly;
instead, a set of utility functions has been provided. These
functions are discussed shortly when we get into driver operations.
- struct net_device *next;
Pointer to the next device in the global
linked list. This field shouldn't be touched by the
- int (*init)(struct net_device *dev);
An initialization function. If this pointer
is set, the function is called by
register_netdev to complete the initialization
of the net_device structure. Most modern network
drivers do not use this function any longer; instead, initialization
is performed before registering the interface.
17.3.2. Hardware Information
fields contain low-level hardware
information for relatively simple devices. They are a holdover from
the earlier days of Linux networking; most modern drivers do make use
of them (with the possible exception of if_port).
We list them here for completeness.
- unsigned long rmem_end;
- unsigned long rmem_start;
- unsigned long mem_end;
- unsigned long mem_start;
memory information. These fields hold the beginning and ending
addresses of the shared memory used by the device. If the device has
different receive and transmit memories, the mem
fields are used for transmit memory and the rmem
fields for receive memory. The rmem fields are
never referenced outside of the driver itself. By convention, the
end fields are set so that end
- start is the amount of
available onboard memory.
- unsigned long base_addr;
The I/O base address of the network interface. This field, like the
previous ones, is assigned by the driver during the device probe. The
ifconfig command can be used to display or
modify the current value. The base_addr can be
explicitly assigned on the kernel command line at system boot (via
the neTDev= parameter) or at module load time. The
field, like the memory fields described above, is not used by the
- unsigned char irq;
The assigned interrupt number. The value of
dev->irq is printed by
ifconfig when interfaces are listed. This value
can usually be set at boot or load time and modified later using
- unsigned char if_port;
The port in use on multiport devices. This field is used, for
example, with devices that support both coaxial
(IF_PORT_10BASE2) and twisted-pair
(IF_PORT_100BASET) Ethernet connections. The full
set of known port types is defined in
- unsigned char dma;
The DMA channel allocated by the device. The field makes sense only
with some peripheral buses, such as ISA. It is not used outside of
the device driver itself but for informational purposes (in
17.3.3. Interface Information
Most of the information about the interface is correctly set up by
function (or whatever other setup
function is appropriate for the given hardware type). Ethernet cards
can rely on this general-purpose function for most of these fields,
but the flags and dev_addr
fields are device specific and must be explicitly assigned at
non-Ethernet interfaces can use
helper functions similar to ether_setup.
drivers/net/net_init.c exports a number of such
functions, including the following:
- void ltalk_setup(struct net_device *dev);
Sets up the
fields for a LocalTalk device
- void fc_setup(struct net_device *dev);
for fiber-channel devices
- void fddi_setup(struct net_device *dev);
interface for a Fiber Distributed Data Interface (FDDI) network
- void hippi_setup(struct net_device *dev);
for a High-Performance Parallel Interface (HIPPI) high-speed
- void tr_setup(struct net_device *dev);
setup for token ring network interfaces
Most devices are covered by one of these classes. If yours is
something radically new and different, however, you need to assign
the following fields by hand:
- unsigned short hard_header_len;
The hardware header length, that is, the number of octets that lead
the transmitted packet before the IP header, or other protocol
information. The value of hard_header_len is
14 (ETH_HLEN) for Ethernet
- unsigned mtu;
The maximum transfer unit (MTU). This field is used by the network
layer to drive packet transmission. Ethernet has an MTU of 1500
octets (ETH_DATA_LEN). This value can be changed
- unsigned long tx_queue_len;
The maximum number of frames that can be queued on the
device's transmission queue. This value is set to
1000 by ether_setup, but you can change it. For
example, plip uses 10 to avoid wasting system
memory (plip has a lower throughput than a real
- unsigned short type;
The hardware type of the interface. The type field
is used by ARP to determine what kind of hardware address the
interface supports. The proper value for Ethernet interfaces is
ARPHRD_ETHER, and that is the value set by
ether_setup. The recognized types are defined in
- unsigned char addr_len;
- unsigned char broadcast[MAX_ADDR_LEN];
- unsigned char dev_addr[MAX_ADDR_LEN];
address length and device hardware addresses. The Ethernet address
length is six octets (we are referring to the hardware ID of the
interface board), and the broadcast address is made up of six
0xff octets; ether_setup
arranges for these values to be correct. The device address, on the
other hand, must be read from the interface board in a
device-specific way, and the driver should copy it to
dev_addr. The hardware address is used to generate
correct Ethernet headers before the packet is handed over to the
driver for transmission. The snull device
doesn't use a physical interface, and it invents its
own hardware address.
- unsigned short flags;
- int features;
Interface flags (detailed next).
The flags field is a bit
mask including the following bit values. The IFF_
prefix stands for "interface
flags." Some flags are managed by the kernel, and
some are set by the interface at initialization time to assert
various capabilities and other features of the interface. The valid
flags, which are defined in <linux/if.h>,
This flag is read-only for the driver. The kernel turns it on when
the interface is active and ready to transfer packets.
This flag (maintained by the networking code) states that the
interface allows broadcasting. Ethernet boards do.
This marks debug mode. The flag can be used to control the verbosity
of your printk calls or for other debugging
purposes. Although no in-tree driver currently uses this flag, it can
be set and reset by user programs via ioctl, and
your driver can use it. The
misc-progs/netifdebug program can be used to
turn the flag on and off.
This flag should be set only in the loopback interface. The kernel
checks for IFF_LOOPBACK instead of hardwiring the
lo name as a special interface.
This flag signals that the interface is connected to a point-to-point
link. It is set by the driver or, sometimes, by
ifconfig. For example, plip
and the PPP driver have it set.
This means that the interface
can't perform ARP. For example, point-to-point
interfaces don't need to run ARP, which would only
impose additional traffic without retrieving useful information.
snull runs without ARP capabilities, so it sets
This flag is set (by the networking code) to activate promiscuous
operation. By default, Ethernet interfaces use a hardware filter to
ensure that they receive broadcast packets and packets directed to
that interface's hardware address only. Packet
sniffers such as tcpdump set promiscuous mode on
the interface in order to retrieve all packets that travel on the
interface's transmission medium.
This flag is set by drivers to mark
interfaces that are capable of multicast transmission.
IFF_MULTICAST by default, so if your driver does
not support multicast, it must clear the flag at initialization time.
This flag tells the interface to receive all multicast packets. The
kernel sets it when the host performs multicast routing, only if
IFF_MULTICAST is set.
IFF_ALLMULTI is read-only for the driver.
Multicast flags are used in Section 17.14
later in this chapter.
These flags are used by the load equalization code. The interface
driver doesn't need to know about them.
These flags signal that the device is capable of switching between
multiple media types; for example, unshielded twisted
pair (UTP) versus coaxial Ethernet cables. If
IFF_AUTOMEDIA is set, the device selects the
proper medium automatically. In practice, the kernel makes no use of
This flag, set by the driver, indicates that the address of this
interface can change. It is not currently used by the kernel.
This flag indicates that the interface is up and running. It is
mostly present for BSD compatibility; the kernel makes little use of
it. Most network drivers need not worry about
This flag is unused in Linux, but it exists for BSD compatibility.
When a program changes IFF_UP, the
open or stop device method
is called. Furthermore, when IFF_UP or any other
flag is modified, the
method is invoked. If the driver needs to perform some action in
response to a modification of the flags, it must take that action in
set_multicast_list. For example, when
IFF_PROMISC is set or reset,
set_multicast_list must notify the onboard
hardware filter. The responsibilities of this device method are
outlined in Section 17.14.
The features field of the
net_device structure is set by the driver to tell
the kernel about any special hardware capabilities that this
interface has. We will discuss some of these features; others are
beyond the scope of this book. The full set is:
Both of these flags control the use of scatter/gather I/O. If your
interface can transmit a packet that has been split into several
distinct memory segments, you should set
NETIF_F_SG. Of course, you have to actually
implement the scatter/gather I/O (we describe how that is done in the
Section 17.5.3). NETIF_F_FRAGLIST states
that your interface can cope with packets that have been fragmented;
only the loopback driver does this in 2.6.
Note that the kernel does not perform scatter/gather I/O to your
device if it does not also provide some form of checksumming as well.
The reason is that, if the kernel has to make a pass over a
fragmented ("nonlinear") packet to
calculate the checksum, it might as well copy the data and coalesce
the packet at the same time.
These flags are all ways of telling the kernel that it need not apply
checksums to some or all packets leaving the system by this
interface. Set NETIF_F_IP_CSUM if your interface
can checksum IP packets but not others. If no checksums are ever
required for this interface, set NETIF_F_NO_CSUM.
The loopback driver sets this flag, and snull
does, too; since packets are only transferred through system memory,
there is (one hopes!) no opportunity for them to be corrupted, and no
need to check them. If your hardware does checksumming itself, set
Set this flag if your device can perform DMA to high memory. In the
absence of this flag, all packet buffers provided to your driver are
allocated in low memory.
These options describe your hardware's support for
802.1q VLAN packets. VLAN support is beyond what we can cover in this
chapter. If VLAN packets confuse your device (which they really
shouldn't), set the
Set this flag if your device can perform TCP segmentation offloading.
TSO is an advanced feature that we cannot cover here.
17.3.4. The Device Methods
As happens with the
and block drivers, each network device declares the functions that
act on it. Operations that can be performed on network interfaces are
listed in this section. Some of the operations can be left
NULL, and others are usually untouched because
ether_setup assigns suitable methods to them.
Device methods for a network interface can be divided into two
groups: fundamental and optional. Fundamental methods include those
that are needed to be able to use the interface; optional methods
implement more advanced functionalities that are not strictly
required. The following are the fundamental methods:
- int (*open)(struct net_device *dev);
Opens the interface. The interface is
opened whenever ifconfig activates it. The
open method should register any system resource
it needs (I/O ports, IRQ, DMA, etc.), turn on the hardware, and
perform any other setup your device requires.
- int (*stop)(struct net_device *dev);
the interface. The interface is stopped when it is brought down. This
function should reverse operations performed at open time.
- int (*hard_start_xmit) (struct sk_buff *skb, struct net_device *dev);
that initiates the transmission of a packet. The full packet
(protocol headers and all) is contained in a socket buffer
(sk_buff) structure. Socket buffers are introduced
later in this chapter.
- int (*hard_header) (struct sk_buff *skb, struct net_device *dev, unsigned
- short type, void *daddr, void *saddr, unsigned len);
(called before hard_start_xmit) that builds the
hardware header from the source and destination hardware addresses
that were previously retrieved; its job is to organize the
information passed to it as arguments into an appropriate,
device-specific hardware header. eth_header is
the default function for Ethernet-like interfaces, and
ether_setup assigns this field accordingly.
- int (*rebuild_header)(struct sk_buff *skb);
used to rebuild the hardware header after ARP resolution completes
but before a packet is transmitted. The default function used by
Ethernet devices uses the ARP support code to fill the packet with
- void (*tx_timeout)(struct net_device *dev);
Method called by the networking code when
a packet transmission fails to complete within a reasonable period,
on the assumption that an interrupt has been missed or the interface
has locked up. It should handle the problem and resume packet
- struct net_device_stats *(*get_stats)(struct net_device *dev);
Whenever an application needs to
get statistics for the interface, this method is called. This
happens, for example, when ifconfig or
netstat -i is run. A sample implementation for
snull is introduced in Section 17.13.
- int (*set_config)(struct net_device *dev, struct ifmap *map);
Changes the interface configuration.
This method is the entry point for configuring the driver. The I/O
address for the device and its interrupt number can be changed at
runtime using set_config. This capability can be
used by the system administrator if the interface cannot be probed
for. Drivers for modern hardware normally do not need to implement
operations are optional:
- int weight;
- int (*poll)(struct net_device *dev; int *quota);
by NAPI-compliant drivers to operate the interface in a polled mode,
with interrupts disabled. NAPI (and the weight
field) are covered in Section 17.8.
- void (*poll_controller)(struct net_device *dev);
Function that asks the driver to check for events on the interface in
situations where interrupts are disabled. It is used for specific
in-kernel networking tasks, such as remote consoles and kernel
debugging over the network.
- int (*do_ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd);
ioctl commands. (Implementation of those
commands is described in Section 17.12.) The corresponding field in
struct net_device can be left
as NULL if the interface doesn't
need any interface-specific commands.
- void (*set_multicast_list)(struct net_device *dev);
called when the multicast list for the device changes and when the
flags change. See the Section 17.14
for further details and a sample implementation.
- int (*set_mac_address)(struct net_device *dev, void *addr);
Function that can be implemented if the
interface supports the ability to change its hardware address. Many
interfaces don't support this ability at all. Others
use the default eth_mac_addr implementation
eth_mac_addr only copies the new address into
dev->dev_addr, and it does so only if the
interface is not running. Drivers that use
eth_mac_addr should set the hardware MAC address
from dev->dev_addr in their
- int (*change_mtu)(struct net_device *dev, int new_mtu);
Function that takes action
if there is a change in the maximum transfer unit (MTU) for the
interface. If the driver needs to do anything particular when the MTU
is changed by the user, it should declare its own function;
otherwise, the default does the right thing.
snull has a template for the function if you are
- int (*header_cache) (struct neighbour *neigh, struct hh_cache *hh);
is called to fill in the hh_cache structure with
the results of an ARP query. Almost all Ethernet-like drivers can use
the default eth_header_cache implementation.
- int (*header_cache_update) (struct hh_cache *hh, struct net_device *dev,
- unsigned char *haddr);
that updates the destination address in the
hh_cache structure in response to a change.
Ethernet devices use eth_header_cache_update.
- int (*hard_header_parse) (struct sk_buff *skb, unsigned char *haddr);
method extracts the source address from the packet contained in
skb, copying it into the buffer at
haddr. The return value from the function is the
of that address. Ethernet devices normally use
17.3.5. Utility Fields
The remaining struct
net_device data fields are used by the interface to hold
useful status information. Some of the fields are used by
ifconfig and netstat to
provide the user with information about the current configuration.
Therefore, an interface should assign values to these fields:
- unsigned long trans_start;
- unsigned long last_rx;
that hold a jiffies value. The driver is responsible for updating
these values when transmission begins and when a packet is received,
respectively. The trans_start value is used by the
networking subsystem to detect transmitter lockups.
last_rx is currently unused, but the driver should
maintain this field anyway to be prepared for future use.
- int watchdog_timeo;
The minimum time (in
jiffies) that should pass before the networking layer decides that a
transmission timeout has occurred and calls the
driver's tx_timeout function.
- void *priv;
The equivalent of filp->private_data. In modern
drivers, this field is set by alloc_netdev and
should not be accessed directly; use netdev_priv
- struct dev_mc_list *mc_list;
- int mc_count;
Fields that handle multicast transmission.
mc_count is the count of items in
mc_list. See the Section 17.14
for further details.
- spinlock_t xmit_lock;
- int xmit_lock_owner;
xmit_lock is used to avoid multiple simultaneous
calls to the driver's
xmit_lock_owner is the number of the CPU that has
obtained xmit_lock. The driver should make no
changes to these fields.
There are other fields in struct net_device,
but they are not used by network drivers.