Poster of Linux kernelThe best gift for a Linux geek
 Linux kernel map 
⇦ prev ⇱ home next ⇨

17.8. Receive Interrupt Mitigation

When a network driver is written as we have described above, the processor is interrupted for every packet received by your interface. In many cases, that is the desired mode of operation, and it is not a problem. High-bandwidth interfaces, however, can receive thousands of packets per second. With that sort of interrupt load, the overall performance of the system can suffer.

As a way of improving the performance of Linux on high-end systems, the networking subsystem developers have created an alternative interface (called NAPI)[1] based on polling. "Polling" can be a dirty word among driver developers, who often see polling techniques as inelegant and inefficient. Polling is inefficient, however, only if the interface is polled when there is no work to do. When the system has a high-speed interface handling heavy traffic, there is always more packets to process. There is no need to interrupt the processor in such situations; it is enough that the new packets be collected from the interface every so often.

[1] NAPI stands for "new API"; the networking hackers are better at creating interfaces than naming them.

Stopping receive interrupts can take a substantial amount of load off the processor. NAPI-compliant drivers can also be told not to feed packets into the kernel if those packets are just dropped in the networking code due to congestion, which can also help performance when that help is needed most. For various reasons, NAPI drivers are also less likely to reorder packets.

Not all devices can operate in the NAPI mode, however. A NAPI-capable interface must be able to store several packets (either on the card itself, or in an in-memory DMA ring). The interface should be capable of disabling interrupts for received packets, while continuing to interrupt for successful transmissions and other events. There are other subtle issues that can make writing a NAPI-compliant driver harder; see Documentation/networking/NAPI_HOWTO.txt in the kernel source tree for the details.

Relatively few drivers implement the NAPI interface. If you are writing a driver for an interface that may generate a huge number of interrupts, however, taking the time to implement NAPI may well prove worthwhile.

The snull driver, when loaded with the use_napi parameter set to a nonzero value, operates in the NAPI mode. At initialization time, we have to set up a couple of extra struct net_device fields:

if (use_napi) {
    dev->poll        = snull_poll;
    dev->weight      = 2;
}

The poll field must be set to your driver's polling function; we look at snull_poll shortly. The weight field describes the relative importance of the interface: how much traffic should be accepted from the interface when resources are tight. There are no strict rules for how the weight parameter should be set; by convention, 10 MBps Ethernet interfaces set weight to 16, while faster interfaces use 64. You should not set weight to a value greater than the number of packets your interface can store. In snull, we set the weight to two as a way of demonstrating deferred packet reception.

The next step in the creation of a NAPI-compliant driver is to change the interrupt handler. When your interface (which should start with receive interrupts enabled) signals that a packet has arrived, the interrupt handler should not process that packet. Instead, it should disable further receive interrupts and tell the kernel that it is time to start polling the interface. In the snull "interrupt" handler, the code that responds to packet reception interrupts has been changed to the following:

if (statusword & SNULL_RX_INTR) {
    snull_rx_ints(dev, 0);  /* Disable further interrupts */
    netif_rx_schedule(dev);
}

When the interface tells us that a packet is available, the interrupt handler leaves it in the interface; all that needs to happen at this point is a call to netif_rx_schedule, which causes our poll method to be called at some future point.

The poll method has this prototype:

int (*poll)(struct net_device *dev, int *budget);

The snull implementation of the poll method looks like this:

static int snull_poll(struct net_device *dev, int *budget)
{
    int npackets = 0, quota = min(dev->quota, *budget);
    struct sk_buff *skb;
    struct snull_priv *priv = netdev_priv(dev);
    struct snull_packet *pkt;
    
    while (npackets < quota && priv->rx_queue) {
        pkt = snull_dequeue_buf(dev);
        skb = dev_alloc_skb(pkt->datalen + 2);
        if (! skb) {
            if (printk_ratelimit(  ))
                printk(KERN_NOTICE "snull: packet dropped\n");
            priv->stats.rx_dropped++;
            snull_release_buffer(pkt);
            continue;
        }
        memcpy(skb_put(skb, pkt->datalen), pkt->data, pkt->datalen);
        skb->dev = dev;
        skb->protocol = eth_type_trans(skb, dev);
        skb->ip_summed = CHECKSUM_UNNECESSARY; /* don't check it */
        netif_receive_skb(skb);
        
            /* Maintain stats */
        npackets++;
        priv->stats.rx_packets++;
        priv->stats.rx_bytes += pkt->datalen;
        snull_release_buffer(pkt);
    }
    /* If we processed all packets, we're done; tell the kernel and reenable ints */
    *budget -= npackets;
    dev->quota -= npackets;
    if (! priv->rx_queue) {
        netif_rx_complete(dev);
        snull_rx_ints(dev, 1);
        return 0;
    }
    /* We couldn't process everything. */
    return 1;
}

The central part of the function is concerned with the creation of an skb holding the packet; this code is the same as what we saw in snull_rx before. A number of things are different, however:

  • The budget parameter provides a maximum number of packets that we are allowed to pass into the kernel. Within the device structure, the quota field gives another maximum; the poll method must respect the lower of the two limits. It should also decrement both dev->quota and *budget by the number of packets actually received. The budget value is a maximum number of packets that the current CPU can receive from all interfaces, while quota is a per-interface value that usually starts out as the weight assigned to the interface at initialization time.

  • Packets should be fed to the kernel with netif_receive_skb, rather than netif_rx.

  • If the poll method is able to process all of the available packets within the limits given to it, it should re-enable receive interrupts, call netif_rx_complete to turn off polling, and return 0. A return value of 1 indicates that there are packets remaining to be processed.

The networking subsystem guarantees that any given device's poll method will not be called concurrently on more than one processor. Calls to poll can still happen concurrently with calls to your other device methods, however.

    ⇦ prev ⇱ home next ⇨
    Poster of Linux kernelThe best gift for a Linux geek