Poster of Linux kernelThe best gift for a Linux geek
 Linux kernel map 
⇦ prev ⇱ home next ⇨

8.4. vmalloc and Friends

The next memory allocation function that we show you is vmalloc, which allocates a contiguous memory region in the virtual address space. Although the pages are not consecutive in physical memory (each page is retrieved with a separate call to alloc_page), the kernel sees them as a contiguous range of addresses. vmalloc returns 0 (the NULL address) if an error occurs, otherwise, it returns a pointer to a linear memory area of size at least size.

We describe vmalloc here because it is one of the fundamental Linux memory allocation mechanisms. We should note, however, that use of vmalloc is discouraged in most situations. Memory obtained from vmalloc is slightly less efficient to work with, and, on some architectures, the amount of address space set aside for vmalloc is relatively small. Code that uses vmalloc is likely to get a chilly reception if submitted for inclusion in the kernel. If possible, you should work directly with individual pages rather than trying to smooth things over with vmalloc.

That said, let's see how vmalloc works. The prototypes of the function and its relatives (ioremap, which is not strictly an allocation function, is discussed later in this section) are as follows:

#include <linux/vmalloc.h>

void *vmalloc(unsigned long size);
void vfree(void * addr);
void *ioremap(unsigned long offset, unsigned long size);
void iounmap(void * addr);

It's worth stressing that memory addresses returned by kmalloc and _ get_free_pages are also virtual addresses. Their actual value is still massaged by the MMU (the memory management unit, usually part of the CPU) before it is used to address physical memory.[4] vmalloc is not different in how it uses the hardware, but rather in how the kernel performs the allocation task.

[4] Actually, some architectures define ranges of "virtual" addresses as reserved to address physical memory. When this happens, the Linux kernel takes advantage of the feature, and both the kernel and _ _get_free_pages addresses lie in one of those memory ranges. The difference is transparent to device drivers and other code that is not directly involved with the memory-management kernel subsystem.

The (virtual) address range used by kmalloc and _ _get_free_pages features a one-to-one mapping to physical memory, possibly shifted by a constant PAGE_OFFSET value; the functions don't need to modify the page tables for that address range. The address range used by vmalloc and ioremap, on the other hand, is completely synthetic, and each allocation builds the (virtual) memory area by suitably setting up the page tables.

This difference can be perceived by comparing the pointers returned by the allocation functions. On some platforms (for example, the x86), addresses returned by vmalloc are just beyond the addresses that kmalloc uses. On other platforms (for example, MIPS, IA-64, and x86_64), they belong to a completely different address range. Addresses available for vmalloc are in the range from VMALLOC_START to VMALLOC_END. Both symbols are defined in <asm/pgtable.h>.

Addresses allocated by vmalloc can't be used outside of the microprocessor, because they make sense only on top of the processor's MMU. When a driver needs a real physical address (such as a DMA address, used by peripheral hardware to drive the system's bus), you can't easily use vmalloc. The right time to call vmalloc is when you are allocating memory for a large sequential buffer that exists only in software. It's important to note that vmalloc has more overhead than _ _get_free_pages, because it must both retrieve the memory and build the page tables. Therefore, it doesn't make sense to call vmalloc to allocate just one page.

An example of a function in the kernel that uses vmalloc is the create_module system call, which uses vmalloc to get space for the module being created. Code and data of the module are later copied to the allocated space using copy_from_user. In this way, the module appears to be loaded into contiguous memory. You can verify, by looking in /proc/kallsyms, that kernel symbols exported by modules lie in a different memory range from symbols exported by the kernel proper.

Memory allocated with vmalloc is released by vfree, in the same way that kfree releases memory allocated by kmalloc.

Like vmalloc, ioremap builds new page tables; unlike vmalloc, however, it doesn't actually allocate any memory. The return value of ioremap is a special virtual address that can be used to access the specified physical address range; the virtual address obtained is eventually released by calling iounmap.

ioremap is most useful for mapping the (physical) address of a PCI buffer to (virtual) kernel space. For example, it can be used to access the frame buffer of a PCI video device; such buffers are usually mapped at high physical addresses, outside of the address range for which the kernel builds page tables at boot time. PCI issues are explained in more detail in Chapter 12.

It's worth noting that for the sake of portability, you should not directly access addresses returned by ioremap as if they were pointers to memory. Rather, you should always use readb and the other I/O functions introduced in Chapter 9. This requirement applies because some platforms, such as the Alpha, are unable to directly map PCI memory regions to the processor address space because of differences between PCI specs and Alpha processors in how data is transferred.

Both ioremap and vmalloc are page oriented (they work by modifying the page tables); consequently, the relocated or allocated size is rounded up to the nearest page boundary. ioremap simulates an unaligned mapping by "rounding down" the address to be remapped and by returning an offset into the first remapped page.

One minor drawback of vmalloc is that it can't be used in atomic context because, internally, it uses kmalloc(GFP_KERNEL) to acquire storage for the page tables, and therefore could sleep. This shouldn't be a problem—if the use of _ _get_free_page isn't good enough for an interrupt handler, the software design needs some cleaning up.

8.4.1. A scull Using Virtual Addresses: scullv

Sample code using vmalloc is provided in the scullv module. Like scullp, this module is a stripped-down version of scull that uses a different allocation function to obtain space for the device to store data.

The module allocates memory 16 pages at a time. The allocation is done in large chunks to achieve better performance than scullp and to show something that takes too long with other allocation techniques to be feasible. Allocating more than one page with _ _get_free_pages is failure prone, and even when it succeeds, it can be slow. As we saw earlier, vmalloc is faster than other functions in allocating several pages, but somewhat slower when retrieving a single page, because of the overhead of page-table building. scullv is designed like scullp. order specifies the "order" of each allocation and defaults to 4. The only difference between scullv and scullp is in allocation management. These lines use vmalloc to obtain new memory:

/* Allocate a quantum using virtual addresses */
if (!dptr->data[s_pos]) {
    dptr->data[s_pos] =
        (void *)vmalloc(PAGE_SIZE << dptr->order);
    if (!dptr->data[s_pos])
        goto nomem;
    memset(dptr->data[s_pos], 0, PAGE_SIZE << dptr->order);
}

and these lines release memory:

/* Release the quantum-set */
for (i = 0; i < qset; i++)
    if (dptr->data[i])
        vfree(dptr->data[i]);

If you compile both modules with debugging enabled, you can look at their data allocation by reading the files they create in /proc. This snapshot was taken on an x86_64 system:

salma% cat /tmp/bigfile > /dev/scullp0; head -5 /proc/scullpmem
Device 0: qset 500, order 0, sz 1535135
  item at 000001001847da58, qset at 000001001db4c000
       0:1001db56000
       1:1003d1c7000
   
salma% cat /tmp/bigfile > /dev/scullv0; head -5 /proc/scullvmem

Device 0: qset 500, order 4, sz 1535135
  item at 000001001847da58, qset at 0000010013dea000
       0:ffffff0001177000
       1:ffffff0001188000

The following output, instead, came from an x86 system:

rudo% cat /tmp/bigfile > /dev/scullp0; head -5 /proc/scullpmem

Device 0: qset 500, order 0, sz 1535135
  item at ccf80e00, qset at cf7b9800
       0:ccc58000
       1:cccdd000

rudo%  cat /tmp/bigfile > /dev/scullv0; head -5 /proc/scullvmem

Device 0: qset 500, order 4, sz 1535135
  item at cfab4800, qset at cf8e4000
       0:d087a000
       1:d08d2000

The values show two different behaviors. On x86_64, physical addresses and virtual addresses are mapped to completely different address ranges (0x100 and 0xffffff00), while on x86 computers, vmalloc returns virtual addresses just above the mapping used for physical memory.

    ⇦ prev ⇱ home next ⇨
    Poster of Linux kernelThe best gift for a Linux geek