Poster of Linux kernelThe best gift for a Linux geek
 Linux kernel map 
⇦ prev ⇱ home next ⇨

11.4. Other Portability Issues

In addition to data typing, there are a few other software issues to keep in mind when writing a driver if you want it to be portable across Linux platforms.

A general rule is to be suspicious of explicit constant values. Usually the code has been parameterized using preprocessor macros. This section lists the most important portability problems. Whenever you encounter other values that have been parameterized, you can find hints in the header files and in the device drivers distributed with the official kernel.

11.4.1. Time Intervals

When dealing with time intervals, don't assume that there are 1000 jiffies per second. Although this is currently true for the i386 architecture, not every Linux platform runs at this speed. The assumption can be false even for the x86 if you play with the HZ value (as some people do), and nobody knows what will happen in future kernels. Whenever you calculate time intervals using jiffies, scale your times using HZ (the number of timer interrupts per second). For example, to check against a timeout of half a second, compare the elapsed time against HZ/2. More generally, the number of jiffies corresponding to msec milliseconds is always msec*HZ/1000.

11.4.2. Page Size

When playing games with memory, remember that a memory page is PAGE_SIZE bytes, not 4 KB. Assuming that the page size is 4 KB and hardcoding the value is a common error among PC programmers, instead, supported platforms show page sizes from 4 KB to 64 KB, and sometimes they differ between different implementations of the same platform. The relevant macros are PAGE_SIZE and PAGE_SHIFT. The latter contains the number of bits to shift an address to get its page number. The number currently is 12 or greater for pages that are 4 KB and larger. The macros are defined in <asm/page.h>; user-space programs can use the getpagesize library function if they ever need the information.

Let's look at a nontrivial situation. If a driver needs 16 KB for temporary data, it shouldn't specify an order of 2 to get_free_pages. You need a portable solution. Such a solution, fortunately, has been written by the kernel developers and is called get_order:

#include <asm/page.h>
int order = get_order(16*1024);
buf = get_free_pages(GFP_KERNEL, order);

Remember that the argument to get_order must be a power of two.

11.4.3. Byte Order

Be careful not to make assumptions about byte ordering. Whereas the PC stores multibyte values low-byte first (little end first, thus little-endian), some high-level platforms work the other way (big-endian). Whenever possible, your code should be written such that it does not care about byte ordering in the data it manipulates. However, sometimes a driver needs to build an integer number out of single bytes or do the opposite, or it must communicate with a device that expects a specific order.

The include file <asm/byteorder.h> defines either _ _BIG_ENDIAN or _ _LITTLE_ENDIAN, depending on the processor's byte ordering. When dealing with byte ordering issues, you could code a bunch of #ifdef _ _LITTLE_ENDIAN conditionals, but there is a better way. The Linux kernel defines a set of macros that handle conversions between the processor's byte ordering and that of the data you need to store or load in a specific byte order. For example:

u32 cpu_to_le32 (u32);
u32 le32_to_cpu (u32);

These two macros convert a value from whatever the CPU uses to an unsigned, little-endian, 32-bit quantity and back. They work whether your CPU is big-endian or little-endian and, for that matter, whether it is a 32-bit processor or not. They return their argument unchanged in cases where there is no work to be done. Use of these macros makes it easy to write portable code without having to use a lot of conditional compilation constructs.

There are dozens of similar routines; you can see the full list in <linux/byteorder/big_endian.h> and <linux/byteorder/little_endian.h>. After a while, the pattern is not hard to follow. be64_to_cpu converts an unsigned, big-endian, 64-bit value to the internal CPU representation. le16_to_cpus, instead, handles signed, little-endian, 16-bit quantities. When dealing with pointers, you can also use functions like cpu_to_le32p, which take a pointer to the value to be converted rather than the value itself. See the include file for the rest.

11.4.4. Data Alignment

The last problem worth considering when writing portable code is how to access unaligned data—for example, how to read a 4-byte value stored at an address that isn't a multiple of 4 bytes. i386 users often access unaligned data items, but not all architectures permit it. Many modern architectures generate an exception every time the program tries unaligned data transfers; data transfer is handled by the exception handler, with a great performance penalty. If you need to access unaligned data, you should use the following macros:

#include <asm/unaligned.h>
get_unaligned(ptr);
put_unaligned(val, ptr);

These macros are typeless and work for every data item, whether it's one, two, four, or eight bytes long. They are defined with any kernel version.

Another issue related to alignment is portability of data structures across platforms. The same data structure (as defined in the C-language source file) can be compiled differently on different platforms. The compiler arranges structure fields to be aligned according to conventions that differ from platform to platform.

In order to write data structures for data items that can be moved across architectures, you should always enforce natural alignment of the data items in addition to standardizing on a specific endianness. Natural alignment means storing data items at an address that is a multiple of their size (for instance, 8-byte items go in an address multiple of 8). To enforce natural alignment while preventing the compiler to arrange the fields in unpredictable ways, you should use filler fields that avoid leaving holes in the data structure.

To show how alignment is enforced by the compiler, the dataalign program is distributed in the misc-progs directory of the sample code, and an equivalent kdataalign module is part of misc-modules. This is the output of the program on several platforms and the output of the module on the SPARC64:

arch  Align:  char  short  int  long   ptr long-long  u8 u16 u32 u64
i386            1     2     4     4     4     4        1   2   4   4
i686            1     2     4     4     4     4        1   2   4   4
alpha           1     2     4     8     8     8        1   2   4   8
armv4l          1     2     4     4     4     4        1   2   4   4
ia64            1     2     4     8     8     8        1   2   4   8
mips            1     2     4     4     4     8        1   2   4   8
ppc             1     2     4     4     4     8        1   2   4   8
sparc           1     2     4     4     4     8        1   2   4   8
sparc64         1     2     4     4     4     8        1   2   4   8
x86_64          1     2     4     8     8     8        1   2   4   8

kernel: arch  Align: char short int long  ptr long-long u8 u16 u32 u64
kernel: sparc64        1    2    4    8    8     8       1   2   4   8

It's interesting to note that not all platforms align 64-bit values on 64-bit boundaries, so you need filler fields to enforce alignment and ensure portability.

Finally, be aware that the compiler may quietly insert padding into structures itself to ensure that every field is aligned for good performance on the target processor. If you are defining a structure that is intended to match a structure expected by a device, this automatic padding may thwart your attempt. The way around this problem is to tell the compiler that the structure must be "packed," with no fillers added. For example, the kernel header file <linux/edd.h> defines several data structures used in interfacing with the x86 BIOS, and it includes the following definition:

struct {
        u16 id;
        u64 lun;
        u16 reserved1;
        u32 reserved2;
} _ _attribute_ _ ((packed)) scsi;

Without the _ _attribute_ _ ((packed)), the lun field would be preceded by two filler bytes or six if we compile the structure on a 64-bit platform.

11.4.5. Pointers and Error Values

Many internal kernel functions return a pointer value to the caller. Many of those functions can also fail. In most cases, failure is indicated by returning a NULL pointer value. This technique works, but it is unable to communicate the exact nature of the problem. Some interfaces really need to return an actual error code so that the caller can make the right decision based on what actually went wrong.

A number of kernel interfaces return this information by encoding the error code in a pointer value. Such functions must be used with care, since their return value cannot simply be compared against NULL. To help in the creation and use of this sort of interface, a small set of functions has been made available (in <linux/err.h>).

A function returning a pointer type can return an error value with:

void *ERR_PTR(long error);

where error is the usual negative error code. The caller can use IS_ERR to test whether a returned pointer is an error code or not:

long IS_ERR(const void *ptr);

If you need the actual error code, it can be extracted with:

long PTR_ERR(const void *ptr);

You should use PTR_ERR only on a value for which IS_ERR returns a true value; any other value is a valid pointer.

    ⇦ prev ⇱ home next ⇨
    Poster of Linux kernelThe best gift for a Linux geek