When to use assembly code

9.1 When to Use Assembly Code

Although asm statements can be abused, they allow your programs to access the computer hardware directly, and they can produce programs that execute quickly. You can use them when writing operating system code that directly needs to interact with hardware. For example, /usr/include/asm/io.h contains assembly instructions to access input/output ports directly. The Linux source code file /usr/src/linux/arch/i386/kernel/process.s provides another example, using hlt in idle loop code. See other Linux source code files in /usr/src/linux/arch/ and /usr/src/linux/drivers/.

Assembly instructions can also speed the innermost loop of computer programs. For example, if the majority of a program's running time is computing the sine and cosine of the same angles, this innermost loop could be recoded using the fsincos x86 instruction. ^[2] See, for example, /usr/include/bits/mathinline.h, which wraps up into macros some inline assembly sequences that speed transcendental function computation.

^[2] Algorithmic or data structure changes may be more effective in reducing a program's running time than using assembly instructions.

You should use inline assembly to speed up code only as a last resort. Current compilers are quite sophisticated and know a lot about the details of the processors for which they generate code. Therefore, compilers can often choose code sequences that may seem unintuitive or roundabout but that actually execute faster than other instruction sequences. Unless you understand the instruction set and scheduling attributes of your target processor very well, you're probably better off letting the compiler's optimizers generate assembly code for you for most operations.

Occasionally, one or two assembly instructions can replace several lines of higher-level language code. For example, determining the position of the most significant nonzero bit of a nonzero integer using the C programming languages requires a loop or floating-point computations. Many architectures, including the x86, have a single assembly instruction (bsr) to compute this bit position. We'll demonstrate the use of one of these in Section 9.4, "Example."