9.4 Example

The x86 architecture includes instructions that determine the positions of the least significant set bit and the most significant set bit in a word. The processor can execute these instructions quite efficiently. In contrast, implementing the same operation in C requires a loop and a bit shift.

For example, the bsrl assembly instruction computes the position of the most significant bit set in its first operand, and places the bit position (counting from 0, the least significant bit) into its second operand. To place the bit position for number into position, we could use this asm statement:

 
asm ("bsrl %1, %0"  :  "=r" (position)  :  "r" (number)); 

One way you could implement the same operation in C is using this loop:

 
long i; 
for (i = (number >> 1), position = 0; i != 0; ++position) 
  i >>= 1; 

To test the relative speeds of these two versions, we'll place them in a loop that computes the bit positions for a large number of values. Listing 9.1 does this using the C loop implementation. The program loops over integers, from 1 up to the value specified on the command line. For each value of number, it computes the most significant bit that is set. Listing 9.2 does the same thing using the inline assembly instruction. Note that in both versions, we assign the computed bit position to a volatile variable result. This is to coerce the compiler's optimizer so that it does not eliminate the entire bit position computation; if the result is not used or stored in memory, the optimizer eliminates the computation as "dead code."

Listing 9.1 (bit-pos-loop.c) Find Bit Position Using a Loop
#include <stdio.h> 
#include <stdlib.h> 
 
int main (int argc, char* argv[]) 
{
  long max = atoi (argv[1]); 
  long number; 
  long i; 
  unsigned position; 
  volatile unsigned result; 
 
  /* Repeat the operation for a large number of values.  */ 
  for (number = 1; number <= max; ++number) {
    /* Repeatedly shift the number to the right, until the result is 
       zero. Keep count of the number of shifts this requires.  */ 
    for (i = (number >> 1), position = 0; i != 0; ++position) 
      i >>= 1; 
    /* The position of the most significant set bit is the number of 
       shifts we needed after the first one.  */ 
    result = position; 
  } 
 
  return 0; 
} 
Listing 9.2 (bit-pos-asm.c) Find Bit Position Using bsrl
#include <stdio.h> 
#include <stdlib.h> 
 
int main (int argc, char* argv[]) 
{
  long max = atoi (argv[1]); 
  long number; 
  unsigned position; 
  volatile unsigned result; 
 
  /* Repeat the operation for a large number of values.  */ 
  for (number = 1; number <= max; ++number) {
    /* Compute the position of the most significant set bit using the 
       bsrl assembly instruction.  */ 
    asm ("bsrl %1, %0" :  "=r" (position) : "r" (number)); 
    result = position; 
  } 
  return 0; 
} 

We'll compile both versions with full optimization:

 
% cc -O2 -o bit-pos-loop bit-pos-loop.c 
% cc -O2 -o bit-pos-asm bit-pos-asm.c 

Now let's run each using the time command to measure execution time. We'll specify a large value as the command-line argument, to make sure that each version takes at least a few seconds to run.

 
% time ./bit-pos-loop 250000000 
19.51user 0.00system 0:20.40elapsed 95%CPU (0avgtext+0avgdata 
0maxresident)k0inputs+0outputs (73major+11minor)pagefaults 0swaps 
% time ./bit-pos-asm 250000000 
3.19user 0.00system 0:03.32elapsed 95%CPU (0avgtext+0avgdata 
0maxresident)k0inputs+0outputs (73major+11minor)pagefaults 0swaps 

Notice that the version that uses inline assembly executes a great deal faster (your results for this example may vary).