9.3 Extended Assembly Syntax

In the subsections that follow, we describe the syntax rules for asm statements. Their sections are separated by colons.

We will refer to this illustrative asm statement, which computes the Boolean expression x > y:

 
asm ("fucomip %%st(1), %%st; seta %%al" : 
     "=a" (result) : "u" (y), "t" (x) : "cc", "st"); 

First, fucomip compares its two operands x and y, and stores values indicating the result into the condition code register. Then seta converts these values into a 0 or 1 result.

9.3.1 Assembler Instructions

The first section contains the assembler instructions, enclosed in quotation marks. The example asm contains two assembly instructions, fucomip and seta, separated by semicolons. If the assembler does not permit semicolons, use newline characters (\n) to separate instructions.

The compiler ignores the contents of this first section, except that one level of percentage signs is removed, so %% changes to %. The meaning of %%st(1) and other such terms is architecture-dependent.

GCC will complain if you specify the -traditional option or the -ansi option when compiling a program containing asm statements. To avoid producing these errors, such as in header files, use the alternative keyword __asm__.

9.3.2 Outputs

The second section specifies the instructions' output operands using C syntax. Each operand is specified by an operand constraint string followed by a C expression in parentheses. For output operands, which must be lvalues, the constraint string should begin with an equals sign. The compiler checks that the C expression for each output operand is in fact an lvalue.

Letters specifying registers for a particular architecture can be found in the GCC source code, in the REG_CLASS_FROM_LETTER macro. For example, the gcc/config/i386/i386.h configuration file in GCC lists the register letters for the x86 architecture. [3] Table 9.1 summarizes these.

[3] You'll need to have some familiarity with GCC's internals to make sense of this file.

Table 9.1. Register Letters for the Intel x86 Architecture

Register Letter

Registers That GCC May Use

R

General register (EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP)

q

General register for data (EAX, EBX, ECX, EDX)

f

Floating-point register

t

Top floating-point register

u

Second-from-top floating-point register

a

EAX register

b

EBX register

c

ECX register

d

EDX register

x

SSE register (Streaming SIMD Extension register)

y

MMX multimedia registers

A

An 8-byte value formed from EAX and EDX

D

Destination pointer for string operations (EDI)

S

Source pointer for string operations (ESI)

Multiple operands in an asm statement, each specified by a constraint string and a C expression, are separated by commas, as illustrated in the example asm's input section. You may specify up to 10 operands, denoted %0, %1, , %9, in the output and input sections. If there are no output operands but there are input operands or clobbered registers, leave the output section empty or mark it with a comment like

 
/* no outputs */. 

9.3.3 Inputs

The third section specifies the input operands for the assembler instructions. The constraint string for an input operand should not have an equals sign, which indicates an lvalue. Otherwise, an input operand's syntax is the same as for output operands.

To indicate that a register is both read from and written to in the same asm, use an input constraint string of the output operand's number. For example, to indicate that an input register is the same as the first output register number, use 0. Output operands are numbered left to right, starting with 0. Merely specifying the same C expression for an output operand and an input operand does not guarantee that the two values will be placed in the same register.

This input section can be omitted if there are no input operands and the subsequent clobber section is empty.

9.3.4 Clobbers

If an instruction modifies the values of one or more registers as a side effect, specify the clobbered registers in the asm's fourth section. For example, the fucomip instruction modifies the condition code register, which is denoted cc. Separate strings representing clobbered registers with commas. If the instruction can modify an arbitrary memory location, specify memory. Using the clobber information, the compiler determines which values must be reloaded after the asm executes. If you don't specify this information correctly, GCC may assume incorrectly that registers still contain values that have, in fact, been overwritten, which will affect your program's correctness.