A.3 Profiling

Now that your program is (hopefully) correct, we turn to speeding its execution. Using the profiler gprof, you can determine which functions require the most execution time. This can help you determine which parts of the program to optimize or rewrite to execute more quickly. It can also help you find errors. For example, you may find that a particular function is called many more times than you expect.

In this section, we describe how to use gprof. Rewriting code to run more quickly requires creativity and careful choice of algorithms.

Obtaining profiling information requires three steps:

1.       Compile and link your program to enable profiling.

2.       Execute your program to generate profiling data.

3.       Use gprof to analyze and display the profiling data.

Before we illustrate these steps, we introduce a large enough program to make profiling interesting.

A.3.1 A Simple Calculator

To illustrate profiling, we'll use a simple calculator program. To ensure that the calculator takes a nontrivial amount of time, we'll use unary numbers for calculations, something we would definitely not want to do in a real-world program. Code for this program appears at the end of this chapter.

A unary number is represented by as many symbols as its value. For example, the number 1 is represented by "x," 2 by "xx," and 3 by "xxx." Instead of using x's, our program represents a non-negative number using a linked list with as many elements as the number's value. The number.c file contains routines to create the number 0, add 1 to a number, subtract 1 from a number, and add, subtract, and multiply numbers. Another function converts a string holding a non-negative decimal number to a unary number, and a function converts from a unary number to an int. Addition is implemented using repeated addition of 1s, while subtraction uses repeated removal of 1s. Multiplication is defined using repeated addition. The unary predicates even and odd each return the unary number for 1 if and only if its one operand is even or odd, respectively; otherwise they return the unary number for 0. The two predicates are mutually recursive. For example, a number is even if it is zero, or if one less than the number is odd.

The calculator accepts one-line postfix expressions [1] and prints each expression's value—for example:

[1] In postfix notation, a binary operator is placed after its operands instead of between them. So, for example, to multiply 6 and 8, you would use 6 8 x. To multiply 6 and 8 and then add 5 to the result, you would use 6 8 x 5 +.

 
% ./calculator 
Please enter a postfix expression: 
2 3 + 
5 
Please enter a postfix expression: 
2 3 + 4 -
1 

The calculator, defined in calculator.c, reads each expression, storing intermediate values on a stack of unary numbers, defined in stack.c. The stack stores its unary numbers in a linked list.

A.3.2 Collecting Profiling Information

The first step in profiling a program is to annotate its executable to collect profiling information. To do so, use the -pg compiler flag when both compiling the object files and linking. For example, consider this code:

 
% gcc -pg -c -o calculator.o calculator.c 
% gcc -pg -c -o stack.o stack.c 
% gcc -pg -c -o number.o number.c 
% gcc -pg calculator.o stack.o number.o -o calculator 

This enables collecting information about function calls and timing information. To collect line-by-line use information, also specify the debugging flag -g. To count basic block executions, such as the number of do -loop iterations, use -a.

The second step is to run the program. While it is running, profiling data is collected into a file named gmon.out, only for those portions of the code that are exercised. You must vary the program's input or commands to exercise the code sections that you want to profile. The program must terminate normally for the profiling file to be written.

A.3.3 Displaying Profiling Data

Given the name of an executable, gprof analyzes the gmon.out file to display information about how much time each function required. For example, consider the "flat" profiling data for computing 1787 x 13 - 1918 using our calculator program, which is produced by executing gprof ./calculator:

 
Flat profile: 
 
Each sample counts as 0.01 seconds. 
  %   cumulative   self              self     total 
 time   seconds   seconds    calls  ms/call  ms/call  name 
 26.07      1.76     1.76 20795463     0.00     0.00  decrement_number 
 24.44      3.41     1.65     1787     0.92     1.72  add 
 19.85      4.75     1.34 62413059     0.00     0.00  zerop 
 15.11      5.77     1.02     1792     0.57     2.05  destroy_number 
 14.37      6.74     0.97 20795463     0.00     0.00  add_one 
  0.15      6.75     0.01     1788     0.01     0.01  copy_number 
  0.00      6.75     0.00     1792     0.00     0.00  make_zero 
  0.00      6.75     0.00       11     0.00     0.00  empty_stack 

Computing the function decrement_number and all the functions it calls required 26.07% of the program's total execution time. It was called 20,795,463 times. Each individual execution required 0.0 seconds—namely, a time too small to measure. The add function was invoked 1,787 times, presumably to compute the product. Each call required 0.92 seconds. The copy_number function was invoked only 1,788 times, while it and the functions it calls required only 0.15% of the total execution time. Sometimes the mcount and profil functions used by profiling appear in the data.

In addition to the flat profile data, which indicates the total time spent within each function, gprof produces call graph data showing the time spent in each function and its children within the context of a function call chain:

 
index % time    self  children    called     name 
                                                 <spontaneous> 
[1]    100.0     0.00    6.75                 main [1] 
                 0.00    6.75       2/2           apply_binary_function [2] 
                 0.00    0.00       1/1792        destroy_number [4] 
                 0.00    0.00       1/1           number_to_unsigned_int [10] 
                 0.00    0.00       3/3           string_to_number [12] 
                 0.00    0.00       3/5           push_stack [16] 
                 0.00    0.00       1/1           create_stack [18] 
                 0.00    0.00       1/11          empty_stack [14] 
                 0.00    0.00       1/5           pop_stack [15] 
                 0.00    0.00       1/1           clear_stack [17] 
-----------------------------------------------
                0.00    6.75       2/2           main [1] 
[2]   100.0     0.00    6.75       2         apply_binary_function [2] 
                0.00    6.74       1/1           product [3] 
                0.00    0.01       4/1792        destroy_number [4] 
                0.00    0.00       1/1           subtract [11] 
                0.00    0.00       4/11          empty_stack [14] 
                0.00    0.00       4/5           pop_stack [15] 
                0.00    0.00       2/5           push_stack [16] 
-----------------------------------------------
               0.00    6.74       1/1           apply_binary_function [2] 
[3]    99.8    0.00    6.74       1         product [3] 
               1.02    2.65    1787/1792        destroy_number [4] 
               1.65    1.43    1787/1787        add [5] 
               0.00    0.00    1788/62413059     zerop [7] 
               0.00    0.00       1/1792        make_zero [13] 

The first frame shows that executing main and its children required 100% of the program's 6.75 seconds. It called apply_binary_function twice, which was called a total of two times throughout the entire program. Its caller was <spontaneous>; this indicates that the profiler was not capable of determining who called main. This first frame also shows that string_to_number called push_stack three times but was called five times throughout the program. The third frame shows that executing product and the functions it calls required 99.8% of the program's total execution time. It was invoked once by apply_binary_function.

The call graph data displays the total time spent executing a function and its children. If the function call graph is a tree, this number is easy to compute, but recursively defined functions must be treated specially. For example, the even function calls odd, which calls even. Each largest such call cycle is given its own number and is displayed individually in the call graph data. Consider this profiling data from determining whether 1787 x 13 x 3 is even:

 
-----------------------------------------------
                0.00    0.02       1/1         main [1] 
[9]      0.1    0.00    0.02       1       apply_unary_function [9] 
                0.01    0.00       1/1         even <cycle 1> [13] 
                0.00    0.00       1/1806      destroy_number [5] 
                0.00    0.00       1/13        empty_stack [17] 
                0.00    0.00       1/6         pop_stack [18] 
                0.00    0.00       1/6         push_stack [19] 
-----------------------------------------------
[10]     0.1    0.01    0.00       1+69693 <cycle 1 as a whole> [10] 
                0.00    0.00   34847           even <cycle 1> [13] 
-----------------------------------------------
                               34847           even <cycle 1> [13] 
[11]     0.1    0.01    0.00   34847       odd <cycle 1> [11] 
                0.00    0.00 34847/186997954     zerop [7] 
                0.00    0.00     1/1806        make_zero [16] 
                               34846           even <cycle 1> [13] 

The 1+69693 in the [10] frame indicates that cycle 1 was called once, while the functions in the cycle were called 69,693 times. The cycle called the even function. The next entry shows that odd was called 34,847 times by even.

In this section, we have briefly discussed only some of gprof's features. Its info pages contain information about other useful features:

·         Use the -s option to sum the execution results from several different runs.

·         Use the -c option to identify children that could have been called but were not.

·         Use the -l option to display line-by-line profiling information.

·         Use the -A option to display source code annotated with percentage execution numbers.

The info pages also provide more information about the interpretation of the analyzed data.

A.3.4 How gprof Collects Data

When a profiled executable runs, every time a function is called its count is also incremented. Also, gprof periodically interrupts the executable to determine the currently executing function. These samples determine function execution times. Because Linux's clock ticks are 0.01 seconds apart, these interruptions occur, at most, every 0.01 seconds. Thus, profiles for quickly executing programs or for quickly executing infrequently called functions may be inaccurate. To avoid these inaccuracies, run the executable for longer periods of time, or sum together profile data from several executions. Read about the -s option to sum profiling data in gprof's info pages.

A.3.5 Source Code for the Calculator Program

Listing A.3 presents a program that calculates the value of postfix expressions.

Listing A.3 (calculator.c) Main Calculator Program
/* Calculate using unary numbers.  */ 
 
/* Enter one-line expressions using reverse postfix notation, e.g., 
     602 7 5 - 3 * + 
   Nonnegative numbers are entered using decimal notation.  The 
   operators "+", "-", and "*" are supported.  The unary operators 
   "even" and "odd" return the number 1 if its one operand is even 
   or odd, respectively.  Spaces must separate all words.  Negative 
   numbers are not supported.  */ 
 
#include <stdio.h> 
#include <stdlib.h> 
#include <string.h> 
#include <ctype.h> 
#include "definitions.h" 
 
/* Apply the binary function with operands obtained from the stack, 
   pushing the answer on the stack.  Return nonzero upon success.  */ 
 
int apply_binary_function (number (*function) (number, number), 
                           Stack* stack) 
{
  number operand1, operand2; 
  if (empty_stack (*stack)) 
    return 0; 
  operand2 = pop_stack (stack); 
  if (empty_stack (*stack)) 
    return 0; 
  operand1 = pop_stack (stack); 
  push_stack (stack, (*function) (operand1, operand2)); 
  destroy_number (operand1); 
  destroy_number (operand2); 
  return 1; 
} 
 
/* Apply the unary function with an operand obtained from the stack, 
   pushing the answer on the stack. Return nonzero upon success.  */ 
 
int apply_unary_function (number (*function) (number), 
                          Stack* stack) 
{
  number operand; 
  if (empty_stack (*stack)) 
    return 0; 
  operand = pop_stack (stack); 
  push_stack (stack, (*function) (operand)); 
  destroy_number (operand); 
  return 1; 
} 
 
int main () 
{
  char command_line[1000]; 
  char* command_to_parse; 
  char* token; 
  Stack number_stack = create_stack (); 
 
  while (1) {
    printf ("Please enter a postfix expression:\n"); 
    command_to_parse = fgets (command_line, sizeof (command_line), stdin); 
    if (command_to_parse == NULL) 
      return 0; 
 
    token = strtok (command_to_parse, " \t\n"); 
    command_to_parse = 0; 
    while (token != 0) {
      if (isdigit (token[0])) 
        push_stack (&number_stack, string_to_number (token)); 
      else if (((strcmp (token, "+") == 0) && 
                !apply_binary_function (&add, &number_stack)) || 
               ((strcmp (token, "-") == 0) && 
                !apply_binary_function (&subtract, &number_stack)) || 
               ((strcmp (token, "*") == 0) && 
                !apply_binary_function (&product, &number_stack)) || 
               ((strcmp (token, "even") == 0) && 
                !apply_unary_function (&even, &number_stack)) || 
               ((strcmp (token, "odd") == 0) && 
                !apply_unary_function (&odd, &number_stack))) 
        return 1; 
      token = strtok (command_to_parse, " \t\n"); 
   } 
   if (empty_stack (number_stack)) 
     return 1; 
   else {
     number answer = pop_stack (&number_stack); 
     printf ("%u\n", number_to_unsigned_int (answer)); 
     destroy_number (answer); 
     clear_stack (&number_stack); 
   } 
} 
 
 return 0; 
} 

The functions in Listing A.4 implement unary numbers using empty linked lists.

Listing A.4 (number.c) Unary Number Implementation
/* Operate on unary numbers.  */ 
 
#include <assert.h> 
#include <stdlib.h> 
#include <limits.h> 
#include "definitions.h" 
 
/* Create a number representing zero.  */ 
 
number make_zero () 
{
  return 0; 
} 
 
/* Return nonzero if the number represents zero.  */ 
 
int zerop (number n) 
{
return n == 0; 
} 
 
/* Decrease a positive number by 1.  */ 
 
number decrement_number (number n) 
{
  number answer; 
  assert (!zerop (n)); 
  answer = n->one_less_; 
  free (n); 
  return answer; 
} 
 
/* Add 1 to a number.  */ 
 
number add_one (number n) 
{
  number answer = malloc (sizeof (struct LinkedListNumber)); 
  answer->one_less_ = n; 
  return answer; 
} 
 
/* Destroying a number.  */ 
 
void destroy_number (number n) 
{
  while (!zerop (n)) 
    n = decrement_number (n); 
} 
 
/* Copy a number.  This function is needed only because of memory 
   allocation.  */ 
 
number copy_number (number n) 
{
  number answer = make_zero (); 
  while (!zerop (n)) {
    answer = add_one (answer); 
    n = n->one_less_; 
  } 
  return answer; 
} 
 
/* Add two numbers.  */ 
 
number add (number n1, number n2) 
{
  number answer = copy_number (n2); 
  number addend = n1; 
  while (!zerop (addend)) {
    answer = add_one (answer); 
    addend = addend->one_less_; 
  } 
  return answer; 
} 
 
/* Subtract a number from another.  */ 
 
number subtract (number n1, number n2) 
{
  number answer = copy_number (n1); 
  number subtrahend = n2; 
  while (!zerop (subtrahend)) {
    assert (!zerop (answer)); 
    answer = decrement_number (answer); 
    subtrahend = subtrahend->one_less_; 
  } 
  return answer; 
} 
 
/* Return the product of two numbers.  */ 
 
number product (number n1, number n2) 
{
  number answer = make_zero (); 
  number multiplicand = n1; 
  while (!zerop (multiplicand)) {
    number answer2 = add (answer, n2); 
    destroy_number (answer); 
      answer = answer2; 
      multiplicand = multiplicand->one_less_; 
   } 
   return answer; 
} 
 
/* Return nonzero if number is even.  */ 
 
number even (number n) 
{
  if (zerop (n)) 
     return add_one (make_zero ()); 
  else 
     return odd (n->one_less_); 
} 
 
/* Return nonzero if number is odd.  */ 
 
number odd (number n) 
{
  if (zerop (n)) 
    return make_zero (); 
  else 
    return even (n->one_less_); 
} 
 
/* Convert a string representing a decimal integer into a "number".  */ 
 
number string_to_number (char * char_number) 
{
  number answer = make_zero (); 
   int num = strtoul (char_number, (char **) 0, 0); 
   while (num != 0) {
      answer = add_one (answer); 
      --num; 
   } 
   return answer; 
} 
 
/* Convert a "number" into an "unsigned int".  */ 
 
unsigned number_to_unsigned_int (number n) 
{
  unsigned answer = 0; 
  while (!zerop (n)) {
    n = n->one_less_; 
    ++answer; 
  } 
  return answer; 
} 

The functions in Listing A.5 implement a stack of unary numbers using a linked list.

Listing A.5 (stack.c) Unary Number Stack
/* Provide a stack of "number"s.  */ 
 
#include <assert.h> 
#include <stdlib.h> 
#include "definitions.h" 
 
/* Create an empty stack.  */ 
 
Stack create_stack () 
{
  return 0; 
} 
 
/* Return nonzero if the stack is empty.  */ 
 
int empty_stack (Stack stack) 
{
  return stack == 0; 
} 
 
/* Remove the number at the top of a nonempty stack.  If the stack is 
   empty, abort.  */ 
 
number pop_stack (Stack* stack) 
{
  number answer; 
  Stack rest_of_stack; 
 
  assert (!empty_stack (*stack)); 
  answer = (*stack)->element_; 
  rest_of_stack = (*stack)->next_; 
  free (*stack); 
  *stack = rest_of_stack; 
  return answer; 
} 
 
/* Add a number to the beginning of a stack.  */ 
 
void push_stack (Stack* stack, number n) 
{
  Stack new_stack = malloc (sizeof (struct StackElement)); 
  new_stack->element_ = n; 
  new_stack->next_ = *stack; 
  *stack = new_stack; 
} 
 
/* Remove all the stack's elements.  */ 
void clear_stack (Stack* stack) 
{
  while (!empty_stack (*stack)) {
     number top = pop_stack (stack); 
     destroy_number (top); 
  } 
} 

Listing A.6 contains declarations for stacks and numbers.

Listing A.6 (definitions.h) Header File for number.c and stack.c
#ifndef DEFINITIONS_H 
#define DEFINITIONS_H 1 
 
/* Implement a number using a linked list.  */ 
struct LinkedListNumber 
{
  struct LinkedListNumber* 
                one_less_; 
}; 
typedef struct LinkedListNumber* number; 
 
/* Implement a stack of numbers as a linked list.  Use 0 to represent 
   an empty stack.  */ 
struct StackElement 
{
  number       element_; 
  struct       StackElement* next_; 
}; 
typedef struct StackElement* Stack; 
 
/* Operate on the stack of numbers.  */ 
Stack create_stack (); 
int empty_stack (Stack stack); 
number pop_stack (Stack* stack); 
void push_stack (Stack* stack, number n); 
void clear_stack (Stack* stack); 
 
/* Operations on numbers.  */ 
number make_zero (); 
void destroy_number (number n); 
number add (number n1, number n2); 
number subtract (number n1, number n2); 
number product (number n1, number n2); 
number even (number n); 
number odd (number n); 
number string_to_number (char* char_number); 
unsigned number_to_unsigned_int (number n); 
 
#endif /* DEFINITIONS_H */