lmbench
is a series of micro benchmarks intended to measure basic operating
system and hardware system metrics. The benchmarks fall into three
general classes: bandwidth, latency, and ``other''.
Most of the
lmbench
benchmarks use a standard timing harness described in timing(3)
and have a few standard options:
parallelism,
warmup,
and
repetitions.
Parallelism
specifies the number of benchmark processes to run in parallel.
This is primarily useful when measuring the performance of SMP
or distributed computers and can be used to evaluate the system's
performance scalability.
Warmup
is the number of minimum number of microseconds the benchmark should
execute the benchmarked capability before it begins measuring
performance. Again this is primarily useful for SMP or distributed
systems and it is intended to give the process scheduler time to
"settle" and migrate processes to other processors. By measuring
performance over various
warmup
periods, users may evaulate the scheduler's responsiveness.
Repetitions
is the number of measurements that the benchmark should take. This
allows lmbench to provide greater or lesser statistical strength to
the results it reports. The default number of
repetitions
is 11.
BANDWIDTH MEASUREMENTS
Data movement is fundamental to the performance on most computer systems.
The bandwidth measurements are intended to show how the system can move
data. The results of the bandwidth metrics can be compared but care
must be taken to understand what it is that is being compared. The
bandwidth benchmarks can be reduced to two main components: operating
system overhead and memory speeds. The bandwidth benchmarks report
their results as megabytes moved per second but please note that the
data moved is not necessarily the same as the memory bandwidth
used to move the data. Consult the individual man pages for more
information.
Each of the bandwidth benchmarks is listed below with a brief overview of the
intent of the benchmark.
bw_file_rd
reading and summing of a file via the read(2) interface.
bw_mem_cp
memory copy.
bw_mem_rd
memory reading and summing.
bw_mem_wr
memory writing.
bw_mmap_rd
reading and summing of a file via the memory mapping mmap(2) interface.
bw_pipe
reading of data via a pipe.
bw_tcp
reading of data via a TCP/IP socket.
bw_unix
reading data from a UNIX socket.
LATENCY MEASUREMENTS
Control messages are also fundamental to the performance on most
computer systems. The latency measurements are intended to show how fast
a system can be told to do some operation. The results of the
latency metrics can be compared to each other
for the most part. In particular, the
pipe, rpc, tcp, and udp transactions are all identical benchmarks
carried out over different system abstractions.
Latency numbers here should mostly be in microseconds per operation.
lat_connect
the time it takes to establish a TCP/IP connection.
lat_ctx
context switching; the number and size of processes is varied.
lat_fcntl
fcntl file locking.
lat_fifo
``hot potato'' transaction through a UNIX FIFO.
lat_fs
creating and deleting small files.
lat_pagefault
the time it takes to fault in a page from a file.
lat_mem_rd
memory read latency (accurate to the ~2-5 nanosecond range,
reported in nanoseconds).
lat_mmap
time to set up a memory mapping.
lat_ops
basic processor operations, such as integer XOR, ADD, SUB, MUL, DIV,
and MOD, and float ADD, MUL, DIV, and double ADD, MUL, DIV.
lat_pipe
``hot potato'' transaction through a Unix pipe.
lat_proc
process creation times (various sorts).
lat_rpc
``hot potato'' transaction through Sun RPC over UDP or TCP.
lat_select
select latency
lat_sig
signal installation and catch latencies. Also protection fault signal
latency.
lat_syscall
non trivial entry into the system.
lat_tcp
``hot potato'' transaction through TCP.
lat_udp
``hot potato'' transaction through UDP.
lat_unix
``hot potato'' transaction through UNIX sockets.
lat_unix_connect
the time it takes to establish a UNIX socket connection.
OTHER MEASUREMENTS
mhz
processor cycle time
tlb
TLB size and TLB miss latency
line
cache line size (in bytes)
cache
cache statistics, such as line size, cache sizes, memory parallelism.
stream
John McCalpin's stream benchmark
par_mem
memory subsystem parallelism. How many requests can the memory
subsystem service in parallel, which may depend on the location of the
data in the memory hierarchy.