The first form of the command line (option
-e
) compresses the data from file
inputfile
and writes the compressed data into
outputfile.
The second form of the command line (option
-d
) decompressed file
inputfile
and writes the output to
outputfile.
OPTIONS
-d
Decoding mode.
-e[blocksize]
Encoding mode.
The optional argument
blocksize
specifies the size of the input file blocks processed by the Burrows-Wheeler
transform expressed in kilobytes. The default block sizes is 2048
KB.
The maximal block size is 4096
KB.
Specifying a larger block size usually produces higher compression ratios
and increases the memory requirements of both the encoder and decoder.
It is useless to specify a block size that is larger than the
input file.
ALGORITHMS
The Burrows-Wheeler transform is performed using a combination of the
Karp-Miller-Rosenberg and the Bentley-Sedgewick algorithms. This is comparable
to (Sadakane, DCC 98) with a slightly more flexible ranking scheme. Symbols
are then ordered according to a running estimate of their occurrence
frequencies. The symbol ranks are then coded using a simple fixed tree and
the ZP binary adaptive coder (Bottou, DCC 98).
The Burrows-Wheeler transform is also used in the well known compressor
bzip2.
The originality of
bzz
is the use of the ZP adaptive coder.
The adaptation noise can cost up to 5 percent in
file size, but this penalty is usually offset by the benefits of
adaptation.
PERFORMANCE
The following table shows comparative results (in bits per character)
on the Canterbury Corpus (
http://corpus.canterbury.ac.nz
). The very good
bzz
performance on the spreadsheet file
excl
puts the weighted average ahead of much more sophisticated
compressors such as
fsmx.
Compression performance
text
fax
csrc
excl
sprc
tech
poem
html
lisp
man
play
Weighted
Average
compress
3.27
0.97
3.56
2.41
4.21
3.06
3.38
3.68
3.90
4.43
3.51
2.55
3.31
gzip -9
2.85
0.82
2.24
1.63
2.67
2.71
3.23
2.59
2.65
3.31
3.12
2.08
2.53
bzip2 -9
2.27
0.78
2.18
1.01
2.70
2.02
2.42
2.48
2.79
3.33
2.53
1.54
2.23
ppmd
2.31
0.99
2.11
1.08
2.68
2.19
2.48
2.38
2.43
3.00
2.53
1.65
2.20
fsmx
2.10
0.79
1.89
1.48
2.52
1.84
2.21
2.24
2.29
2.91
2.35
1.63
2.06
bzz
2.25
0.76
2.13
0.78
2.67
2.00
2.40
2.52
2.60
3.19
2.52
1.44
2.16
Note that DjVu contributors have several
entries in this table. Program
compress
was written some time ago by Joe Orost.
Program
ppmd
is an improvement of the