bzz (1)
Leading comments
t Copyright (c) 2001-2003 Leon Bottou, Yann Le Cun, Patrick Haffner, Copyright (c) 2001 AT&T Corp., and Lizardtech, Inc. This is free documentation; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. The GNU General Public License's references to "object code" and "executables" are to be interpreted as the output of any document forma...
NAME
bzz - DjVu general purpose compression utility.SYNOPSIS
Encoding:
bzz -e[blocksize] inputfile outputfileDecoding:
bzz -d inputfile outputfile
DESCRIPTION
The first form of the command line (option -e) compresses the data from file inputfile and writes the compressed data into outputfile. The second form of the command line (option -d) decompressed file inputfile and writes the output to outputfile.OPTIONS
- -d
- Decoding mode.
- -e[blocksize]
-
Encoding mode.
The optional argument
blocksize
specifies the size of the input file blocks processed by the Burrows-Wheeler
transform expressed in kilobytes. The default block sizes is 2048
KB.The maximal block size is 4096KB.Specifying a larger block size usually produces higher compression ratios and increases the memory requirements of both the encoder and decoder. It is useless to specify a block size that is larger than the input file.
ALGORITHMS
The Burrows-Wheeler transform is performed using a combination of the Karp-Miller-Rosenberg and the Bentley-Sedgewick algorithms. This is comparable to (Sadakane, DCC 98) with a slightly more flexible ranking scheme. Symbols are then ordered according to a running estimate of their occurrence frequencies. The symbol ranks are then coded using a simple fixed tree and the ZP binary adaptive coder (Bottou, DCC 98).The Burrows-Wheeler transform is also used in the well known compressor bzip2. The originality of bzz is the use of the ZP adaptive coder. The adaptation noise can cost up to 5 percent in file size, but this penalty is usually offset by the benefits of adaptation.
PERFORMANCE
The following table shows comparative results (in bits per character) on the Canterbury Corpus ( corpus.canterbury.ac.nz ). The very good bzz performance on the spreadsheet file excl puts the weighted average ahead of much more sophisticated compressors such as fsmx.Compression performance | |||||||||||||
text | fax | csrc | excl | sprc | tech | poem | html | lisp | man | play | Weighted | Average | |
compress | 3.27 | 0.97 | 3.56 | 2.41 | 4.21 | 3.06 | 3.38 | 3.68 | 3.90 | 4.43 | 3.51 | 2.55 | 3.31 |
gzip -9 | 2.85 | 0.82 | 2.24 | 1.63 | 2.67 | 2.71 | 3.23 | 2.59 | 2.65 | 3.31 | 3.12 | 2.08 | 2.53 |
bzip2 -9 | 2.27 | 0.78 | 2.18 | 1.01 | 2.70 | 2.02 | 2.42 | 2.48 | 2.79 | 3.33 | 2.53 | 1.54 | 2.23 |
ppmd | 2.31 | 0.99 | 2.11 | 1.08 | 2.68 | 2.19 | 2.48 | 2.38 | 2.43 | 3.00 | 2.53 | 1.65 | 2.20 |
fsmx | 2.10 | 0.79 | 1.89 | 1.48 | 2.52 | 1.84 | 2.21 | 2.24 | 2.29 | 2.91 | 2.35 | 1.63 | 2.06 |
bzz | 2.25 | 0.76 | 2.13 | 0.78 | 2.67 | 2.00 | 2.40 | 2.52 | 2.60 | 3.19 | 2.52 | 1.44 | 2.16 |
Note that DjVu contributors have several entries in this table. Program compress was written some time ago by Joe Orost. Program ppmd is an improvement of the
PPM-C
method invented by Paul Howard.