Imported bzip2 modified to build a decompression only dll for use by the ramdisk driver

svn path=/trunk/; revision=2298
This commit is contained in:
Phillip Susi 2001-10-16 20:59:19 +00:00
parent 718510b28a
commit f89bf61ed5
47 changed files with 49532 additions and 0 deletions

View file

@ -0,0 +1,167 @@
0.9.0
~~~~~
First version.
0.9.0a
~~~~~~
Removed 'ranlib' from Makefile, since most modern Unix-es
don't need it, or even know about it.
0.9.0b
~~~~~~
Fixed a problem with error reporting in bzip2.c. This does not effect
the library in any way. Problem is: versions 0.9.0 and 0.9.0a (of the
program proper) compress and decompress correctly, but give misleading
error messages (internal panics) when an I/O error occurs, instead of
reporting the problem correctly. This shouldn't give any data loss
(as far as I can see), but is confusing.
Made the inline declarations disappear for non-GCC compilers.
0.9.0c
~~~~~~
Fixed some problems in the library pertaining to some boundary cases.
This makes the library behave more correctly in those situations. The
fixes apply only to features (calls and parameters) not used by
bzip2.c, so the non-fixedness of them in previous versions has no
effect on reliability of bzip2.c.
In bzlib.c:
* made zero-length BZ_FLUSH work correctly in bzCompress().
* fixed bzWrite/bzRead to ignore zero-length requests.
* fixed bzread to correctly handle read requests after EOF.
* wrong parameter order in call to bzDecompressInit in
bzBuffToBuffDecompress. Fixed.
In compress.c:
* changed setting of nGroups in sendMTFValues() so as to
do a bit better on small files. This _does_ effect
bzip2.c.
0.9.5a
~~~~~~
Major change: add a fallback sorting algorithm (blocksort.c)
to give reasonable behaviour even for very repetitive inputs.
Nuked --repetitive-best and --repetitive-fast since they are
no longer useful.
Minor changes: mostly a whole bunch of small changes/
bugfixes in the driver (bzip2.c). Changes pertaining to the
user interface are:
allow decompression of symlink'd files to stdout
decompress/test files even without .bz2 extension
give more accurate error messages for I/O errors
when compressing/decompressing to stdout, don't catch control-C
read flags from BZIP2 and BZIP environment variables
decline to break hard links to a file unless forced with -f
allow -c flag even with no filenames
preserve file ownerships as far as possible
make -s -1 give the expected block size (100k)
add a flag -q --quiet to suppress nonessential warnings
stop decoding flags after --, so files beginning in - can be handled
resolved inconsistent naming: bzcat or bz2cat ?
bzip2 --help now returns 0
Programming-level changes are:
fixed syntax error in GET_LL4 for Borland C++ 5.02
let bzBuffToBuffDecompress return BZ_DATA_ERROR{_MAGIC}
fix overshoot of mode-string end in bzopen_or_bzdopen
wrapped bzlib.h in #ifdef __cplusplus ... extern "C" { ... }
close file handles under all error conditions
added minor mods so it compiles with DJGPP out of the box
fixed Makefile so it doesn't give problems with BSD make
fix uninitialised memory reads in dlltest.c
0.9.5b
~~~~~~
Open stdin/stdout in binary mode for DJGPP.
0.9.5c
~~~~~~
Changed BZ_N_OVERSHOOT to be ... + 2 instead of ... + 1. The + 1
version could cause the sorted order to be wrong in some extremely
obscure cases. Also changed setting of quadrant in blocksort.c.
0.9.5d
~~~~~~
The only functional change is to make bzlibVersion() in the library
return the correct string. This has no effect whatsoever on the
functioning of the bzip2 program or library. Added a couple of casts
so the library compiles without warnings at level 3 in MS Visual
Studio 6.0. Included a Y2K statement in the file Y2K_INFO. All other
changes are minor documentation changes.
1.0
~~~
Several minor bugfixes and enhancements:
* Large file support. The library uses 64-bit counters to
count the volume of data passing through it. bzip2.c
is now compiled with -D_FILE_OFFSET_BITS=64 to get large
file support from the C library. -v correctly prints out
file sizes greater than 4 gigabytes. All these changes have
been made without assuming a 64-bit platform or a C compiler
which supports 64-bit ints, so, except for the C library
aspect, they are fully portable.
* Decompression robustness. The library/program should be
robust to any corruption of compressed data, detecting and
handling _all_ corruption, instead of merely relying on
the CRCs. What this means is that the program should
never crash, given corrupted data, and the library should
always return BZ_DATA_ERROR.
* Fixed an obscure race-condition bug only ever observed on
Solaris, in which, if you were very unlucky and issued
control-C at exactly the wrong time, both input and output
files would be deleted.
* Don't run out of file handles on test/decompression when
large numbers of files have invalid magic numbers.
* Avoid library namespace pollution. Prefix all exported
symbols with BZ2_.
* Minor sorting enhancements from my DCC2000 paper.
* Advance the version number to 1.0, so as to counteract the
(false-in-this-case) impression some people have that programs
with version numbers less than 1.0 are in someway, experimental,
pre-release versions.
* Create an initial Makefile-libbz2_so to build a shared library.
Yes, I know I should really use libtool et al ...
* Make the program exit with 2 instead of 0 when decompression
fails due to a bad magic number (ie, an invalid bzip2 header).
Also exit with 1 (as the manual claims :-) whenever a diagnostic
message would have been printed AND the corresponding operation
is aborted, for example
bzip2: Output file xx already exists.
When a diagnostic message is printed but the operation is not
aborted, for example
bzip2: Can't guess original name for wurble -- using wurble.out
then the exit value 0 is returned, unless some other problem is
also detected.
I think it corresponds more closely to what the manual claims now.
1.0.1
~~~~~
* Modified dlltest.c so it uses the new BZ2_ naming scheme.
* Modified makefile-msc to fix minor build probs on Win2k.
* Updated README.COMPILATION.PROBLEMS.
There are no functionality changes or bug fixes relative to version
1.0.0. This is just a documentation update + a fix for minor Win32
build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is
utterly pointless. Don't bother.

View file

@ -0,0 +1,39 @@
This program, "bzip2" and associated library "libbzip2", are
copyright (C) 1996-2000 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Cambridge, UK.
jseward@acm.org
bzip2/libbzip2 version 1.0 of 21 March 2000

View file

@ -0,0 +1,17 @@
PATH_TO_TOP = ../..
TARGET_TYPE = dynlink
TARGET_NAME = unbzip2
TARGET_NORC = yes
TARGET_LFLAGS = -nostartfiles -ffreestanding
TARGET_CFLAGS=-Wall -Winline -Os -fomit-frame-pointer -fno-strength-reduce -DBZ_NO_STDIO -DBZ_DECOMPRESS_ONLY $(BIGFILES)
TARGET_OBJECTS = bzlib.o randtable.o crctable.o decompress.o huffman.o dllmain.o
include $(PATH_TO_TOP)/rules.mak
include $(TOOLS_PATH)/helper.mk
test.exe: test.o ../../dk/w32/lib/unbzip2.a
$(CC) -g -o test.exe test.o ../../dk/w32/lib/unbzip2.a
test.o: test.c
$(CC) -g -c test.c

View file

@ -0,0 +1,43 @@
# This Makefile builds a shared version of the library,
# libbz2.so.1.0.1, with soname libbz2.so.1.0,
# at least on x86-Linux (RedHat 5.2),
# with gcc-2.7.2.3. Please see the README file for some
# important info about building the library like this.
SHELL=/bin/sh
CC=gcc
BIGFILES=-D_FILE_OFFSET_BITS=64
CFLAGS=-fpic -fPIC -Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce $(BIGFILES)
OBJS= blocksort.o \
huffman.o \
crctable.o \
randtable.o \
compress.o \
decompress.o \
bzlib.o
all: $(OBJS)
$(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS)
$(CC) $(CFLAGS) -o bzip2-shared bzip2.c libbz2.so.1.0.1
rm -f libbz2.so.1.0
ln -s libbz2.so.1.0.1 libbz2.so.1.0
clean:
rm -f $(OBJS) bzip2.o libbz2.so.1.0.1 libbz2.so.1.0 bzip2-shared
blocksort.o: blocksort.c
$(CC) $(CFLAGS) -c blocksort.c
huffman.o: huffman.c
$(CC) $(CFLAGS) -c huffman.c
crctable.o: crctable.c
$(CC) $(CFLAGS) -c crctable.c
randtable.o: randtable.c
$(CC) $(CFLAGS) -c randtable.c
compress.o: compress.c
$(CC) $(CFLAGS) -c compress.c
decompress.o: decompress.c
$(CC) $(CFLAGS) -c decompress.c
bzlib.o: bzlib.c
$(CC) $(CFLAGS) -c bzlib.c

View file

@ -0,0 +1,166 @@
This is the README for bzip2, a block-sorting file compressor, version
1.0. This version is fully compatible with the previous public
releases, bzip2-0.1pl2, bzip2-0.9.0 and bzip2-0.9.5.
bzip2-1.0 is distributed under a BSD-style license. For details,
see the file LICENSE.
Complete documentation is available in Postscript form (manual.ps) or
html (manual_toc.html). A plain-text version of the manual page is
available as bzip2.txt. A statement about Y2K issues is now included
in the file Y2K_INFO.
HOW TO BUILD -- UNIX
Type `make'. This builds the library libbz2.a and then the
programs bzip2 and bzip2recover. Six self-tests are run.
If the self-tests complete ok, carry on to installation:
To install in /usr/bin, /usr/lib, /usr/man and /usr/include, type
make install
To install somewhere else, eg, /xxx/yyy/{bin,lib,man,include}, type
make install PREFIX=/xxx/yyy
If you are (justifiably) paranoid and want to see what 'make install'
is going to do, you can first do
make -n install or
make -n install PREFIX=/xxx/yyy respectively.
The -n instructs make to show the commands it would execute, but
not actually execute them.
HOW TO BUILD -- UNIX, shared library libbz2.so.
Do 'make -f Makefile-libbz2_so'. This Makefile seems to work for
Linux-ELF (RedHat 5.2 on an x86 box), with gcc. I make no claims
that it works for any other platform, though I suspect it probably
will work for most platforms employing both ELF and gcc.
bzip2-shared, a client of the shared library, is also build, but
not self-tested. So I suggest you also build using the normal
Makefile, since that conducts a self-test.
Important note for people upgrading .so's from 0.9.0/0.9.5 to
version 1.0. All the functions in the library have been renamed,
from (eg) bzCompress to BZ2_bzCompress, to avoid namespace pollution.
Unfortunately this means that the libbz2.so created by
Makefile-libbz2_so will not work with any program which used an
older version of the library. Sorry. I do encourage library
clients to make the effort to upgrade to use version 1.0, since
it is both faster and more robust than previous versions.
HOW TO BUILD -- Windows 95, NT, DOS, Mac, etc.
It's difficult for me to support compilation on all these platforms.
My approach is to collect binaries for these platforms, and put them
on the master web page (http://sourceware.cygnus.com/bzip2). Look
there. However (FWIW), bzip2-1.0 is very standard ANSI C and should
compile unmodified with MS Visual C. For Win32, there is one
important caveat: in bzip2.c, you must set BZ_UNIX to 0 and
BZ_LCCWIN32 to 1 before building. If you have difficulties building,
you might want to read README.COMPILATION.PROBLEMS.
VALIDATION
Correct operation, in the sense that a compressed file can always be
decompressed to reproduce the original, is obviously of paramount
importance. To validate bzip2, I used a modified version of Mark
Nelson's churn program. Churn is an automated test driver which
recursively traverses a directory structure, using bzip2 to compress
and then decompress each file it encounters, and checking that the
decompressed data is the same as the original. There are more details
in Section 4 of the user guide.
Please read and be aware of the following:
WARNING:
This program (attempts to) compress data by performing several
non-trivial transformations on it. Unless you are 100% familiar
with *all* the algorithms contained herein, and with the
consequences of modifying them, you should NOT meddle with the
compression or decompression machinery. Incorrect changes can and
very likely *will* lead to disastrous loss of data.
DISCLAIMER:
I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE
USE OF THIS PROGRAM, HOWSOEVER CAUSED.
Every compression of a file implies an assumption that the
compressed file can be decompressed to reproduce the original.
Great efforts in design, coding and testing have been made to
ensure that this program works correctly. However, the complexity
of the algorithms, and, in particular, the presence of various
special cases in the code which occur with very low but non-zero
probability make it impossible to rule out the possibility of bugs
remaining in the program. DO NOT COMPRESS ANY DATA WITH THIS
PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER
SMALL, THAT THE DATA WILL NOT BE RECOVERABLE.
That is not to say this program is inherently unreliable. Indeed,
I very much hope the opposite is true. bzip2 has been carefully
constructed and extensively tested.
PATENTS:
To the best of my knowledge, bzip2 does not use any patented
algorithms. However, I do not have the resources available to
carry out a full patent search. Therefore I cannot give any
guarantee of the above statement.
End of legalities.
WHAT'S NEW IN 0.9.0 (as compared to 0.1pl2) ?
* Approx 10% faster compression, 30% faster decompression
* -t (test mode) is a lot quicker
* Can decompress concatenated compressed files
* Programming interface, so programs can directly read/write .bz2 files
* Less restrictive (BSD-style) licensing
* Flag handling more compatible with GNU gzip
* Much more documentation, i.e., a proper user manual
* Hopefully, improved portability (at least of the library)
WHAT'S NEW IN 0.9.5 ?
* Compression speed is much less sensitive to the input
data than in previous versions. Specifically, the very
slow performance caused by repetitive data is fixed.
* Many small improvements in file and flag handling.
* A Y2K statement.
WHAT'S NEW IN 1.0
See the CHANGES file.
I hope you find bzip2 useful. Feel free to contact me at
jseward@acm.org
if you have any suggestions or queries. Many people mailed me with
comments, suggestions and patches after the releases of bzip-0.15,
bzip-0.21, bzip2-0.1pl2 and bzip2-0.9.0, and the changes in bzip2 are
largely a result of this feedback. I thank you for your comments.
At least for the time being, bzip2's "home" is (or can be reached via)
http://www.muraroa.demon.co.uk.
Julian Seward
jseward@acm.org
Cambridge, UK
18 July 1996 (version 0.15)
25 August 1996 (version 0.21)
7 August 1997 (bzip2, version 0.1)
29 August 1997 (bzip2, version 0.1pl2)
23 August 1998 (bzip2, version 0.9.0)
8 June 1999 (bzip2, version 0.9.5)
4 Sept 1999 (bzip2, version 0.9.5d)
5 May 2000 (bzip2, version 1.0pre8)

View file

@ -0,0 +1,130 @@
bzip2-1.0 should compile without problems on the vast majority of
platforms. Using the supplied Makefile, I've built and tested it
myself for x86-linux, sparc-solaris, alpha-linux, x86-cygwin32 and
alpha-tru64unix. With makefile.msc, Visual C++ 6.0 and nmake, you can
build a native Win32 version too. Large file support seems to work
correctly on at least alpha-tru64unix and x86-cygwin32 (on Windows
2000).
When I say "large file" I mean a file of size 2,147,483,648 (2^31)
bytes or above. Many older OSs can't handle files above this size,
but many newer ones can. Large files are pretty huge -- most files
you'll encounter are not Large Files.
Earlier versions of bzip2 (0.1, 0.9.0, 0.9.5) compiled on a wide
variety of platforms without difficulty, and I hope this version will
continue in that tradition. However, in order to support large files,
I've had to include the define -D_FILE_OFFSET_BITS=64 in the Makefile.
This can cause problems.
The technique of adding -D_FILE_OFFSET_BITS=64 to get large file
support is, as far as I know, the Recommended Way to get correct large
file support. For more details, see the Large File Support
Specification, published by the Large File Summit, at
http://www.sas.com/standard/large.file/
As a general comment, if you get compilation errors which you think
are related to large file support, try removing the above define from
the Makefile, ie, delete the line
BIGFILES=-D_FILE_OFFSET_BITS=64
from the Makefile, and do 'make clean ; make'. This will give you a
version of bzip2 without large file support, which, for most
applications, is probably not a problem.
Alternatively, try some of the platform-specific hints listed below.
You can use the spewG.c program to generate huge files to test bzip2's
large file support, if you are feeling paranoid. Be aware though that
any compilation problems which affect bzip2 will also affect spewG.c,
alas.
Known problems as of 1.0pre8:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* HP/UX 10.20 and 11.00, using gcc (2.7.2.3 and 2.95.2): A large
number of warnings appear, including the following:
/usr/include/sys/resource.h: In function `getrlimit':
/usr/include/sys/resource.h:168:
warning: implicit declaration of function `__getrlimit64'
/usr/include/sys/resource.h: In function `setrlimit':
/usr/include/sys/resource.h:170:
warning: implicit declaration of function `__setrlimit64'
This would appear to be a problem with large file support, header
files and gcc. gcc may or may not give up at this point. If it
fails, you might be able to improve matters by adding
-D__STDC_EXT__=1
to the BIGFILES variable in the Makefile (ie, change its definition
to
BIGFILES=-D_FILE_OFFSET_BITS=64 -D__STDC_EXT__=1
Even if gcc does produce a binary which appears to work (ie passes
its self-tests), you might want to test it to see if it works properly
on large files.
* HP/UX 10.20 and 11.00, using HP's cc compiler.
No specific problems for this combination, except that you'll need to
specify the -Ae flag, and zap the gcc-specific stuff
-Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce.
You should retain -D_FILE_OFFSET_BITS=64 in order to get large
file support -- which is reported to work ok for this HP/UX + cc
combination.
* SunOS 4.1.X.
Amazingly, there are still people out there using this venerable old
banger. I shouldn't be too rude -- I started life on SunOS, and
it was a pretty darn good OS, way back then. Anyway:
SunOS doesn't seem to have strerror(), so you'll have to use
perror(), perhaps by doing adding this (warning: UNTESTED CODE):
char* strerror ( int errnum )
{
if (errnum < 0 || errnum >= sys_nerr)
return "Unknown error";
else
return sys_errlist[errnum];
}
Or you could comment out the relevant calls to strerror; they're
not mission-critical. Or you could upgrade to Solaris. Ha ha ha!
(what?? you think I've got Bad Attitude?)
* Making a shared library on Solaris. (Not really a compilation
problem, but many people ask ...)
Firstly, if you have Solaris 8, either you have libbz2.so already
on your system, or you can install it from the Solaris CD.
Secondly, be aware that there are potential naming conflicts
between the .so file supplied with Solaris 8, and the .so file
which Makefile-libbz2_so will make. Makefile-libbz2_so creates
a .so which has the names which I intend to be "official" as
of version 1.0.0 and onwards. Unfortunately, the .so in
Solaris 8 appeared before I decided on the final names, so
the two libraries are incompatible. We have since communicated
and I hope that the problems will have been solved in the next
version of Solaris, whenever that might appear.
All that said: you might be able to get somewhere
by finding the line in Makefile-libbz2_so which says
$(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS)
and replacing with
($CC) -G -shared -o libbz2.so.1.0.1 -h libbz2.so.1.0 $(OBJS)
If gcc objects to the combination -fpic -fPIC, get rid of
the second one, leaving just "-fpic".
That's the end of the currently known compilation problems.

View file

@ -0,0 +1,8 @@
2001-10-16: Imported bzip2 code into reactos project to create a decompression
library for use by the ramdisk driver to decompress the ramdisk image into mem
Modified makefile and some source code to build a .dll with only the
decompression code. There are 3 exports: the decompression routine, and two
function pointers that must be initialized to C malloc and free routines.
- Phillip Susi

View file

@ -0,0 +1,34 @@
Y2K status of bzip2 and libbzip2, versions 0.1, 0.9.0 and 0.9.5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Informally speaking:
bzip2 is a compression program built on top of libbzip2,
a library which does the real work of compression and
decompression. As far as I am aware, libbzip2 does not have
any date-related code at all.
bzip2 itself copies dates from source to destination files
when compressing or decompressing, using the 'stat' and 'utime'
UNIX system calls. It doesn't examine, manipulate or store the
dates in any way. So as far as I can see, there shouldn't be any
problem with bzip2 providing 'stat' and 'utime' work correctly
on your system.
On non-unix platforms (those for which BZ_UNIX in bzip2.c is
not set to 1), bzip2 doesn't even do the date copying.
Overall, informally speaking, I don't think bzip2 or libbzip2
have a Y2K problem.
Formally speaking:
I am not prepared to offer you any assurance whatsoever
regarding Y2K issues in my software. You alone assume the
entire risk of using the software. The disclaimer of liability
in the LICENSE file in the bzip2 source distribution continues
to apply on this issue as with every other issue pertaining
to the software.
Julian Seward
Cambridge, UK
25 August 1999

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,439 @@
.PU
.TH bzip2 1
.SH NAME
bzip2, bunzip2 \- a block-sorting file compressor, v1.0
.br
bzcat \- decompresses files to stdout
.br
bzip2recover \- recovers data from damaged bzip2 files
.SH SYNOPSIS
.ll +8
.B bzip2
.RB [ " \-cdfkqstvzVL123456789 " ]
[
.I "filenames \&..."
]
.ll -8
.br
.B bunzip2
.RB [ " \-fkvsVL " ]
[
.I "filenames \&..."
]
.br
.B bzcat
.RB [ " \-s " ]
[
.I "filenames \&..."
]
.br
.B bzip2recover
.I "filename"
.SH DESCRIPTION
.I bzip2
compresses files using the Burrows-Wheeler block sorting
text compression algorithm, and Huffman coding. Compression is
generally considerably better than that achieved by more conventional
LZ77/LZ78-based compressors, and approaches the performance of the PPM
family of statistical compressors.
The command-line options are deliberately very similar to
those of
.I GNU gzip,
but they are not identical.
.I bzip2
expects a list of file names to accompany the
command-line flags. Each file is replaced by a compressed version of
itself, with the name "original_name.bz2".
Each compressed file
has the same modification date, permissions, and, when possible,
ownership as the corresponding original, so that these properties can
be correctly restored at decompression time. File name handling is
naive in the sense that there is no mechanism for preserving original
file names, permissions, ownerships or dates in filesystems which lack
these concepts, or have serious file name length restrictions, such as
MS-DOS.
.I bzip2
and
.I bunzip2
will by default not overwrite existing
files. If you want this to happen, specify the \-f flag.
If no file names are specified,
.I bzip2
compresses from standard
input to standard output. In this case,
.I bzip2
will decline to
write compressed output to a terminal, as this would be entirely
incomprehensible and therefore pointless.
.I bunzip2
(or
.I bzip2 \-d)
decompresses all
specified files. Files which were not created by
.I bzip2
will be detected and ignored, and a warning issued.
.I bzip2
attempts to guess the filename for the decompressed file
from that of the compressed file as follows:
filename.bz2 becomes filename
filename.bz becomes filename
filename.tbz2 becomes filename.tar
filename.tbz becomes filename.tar
anyothername becomes anyothername.out
If the file does not end in one of the recognised endings,
.I .bz2,
.I .bz,
.I .tbz2
or
.I .tbz,
.I bzip2
complains that it cannot
guess the name of the original file, and uses the original name
with
.I .out
appended.
As with compression, supplying no
filenames causes decompression from
standard input to standard output.
.I bunzip2
will correctly decompress a file which is the
concatenation of two or more compressed files. The result is the
concatenation of the corresponding uncompressed files. Integrity
testing (\-t)
of concatenated
compressed files is also supported.
You can also compress or decompress files to the standard output by
giving the \-c flag. Multiple files may be compressed and
decompressed like this. The resulting outputs are fed sequentially to
stdout. Compression of multiple files
in this manner generates a stream
containing multiple compressed file representations. Such a stream
can be decompressed correctly only by
.I bzip2
version 0.9.0 or
later. Earlier versions of
.I bzip2
will stop after decompressing
the first file in the stream.
.I bzcat
(or
.I bzip2 -dc)
decompresses all specified files to
the standard output.
.I bzip2
will read arguments from the environment variables
.I BZIP2
and
.I BZIP,
in that order, and will process them
before any arguments read from the command line. This gives a
convenient way to supply default arguments.
Compression is always performed, even if the compressed
file is slightly
larger than the original. Files of less than about one hundred bytes
tend to get larger, since the compression mechanism has a constant
overhead in the region of 50 bytes. Random data (including the output
of most file compressors) is coded at about 8.05 bits per byte, giving
an expansion of around 0.5%.
As a self-check for your protection,
.I
bzip2
uses 32-bit CRCs to
make sure that the decompressed version of a file is identical to the
original. This guards against corruption of the compressed data, and
against undetected bugs in
.I bzip2
(hopefully very unlikely). The
chances of data corruption going undetected is microscopic, about one
chance in four billion for each file processed. Be aware, though, that
the check occurs upon decompression, so it can only tell you that
something is wrong. It can't help you
recover the original uncompressed
data. You can use
.I bzip2recover
to try to recover data from
damaged files.
Return values: 0 for a normal exit, 1 for environmental problems (file
not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
compressed file, 3 for an internal consistency error (eg, bug) which
caused
.I bzip2
to panic.
.SH OPTIONS
.TP
.B \-c --stdout
Compress or decompress to standard output.
.TP
.B \-d --decompress
Force decompression.
.I bzip2,
.I bunzip2
and
.I bzcat
are
really the same program, and the decision about what actions to take is
done on the basis of which name is used. This flag overrides that
mechanism, and forces
.I bzip2
to decompress.
.TP
.B \-z --compress
The complement to \-d: forces compression, regardless of the
invokation name.
.TP
.B \-t --test
Check integrity of the specified file(s), but don't decompress them.
This really performs a trial decompression and throws away the result.
.TP
.B \-f --force
Force overwrite of output files. Normally,
.I bzip2
will not overwrite
existing output files. Also forces
.I bzip2
to break hard links
to files, which it otherwise wouldn't do.
.TP
.B \-k --keep
Keep (don't delete) input files during compression
or decompression.
.TP
.B \-s --small
Reduce memory usage, for compression, decompression and testing. Files
are decompressed and tested using a modified algorithm which only
requires 2.5 bytes per block byte. This means any file can be
decompressed in 2300k of memory, albeit at about half the normal speed.
During compression, \-s selects a block size of 200k, which limits
memory use to around the same figure, at the expense of your compression
ratio. In short, if your machine is low on memory (8 megabytes or
less), use \-s for everything. See MEMORY MANAGEMENT below.
.TP
.B \-q --quiet
Suppress non-essential warning messages. Messages pertaining to
I/O errors and other critical events will not be suppressed.
.TP
.B \-v --verbose
Verbose mode -- show the compression ratio for each file processed.
Further \-v's increase the verbosity level, spewing out lots of
information which is primarily of interest for diagnostic purposes.
.TP
.B \-L --license -V --version
Display the software version, license terms and conditions.
.TP
.B \-1 to \-9
Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
effect when decompressing. See MEMORY MANAGEMENT below.
.TP
.B \--
Treats all subsequent arguments as file names, even if they start
with a dash. This is so you can handle files with names beginning
with a dash, for example: bzip2 \-- \-myfilename.
.TP
.B \--repetitive-fast --repetitive-best
These flags are redundant in versions 0.9.5 and above. They provided
some coarse control over the behaviour of the sorting algorithm in
earlier versions, which was sometimes useful. 0.9.5 and above have an
improved algorithm which renders these flags irrelevant.
.SH MEMORY MANAGEMENT
.I bzip2
compresses large files in blocks. The block size affects
both the compression ratio achieved, and the amount of memory needed for
compression and decompression. The flags \-1 through \-9
specify the block size to be 100,000 bytes through 900,000 bytes (the
default) respectively. At decompression time, the block size used for
compression is read from the header of the compressed file, and
.I bunzip2
then allocates itself just enough memory to decompress
the file. Since block sizes are stored in compressed files, it follows
that the flags \-1 to \-9 are irrelevant to and so ignored
during decompression.
Compression and decompression requirements,
in bytes, can be estimated as:
Compression: 400k + ( 8 x block size )
Decompression: 100k + ( 4 x block size ), or
100k + ( 2.5 x block size )
Larger block sizes give rapidly diminishing marginal returns. Most of
the compression comes from the first two or three hundred k of block
size, a fact worth bearing in mind when using
.I bzip2
on small machines.
It is also important to appreciate that the decompression memory
requirement is set at compression time by the choice of block size.
For files compressed with the default 900k block size,
.I bunzip2
will require about 3700 kbytes to decompress. To support decompression
of any file on a 4 megabyte machine,
.I bunzip2
has an option to
decompress using approximately half this amount of memory, about 2300
kbytes. Decompression speed is also halved, so you should use this
option only where necessary. The relevant flag is -s.
In general, try and use the largest block size memory constraints allow,
since that maximises the compression achieved. Compression and
decompression speed are virtually unaffected by block size.
Another significant point applies to files which fit in a single block
-- that means most files you'd encounter using a large block size. The
amount of real memory touched is proportional to the size of the file,
since the file is smaller than a block. For example, compressing a file
20,000 bytes long with the flag -9 will cause the compressor to
allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
kbytes of it. Similarly, the decompressor will allocate 3700k but only
touch 100k + 20000 * 4 = 180 kbytes.
Here is a table which summarises the maximum memory usage for different
block sizes. Also recorded is the total compressed size for 14 files of
the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
column gives some feel for how compression varies with block size.
These figures tend to understate the advantage of larger block sizes for
larger files, since the Corpus is dominated by smaller files.
Compress Decompress Decompress Corpus
Flag usage usage -s usage Size
-1 1200k 500k 350k 914704
-2 2000k 900k 600k 877703
-3 2800k 1300k 850k 860338
-4 3600k 1700k 1100k 846899
-5 4400k 2100k 1350k 845160
-6 5200k 2500k 1600k 838626
-7 6100k 2900k 1850k 834096
-8 6800k 3300k 2100k 828642
-9 7600k 3700k 2350k 828642
.SH RECOVERING DATA FROM DAMAGED FILES
.I bzip2
compresses files in blocks, usually 900kbytes long. Each
block is handled independently. If a media or transmission error causes
a multi-block .bz2
file to become damaged, it may be possible to
recover data from the undamaged blocks in the file.
The compressed representation of each block is delimited by a 48-bit
pattern, which makes it possible to find the block boundaries with
reasonable certainty. Each block also carries its own 32-bit CRC, so
damaged blocks can be distinguished from undamaged ones.
.I bzip2recover
is a simple program whose purpose is to search for
blocks in .bz2 files, and write each block out into its own .bz2
file. You can then use
.I bzip2
\-t
to test the
integrity of the resulting files, and decompress those which are
undamaged.
.I bzip2recover
takes a single argument, the name of the damaged file,
and writes a number of files "rec0001file.bz2",
"rec0002file.bz2", etc, containing the extracted blocks.
The output filenames are designed so that the use of
wildcards in subsequent processing -- for example,
"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in
the correct order.
.I bzip2recover
should be of most use dealing with large .bz2
files, as these will contain many blocks. It is clearly
futile to use it on damaged single-block files, since a
damaged block cannot be recovered. If you wish to minimise
any potential data loss through media or transmission errors,
you might consider compressing with a smaller
block size.
.SH PERFORMANCE NOTES
The sorting phase of compression gathers together similar strings in the
file. Because of this, files containing very long runs of repeated
symbols, like "aabaabaabaab ..." (repeated several hundred times) may
compress more slowly than normal. Versions 0.9.5 and above fare much
better than previous versions in this respect. The ratio between
worst-case and average-case compression time is in the region of 10:1.
For previous versions, this figure was more like 100:1. You can use the
\-vvvv option to monitor progress in great detail, if you want.
Decompression speed is unaffected by these phenomena.
.I bzip2
usually allocates several megabytes of memory to operate
in, and then charges all over it in a fairly random fashion. This means
that performance, both for compressing and decompressing, is largely
determined by the speed at which your machine can service cache misses.
Because of this, small changes to the code to reduce the miss rate have
been observed to give disproportionately large performance improvements.
I imagine
.I bzip2
will perform best on machines with very large caches.
.SH CAVEATS
I/O error messages are not as helpful as they could be.
.I bzip2
tries hard to detect I/O errors and exit cleanly, but the details of
what the problem is sometimes seem rather misleading.
This manual page pertains to version 1.0 of
.I bzip2.
Compressed
data created by this version is entirely forwards and backwards
compatible with the previous public releases, versions 0.1pl2, 0.9.0
and 0.9.5,
but with the following exception: 0.9.0 and above can correctly
decompress multiple concatenated compressed files. 0.1pl2 cannot do
this; it will stop after decompressing just the first file in the
stream.
.I bzip2recover
uses 32-bit integers to represent bit positions in
compressed files, so it cannot handle compressed files more than 512
megabytes long. This could easily be fixed.
.SH AUTHOR
Julian Seward, jseward@acm.org.
http://sourceware.cygnus.com/bzip2
http://www.muraroa.demon.co.uk
The ideas embodied in
.I bzip2
are due to (at least) the following
people: Michael Burrows and David Wheeler (for the block sorting
transformation), David Wheeler (again, for the Huffman coder), Peter
Fenwick (for the structured coding model in the original
.I bzip,
and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
(for the arithmetic coder in the original
.I bzip).
I am much
indebted for their help, support and advice. See the manual in the
source distribution for pointers to sources of documentation. Christian
von Roques encouraged me to look for faster sorting algorithms, so as to
speed up compression. Bela Lubkin encouraged me to improve the
worst-case compression performance. Many people sent patches, helped
with portability problems, lent machines, gave advice and were generally
helpful.

View file

@ -0,0 +1,462 @@
bzip2(1) bzip2(1)
NNAAMMEE
bzip2, bunzip2 - a block-sorting file compressor, v1.0
bzcat - decompresses files to stdout
bzip2recover - recovers data from damaged bzip2 files
SSYYNNOOPPSSIISS
bbzziipp22 [ --ccddffkkqqssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ]
bbuunnzziipp22 [ --ffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ]
bbzzccaatt [ --ss ] [ _f_i_l_e_n_a_m_e_s _._._. ]
bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e
DDEESSCCRRIIPPTTIIOONN
_b_z_i_p_2 compresses files using the Burrows-Wheeler block
sorting text compression algorithm, and Huffman coding.
Compression is generally considerably better than that
achieved by more conventional LZ77/LZ78-based compressors,
and approaches the performance of the PPM family of sta-
tistical compressors.
The command-line options are deliberately very similar to
those of _G_N_U _g_z_i_p_, but they are not identical.
_b_z_i_p_2 expects a list of file names to accompany the com-
mand-line flags. Each file is replaced by a compressed
version of itself, with the name "original_name.bz2".
Each compressed file has the same modification date, per-
missions, and, when possible, ownership as the correspond-
ing original, so that these properties can be correctly
restored at decompression time. File name handling is
naive in the sense that there is no mechanism for preserv-
ing original file names, permissions, ownerships or dates
in filesystems which lack these concepts, or have serious
file name length restrictions, such as MS-DOS.
_b_z_i_p_2 and _b_u_n_z_i_p_2 will by default not overwrite existing
files. If you want this to happen, specify the -f flag.
If no file names are specified, _b_z_i_p_2 compresses from
standard input to standard output. In this case, _b_z_i_p_2
will decline to write compressed output to a terminal, as
this would be entirely incomprehensible and therefore
pointless.
_b_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d_) decompresses all specified files.
Files which were not created by _b_z_i_p_2 will be detected and
ignored, and a warning issued. _b_z_i_p_2 attempts to guess
the filename for the decompressed file from that of the
compressed file as follows:
filename.bz2 becomes filename
filename.bz becomes filename
filename.tbz2 becomes filename.tar
1
bzip2(1) bzip2(1)
filename.tbz becomes filename.tar
anyothername becomes anyothername.out
If the file does not end in one of the recognised endings,
_._b_z_2_, _._b_z_, _._t_b_z_2 or _._t_b_z_, _b_z_i_p_2 complains that it cannot
guess the name of the original file, and uses the original
name with _._o_u_t appended.
As with compression, supplying no filenames causes decom-
pression from standard input to standard output.
_b_u_n_z_i_p_2 will correctly decompress a file which is the con-
catenation of two or more compressed files. The result is
the concatenation of the corresponding uncompressed files.
Integrity testing (-t) of concatenated compressed files is
also supported.
You can also compress or decompress files to the standard
output by giving the -c flag. Multiple files may be com-
pressed and decompressed like this. The resulting outputs
are fed sequentially to stdout. Compression of multiple
files in this manner generates a stream containing multi-
ple compressed file representations. Such a stream can be
decompressed correctly only by _b_z_i_p_2 version 0.9.0 or
later. Earlier versions of _b_z_i_p_2 will stop after decom-
pressing the first file in the stream.
_b_z_c_a_t (or _b_z_i_p_2 _-_d_c_) decompresses all specified files to
the standard output.
_b_z_i_p_2 will read arguments from the environment variables
_B_Z_I_P_2 and _B_Z_I_P_, in that order, and will process them
before any arguments read from the command line. This
gives a convenient way to supply default arguments.
Compression is always performed, even if the compressed
file is slightly larger than the original. Files of less
than about one hundred bytes tend to get larger, since the
compression mechanism has a constant overhead in the
region of 50 bytes. Random data (including the output of
most file compressors) is coded at about 8.05 bits per
byte, giving an expansion of around 0.5%.
As a self-check for your protection, _b_z_i_p_2 uses 32-bit
CRCs to make sure that the decompressed version of a file
is identical to the original. This guards against corrup-
tion of the compressed data, and against undetected bugs
in _b_z_i_p_2 (hopefully very unlikely). The chances of data
corruption going undetected is microscopic, about one
chance in four billion for each file processed. Be aware,
though, that the check occurs upon decompression, so it
can only tell you that something is wrong. It can't help
you recover the original uncompressed data. You can use
_b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files.
2
bzip2(1) bzip2(1)
Return values: 0 for a normal exit, 1 for environmental
problems (file not found, invalid flags, I/O errors, &c),
2 to indicate a corrupt compressed file, 3 for an internal
consistency error (eg, bug) which caused _b_z_i_p_2 to panic.
OOPPTTIIOONNSS
--cc ----ssttddoouutt
Compress or decompress to standard output.
--dd ----ddeeccoommpprreessss
Force decompression. _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are
really the same program, and the decision about
what actions to take is done on the basis of which
name is used. This flag overrides that mechanism,
and forces _b_z_i_p_2 to decompress.
--zz ----ccoommpprreessss
The complement to -d: forces compression, regard-
less of the invokation name.
--tt ----tteesstt
Check integrity of the specified file(s), but don't
decompress them. This really performs a trial
decompression and throws away the result.
--ff ----ffoorrccee
Force overwrite of output files. Normally, _b_z_i_p_2
will not overwrite existing output files. Also
forces _b_z_i_p_2 to break hard links to files, which it
otherwise wouldn't do.
--kk ----kkeeeepp
Keep (don't delete) input files during compression
or decompression.
--ss ----ssmmaallll
Reduce memory usage, for compression, decompression
and testing. Files are decompressed and tested
using a modified algorithm which only requires 2.5
bytes per block byte. This means any file can be
decompressed in 2300k of memory, albeit at about
half the normal speed.
During compression, -s selects a block size of
200k, which limits memory use to around the same
figure, at the expense of your compression ratio.
In short, if your machine is low on memory (8
megabytes or less), use -s for everything. See
MEMORY MANAGEMENT below.
--qq ----qquuiieett
Suppress non-essential warning messages. Messages
pertaining to I/O errors and other critical events
3
bzip2(1) bzip2(1)
will not be suppressed.
--vv ----vveerrbboossee
Verbose mode -- show the compression ratio for each
file processed. Further -v's increase the ver-
bosity level, spewing out lots of information which
is primarily of interest for diagnostic purposes.
--LL ----lliicceennssee --VV ----vveerrssiioonn
Display the software version, license terms and
conditions.
--11 ttoo --99
Set the block size to 100 k, 200 k .. 900 k when
compressing. Has no effect when decompressing.
See MEMORY MANAGEMENT below.
---- Treats all subsequent arguments as file names, even
if they start with a dash. This is so you can han-
dle files with names beginning with a dash, for
example: bzip2 -- -myfilename.
----rreeppeettiittiivvee--ffaasstt ----rreeppeettiittiivvee--bbeesstt
These flags are redundant in versions 0.9.5 and
above. They provided some coarse control over the
behaviour of the sorting algorithm in earlier ver-
sions, which was sometimes useful. 0.9.5 and above
have an improved algorithm which renders these
flags irrelevant.
MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
_b_z_i_p_2 compresses large files in blocks. The block size
affects both the compression ratio achieved, and the
amount of memory needed for compression and decompression.
The flags -1 through -9 specify the block size to be
100,000 bytes through 900,000 bytes (the default) respec-
tively. At decompression time, the block size used for
compression is read from the header of the compressed
file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
to decompress the file. Since block sizes are stored in
compressed files, it follows that the flags -1 to -9 are
irrelevant to and so ignored during decompression.
Compression and decompression requirements, in bytes, can
be estimated as:
Compression: 400k + ( 8 x block size )
Decompression: 100k + ( 4 x block size ), or
100k + ( 2.5 x block size )
Larger block sizes give rapidly diminishing marginal
returns. Most of the compression comes from the first two
4
bzip2(1) bzip2(1)
or three hundred k of block size, a fact worth bearing in
mind when using _b_z_i_p_2 on small machines. It is also
important to appreciate that the decompression memory
requirement is set at compression time by the choice of
block size.
For files compressed with the default 900k block size,
_b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To
support decompression of any file on a 4 megabyte machine,
_b_u_n_z_i_p_2 has an option to decompress using approximately
half this amount of memory, about 2300 kbytes. Decompres-
sion speed is also halved, so you should use this option
only where necessary. The relevant flag is -s.
In general, try and use the largest block size memory con-
straints allow, since that maximises the compression
achieved. Compression and decompression speed are virtu-
ally unaffected by block size.
Another significant point applies to files which fit in a
single block -- that means most files you'd encounter
using a large block size. The amount of real memory
touched is proportional to the size of the file, since the
file is smaller than a block. For example, compressing a
file 20,000 bytes long with the flag -9 will cause the
compressor to allocate around 7600k of memory, but only
touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the
decompressor will allocate 3700k but only touch 100k +
20000 * 4 = 180 kbytes.
Here is a table which summarises the maximum memory usage
for different block sizes. Also recorded is the total
compressed size for 14 files of the Calgary Text Compres-
sion Corpus totalling 3,141,622 bytes. This column gives
some feel for how compression varies with block size.
These figures tend to understate the advantage of larger
block sizes for larger files, since the Corpus is domi-
nated by smaller files.
Compress Decompress Decompress Corpus
Flag usage usage -s usage Size
-1 1200k 500k 350k 914704
-2 2000k 900k 600k 877703
-3 2800k 1300k 850k 860338
-4 3600k 1700k 1100k 846899
-5 4400k 2100k 1350k 845160
-6 5200k 2500k 1600k 838626
-7 6100k 2900k 1850k 834096
-8 6800k 3300k 2100k 828642
-9 7600k 3700k 2350k 828642
5
bzip2(1) bzip2(1)
RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS
_b_z_i_p_2 compresses files in blocks, usually 900kbytes long.
Each block is handled independently. If a media or trans-
mission error causes a multi-block .bz2 file to become
damaged, it may be possible to recover data from the
undamaged blocks in the file.
The compressed representation of each block is delimited
by a 48-bit pattern, which makes it possible to find the
block boundaries with reasonable certainty. Each block
also carries its own 32-bit CRC, so damaged blocks can be
distinguished from undamaged ones.
_b_z_i_p_2_r_e_c_o_v_e_r is a simple program whose purpose is to
search for blocks in .bz2 files, and write each block out
into its own .bz2 file. You can then use _b_z_i_p_2 -t to test
the integrity of the resulting files, and decompress those
which are undamaged.
_b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam-
aged file, and writes a number of files "rec0001file.bz2",
"rec0002file.bz2", etc, containing the extracted blocks.
The output filenames are designed so that the use of
wildcards in subsequent processing -- for example, "bzip2
-dc rec*file.bz2 > recovered_data" -- lists the files in
the correct order.
_b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
files, as these will contain many blocks. It is clearly
futile to use it on damaged single-block files, since a
damaged block cannot be recovered. If you wish to min-
imise any potential data loss through media or transmis-
sion errors, you might consider compressing with a smaller
block size.
PPEERRFFOORRMMAANNCCEE NNOOTTEESS
The sorting phase of compression gathers together similar
strings in the file. Because of this, files containing
very long runs of repeated symbols, like "aabaabaabaab
..." (repeated several hundred times) may compress more
slowly than normal. Versions 0.9.5 and above fare much
better than previous versions in this respect. The ratio
between worst-case and average-case compression time is in
the region of 10:1. For previous versions, this figure
was more like 100:1. You can use the -vvvv option to mon-
itor progress in great detail, if you want.
Decompression speed is unaffected by these phenomena.
_b_z_i_p_2 usually allocates several megabytes of memory to
operate in, and then charges all over it in a fairly ran-
dom fashion. This means that performance, both for com-
pressing and decompressing, is largely determined by the
6
bzip2(1) bzip2(1)
speed at which your machine can service cache misses.
Because of this, small changes to the code to reduce the
miss rate have been observed to give disproportionately
large performance improvements. I imagine _b_z_i_p_2 will per-
form best on machines with very large caches.
CCAAVVEEAATTSS
I/O error messages are not as helpful as they could be.
_b_z_i_p_2 tries hard to detect I/O errors and exit cleanly,
but the details of what the problem is sometimes seem
rather misleading.
This manual page pertains to version 1.0 of _b_z_i_p_2_. Com-
pressed data created by this version is entirely forwards
and backwards compatible with the previous public
releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the
following exception: 0.9.0 and above can correctly decom-
press multiple concatenated compressed files. 0.1pl2 can-
not do this; it will stop after decompressing just the
first file in the stream.
_b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi-
tions in compressed files, so it cannot handle compressed
files more than 512 megabytes long. This could easily be
fixed.
AAUUTTHHOORR
Julian Seward, jseward@acm.org.
http://sourceware.cygnus.com/bzip2
http://www.muraroa.demon.co.uk
The ideas embodied in _b_z_i_p_2 are due to (at least) the fol-
lowing people: Michael Burrows and David Wheeler (for the
block sorting transformation), David Wheeler (again, for
the Huffman coder), Peter Fenwick (for the structured cod-
ing model in the original _b_z_i_p_, and many refinements), and
Alistair Moffat, Radford Neal and Ian Witten (for the
arithmetic coder in the original _b_z_i_p_)_. I am much
indebted for their help, support and advice. See the man-
ual in the source distribution for pointers to sources of
documentation. Christian von Roques encouraged me to look
for faster sorting algorithms, so as to speed up compres-
sion. Bela Lubkin encouraged me to improve the worst-case
compression performance. Many people sent patches, helped
with portability problems, lent machines, gave advice and
were generally helpful.
7

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,376 @@
NAME
bzip2, bunzip2 - a block-sorting file compressor, v1.0
bzcat - decompresses files to stdout
bzip2recover - recovers data from damaged bzip2 files
SYNOPSIS
bzip2 [ -cdfkqstvzVL123456789 ] [ filenames ... ]
bunzip2 [ -fkvsVL ] [ filenames ... ]
bzcat [ -s ] [ filenames ... ]
bzip2recover filename
DESCRIPTION
bzip2 compresses files using the Burrows-Wheeler block
sorting text compression algorithm, and Huffman coding.
Compression is generally considerably better than that
achieved by more conventional LZ77/LZ78-based compressors,
and approaches the performance of the PPM family of sta-
tistical compressors.
The command-line options are deliberately very similar to
those of GNU gzip, but they are not identical.
bzip2 expects a list of file names to accompany the com-
mand-line flags. Each file is replaced by a compressed
version of itself, with the name "original_name.bz2".
Each compressed file has the same modification date, per-
missions, and, when possible, ownership as the correspond-
ing original, so that these properties can be correctly
restored at decompression time. File name handling is
naive in the sense that there is no mechanism for preserv-
ing original file names, permissions, ownerships or dates
in filesystems which lack these concepts, or have serious
file name length restrictions, such as MS-DOS.
bzip2 and bunzip2 will by default not overwrite existing
files. If you want this to happen, specify the -f flag.
If no file names are specified, bzip2 compresses from
standard input to standard output. In this case, bzip2
will decline to write compressed output to a terminal, as
this would be entirely incomprehensible and therefore
pointless.
bunzip2 (or bzip2 -d) decompresses all specified files.
Files which were not created by bzip2 will be detected and
ignored, and a warning issued. bzip2 attempts to guess
the filename for the decompressed file from that of the
compressed file as follows:
filename.bz2 becomes filename
filename.bz becomes filename
filename.tbz2 becomes filename.tar
filename.tbz becomes filename.tar
anyothername becomes anyothername.out
If the file does not end in one of the recognised endings,
.bz2, .bz, .tbz2 or .tbz, bzip2 complains that it cannot
guess the name of the original file, and uses the original
name with .out appended.
As with compression, supplying no filenames causes decom-
pression from standard input to standard output.
bunzip2 will correctly decompress a file which is the con-
catenation of two or more compressed files. The result is
the concatenation of the corresponding uncompressed files.
Integrity testing (-t) of concatenated compressed files is
also supported.
You can also compress or decompress files to the standard
output by giving the -c flag. Multiple files may be com-
pressed and decompressed like this. The resulting outputs
are fed sequentially to stdout. Compression of multiple
files in this manner generates a stream containing multi-
ple compressed file representations. Such a stream can be
decompressed correctly only by bzip2 version 0.9.0 or
later. Earlier versions of bzip2 will stop after decom-
pressing the first file in the stream.
bzcat (or bzip2 -dc) decompresses all specified files to
the standard output.
bzip2 will read arguments from the environment variables
BZIP2 and BZIP, in that order, and will process them
before any arguments read from the command line. This
gives a convenient way to supply default arguments.
Compression is always performed, even if the compressed
file is slightly larger than the original. Files of less
than about one hundred bytes tend to get larger, since the
compression mechanism has a constant overhead in the
region of 50 bytes. Random data (including the output of
most file compressors) is coded at about 8.05 bits per
byte, giving an expansion of around 0.5%.
As a self-check for your protection, bzip2 uses 32-bit
CRCs to make sure that the decompressed version of a file
is identical to the original. This guards against corrup-
tion of the compressed data, and against undetected bugs
in bzip2 (hopefully very unlikely). The chances of data
corruption going undetected is microscopic, about one
chance in four billion for each file processed. Be aware,
though, that the check occurs upon decompression, so it
can only tell you that something is wrong. It can't help
you recover the original uncompressed data. You can use
bzip2recover to try to recover data from damaged files.
Return values: 0 for a normal exit, 1 for environmental
problems (file not found, invalid flags, I/O errors, &c),
2 to indicate a corrupt compressed file, 3 for an internal
consistency error (eg, bug) which caused bzip2 to panic.
OPTIONS
-c --stdout
Compress or decompress to standard output.
-d --decompress
Force decompression. bzip2, bunzip2 and bzcat are
really the same program, and the decision about
what actions to take is done on the basis of which
name is used. This flag overrides that mechanism,
and forces bzip2 to decompress.
-z --compress
The complement to -d: forces compression, regard-
less of the invokation name.
-t --test
Check integrity of the specified file(s), but don't
decompress them. This really performs a trial
decompression and throws away the result.
-f --force
Force overwrite of output files. Normally, bzip2
will not overwrite existing output files. Also
forces bzip2 to break hard links to files, which it
otherwise wouldn't do.
-k --keep
Keep (don't delete) input files during compression
or decompression.
-s --small
Reduce memory usage, for compression, decompression
and testing. Files are decompressed and tested
using a modified algorithm which only requires 2.5
bytes per block byte. This means any file can be
decompressed in 2300k of memory, albeit at about
half the normal speed.
During compression, -s selects a block size of
200k, which limits memory use to around the same
figure, at the expense of your compression ratio.
In short, if your machine is low on memory (8
megabytes or less), use -s for everything. See
MEMORY MANAGEMENT below.
-q --quiet
Suppress non-essential warning messages. Messages
pertaining to I/O errors and other critical events
will not be suppressed.
-v --verbose
Verbose mode -- show the compression ratio for each
file processed. Further -v's increase the ver-
bosity level, spewing out lots of information which
is primarily of interest for diagnostic purposes.
-L --license -V --version
Display the software version, license terms and
conditions.
-1 to -9
Set the block size to 100 k, 200 k .. 900 k when
compressing. Has no effect when decompressing.
See MEMORY MANAGEMENT below.
-- Treats all subsequent arguments as file names, even
if they start with a dash. This is so you can han-
dle files with names beginning with a dash, for
example: bzip2 -- -myfilename.
--repetitive-fast --repetitive-best
These flags are redundant in versions 0.9.5 and
above. They provided some coarse control over the
behaviour of the sorting algorithm in earlier ver-
sions, which was sometimes useful. 0.9.5 and above
have an improved algorithm which renders these
flags irrelevant.
MEMORY MANAGEMENT
bzip2 compresses large files in blocks. The block size
affects both the compression ratio achieved, and the
amount of memory needed for compression and decompression.
The flags -1 through -9 specify the block size to be
100,000 bytes through 900,000 bytes (the default) respec-
tively. At decompression time, the block size used for
compression is read from the header of the compressed
file, and bunzip2 then allocates itself just enough memory
to decompress the file. Since block sizes are stored in
compressed files, it follows that the flags -1 to -9 are
irrelevant to and so ignored during decompression.
Compression and decompression requirements, in bytes, can
be estimated as:
Compression: 400k + ( 8 x block size )
Decompression: 100k + ( 4 x block size ), or
100k + ( 2.5 x block size )
Larger block sizes give rapidly diminishing marginal
returns. Most of the compression comes from the first two
or three hundred k of block size, a fact worth bearing in
mind when using bzip2 on small machines. It is also
important to appreciate that the decompression memory
requirement is set at compression time by the choice of
block size.
For files compressed with the default 900k block size,
bunzip2 will require about 3700 kbytes to decompress. To
support decompression of any file on a 4 megabyte machine,
bunzip2 has an option to decompress using approximately
half this amount of memory, about 2300 kbytes. Decompres-
sion speed is also halved, so you should use this option
only where necessary. The relevant flag is -s.
In general, try and use the largest block size memory con-
straints allow, since that maximises the compression
achieved. Compression and decompression speed are virtu-
ally unaffected by block size.
Another significant point applies to files which fit in a
single block -- that means most files you'd encounter
using a large block size. The amount of real memory
touched is proportional to the size of the file, since the
file is smaller than a block. For example, compressing a
file 20,000 bytes long with the flag -9 will cause the
compressor to allocate around 7600k of memory, but only
touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the
decompressor will allocate 3700k but only touch 100k +
20000 * 4 = 180 kbytes.
Here is a table which summarises the maximum memory usage
for different block sizes. Also recorded is the total
compressed size for 14 files of the Calgary Text Compres-
sion Corpus totalling 3,141,622 bytes. This column gives
some feel for how compression varies with block size.
These figures tend to understate the advantage of larger
block sizes for larger files, since the Corpus is domi-
nated by smaller files.
Compress Decompress Decompress Corpus
Flag usage usage -s usage Size
-1 1200k 500k 350k 914704
-2 2000k 900k 600k 877703
-3 2800k 1300k 850k 860338
-4 3600k 1700k 1100k 846899
-5 4400k 2100k 1350k 845160
-6 5200k 2500k 1600k 838626
-7 6100k 2900k 1850k 834096
-8 6800k 3300k 2100k 828642
-9 7600k 3700k 2350k 828642
RECOVERING DATA FROM DAMAGED FILES
bzip2 compresses files in blocks, usually 900kbytes long.
Each block is handled independently. If a media or trans-
mission error causes a multi-block .bz2 file to become
damaged, it may be possible to recover data from the
undamaged blocks in the file.
The compressed representation of each block is delimited
by a 48-bit pattern, which makes it possible to find the
block boundaries with reasonable certainty. Each block
also carries its own 32-bit CRC, so damaged blocks can be
distinguished from undamaged ones.
bzip2recover is a simple program whose purpose is to
search for blocks in .bz2 files, and write each block out
into its own .bz2 file. You can then use bzip2 -t to test
the integrity of the resulting files, and decompress those
which are undamaged.
bzip2recover takes a single argument, the name of the dam-
aged file, and writes a number of files "rec0001file.bz2",
"rec0002file.bz2", etc, containing the extracted blocks.
The output filenames are designed so that the use of
wildcards in subsequent processing -- for example, "bzip2
-dc rec*file.bz2 > recovered_data" -- lists the files in
the correct order.
bzip2recover should be of most use dealing with large .bz2
files, as these will contain many blocks. It is clearly
futile to use it on damaged single-block files, since a
damaged block cannot be recovered. If you wish to min-
imise any potential data loss through media or transmis-
sion errors, you might consider compressing with a smaller
block size.
PERFORMANCE NOTES
The sorting phase of compression gathers together similar
strings in the file. Because of this, files containing
very long runs of repeated symbols, like "aabaabaabaab
..." (repeated several hundred times) may compress more
slowly than normal. Versions 0.9.5 and above fare much
better than previous versions in this respect. The ratio
between worst-case and average-case compression time is in
the region of 10:1. For previous versions, this figure
was more like 100:1. You can use the -vvvv option to mon-
itor progress in great detail, if you want.
Decompression speed is unaffected by these phenomena.
bzip2 usually allocates several megabytes of memory to
operate in, and then charges all over it in a fairly ran-
dom fashion. This means that performance, both for com-
pressing and decompressing, is largely determined by the
speed at which your machine can service cache misses.
Because of this, small changes to the code to reduce the
miss rate have been observed to give disproportionately
large performance improvements. I imagine bzip2 will per-
form best on machines with very large caches.
CAVEATS
I/O error messages are not as helpful as they could be.
bzip2 tries hard to detect I/O errors and exit cleanly,
but the details of what the problem is sometimes seem
rather misleading.
This manual page pertains to version 1.0 of bzip2. Com-
pressed data created by this version is entirely forwards
and backwards compatible with the previous public
releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the
following exception: 0.9.0 and above can correctly decom-
press multiple concatenated compressed files. 0.1pl2 can-
not do this; it will stop after decompressing just the
first file in the stream.
bzip2recover uses 32-bit integers to represent bit posi-
tions in compressed files, so it cannot handle compressed
files more than 512 megabytes long. This could easily be
fixed.
AUTHOR
Julian Seward, jseward@acm.org.
http://sourceware.cygnus.com/bzip2
http://www.muraroa.demon.co.uk
The ideas embodied in bzip2 are due to (at least) the fol-
lowing people: Michael Burrows and David Wheeler (for the
block sorting transformation), David Wheeler (again, for
the Huffman coder), Peter Fenwick (for the structured cod-
ing model in the original bzip, and many refinements), and
Alistair Moffat, Radford Neal and Ian Witten (for the
arithmetic coder in the original bzip). I am much
indebted for their help, support and advice. See the man-
ual in the source distribution for pointers to sources of
documentation. Christian von Roques encouraged me to look
for faster sorting algorithms, so as to speed up compres-
sion. Bela Lubkin encouraged me to improve the worst-case
compression performance. Many people sent patches, helped
with portability problems, lent machines, gave advice and
were generally helpful.

View file

@ -0,0 +1,435 @@
/*-----------------------------------------------------------*/
/*--- Block recoverer program for bzip2 ---*/
/*--- bzip2recover.c ---*/
/*-----------------------------------------------------------*/
/*--
This program is bzip2recover, a program to attempt data
salvage from damaged files created by the accompanying
bzip2-1.0 program.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Cambridge, UK.
jseward@acm.org
bzip2/libbzip2 version 1.0 of 21 March 2000
--*/
/*--
This program is a complete hack and should be rewritten
properly. It isn't very complicated.
--*/
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned int UInt32;
typedef int Int32;
typedef unsigned char UChar;
typedef char Char;
typedef unsigned char Bool;
#define True ((Bool)1)
#define False ((Bool)0)
Char inFileName[2000];
Char outFileName[2000];
Char progName[2000];
UInt32 bytesOut = 0;
UInt32 bytesIn = 0;
/*---------------------------------------------------*/
/*--- I/O errors ---*/
/*---------------------------------------------------*/
/*---------------------------------------------*/
void readError ( void )
{
fprintf ( stderr,
"%s: I/O error reading `%s', possible reason follows.\n",
progName, inFileName );
perror ( progName );
fprintf ( stderr, "%s: warning: output file(s) may be incomplete.\n",
progName );
exit ( 1 );
}
/*---------------------------------------------*/
void writeError ( void )
{
fprintf ( stderr,
"%s: I/O error reading `%s', possible reason follows.\n",
progName, inFileName );
perror ( progName );
fprintf ( stderr, "%s: warning: output file(s) may be incomplete.\n",
progName );
exit ( 1 );
}
/*---------------------------------------------*/
void mallocFail ( Int32 n )
{
fprintf ( stderr,
"%s: malloc failed on request for %d bytes.\n",
progName, n );
fprintf ( stderr, "%s: warning: output file(s) may be incomplete.\n",
progName );
exit ( 1 );
}
/*---------------------------------------------------*/
/*--- Bit stream I/O ---*/
/*---------------------------------------------------*/
typedef
struct {
FILE* handle;
Int32 buffer;
Int32 buffLive;
Char mode;
}
BitStream;
/*---------------------------------------------*/
BitStream* bsOpenReadStream ( FILE* stream )
{
BitStream *bs = malloc ( sizeof(BitStream) );
if (bs == NULL) mallocFail ( sizeof(BitStream) );
bs->handle = stream;
bs->buffer = 0;
bs->buffLive = 0;
bs->mode = 'r';
return bs;
}
/*---------------------------------------------*/
BitStream* bsOpenWriteStream ( FILE* stream )
{
BitStream *bs = malloc ( sizeof(BitStream) );
if (bs == NULL) mallocFail ( sizeof(BitStream) );
bs->handle = stream;
bs->buffer = 0;
bs->buffLive = 0;
bs->mode = 'w';
return bs;
}
/*---------------------------------------------*/
void bsPutBit ( BitStream* bs, Int32 bit )
{
if (bs->buffLive == 8) {
Int32 retVal = putc ( (UChar) bs->buffer, bs->handle );
if (retVal == EOF) writeError();
bytesOut++;
bs->buffLive = 1;
bs->buffer = bit & 0x1;
} else {
bs->buffer = ( (bs->buffer << 1) | (bit & 0x1) );
bs->buffLive++;
};
}
/*---------------------------------------------*/
/*--
Returns 0 or 1, or 2 to indicate EOF.
--*/
Int32 bsGetBit ( BitStream* bs )
{
if (bs->buffLive > 0) {
bs->buffLive --;
return ( ((bs->buffer) >> (bs->buffLive)) & 0x1 );
} else {
Int32 retVal = getc ( bs->handle );
if ( retVal == EOF ) {
if (errno != 0) readError();
return 2;
}
bs->buffLive = 7;
bs->buffer = retVal;
return ( ((bs->buffer) >> 7) & 0x1 );
}
}
/*---------------------------------------------*/
void bsClose ( BitStream* bs )
{
Int32 retVal;
if ( bs->mode == 'w' ) {
while ( bs->buffLive < 8 ) {
bs->buffLive++;
bs->buffer <<= 1;
};
retVal = putc ( (UChar) (bs->buffer), bs->handle );
if (retVal == EOF) writeError();
bytesOut++;
retVal = fflush ( bs->handle );
if (retVal == EOF) writeError();
}
retVal = fclose ( bs->handle );
if (retVal == EOF) {
if (bs->mode == 'w') writeError(); else readError();
}
free ( bs );
}
/*---------------------------------------------*/
void bsPutUChar ( BitStream* bs, UChar c )
{
Int32 i;
for (i = 7; i >= 0; i--)
bsPutBit ( bs, (((UInt32) c) >> i) & 0x1 );
}
/*---------------------------------------------*/
void bsPutUInt32 ( BitStream* bs, UInt32 c )
{
Int32 i;
for (i = 31; i >= 0; i--)
bsPutBit ( bs, (c >> i) & 0x1 );
}
/*---------------------------------------------*/
Bool endsInBz2 ( Char* name )
{
Int32 n = strlen ( name );
if (n <= 4) return False;
return
(name[n-4] == '.' &&
name[n-3] == 'b' &&
name[n-2] == 'z' &&
name[n-1] == '2');
}
/*---------------------------------------------------*/
/*--- ---*/
/*---------------------------------------------------*/
#define BLOCK_HEADER_HI 0x00003141UL
#define BLOCK_HEADER_LO 0x59265359UL
#define BLOCK_ENDMARK_HI 0x00001772UL
#define BLOCK_ENDMARK_LO 0x45385090UL
UInt32 bStart[20000];
UInt32 bEnd[20000];
UInt32 rbStart[20000];
UInt32 rbEnd[20000];
Int32 main ( Int32 argc, Char** argv )
{
FILE* inFile;
FILE* outFile;
BitStream* bsIn, *bsWr;
Int32 currBlock, b, wrBlock;
UInt32 bitsRead;
Int32 rbCtr;
UInt32 buffHi, buffLo, blockCRC;
Char* p;
strcpy ( progName, argv[0] );
inFileName[0] = outFileName[0] = 0;
fprintf ( stderr, "bzip2recover 1.0: extracts blocks from damaged .bz2 files.\n" );
if (argc != 2) {
fprintf ( stderr, "%s: usage is `%s damaged_file_name'.\n",
progName, progName );
exit(1);
}
strcpy ( inFileName, argv[1] );
inFile = fopen ( inFileName, "rb" );
if (inFile == NULL) {
fprintf ( stderr, "%s: can't read `%s'\n", progName, inFileName );
exit(1);
}
bsIn = bsOpenReadStream ( inFile );
fprintf ( stderr, "%s: searching for block boundaries ...\n", progName );
bitsRead = 0;
buffHi = buffLo = 0;
currBlock = 0;
bStart[currBlock] = 0;
rbCtr = 0;
while (True) {
b = bsGetBit ( bsIn );
bitsRead++;
if (b == 2) {
if (bitsRead >= bStart[currBlock] &&
(bitsRead - bStart[currBlock]) >= 40) {
bEnd[currBlock] = bitsRead-1;
if (currBlock > 0)
fprintf ( stderr, " block %d runs from %d to %d (incomplete)\n",
currBlock, bStart[currBlock], bEnd[currBlock] );
} else
currBlock--;
break;
}
buffHi = (buffHi << 1) | (buffLo >> 31);
buffLo = (buffLo << 1) | (b & 1);
if ( ( (buffHi & 0x0000ffff) == BLOCK_HEADER_HI
&& buffLo == BLOCK_HEADER_LO)
||
( (buffHi & 0x0000ffff) == BLOCK_ENDMARK_HI
&& buffLo == BLOCK_ENDMARK_LO)
) {
if (bitsRead > 49)
bEnd[currBlock] = bitsRead-49; else
bEnd[currBlock] = 0;
if (currBlock > 0 &&
(bEnd[currBlock] - bStart[currBlock]) >= 130) {
fprintf ( stderr, " block %d runs from %d to %d\n",
rbCtr+1, bStart[currBlock], bEnd[currBlock] );
rbStart[rbCtr] = bStart[currBlock];
rbEnd[rbCtr] = bEnd[currBlock];
rbCtr++;
}
currBlock++;
bStart[currBlock] = bitsRead;
}
}
bsClose ( bsIn );
/*-- identified blocks run from 1 to rbCtr inclusive. --*/
if (rbCtr < 1) {
fprintf ( stderr,
"%s: sorry, I couldn't find any block boundaries.\n",
progName );
exit(1);
};
fprintf ( stderr, "%s: splitting into blocks\n", progName );
inFile = fopen ( inFileName, "rb" );
if (inFile == NULL) {
fprintf ( stderr, "%s: can't open `%s'\n", progName, inFileName );
exit(1);
}
bsIn = bsOpenReadStream ( inFile );
/*-- placate gcc's dataflow analyser --*/
blockCRC = 0; bsWr = 0;
bitsRead = 0;
outFile = NULL;
wrBlock = 0;
while (True) {
b = bsGetBit(bsIn);
if (b == 2) break;
buffHi = (buffHi << 1) | (buffLo >> 31);
buffLo = (buffLo << 1) | (b & 1);
if (bitsRead == 47+rbStart[wrBlock])
blockCRC = (buffHi << 16) | (buffLo >> 16);
if (outFile != NULL && bitsRead >= rbStart[wrBlock]
&& bitsRead <= rbEnd[wrBlock]) {
bsPutBit ( bsWr, b );
}
bitsRead++;
if (bitsRead == rbEnd[wrBlock]+1) {
if (outFile != NULL) {
bsPutUChar ( bsWr, 0x17 ); bsPutUChar ( bsWr, 0x72 );
bsPutUChar ( bsWr, 0x45 ); bsPutUChar ( bsWr, 0x38 );
bsPutUChar ( bsWr, 0x50 ); bsPutUChar ( bsWr, 0x90 );
bsPutUInt32 ( bsWr, blockCRC );
bsClose ( bsWr );
}
if (wrBlock >= rbCtr) break;
wrBlock++;
} else
if (bitsRead == rbStart[wrBlock]) {
outFileName[0] = 0;
sprintf ( outFileName, "rec%4d", wrBlock+1 );
for (p = outFileName; *p != 0; p++) if (*p == ' ') *p = '0';
strcat ( outFileName, inFileName );
if ( !endsInBz2(outFileName)) strcat ( outFileName, ".bz2" );
fprintf ( stderr, " writing block %d to `%s' ...\n",
wrBlock+1, outFileName );
outFile = fopen ( outFileName, "wb" );
if (outFile == NULL) {
fprintf ( stderr, "%s: can't write `%s'\n",
progName, outFileName );
exit(1);
}
bsWr = bsOpenWriteStream ( outFile );
bsPutUChar ( bsWr, 'B' ); bsPutUChar ( bsWr, 'Z' );
bsPutUChar ( bsWr, 'h' ); bsPutUChar ( bsWr, '9' );
bsPutUChar ( bsWr, 0x31 ); bsPutUChar ( bsWr, 0x41 );
bsPutUChar ( bsWr, 0x59 ); bsPutUChar ( bsWr, 0x26 );
bsPutUChar ( bsWr, 0x53 ); bsPutUChar ( bsWr, 0x59 );
}
}
fprintf ( stderr, "%s: finished\n", progName );
return 0;
}
/*-----------------------------------------------------------*/
/*--- end bzip2recover.c ---*/
/*-----------------------------------------------------------*/

Binary file not shown.

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,322 @@
/*-------------------------------------------------------------*/
/*--- Public header file for the library. ---*/
/*--- bzlib.h ---*/
/*-------------------------------------------------------------*/
/*--
This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Cambridge, UK.
jseward@acm.org
bzip2/libbzip2 version 1.0 of 21 March 2000
This program is based on (at least) the work of:
Mike Burrows
David Wheeler
Peter Fenwick
Alistair Moffat
Radford Neal
Ian H. Witten
Robert Sedgewick
Jon L. Bentley
For more information on these sources, see the manual.
--*/
#ifndef _BZLIB_H
#define _BZLIB_H
#ifdef __cplusplus
extern "C" {
#endif
#define BZ_RUN 0
#define BZ_FLUSH 1
#define BZ_FINISH 2
#define BZ_OK 0
#define BZ_RUN_OK 1
#define BZ_FLUSH_OK 2
#define BZ_FINISH_OK 3
#define BZ_STREAM_END 4
#define BZ_SEQUENCE_ERROR (-1)
#define BZ_PARAM_ERROR (-2)
#define BZ_MEM_ERROR (-3)
#define BZ_DATA_ERROR (-4)
#define BZ_DATA_ERROR_MAGIC (-5)
#define BZ_IO_ERROR (-6)
#define BZ_UNEXPECTED_EOF (-7)
#define BZ_OUTBUFF_FULL (-8)
#define BZ_CONFIG_ERROR (-9)
typedef
struct {
char *next_in;
unsigned int avail_in;
unsigned int total_in_lo32;
unsigned int total_in_hi32;
char *next_out;
unsigned int avail_out;
unsigned int total_out_lo32;
unsigned int total_out_hi32;
void *state;
void *(*bzalloc)(void *,int,int);
void (*bzfree)(void *,void *);
void *opaque;
}
bz_stream;
#ifndef BZ_IMPORT
#define BZ_EXPORT
#endif
#ifdef _WIN32
# include <stdio.h>
# include <windows.h>
# ifdef small
/* windows.h define small to char */
# undef small
# endif
# ifdef BZ_EXPORT
# define BZ_API(func) WINAPI func
# define BZ_EXTERN extern
# else
/* import windows dll dynamically */
# define BZ_API(func) (WINAPI * func)
# define BZ_EXTERN
# endif
#else
# define BZ_API(func) func
# define BZ_EXTERN extern
#endif
/*-- Core (low-level) library functions --*/
BZ_EXTERN int BZ_API(BZ2_bzCompressInit) (
bz_stream* strm,
int blockSize100k,
int verbosity,
int workFactor
);
BZ_EXTERN int BZ_API(BZ2_bzCompress) (
bz_stream* strm,
int action
);
BZ_EXTERN int BZ_API(BZ2_bzCompressEnd) (
bz_stream* strm
);
BZ_EXTERN int BZ_API(BZ2_bzDecompressInit) (
bz_stream *strm,
int verbosity,
int small
);
BZ_EXTERN int BZ_API(BZ2_bzDecompress) (
bz_stream* strm
);
BZ_EXTERN int BZ_API(BZ2_bzDecompressEnd) (
bz_stream *strm
);
/*-- High(er) level library functions --*/
#ifndef BZ_NO_STDIO
#define BZ_MAX_UNUSED 5000
typedef void BZFILE;
BZ_EXTERN BZFILE* BZ_API(BZ2_bzReadOpen) (
int* bzerror,
FILE* f,
int verbosity,
int small,
void* unused,
int nUnused
);
BZ_EXTERN void BZ_API(BZ2_bzReadClose) (
int* bzerror,
BZFILE* b
);
BZ_EXTERN void BZ_API(BZ2_bzReadGetUnused) (
int* bzerror,
BZFILE* b,
void** unused,
int* nUnused
);
BZ_EXTERN int BZ_API(BZ2_bzRead) (
int* bzerror,
BZFILE* b,
void* buf,
int len
);
BZ_EXTERN BZFILE* BZ_API(BZ2_bzWriteOpen) (
int* bzerror,
FILE* f,
int blockSize100k,
int verbosity,
int workFactor
);
BZ_EXTERN void BZ_API(BZ2_bzWrite) (
int* bzerror,
BZFILE* b,
void* buf,
int len
);
BZ_EXTERN void BZ_API(BZ2_bzWriteClose) (
int* bzerror,
BZFILE* b,
int abandon,
unsigned int* nbytes_in,
unsigned int* nbytes_out
);
BZ_EXTERN void BZ_API(BZ2_bzWriteClose64) (
int* bzerror,
BZFILE* b,
int abandon,
unsigned int* nbytes_in_lo32,
unsigned int* nbytes_in_hi32,
unsigned int* nbytes_out_lo32,
unsigned int* nbytes_out_hi32
);
#endif
/*-- Utility functions --*/
BZ_EXTERN int BZ_API(BZ2_bzBuffToBuffCompress) (
char* dest,
unsigned int* destLen,
char* source,
unsigned int sourceLen,
int blockSize100k,
int verbosity,
int workFactor
);
BZ_EXTERN int BZ_API(BZ2_bzBuffToBuffDecompress) (
char* dest,
unsigned int* destLen,
char* source,
unsigned int sourceLen,
int small,
int verbosity
);
/*--
Code contributed by Yoshioka Tsuneo
(QWF00133@niftyserve.or.jp/tsuneo-y@is.aist-nara.ac.jp),
to support better zlib compatibility.
This code is not _officially_ part of libbzip2 (yet);
I haven't tested it, documented it, or considered the
threading-safeness of it.
If this code breaks, please contact both Yoshioka and me.
--*/
BZ_EXTERN const char * BZ_API(BZ2_bzlibVersion) (
void
);
#ifndef BZ_NO_STDIO
BZ_EXTERN BZFILE * BZ_API(BZ2_bzopen) (
const char *path,
const char *mode
);
BZ_EXTERN BZFILE * BZ_API(BZ2_bzdopen) (
int fd,
const char *mode
);
BZ_EXTERN int BZ_API(BZ2_bzread) (
BZFILE* b,
void* buf,
int len
);
BZ_EXTERN int BZ_API(BZ2_bzwrite) (
BZFILE* b,
void* buf,
int len
);
BZ_EXTERN int BZ_API(BZ2_bzflush) (
BZFILE* b
);
BZ_EXTERN void BZ_API(BZ2_bzclose) (
BZFILE* b
);
BZ_EXTERN const char * BZ_API(BZ2_bzerror) (
BZFILE *b,
int *errnum
);
#endif
extern _stdcall void *(*BZ2_malloc)( unsigned long size );
extern _stdcall void (*BZ2_free)( void *ptr );
#ifdef __cplusplus
}
#endif
#endif
/*-------------------------------------------------------------*/
/*--- end bzlib.h ---*/
/*-------------------------------------------------------------*/

View file

@ -0,0 +1,530 @@
/*-------------------------------------------------------------*/
/*--- Private header file for the library. ---*/
/*--- bzlib_private.h ---*/
/*-------------------------------------------------------------*/
/*--
This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Cambridge, UK.
jseward@acm.org
bzip2/libbzip2 version 1.0 of 21 March 2000
This program is based on (at least) the work of:
Mike Burrows
David Wheeler
Peter Fenwick
Alistair Moffat
Radford Neal
Ian H. Witten
Robert Sedgewick
Jon L. Bentley
For more information on these sources, see the manual.
--*/
#ifndef _BZLIB_PRIVATE_H
#define _BZLIB_PRIVATE_H
#include <stdlib.h>
#ifndef BZ_NO_STDIO
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#endif
#include "bzlib.h"
/*-- General stuff. --*/
#define BZ_VERSION "1.0.1, 23-June-2000"
typedef char Char;
typedef unsigned char Bool;
typedef unsigned char UChar;
typedef int Int32;
typedef unsigned int UInt32;
typedef short Int16;
typedef unsigned short UInt16;
#define True ((Bool)1)
#define False ((Bool)0)
#ifndef __GNUC__
#define __inline__ /* */
#endif
#ifndef BZ_NO_STDIO
extern void BZ2_bz__AssertH__fail ( int errcode );
#define AssertH(cond,errcode) \
{ if (!(cond)) BZ2_bz__AssertH__fail ( errcode ); }
#if BZ_DEBUG
#define AssertD(cond,msg) \
{ if (!(cond)) { \
fprintf ( stderr, \
"\n\nlibbzip2(debug build): internal error\n\t%s\n", msg );\
exit(1); \
}}
#else
#define AssertD(cond,msg) /* */
#endif
#define VPrintf0(zf) \
fprintf(stderr,zf)
#define VPrintf1(zf,za1) \
fprintf(stderr,zf,za1)
#define VPrintf2(zf,za1,za2) \
fprintf(stderr,zf,za1,za2)
#define VPrintf3(zf,za1,za2,za3) \
fprintf(stderr,zf,za1,za2,za3)
#define VPrintf4(zf,za1,za2,za3,za4) \
fprintf(stderr,zf,za1,za2,za3,za4)
#define VPrintf5(zf,za1,za2,za3,za4,za5) \
fprintf(stderr,zf,za1,za2,za3,za4,za5)
#else
extern void bz_internal_error ( int errcode );
#define AssertH(cond,errcode) \
{ if (!(cond)) bz_internal_error ( errcode ); }
#define AssertD(cond,msg) /* */
#define VPrintf0(zf) /* */
#define VPrintf1(zf,za1) /* */
#define VPrintf2(zf,za1,za2) /* */
#define VPrintf3(zf,za1,za2,za3) /* */
#define VPrintf4(zf,za1,za2,za3,za4) /* */
#define VPrintf5(zf,za1,za2,za3,za4,za5) /* */
#endif
#define BZALLOC(nnn) (strm->bzalloc)(strm->opaque,(nnn),1)
#define BZFREE(ppp) (strm->bzfree)(strm->opaque,(ppp))
/*-- Constants for the back end. --*/
#define BZ_MAX_ALPHA_SIZE 258
#define BZ_MAX_CODE_LEN 23
#define BZ_RUNA 0
#define BZ_RUNB 1
#define BZ_N_GROUPS 6
#define BZ_G_SIZE 50
#define BZ_N_ITERS 4
#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))
/*-- Stuff for randomising repetitive blocks. --*/
extern Int32 BZ2_rNums[512];
#define BZ_RAND_DECLS \
Int32 rNToGo; \
Int32 rTPos \
#define BZ_RAND_INIT_MASK \
s->rNToGo = 0; \
s->rTPos = 0 \
#define BZ_RAND_MASK ((s->rNToGo == 1) ? 1 : 0)
#define BZ_RAND_UPD_MASK \
if (s->rNToGo == 0) { \
s->rNToGo = BZ2_rNums[s->rTPos]; \
s->rTPos++; \
if (s->rTPos == 512) s->rTPos = 0; \
} \
s->rNToGo--;
/*-- Stuff for doing CRCs. --*/
extern UInt32 BZ2_crc32Table[256];
#define BZ_INITIALISE_CRC(crcVar) \
{ \
crcVar = 0xffffffffL; \
}
#define BZ_FINALISE_CRC(crcVar) \
{ \
crcVar = ~(crcVar); \
}
#define BZ_UPDATE_CRC(crcVar,cha) \
{ \
crcVar = (crcVar << 8) ^ \
BZ2_crc32Table[(crcVar >> 24) ^ \
((UChar)cha)]; \
}
/*-- States and modes for compression. --*/
#define BZ_M_IDLE 1
#define BZ_M_RUNNING 2
#define BZ_M_FLUSHING 3
#define BZ_M_FINISHING 4
#define BZ_S_OUTPUT 1
#define BZ_S_INPUT 2
#define BZ_N_RADIX 2
#define BZ_N_QSORT 12
#define BZ_N_SHELL 18
#define BZ_N_OVERSHOOT (BZ_N_RADIX + BZ_N_QSORT + BZ_N_SHELL + 2)
/*-- Structure holding all the compression-side stuff. --*/
typedef
struct {
/* pointer back to the struct bz_stream */
bz_stream* strm;
/* mode this stream is in, and whether inputting */
/* or outputting data */
Int32 mode;
Int32 state;
/* remembers avail_in when flush/finish requested */
UInt32 avail_in_expect;
/* for doing the block sorting */
UInt32* arr1;
UInt32* arr2;
UInt32* ftab;
Int32 origPtr;
/* aliases for arr1 and arr2 */
UInt32* ptr;
UChar* block;
UInt16* mtfv;
UChar* zbits;
/* for deciding when to use the fallback sorting algorithm */
Int32 workFactor;
/* run-length-encoding of the input */
UInt32 state_in_ch;
Int32 state_in_len;
BZ_RAND_DECLS;
/* input and output limits and current posns */
Int32 nblock;
Int32 nblockMAX;
Int32 numZ;
Int32 state_out_pos;
/* map of bytes used in block */
Int32 nInUse;
Bool inUse[256];
UChar unseqToSeq[256];
/* the buffer for bit stream creation */
UInt32 bsBuff;
Int32 bsLive;
/* block and combined CRCs */
UInt32 blockCRC;
UInt32 combinedCRC;
/* misc administratium */
Int32 verbosity;
Int32 blockNo;
Int32 blockSize100k;
/* stuff for coding the MTF values */
Int32 nMTF;
Int32 mtfFreq [BZ_MAX_ALPHA_SIZE];
UChar selector [BZ_MAX_SELECTORS];
UChar selectorMtf[BZ_MAX_SELECTORS];
UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
Int32 code [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
Int32 rfreq [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
/* second dimension: only 3 needed; 4 makes index calculations faster */
UInt32 len_pack[BZ_MAX_ALPHA_SIZE][4];
}
EState;
/*-- externs for compression. --*/
extern void
BZ2_blockSort ( EState* );
extern void
BZ2_compressBlock ( EState*, Bool );
extern void
BZ2_bsInitWrite ( EState* );
extern void
BZ2_hbAssignCodes ( Int32*, UChar*, Int32, Int32, Int32 );
extern void
BZ2_hbMakeCodeLengths ( UChar*, Int32*, Int32, Int32 );
/*-- states for decompression. --*/
#define BZ_X_IDLE 1
#define BZ_X_OUTPUT 2
#define BZ_X_MAGIC_1 10
#define BZ_X_MAGIC_2 11
#define BZ_X_MAGIC_3 12
#define BZ_X_MAGIC_4 13
#define BZ_X_BLKHDR_1 14
#define BZ_X_BLKHDR_2 15
#define BZ_X_BLKHDR_3 16
#define BZ_X_BLKHDR_4 17
#define BZ_X_BLKHDR_5 18
#define BZ_X_BLKHDR_6 19
#define BZ_X_BCRC_1 20
#define BZ_X_BCRC_2 21
#define BZ_X_BCRC_3 22
#define BZ_X_BCRC_4 23
#define BZ_X_RANDBIT 24
#define BZ_X_ORIGPTR_1 25
#define BZ_X_ORIGPTR_2 26
#define BZ_X_ORIGPTR_3 27
#define BZ_X_MAPPING_1 28
#define BZ_X_MAPPING_2 29
#define BZ_X_SELECTOR_1 30
#define BZ_X_SELECTOR_2 31
#define BZ_X_SELECTOR_3 32
#define BZ_X_CODING_1 33
#define BZ_X_CODING_2 34
#define BZ_X_CODING_3 35
#define BZ_X_MTF_1 36
#define BZ_X_MTF_2 37
#define BZ_X_MTF_3 38
#define BZ_X_MTF_4 39
#define BZ_X_MTF_5 40
#define BZ_X_MTF_6 41
#define BZ_X_ENDHDR_2 42
#define BZ_X_ENDHDR_3 43
#define BZ_X_ENDHDR_4 44
#define BZ_X_ENDHDR_5 45
#define BZ_X_ENDHDR_6 46
#define BZ_X_CCRC_1 47
#define BZ_X_CCRC_2 48
#define BZ_X_CCRC_3 49
#define BZ_X_CCRC_4 50
/*-- Constants for the fast MTF decoder. --*/
#define MTFA_SIZE 4096
#define MTFL_SIZE 16
/*-- Structure holding all the decompression-side stuff. --*/
typedef
struct {
/* pointer back to the struct bz_stream */
bz_stream* strm;
/* state indicator for this stream */
Int32 state;
/* for doing the final run-length decoding */
UChar state_out_ch;
Int32 state_out_len;
Bool blockRandomised;
BZ_RAND_DECLS;
/* the buffer for bit stream reading */
UInt32 bsBuff;
Int32 bsLive;
/* misc administratium */
Int32 blockSize100k;
Bool smallDecompress;
Int32 currBlockNo;
Int32 verbosity;
/* for undoing the Burrows-Wheeler transform */
Int32 origPtr;
UInt32 tPos;
Int32 k0;
Int32 unzftab[256];
Int32 nblock_used;
Int32 cftab[257];
Int32 cftabCopy[257];
/* for undoing the Burrows-Wheeler transform (FAST) */
UInt32 *tt;
/* for undoing the Burrows-Wheeler transform (SMALL) */
UInt16 *ll16;
UChar *ll4;
/* stored and calculated CRCs */
UInt32 storedBlockCRC;
UInt32 storedCombinedCRC;
UInt32 calculatedBlockCRC;
UInt32 calculatedCombinedCRC;
/* map of bytes used in block */
Int32 nInUse;
Bool inUse[256];
Bool inUse16[16];
UChar seqToUnseq[256];
/* for decoding the MTF values */
UChar mtfa [MTFA_SIZE];
Int32 mtfbase[256 / MTFL_SIZE];
UChar selector [BZ_MAX_SELECTORS];
UChar selectorMtf[BZ_MAX_SELECTORS];
UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
Int32 limit [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
Int32 base [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
Int32 perm [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
Int32 minLens[BZ_N_GROUPS];
/* save area for scalars in the main decompress code */
Int32 save_i;
Int32 save_j;
Int32 save_t;
Int32 save_alphaSize;
Int32 save_nGroups;
Int32 save_nSelectors;
Int32 save_EOB;
Int32 save_groupNo;
Int32 save_groupPos;
Int32 save_nextSym;
Int32 save_nblockMAX;
Int32 save_nblock;
Int32 save_es;
Int32 save_N;
Int32 save_curr;
Int32 save_zt;
Int32 save_zn;
Int32 save_zvec;
Int32 save_zj;
Int32 save_gSel;
Int32 save_gMinlen;
Int32* save_gLimit;
Int32* save_gBase;
Int32* save_gPerm;
}
DState;
/*-- Macros for decompression. --*/
#define BZ_GET_FAST(cccc) \
s->tPos = s->tt[s->tPos]; \
cccc = (UChar)(s->tPos & 0xff); \
s->tPos >>= 8;
#define BZ_GET_FAST_C(cccc) \
c_tPos = c_tt[c_tPos]; \
cccc = (UChar)(c_tPos & 0xff); \
c_tPos >>= 8;
#define SET_LL4(i,n) \
{ if (((i) & 0x1) == 0) \
s->ll4[(i) >> 1] = (s->ll4[(i) >> 1] & 0xf0) | (n); else \
s->ll4[(i) >> 1] = (s->ll4[(i) >> 1] & 0x0f) | ((n) << 4); \
}
#define GET_LL4(i) \
((((UInt32)(s->ll4[(i) >> 1])) >> (((i) << 2) & 0x4)) & 0xF)
#define SET_LL(i,n) \
{ s->ll16[i] = (UInt16)(n & 0x0000ffff); \
SET_LL4(i, n >> 16); \
}
#define GET_LL(i) \
(((UInt32)s->ll16[i]) | (GET_LL4(i) << 16))
#define BZ_GET_SMALL(cccc) \
cccc = BZ2_indexIntoF ( s->tPos, s->cftab ); \
s->tPos = GET_LL(s->tPos);
/*-- externs for decompression. --*/
extern Int32
BZ2_indexIntoF ( Int32, Int32* );
extern Int32
BZ2_decompress ( DState* );
extern void
BZ2_hbCreateDecodeTables ( Int32*, Int32*, Int32*, UChar*,
Int32, Int32, Int32 );
#endif
/*-- BZ_NO_STDIO seems to make NULL disappear on some platforms. --*/
#ifdef BZ_NO_STDIO
#ifndef NULL
#define NULL 0
#endif
#endif
/*-------------------------------------------------------------*/
/*--- end bzlib_private.h ---*/
/*-------------------------------------------------------------*/

View file

@ -0,0 +1,714 @@
/*-------------------------------------------------------------*/
/*--- Compression machinery (not incl block sorting) ---*/
/*--- compress.c ---*/
/*-------------------------------------------------------------*/
/*--
This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Cambridge, UK.
jseward@acm.org
bzip2/libbzip2 version 1.0 of 21 March 2000
This program is based on (at least) the work of:
Mike Burrows
David Wheeler
Peter Fenwick
Alistair Moffat
Radford Neal
Ian H. Witten
Robert Sedgewick
Jon L. Bentley
For more information on these sources, see the manual.
--*/
/*--
CHANGES
~~~~~~~
0.9.0 -- original version.
0.9.0a/b -- no changes in this file.
0.9.0c
* changed setting of nGroups in sendMTFValues() so as to
do a bit better on small files
--*/
#include "bzlib_private.h"
/*---------------------------------------------------*/
/*--- Bit stream I/O ---*/
/*---------------------------------------------------*/
/*---------------------------------------------------*/
void BZ2_bsInitWrite ( EState* s )
{
s->bsLive = 0;
s->bsBuff = 0;
}
/*---------------------------------------------------*/
static
void bsFinishWrite ( EState* s )
{
while (s->bsLive > 0) {
s->zbits[s->numZ] = (UChar)(s->bsBuff >> 24);
s->numZ++;
s->bsBuff <<= 8;
s->bsLive -= 8;
}
}
/*---------------------------------------------------*/
#define bsNEEDW(nz) \
{ \
while (s->bsLive >= 8) { \
s->zbits[s->numZ] \
= (UChar)(s->bsBuff >> 24); \
s->numZ++; \
s->bsBuff <<= 8; \
s->bsLive -= 8; \
} \
}
/*---------------------------------------------------*/
static
__inline__
void bsW ( EState* s, Int32 n, UInt32 v )
{
bsNEEDW ( n );
s->bsBuff |= (v << (32 - s->bsLive - n));
s->bsLive += n;
}
/*---------------------------------------------------*/
static
void bsPutUInt32 ( EState* s, UInt32 u )
{
bsW ( s, 8, (u >> 24) & 0xffL );
bsW ( s, 8, (u >> 16) & 0xffL );
bsW ( s, 8, (u >> 8) & 0xffL );
bsW ( s, 8, u & 0xffL );
}
/*---------------------------------------------------*/
static
void bsPutUChar ( EState* s, UChar c )
{
bsW( s, 8, (UInt32)c );
}
/*---------------------------------------------------*/
/*--- The back end proper ---*/
/*---------------------------------------------------*/
/*---------------------------------------------------*/
static
void makeMaps_e ( EState* s )
{
Int32 i;
s->nInUse = 0;
for (i = 0; i < 256; i++)
if (s->inUse[i]) {
s->unseqToSeq[i] = s->nInUse;
s->nInUse++;
}
}
/*---------------------------------------------------*/
static
void generateMTFValues ( EState* s )
{
UChar yy[256];
Int32 i, j;
Int32 zPend;
Int32 wr;
Int32 EOB;
/*
After sorting (eg, here),
s->arr1 [ 0 .. s->nblock-1 ] holds sorted order,
and
((UChar*)s->arr2) [ 0 .. s->nblock-1 ]
holds the original block data.
The first thing to do is generate the MTF values,
and put them in
((UInt16*)s->arr1) [ 0 .. s->nblock-1 ].
Because there are strictly fewer or equal MTF values
than block values, ptr values in this area are overwritten
with MTF values only when they are no longer needed.
The final compressed bitstream is generated into the
area starting at
(UChar*) (&((UChar*)s->arr2)[s->nblock])
These storage aliases are set up in bzCompressInit(),
except for the last one, which is arranged in
compressBlock().
*/
UInt32* ptr = s->ptr;
UChar* block = s->block;
UInt16* mtfv = s->mtfv;
makeMaps_e ( s );
EOB = s->nInUse+1;
for (i = 0; i <= EOB; i++) s->mtfFreq[i] = 0;
wr = 0;
zPend = 0;
for (i = 0; i < s->nInUse; i++) yy[i] = (UChar) i;
for (i = 0; i < s->nblock; i++) {
UChar ll_i;
AssertD ( wr <= i, "generateMTFValues(1)" );
j = ptr[i]-1; if (j < 0) j += s->nblock;
ll_i = s->unseqToSeq[block[j]];
AssertD ( ll_i < s->nInUse, "generateMTFValues(2a)" );
if (yy[0] == ll_i) {
zPend++;
} else {
if (zPend > 0) {
zPend--;
while (True) {
if (zPend & 1) {
mtfv[wr] = BZ_RUNB; wr++;
s->mtfFreq[BZ_RUNB]++;
} else {
mtfv[wr] = BZ_RUNA; wr++;
s->mtfFreq[BZ_RUNA]++;
}
if (zPend < 2) break;
zPend = (zPend - 2) / 2;
};
zPend = 0;
}
{
register UChar rtmp;
register UChar* ryy_j;
register UChar rll_i;
rtmp = yy[1];
yy[1] = yy[0];
ryy_j = &(yy[1]);
rll_i = ll_i;
while ( rll_i != rtmp ) {
register UChar rtmp2;
ryy_j++;
rtmp2 = rtmp;
rtmp = *ryy_j;
*ryy_j = rtmp2;
};
yy[0] = rtmp;
j = ryy_j - &(yy[0]);
mtfv[wr] = j+1; wr++; s->mtfFreq[j+1]++;
}
}
}
if (zPend > 0) {
zPend--;
while (True) {
if (zPend & 1) {
mtfv[wr] = BZ_RUNB; wr++;
s->mtfFreq[BZ_RUNB]++;
} else {
mtfv[wr] = BZ_RUNA; wr++;
s->mtfFreq[BZ_RUNA]++;
}
if (zPend < 2) break;
zPend = (zPend - 2) / 2;
};
zPend = 0;
}
mtfv[wr] = EOB; wr++; s->mtfFreq[EOB]++;
s->nMTF = wr;
}
/*---------------------------------------------------*/
#define BZ_LESSER_ICOST 0
#define BZ_GREATER_ICOST 15
static
void sendMTFValues ( EState* s )
{
Int32 v, t, i, j, gs, ge, totc, bt, bc, iter;
Int32 nSelectors, alphaSize, minLen, maxLen, selCtr;
Int32 nGroups, nBytes;
/*--
UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
is a global since the decoder also needs it.
Int32 code[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
Int32 rfreq[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
are also globals only used in this proc.
Made global to keep stack frame size small.
--*/
UInt16 cost[BZ_N_GROUPS];
Int32 fave[BZ_N_GROUPS];
UInt16* mtfv = s->mtfv;
if (s->verbosity >= 3)
VPrintf3( " %d in block, %d after MTF & 1-2 coding, "
"%d+2 syms in use\n",
s->nblock, s->nMTF, s->nInUse );
alphaSize = s->nInUse+2;
for (t = 0; t < BZ_N_GROUPS; t++)
for (v = 0; v < alphaSize; v++)
s->len[t][v] = BZ_GREATER_ICOST;
/*--- Decide how many coding tables to use ---*/
AssertH ( s->nMTF > 0, 3001 );
if (s->nMTF < 200) nGroups = 2; else
if (s->nMTF < 600) nGroups = 3; else
if (s->nMTF < 1200) nGroups = 4; else
if (s->nMTF < 2400) nGroups = 5; else
nGroups = 6;
/*--- Generate an initial set of coding tables ---*/
{
Int32 nPart, remF, tFreq, aFreq;
nPart = nGroups;
remF = s->nMTF;
gs = 0;
while (nPart > 0) {
tFreq = remF / nPart;
ge = gs-1;
aFreq = 0;
while (aFreq < tFreq && ge < alphaSize-1) {
ge++;
aFreq += s->mtfFreq[ge];
}
if (ge > gs
&& nPart != nGroups && nPart != 1
&& ((nGroups-nPart) % 2 == 1)) {
aFreq -= s->mtfFreq[ge];
ge--;
}
if (s->verbosity >= 3)
VPrintf5( " initial group %d, [%d .. %d], "
"has %d syms (%4.1f%%)\n",
nPart, gs, ge, aFreq,
(100.0 * (float)aFreq) / (float)(s->nMTF) );
for (v = 0; v < alphaSize; v++)
if (v >= gs && v <= ge)
s->len[nPart-1][v] = BZ_LESSER_ICOST; else
s->len[nPart-1][v] = BZ_GREATER_ICOST;
nPart--;
gs = ge+1;
remF -= aFreq;
}
}
/*---
Iterate up to BZ_N_ITERS times to improve the tables.
---*/
for (iter = 0; iter < BZ_N_ITERS; iter++) {
for (t = 0; t < nGroups; t++) fave[t] = 0;
for (t = 0; t < nGroups; t++)
for (v = 0; v < alphaSize; v++)
s->rfreq[t][v] = 0;
/*---
Set up an auxiliary length table which is used to fast-track
the common case (nGroups == 6).
---*/
if (nGroups == 6) {
for (v = 0; v < alphaSize; v++) {
s->len_pack[v][0] = (s->len[1][v] << 16) | s->len[0][v];
s->len_pack[v][1] = (s->len[3][v] << 16) | s->len[2][v];
s->len_pack[v][2] = (s->len[5][v] << 16) | s->len[4][v];
}
}
nSelectors = 0;
totc = 0;
gs = 0;
while (True) {
/*--- Set group start & end marks. --*/
if (gs >= s->nMTF) break;
ge = gs + BZ_G_SIZE - 1;
if (ge >= s->nMTF) ge = s->nMTF-1;
/*--
Calculate the cost of this group as coded
by each of the coding tables.
--*/
for (t = 0; t < nGroups; t++) cost[t] = 0;
if (nGroups == 6 && 50 == ge-gs+1) {
/*--- fast track the common case ---*/
register UInt32 cost01, cost23, cost45;
register UInt16 icv;
cost01 = cost23 = cost45 = 0;
# define BZ_ITER(nn) \
icv = mtfv[gs+(nn)]; \
cost01 += s->len_pack[icv][0]; \
cost23 += s->len_pack[icv][1]; \
cost45 += s->len_pack[icv][2]; \
BZ_ITER(0); BZ_ITER(1); BZ_ITER(2); BZ_ITER(3); BZ_ITER(4);
BZ_ITER(5); BZ_ITER(6); BZ_ITER(7); BZ_ITER(8); BZ_ITER(9);
BZ_ITER(10); BZ_ITER(11); BZ_ITER(12); BZ_ITER(13); BZ_ITER(14);
BZ_ITER(15); BZ_ITER(16); BZ_ITER(17); BZ_ITER(18); BZ_ITER(19);
BZ_ITER(20); BZ_ITER(21); BZ_ITER(22); BZ_ITER(23); BZ_ITER(24);
BZ_ITER(25); BZ_ITER(26); BZ_ITER(27); BZ_ITER(28); BZ_ITER(29);
BZ_ITER(30); BZ_ITER(31); BZ_ITER(32); BZ_ITER(33); BZ_ITER(34);
BZ_ITER(35); BZ_ITER(36); BZ_ITER(37); BZ_ITER(38); BZ_ITER(39);
BZ_ITER(40); BZ_ITER(41); BZ_ITER(42); BZ_ITER(43); BZ_ITER(44);
BZ_ITER(45); BZ_ITER(46); BZ_ITER(47); BZ_ITER(48); BZ_ITER(49);
# undef BZ_ITER
cost[0] = cost01 & 0xffff; cost[1] = cost01 >> 16;
cost[2] = cost23 & 0xffff; cost[3] = cost23 >> 16;
cost[4] = cost45 & 0xffff; cost[5] = cost45 >> 16;
} else {
/*--- slow version which correctly handles all situations ---*/
for (i = gs; i <= ge; i++) {
UInt16 icv = mtfv[i];
for (t = 0; t < nGroups; t++) cost[t] += s->len[t][icv];
}
}
/*--
Find the coding table which is best for this group,
and record its identity in the selector table.
--*/
bc = 999999999; bt = -1;
for (t = 0; t < nGroups; t++)
if (cost[t] < bc) { bc = cost[t]; bt = t; };
totc += bc;
fave[bt]++;
s->selector[nSelectors] = bt;
nSelectors++;
/*--
Increment the symbol frequencies for the selected table.
--*/
if (nGroups == 6 && 50 == ge-gs+1) {
/*--- fast track the common case ---*/
# define BZ_ITUR(nn) s->rfreq[bt][ mtfv[gs+(nn)] ]++
BZ_ITUR(0); BZ_ITUR(1); BZ_ITUR(2); BZ_ITUR(3); BZ_ITUR(4);
BZ_ITUR(5); BZ_ITUR(6); BZ_ITUR(7); BZ_ITUR(8); BZ_ITUR(9);
BZ_ITUR(10); BZ_ITUR(11); BZ_ITUR(12); BZ_ITUR(13); BZ_ITUR(14);
BZ_ITUR(15); BZ_ITUR(16); BZ_ITUR(17); BZ_ITUR(18); BZ_ITUR(19);
BZ_ITUR(20); BZ_ITUR(21); BZ_ITUR(22); BZ_ITUR(23); BZ_ITUR(24);
BZ_ITUR(25); BZ_ITUR(26); BZ_ITUR(27); BZ_ITUR(28); BZ_ITUR(29);
BZ_ITUR(30); BZ_ITUR(31); BZ_ITUR(32); BZ_ITUR(33); BZ_ITUR(34);
BZ_ITUR(35); BZ_ITUR(36); BZ_ITUR(37); BZ_ITUR(38); BZ_ITUR(39);
BZ_ITUR(40); BZ_ITUR(41); BZ_ITUR(42); BZ_ITUR(43); BZ_ITUR(44);
BZ_ITUR(45); BZ_ITUR(46); BZ_ITUR(47); BZ_ITUR(48); BZ_ITUR(49);
# undef BZ_ITUR
} else {
/*--- slow version which correctly handles all situations ---*/
for (i = gs; i <= ge; i++)
s->rfreq[bt][ mtfv[i] ]++;
}
gs = ge+1;
}
if (s->verbosity >= 3) {
VPrintf2 ( " pass %d: size is %d, grp uses are ",
iter+1, totc/8 );
for (t = 0; t < nGroups; t++)
VPrintf1 ( "%d ", fave[t] );
VPrintf0 ( "\n" );
}
/*--
Recompute the tables based on the accumulated frequencies.
--*/
for (t = 0; t < nGroups; t++)
BZ2_hbMakeCodeLengths ( &(s->len[t][0]), &(s->rfreq[t][0]),
alphaSize, 20 );
}
AssertH( nGroups < 8, 3002 );
AssertH( nSelectors < 32768 &&
nSelectors <= (2 + (900000 / BZ_G_SIZE)),
3003 );
/*--- Compute MTF values for the selectors. ---*/
{
UChar pos[BZ_N_GROUPS], ll_i, tmp2, tmp;
for (i = 0; i < nGroups; i++) pos[i] = i;
for (i = 0; i < nSelectors; i++) {
ll_i = s->selector[i];
j = 0;
tmp = pos[j];
while ( ll_i != tmp ) {
j++;
tmp2 = tmp;
tmp = pos[j];
pos[j] = tmp2;
};
pos[0] = tmp;
s->selectorMtf[i] = j;
}
};
/*--- Assign actual codes for the tables. --*/
for (t = 0; t < nGroups; t++) {
minLen = 32;
maxLen = 0;
for (i = 0; i < alphaSize; i++) {
if (s->len[t][i] > maxLen) maxLen = s->len[t][i];
if (s->len[t][i] < minLen) minLen = s->len[t][i];
}
AssertH ( !(maxLen > 20), 3004 );
AssertH ( !(minLen < 1), 3005 );
BZ2_hbAssignCodes ( &(s->code[t][0]), &(s->len[t][0]),
minLen, maxLen, alphaSize );
}
/*--- Transmit the mapping table. ---*/
{
Bool inUse16[16];
for (i = 0; i < 16; i++) {
inUse16[i] = False;
for (j = 0; j < 16; j++)
if (s->inUse[i * 16 + j]) inUse16[i] = True;
}
nBytes = s->numZ;
for (i = 0; i < 16; i++)
if (inUse16[i]) bsW(s,1,1); else bsW(s,1,0);
for (i = 0; i < 16; i++)
if (inUse16[i])
for (j = 0; j < 16; j++) {
if (s->inUse[i * 16 + j]) bsW(s,1,1); else bsW(s,1,0);
}
if (s->verbosity >= 3)
VPrintf1( " bytes: mapping %d, ", s->numZ-nBytes );
}
/*--- Now the selectors. ---*/
nBytes = s->numZ;
bsW ( s, 3, nGroups );
bsW ( s, 15, nSelectors );
for (i = 0; i < nSelectors; i++) {
for (j = 0; j < s->selectorMtf[i]; j++) bsW(s,1,1);
bsW(s,1,0);
}
if (s->verbosity >= 3)
VPrintf1( "selectors %d, ", s->numZ-nBytes );
/*--- Now the coding tables. ---*/
nBytes = s->numZ;
for (t = 0; t < nGroups; t++) {
Int32 curr = s->len[t][0];
bsW ( s, 5, curr );
for (i = 0; i < alphaSize; i++) {
while (curr < s->len[t][i]) { bsW(s,2,2); curr++; /* 10 */ };
while (curr > s->len[t][i]) { bsW(s,2,3); curr--; /* 11 */ };
bsW ( s, 1, 0 );
}
}
if (s->verbosity >= 3)
VPrintf1 ( "code lengths %d, ", s->numZ-nBytes );
/*--- And finally, the block data proper ---*/
nBytes = s->numZ;
selCtr = 0;
gs = 0;
while (True) {
if (gs >= s->nMTF) break;
ge = gs + BZ_G_SIZE - 1;
if (ge >= s->nMTF) ge = s->nMTF-1;
AssertH ( s->selector[selCtr] < nGroups, 3006 );
if (nGroups == 6 && 50 == ge-gs+1) {
/*--- fast track the common case ---*/
UInt16 mtfv_i;
UChar* s_len_sel_selCtr
= &(s->len[s->selector[selCtr]][0]);
Int32* s_code_sel_selCtr
= &(s->code[s->selector[selCtr]][0]);
# define BZ_ITAH(nn) \
mtfv_i = mtfv[gs+(nn)]; \
bsW ( s, \
s_len_sel_selCtr[mtfv_i], \
s_code_sel_selCtr[mtfv_i] )
BZ_ITAH(0); BZ_ITAH(1); BZ_ITAH(2); BZ_ITAH(3); BZ_ITAH(4);
BZ_ITAH(5); BZ_ITAH(6); BZ_ITAH(7); BZ_ITAH(8); BZ_ITAH(9);
BZ_ITAH(10); BZ_ITAH(11); BZ_ITAH(12); BZ_ITAH(13); BZ_ITAH(14);
BZ_ITAH(15); BZ_ITAH(16); BZ_ITAH(17); BZ_ITAH(18); BZ_ITAH(19);
BZ_ITAH(20); BZ_ITAH(21); BZ_ITAH(22); BZ_ITAH(23); BZ_ITAH(24);
BZ_ITAH(25); BZ_ITAH(26); BZ_ITAH(27); BZ_ITAH(28); BZ_ITAH(29);
BZ_ITAH(30); BZ_ITAH(31); BZ_ITAH(32); BZ_ITAH(33); BZ_ITAH(34);
BZ_ITAH(35); BZ_ITAH(36); BZ_ITAH(37); BZ_ITAH(38); BZ_ITAH(39);
BZ_ITAH(40); BZ_ITAH(41); BZ_ITAH(42); BZ_ITAH(43); BZ_ITAH(44);
BZ_ITAH(45); BZ_ITAH(46); BZ_ITAH(47); BZ_ITAH(48); BZ_ITAH(49);
# undef BZ_ITAH
} else {
/*--- slow version which correctly handles all situations ---*/
for (i = gs; i <= ge; i++) {
bsW ( s,
s->len [s->selector[selCtr]] [mtfv[i]],
s->code [s->selector[selCtr]] [mtfv[i]] );
}
}
gs = ge+1;
selCtr++;
}
AssertH( selCtr == nSelectors, 3007 );
if (s->verbosity >= 3)
VPrintf1( "codes %d\n", s->numZ-nBytes );
}
/*---------------------------------------------------*/
void BZ2_compressBlock ( EState* s, Bool is_last_block )
{
if (s->nblock > 0) {
BZ_FINALISE_CRC ( s->blockCRC );
s->combinedCRC = (s->combinedCRC << 1) | (s->combinedCRC >> 31);
s->combinedCRC ^= s->blockCRC;
if (s->blockNo > 1) s->numZ = 0;
if (s->verbosity >= 2)
VPrintf4( " block %d: crc = 0x%8x, "
"combined CRC = 0x%8x, size = %d\n",
s->blockNo, s->blockCRC, s->combinedCRC, s->nblock );
BZ2_blockSort ( s );
}
s->zbits = (UChar*) (&((UChar*)s->arr2)[s->nblock]);
/*-- If this is the first block, create the stream header. --*/
if (s->blockNo == 1) {
BZ2_bsInitWrite ( s );
bsPutUChar ( s, 'B' );
bsPutUChar ( s, 'Z' );
bsPutUChar ( s, 'h' );
bsPutUChar ( s, (UChar)('0' + s->blockSize100k) );
}
if (s->nblock > 0) {
bsPutUChar ( s, 0x31 ); bsPutUChar ( s, 0x41 );
bsPutUChar ( s, 0x59 ); bsPutUChar ( s, 0x26 );
bsPutUChar ( s, 0x53 ); bsPutUChar ( s, 0x59 );
/*-- Now the block's CRC, so it is in a known place. --*/
bsPutUInt32 ( s, s->blockCRC );
/*--
Now a single bit indicating (non-)randomisation.
As of version 0.9.5, we use a better sorting algorithm
which makes randomisation unnecessary. So always set
the randomised bit to 'no'. Of course, the decoder
still needs to be able to handle randomised blocks
so as to maintain backwards compatibility with
older versions of bzip2.
--*/
bsW(s,1,0);
bsW ( s, 24, s->origPtr );
generateMTFValues ( s );
sendMTFValues ( s );
}
/*-- If this is the last block, add the stream trailer. --*/
if (is_last_block) {
bsPutUChar ( s, 0x17 ); bsPutUChar ( s, 0x72 );
bsPutUChar ( s, 0x45 ); bsPutUChar ( s, 0x38 );
bsPutUChar ( s, 0x50 ); bsPutUChar ( s, 0x90 );
bsPutUInt32 ( s, s->combinedCRC );
if (s->verbosity >= 2)
VPrintf1( " final combined CRC = 0x%x\n ", s->combinedCRC );
bsFinishWrite ( s );
}
}
/*-------------------------------------------------------------*/
/*--- end compress.c ---*/
/*-------------------------------------------------------------*/

View file

@ -0,0 +1,144 @@
/*-------------------------------------------------------------*/
/*--- Table for doing CRCs ---*/
/*--- crctable.c ---*/
/*-------------------------------------------------------------*/
/*--
This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Cambridge, UK.
jseward@acm.org
bzip2/libbzip2 version 1.0 of 21 March 2000
This program is based on (at least) the work of:
Mike Burrows
David Wheeler
Peter Fenwick
Alistair Moffat
Radford Neal
Ian H. Witten
Robert Sedgewick
Jon L. Bentley
For more information on these sources, see the manual.
--*/
#include "bzlib_private.h"
/*--
I think this is an implementation of the AUTODIN-II,
Ethernet & FDDI 32-bit CRC standard. Vaguely derived
from code by Rob Warnock, in Section 51 of the
comp.compression FAQ.
--*/
UInt32 BZ2_crc32Table[256] = {
/*-- Ugly, innit? --*/
0x00000000L, 0x04c11db7L, 0x09823b6eL, 0x0d4326d9L,
0x130476dcL, 0x17c56b6bL, 0x1a864db2L, 0x1e475005L,
0x2608edb8L, 0x22c9f00fL, 0x2f8ad6d6L, 0x2b4bcb61L,
0x350c9b64L, 0x31cd86d3L, 0x3c8ea00aL, 0x384fbdbdL,
0x4c11db70L, 0x48d0c6c7L, 0x4593e01eL, 0x4152fda9L,
0x5f15adacL, 0x5bd4b01bL, 0x569796c2L, 0x52568b75L,
0x6a1936c8L, 0x6ed82b7fL, 0x639b0da6L, 0x675a1011L,
0x791d4014L, 0x7ddc5da3L, 0x709f7b7aL, 0x745e66cdL,
0x9823b6e0L, 0x9ce2ab57L, 0x91a18d8eL, 0x95609039L,
0x8b27c03cL, 0x8fe6dd8bL, 0x82a5fb52L, 0x8664e6e5L,
0xbe2b5b58L, 0xbaea46efL, 0xb7a96036L, 0xb3687d81L,
0xad2f2d84L, 0xa9ee3033L, 0xa4ad16eaL, 0xa06c0b5dL,
0xd4326d90L, 0xd0f37027L, 0xddb056feL, 0xd9714b49L,
0xc7361b4cL, 0xc3f706fbL, 0xceb42022L, 0xca753d95L,
0xf23a8028L, 0xf6fb9d9fL, 0xfbb8bb46L, 0xff79a6f1L,
0xe13ef6f4L, 0xe5ffeb43L, 0xe8bccd9aL, 0xec7dd02dL,
0x34867077L, 0x30476dc0L, 0x3d044b19L, 0x39c556aeL,
0x278206abL, 0x23431b1cL, 0x2e003dc5L, 0x2ac12072L,
0x128e9dcfL, 0x164f8078L, 0x1b0ca6a1L, 0x1fcdbb16L,
0x018aeb13L, 0x054bf6a4L, 0x0808d07dL, 0x0cc9cdcaL,
0x7897ab07L, 0x7c56b6b0L, 0x71159069L, 0x75d48ddeL,
0x6b93dddbL, 0x6f52c06cL, 0x6211e6b5L, 0x66d0fb02L,
0x5e9f46bfL, 0x5a5e5b08L, 0x571d7dd1L, 0x53dc6066L,
0x4d9b3063L, 0x495a2dd4L, 0x44190b0dL, 0x40d816baL,
0xaca5c697L, 0xa864db20L, 0xa527fdf9L, 0xa1e6e04eL,
0xbfa1b04bL, 0xbb60adfcL, 0xb6238b25L, 0xb2e29692L,
0x8aad2b2fL, 0x8e6c3698L, 0x832f1041L, 0x87ee0df6L,
0x99a95df3L, 0x9d684044L, 0x902b669dL, 0x94ea7b2aL,
0xe0b41de7L, 0xe4750050L, 0xe9362689L, 0xedf73b3eL,
0xf3b06b3bL, 0xf771768cL, 0xfa325055L, 0xfef34de2L,
0xc6bcf05fL, 0xc27dede8L, 0xcf3ecb31L, 0xcbffd686L,
0xd5b88683L, 0xd1799b34L, 0xdc3abdedL, 0xd8fba05aL,
0x690ce0eeL, 0x6dcdfd59L, 0x608edb80L, 0x644fc637L,
0x7a089632L, 0x7ec98b85L, 0x738aad5cL, 0x774bb0ebL,
0x4f040d56L, 0x4bc510e1L, 0x46863638L, 0x42472b8fL,
0x5c007b8aL, 0x58c1663dL, 0x558240e4L, 0x51435d53L,
0x251d3b9eL, 0x21dc2629L, 0x2c9f00f0L, 0x285e1d47L,
0x36194d42L, 0x32d850f5L, 0x3f9b762cL, 0x3b5a6b9bL,
0x0315d626L, 0x07d4cb91L, 0x0a97ed48L, 0x0e56f0ffL,
0x1011a0faL, 0x14d0bd4dL, 0x19939b94L, 0x1d528623L,
0xf12f560eL, 0xf5ee4bb9L, 0xf8ad6d60L, 0xfc6c70d7L,
0xe22b20d2L, 0xe6ea3d65L, 0xeba91bbcL, 0xef68060bL,
0xd727bbb6L, 0xd3e6a601L, 0xdea580d8L, 0xda649d6fL,
0xc423cd6aL, 0xc0e2d0ddL, 0xcda1f604L, 0xc960ebb3L,
0xbd3e8d7eL, 0xb9ff90c9L, 0xb4bcb610L, 0xb07daba7L,
0xae3afba2L, 0xaafbe615L, 0xa7b8c0ccL, 0xa379dd7bL,
0x9b3660c6L, 0x9ff77d71L, 0x92b45ba8L, 0x9675461fL,
0x8832161aL, 0x8cf30badL, 0x81b02d74L, 0x857130c3L,
0x5d8a9099L, 0x594b8d2eL, 0x5408abf7L, 0x50c9b640L,
0x4e8ee645L, 0x4a4ffbf2L, 0x470cdd2bL, 0x43cdc09cL,
0x7b827d21L, 0x7f436096L, 0x7200464fL, 0x76c15bf8L,
0x68860bfdL, 0x6c47164aL, 0x61043093L, 0x65c52d24L,
0x119b4be9L, 0x155a565eL, 0x18197087L, 0x1cd86d30L,
0x029f3d35L, 0x065e2082L, 0x0b1d065bL, 0x0fdc1becL,
0x3793a651L, 0x3352bbe6L, 0x3e119d3fL, 0x3ad08088L,
0x2497d08dL, 0x2056cd3aL, 0x2d15ebe3L, 0x29d4f654L,
0xc5a92679L, 0xc1683bceL, 0xcc2b1d17L, 0xc8ea00a0L,
0xd6ad50a5L, 0xd26c4d12L, 0xdf2f6bcbL, 0xdbee767cL,
0xe3a1cbc1L, 0xe760d676L, 0xea23f0afL, 0xeee2ed18L,
0xf0a5bd1dL, 0xf464a0aaL, 0xf9278673L, 0xfde69bc4L,
0x89b8fd09L, 0x8d79e0beL, 0x803ac667L, 0x84fbdbd0L,
0x9abc8bd5L, 0x9e7d9662L, 0x933eb0bbL, 0x97ffad0cL,
0xafb010b1L, 0xab710d06L, 0xa6322bdfL, 0xa2f33668L,
0xbcb4666dL, 0xb8757bdaL, 0xb5365d03L, 0xb1f740b4L
};
/*-------------------------------------------------------------*/
/*--- end crctable.c ---*/
/*-------------------------------------------------------------*/

View file

@ -0,0 +1,660 @@
/*-------------------------------------------------------------*/
/*--- Decompression machinery ---*/
/*--- decompress.c ---*/
/*-------------------------------------------------------------*/
/*--
This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Cambridge, UK.
jseward@acm.org
bzip2/libbzip2 version 1.0 of 21 March 2000
This program is based on (at least) the work of:
Mike Burrows
David Wheeler
Peter Fenwick
Alistair Moffat
Radford Neal
Ian H. Witten
Robert Sedgewick
Jon L. Bentley
For more information on these sources, see the manual.
--*/
#include "bzlib_private.h"
/*---------------------------------------------------*/
static
void makeMaps_d ( DState* s )
{
Int32 i;
s->nInUse = 0;
for (i = 0; i < 256; i++)
if (s->inUse[i]) {
s->seqToUnseq[s->nInUse] = i;
s->nInUse++;
}
}
/*---------------------------------------------------*/
#define RETURN(rrr) \
{ retVal = rrr; goto save_state_and_return; };
#define GET_BITS(lll,vvv,nnn) \
case lll: s->state = lll; \
while (True) { \
if (s->bsLive >= nnn) { \
UInt32 v; \
v = (s->bsBuff >> \
(s->bsLive-nnn)) & ((1 << nnn)-1); \
s->bsLive -= nnn; \
vvv = v; \
break; \
} \
if (s->strm->avail_in == 0) RETURN(BZ_OK); \
s->bsBuff \
= (s->bsBuff << 8) | \
((UInt32) \
(*((UChar*)(s->strm->next_in)))); \
s->bsLive += 8; \
s->strm->next_in++; \
s->strm->avail_in--; \
s->strm->total_in_lo32++; \
if (s->strm->total_in_lo32 == 0) \
s->strm->total_in_hi32++; \
}
#define GET_UCHAR(lll,uuu) \
GET_BITS(lll,uuu,8)
#define GET_BIT(lll,uuu) \
GET_BITS(lll,uuu,1)
/*---------------------------------------------------*/
#define GET_MTF_VAL(label1,label2,lval) \
{ \
if (groupPos == 0) { \
groupNo++; \
if (groupNo >= nSelectors) \
RETURN(BZ_DATA_ERROR); \
groupPos = BZ_G_SIZE; \
gSel = s->selector[groupNo]; \
gMinlen = s->minLens[gSel]; \
gLimit = &(s->limit[gSel][0]); \
gPerm = &(s->perm[gSel][0]); \
gBase = &(s->base[gSel][0]); \
} \
groupPos--; \
zn = gMinlen; \
GET_BITS(label1, zvec, zn); \
while (1) { \
if (zn > 20 /* the longest code */) \
RETURN(BZ_DATA_ERROR); \
if (zvec <= gLimit[zn]) break; \
zn++; \
GET_BIT(label2, zj); \
zvec = (zvec << 1) | zj; \
}; \
if (zvec - gBase[zn] < 0 \
|| zvec - gBase[zn] >= BZ_MAX_ALPHA_SIZE) \
RETURN(BZ_DATA_ERROR); \
lval = gPerm[zvec - gBase[zn]]; \
}
/*---------------------------------------------------*/
Int32 BZ2_decompress ( DState* s )
{
UChar uc;
Int32 retVal;
Int32 minLen, maxLen;
bz_stream* strm = s->strm;
/* stuff that needs to be saved/restored */
Int32 i;
Int32 j;
Int32 t;
Int32 alphaSize;
Int32 nGroups;
Int32 nSelectors;
Int32 EOB;
Int32 groupNo;
Int32 groupPos;
Int32 nextSym;
Int32 nblockMAX;
Int32 nblock;
Int32 es;
Int32 N;
Int32 curr;
Int32 zt;
Int32 zn;
Int32 zvec;
Int32 zj;
Int32 gSel;
Int32 gMinlen;
Int32* gLimit;
Int32* gBase;
Int32* gPerm;
if (s->state == BZ_X_MAGIC_1) {
/*initialise the save area*/
s->save_i = 0;
s->save_j = 0;
s->save_t = 0;
s->save_alphaSize = 0;
s->save_nGroups = 0;
s->save_nSelectors = 0;
s->save_EOB = 0;
s->save_groupNo = 0;
s->save_groupPos = 0;
s->save_nextSym = 0;
s->save_nblockMAX = 0;
s->save_nblock = 0;
s->save_es = 0;
s->save_N = 0;
s->save_curr = 0;
s->save_zt = 0;
s->save_zn = 0;
s->save_zvec = 0;
s->save_zj = 0;
s->save_gSel = 0;
s->save_gMinlen = 0;
s->save_gLimit = NULL;
s->save_gBase = NULL;
s->save_gPerm = NULL;
}
/*restore from the save area*/
i = s->save_i;
j = s->save_j;
t = s->save_t;
alphaSize = s->save_alphaSize;
nGroups = s->save_nGroups;
nSelectors = s->save_nSelectors;
EOB = s->save_EOB;
groupNo = s->save_groupNo;
groupPos = s->save_groupPos;
nextSym = s->save_nextSym;
nblockMAX = s->save_nblockMAX;
nblock = s->save_nblock;
es = s->save_es;
N = s->save_N;
curr = s->save_curr;
zt = s->save_zt;
zn = s->save_zn;
zvec = s->save_zvec;
zj = s->save_zj;
gSel = s->save_gSel;
gMinlen = s->save_gMinlen;
gLimit = s->save_gLimit;
gBase = s->save_gBase;
gPerm = s->save_gPerm;
retVal = BZ_OK;
switch (s->state) {
GET_UCHAR(BZ_X_MAGIC_1, uc);
if (uc != 'B') RETURN(BZ_DATA_ERROR_MAGIC);
GET_UCHAR(BZ_X_MAGIC_2, uc);
if (uc != 'Z') RETURN(BZ_DATA_ERROR_MAGIC);
GET_UCHAR(BZ_X_MAGIC_3, uc)
if (uc != 'h') RETURN(BZ_DATA_ERROR_MAGIC);
GET_BITS(BZ_X_MAGIC_4, s->blockSize100k, 8)
if (s->blockSize100k < '1' ||
s->blockSize100k > '9') RETURN(BZ_DATA_ERROR_MAGIC);
s->blockSize100k -= '0';
if (s->smallDecompress) {
s->ll16 = BZALLOC( s->blockSize100k * 100000 * sizeof(UInt16) );
s->ll4 = BZALLOC(
((1 + s->blockSize100k * 100000) >> 1) * sizeof(UChar)
);
if (s->ll16 == NULL || s->ll4 == NULL) RETURN(BZ_MEM_ERROR);
} else {
s->tt = BZALLOC( s->blockSize100k * 100000 * sizeof(Int32) );
if (s->tt == NULL) RETURN(BZ_MEM_ERROR);
}
GET_UCHAR(BZ_X_BLKHDR_1, uc);
if (uc == 0x17) goto endhdr_2;
if (uc != 0x31) RETURN(BZ_DATA_ERROR);
GET_UCHAR(BZ_X_BLKHDR_2, uc);
if (uc != 0x41) RETURN(BZ_DATA_ERROR);
GET_UCHAR(BZ_X_BLKHDR_3, uc);
if (uc != 0x59) RETURN(BZ_DATA_ERROR);
GET_UCHAR(BZ_X_BLKHDR_4, uc);
if (uc != 0x26) RETURN(BZ_DATA_ERROR);
GET_UCHAR(BZ_X_BLKHDR_5, uc);
if (uc != 0x53) RETURN(BZ_DATA_ERROR);
GET_UCHAR(BZ_X_BLKHDR_6, uc);
if (uc != 0x59) RETURN(BZ_DATA_ERROR);
s->currBlockNo++;
if (s->verbosity >= 2)
VPrintf1 ( "\n [%d: huff+mtf ", s->currBlockNo );
s->storedBlockCRC = 0;
GET_UCHAR(BZ_X_BCRC_1, uc);
s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
GET_UCHAR(BZ_X_BCRC_2, uc);
s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
GET_UCHAR(BZ_X_BCRC_3, uc);
s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
GET_UCHAR(BZ_X_BCRC_4, uc);
s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
GET_BITS(BZ_X_RANDBIT, s->blockRandomised, 1);
s->origPtr = 0;
GET_UCHAR(BZ_X_ORIGPTR_1, uc);
s->origPtr = (s->origPtr << 8) | ((Int32)uc);
GET_UCHAR(BZ_X_ORIGPTR_2, uc);
s->origPtr = (s->origPtr << 8) | ((Int32)uc);
GET_UCHAR(BZ_X_ORIGPTR_3, uc);
s->origPtr = (s->origPtr << 8) | ((Int32)uc);
if (s->origPtr < 0)
RETURN(BZ_DATA_ERROR);
if (s->origPtr > 10 + 100000*s->blockSize100k)
RETURN(BZ_DATA_ERROR);
/*--- Receive the mapping table ---*/
for (i = 0; i < 16; i++) {
GET_BIT(BZ_X_MAPPING_1, uc);
if (uc == 1)
s->inUse16[i] = True; else
s->inUse16[i] = False;
}
for (i = 0; i < 256; i++) s->inUse[i] = False;
for (i = 0; i < 16; i++)
if (s->inUse16[i])
for (j = 0; j < 16; j++) {
GET_BIT(BZ_X_MAPPING_2, uc);
if (uc == 1) s->inUse[i * 16 + j] = True;
}
makeMaps_d ( s );
if (s->nInUse == 0) RETURN(BZ_DATA_ERROR);
alphaSize = s->nInUse+2;
/*--- Now the selectors ---*/
GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
if (nGroups < 2 || nGroups > 6) RETURN(BZ_DATA_ERROR);
GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
if (nSelectors < 1) RETURN(BZ_DATA_ERROR);
for (i = 0; i < nSelectors; i++) {
j = 0;
while (True) {
GET_BIT(BZ_X_SELECTOR_3, uc);
if (uc == 0) break;
j++;
if (j >= nGroups) RETURN(BZ_DATA_ERROR);
}
s->selectorMtf[i] = j;
}
/*--- Undo the MTF values for the selectors. ---*/
{
UChar pos[BZ_N_GROUPS], tmp, v;
for (v = 0; v < nGroups; v++) pos[v] = v;
for (i = 0; i < nSelectors; i++) {
v = s->selectorMtf[i];
tmp = pos[v];
while (v > 0) { pos[v] = pos[v-1]; v--; }
pos[0] = tmp;
s->selector[i] = tmp;
}
}
/*--- Now the coding tables ---*/
for (t = 0; t < nGroups; t++) {
GET_BITS(BZ_X_CODING_1, curr, 5);
for (i = 0; i < alphaSize; i++) {
while (True) {
if (curr < 1 || curr > 20) RETURN(BZ_DATA_ERROR);
GET_BIT(BZ_X_CODING_2, uc);
if (uc == 0) break;
GET_BIT(BZ_X_CODING_3, uc);
if (uc == 0) curr++; else curr--;
}
s->len[t][i] = curr;
}
}
/*--- Create the Huffman decoding tables ---*/
for (t = 0; t < nGroups; t++) {
minLen = 32;
maxLen = 0;
for (i = 0; i < alphaSize; i++) {
if (s->len[t][i] > maxLen) maxLen = s->len[t][i];
if (s->len[t][i] < minLen) minLen = s->len[t][i];
}
BZ2_hbCreateDecodeTables (
&(s->limit[t][0]),
&(s->base[t][0]),
&(s->perm[t][0]),
&(s->len[t][0]),
minLen, maxLen, alphaSize
);
s->minLens[t] = minLen;
}
/*--- Now the MTF values ---*/
EOB = s->nInUse+1;
nblockMAX = 100000 * s->blockSize100k;
groupNo = -1;
groupPos = 0;
for (i = 0; i <= 255; i++) s->unzftab[i] = 0;
/*-- MTF init --*/
{
Int32 ii, jj, kk;
kk = MTFA_SIZE-1;
for (ii = 256 / MTFL_SIZE - 1; ii >= 0; ii--) {
for (jj = MTFL_SIZE-1; jj >= 0; jj--) {
s->mtfa[kk] = (UChar)(ii * MTFL_SIZE + jj);
kk--;
}
s->mtfbase[ii] = kk + 1;
}
}
/*-- end MTF init --*/
nblock = 0;
GET_MTF_VAL(BZ_X_MTF_1, BZ_X_MTF_2, nextSym);
while (True) {
if (nextSym == EOB) break;
if (nextSym == BZ_RUNA || nextSym == BZ_RUNB) {
es = -1;
N = 1;
do {
if (nextSym == BZ_RUNA) es = es + (0+1) * N; else
if (nextSym == BZ_RUNB) es = es + (1+1) * N;
N = N * 2;
GET_MTF_VAL(BZ_X_MTF_3, BZ_X_MTF_4, nextSym);
}
while (nextSym == BZ_RUNA || nextSym == BZ_RUNB);
es++;
uc = s->seqToUnseq[ s->mtfa[s->mtfbase[0]] ];
s->unzftab[uc] += es;
if (s->smallDecompress)
while (es > 0) {
if (nblock >= nblockMAX) RETURN(BZ_DATA_ERROR);
s->ll16[nblock] = (UInt16)uc;
nblock++;
es--;
}
else
while (es > 0) {
if (nblock >= nblockMAX) RETURN(BZ_DATA_ERROR);
s->tt[nblock] = (UInt32)uc;
nblock++;
es--;
};
continue;
} else {
if (nblock >= nblockMAX) RETURN(BZ_DATA_ERROR);
/*-- uc = MTF ( nextSym-1 ) --*/
{
Int32 ii, jj, kk, pp, lno, off;
UInt32 nn;
nn = (UInt32)(nextSym - 1);
if (nn < MTFL_SIZE) {
/* avoid general-case expense */
pp = s->mtfbase[0];
uc = s->mtfa[pp+nn];
while (nn > 3) {
Int32 z = pp+nn;
s->mtfa[(z) ] = s->mtfa[(z)-1];
s->mtfa[(z)-1] = s->mtfa[(z)-2];
s->mtfa[(z)-2] = s->mtfa[(z)-3];
s->mtfa[(z)-3] = s->mtfa[(z)-4];
nn -= 4;
}
while (nn > 0) {
s->mtfa[(pp+nn)] = s->mtfa[(pp+nn)-1]; nn--;
};
s->mtfa[pp] = uc;
} else {
/* general case */
lno = nn / MTFL_SIZE;
off = nn % MTFL_SIZE;
pp = s->mtfbase[lno] + off;
uc = s->mtfa[pp];
while (pp > s->mtfbase[lno]) {
s->mtfa[pp] = s->mtfa[pp-1]; pp--;
};
s->mtfbase[lno]++;
while (lno > 0) {
s->mtfbase[lno]--;
s->mtfa[s->mtfbase[lno]]
= s->mtfa[s->mtfbase[lno-1] + MTFL_SIZE - 1];
lno--;
}
s->mtfbase[0]--;
s->mtfa[s->mtfbase[0]] = uc;
if (s->mtfbase[0] == 0) {
kk = MTFA_SIZE-1;
for (ii = 256 / MTFL_SIZE-1; ii >= 0; ii--) {
for (jj = MTFL_SIZE-1; jj >= 0; jj--) {
s->mtfa[kk] = s->mtfa[s->mtfbase[ii] + jj];
kk--;
}
s->mtfbase[ii] = kk + 1;
}
}
}
}
/*-- end uc = MTF ( nextSym-1 ) --*/
s->unzftab[s->seqToUnseq[uc]]++;
if (s->smallDecompress)
s->ll16[nblock] = (UInt16)(s->seqToUnseq[uc]); else
s->tt[nblock] = (UInt32)(s->seqToUnseq[uc]);
nblock++;
GET_MTF_VAL(BZ_X_MTF_5, BZ_X_MTF_6, nextSym);
continue;
}
}
/* Now we know what nblock is, we can do a better sanity
check on s->origPtr.
*/
if (s->origPtr < 0 || s->origPtr >= nblock)
RETURN(BZ_DATA_ERROR);
s->state_out_len = 0;
s->state_out_ch = 0;
BZ_INITIALISE_CRC ( s->calculatedBlockCRC );
s->state = BZ_X_OUTPUT;
if (s->verbosity >= 2) VPrintf0 ( "rt+rld" );
/*-- Set up cftab to facilitate generation of T^(-1) --*/
s->cftab[0] = 0;
for (i = 1; i <= 256; i++) s->cftab[i] = s->unzftab[i-1];
for (i = 1; i <= 256; i++) s->cftab[i] += s->cftab[i-1];
if (s->smallDecompress) {
/*-- Make a copy of cftab, used in generation of T --*/
for (i = 0; i <= 256; i++) s->cftabCopy[i] = s->cftab[i];
/*-- compute the T vector --*/
for (i = 0; i < nblock; i++) {
uc = (UChar)(s->ll16[i]);
SET_LL(i, s->cftabCopy[uc]);
s->cftabCopy[uc]++;
}
/*-- Compute T^(-1) by pointer reversal on T --*/
i = s->origPtr;
j = GET_LL(i);
do {
Int32 tmp = GET_LL(j);
SET_LL(j, i);
i = j;
j = tmp;
}
while (i != s->origPtr);
s->tPos = s->origPtr;
s->nblock_used = 0;
if (s->blockRandomised) {
BZ_RAND_INIT_MASK;
BZ_GET_SMALL(s->k0); s->nblock_used++;
BZ_RAND_UPD_MASK; s->k0 ^= BZ_RAND_MASK;
} else {
BZ_GET_SMALL(s->k0); s->nblock_used++;
}
} else {
/*-- compute the T^(-1) vector --*/
for (i = 0; i < nblock; i++) {
uc = (UChar)(s->tt[i] & 0xff);
s->tt[s->cftab[uc]] |= (i << 8);
s->cftab[uc]++;
}
s->tPos = s->tt[s->origPtr] >> 8;
s->nblock_used = 0;
if (s->blockRandomised) {
BZ_RAND_INIT_MASK;
BZ_GET_FAST(s->k0); s->nblock_used++;
BZ_RAND_UPD_MASK; s->k0 ^= BZ_RAND_MASK;
} else {
BZ_GET_FAST(s->k0); s->nblock_used++;
}
}
RETURN(BZ_OK);
endhdr_2:
GET_UCHAR(BZ_X_ENDHDR_2, uc);
if (uc != 0x72) RETURN(BZ_DATA_ERROR);
GET_UCHAR(BZ_X_ENDHDR_3, uc);
if (uc != 0x45) RETURN(BZ_DATA_ERROR);
GET_UCHAR(BZ_X_ENDHDR_4, uc);
if (uc != 0x38) RETURN(BZ_DATA_ERROR);
GET_UCHAR(BZ_X_ENDHDR_5, uc);
if (uc != 0x50) RETURN(BZ_DATA_ERROR);
GET_UCHAR(BZ_X_ENDHDR_6, uc);
if (uc != 0x90) RETURN(BZ_DATA_ERROR);
s->storedCombinedCRC = 0;
GET_UCHAR(BZ_X_CCRC_1, uc);
s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
GET_UCHAR(BZ_X_CCRC_2, uc);
s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
GET_UCHAR(BZ_X_CCRC_3, uc);
s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
GET_UCHAR(BZ_X_CCRC_4, uc);
s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
s->state = BZ_X_IDLE;
RETURN(BZ_STREAM_END);
default: AssertH ( False, 4001 );
}
AssertH ( False, 4002 );
save_state_and_return:
s->save_i = i;
s->save_j = j;
s->save_t = t;
s->save_alphaSize = alphaSize;
s->save_nGroups = nGroups;
s->save_nSelectors = nSelectors;
s->save_EOB = EOB;
s->save_groupNo = groupNo;
s->save_groupPos = groupPos;
s->save_nextSym = nextSym;
s->save_nblockMAX = nblockMAX;
s->save_nblock = nblock;
s->save_es = es;
s->save_N = N;
s->save_curr = curr;
s->save_zt = zt;
s->save_zn = zn;
s->save_zvec = zvec;
s->save_zj = zj;
s->save_gSel = gSel;
s->save_gMinlen = gMinlen;
s->save_gLimit = gLimit;
s->save_gBase = gBase;
s->save_gPerm = gPerm;
return retVal;
}
/*-------------------------------------------------------------*/
/*--- end decompress.c ---*/
/*-------------------------------------------------------------*/

View file

@ -0,0 +1,10 @@
int _stdcall DllMain( unsigned long a, unsigned long b, unsigned long c )
{
return 1;
}
void bz_internal_error ( int errcode )
{
return;
}

View file

@ -0,0 +1,176 @@
/*
minibz2
libbz2.dll test program.
by Yoshioka Tsuneo(QWF00133@nifty.ne.jp/tsuneo-y@is.aist-nara.ac.jp)
This file is Public Domain.
welcome any email to me.
usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename]
*/
#define BZ_IMPORT
#include <stdio.h>
#include <stdlib.h>
#include "bzlib.h"
#ifdef _WIN32
#include <io.h>
#endif
#ifdef _WIN32
#define BZ2_LIBNAME "libbz2-1.0.0.DLL"
#include <windows.h>
static int BZ2DLLLoaded = 0;
static HINSTANCE BZ2DLLhLib;
int BZ2DLLLoadLibrary(void)
{
HINSTANCE hLib;
if(BZ2DLLLoaded==1){return 0;}
hLib=LoadLibrary(BZ2_LIBNAME);
if(hLib == NULL){
fprintf(stderr,"Can't load %s\n",BZ2_LIBNAME);
return -1;
}
BZ2_bzlibVersion=GetProcAddress(hLib,"BZ2_bzlibVersion");
BZ2_bzopen=GetProcAddress(hLib,"BZ2_bzopen");
BZ2_bzdopen=GetProcAddress(hLib,"BZ2_bzdopen");
BZ2_bzread=GetProcAddress(hLib,"BZ2_bzread");
BZ2_bzwrite=GetProcAddress(hLib,"BZ2_bzwrite");
BZ2_bzflush=GetProcAddress(hLib,"BZ2_bzflush");
BZ2_bzclose=GetProcAddress(hLib,"BZ2_bzclose");
BZ2_bzerror=GetProcAddress(hLib,"BZ2_bzerror");
if (!BZ2_bzlibVersion || !BZ2_bzopen || !BZ2_bzdopen
|| !BZ2_bzread || !BZ2_bzwrite || !BZ2_bzflush
|| !BZ2_bzclose || !BZ2_bzerror) {
fprintf(stderr,"GetProcAddress failed.\n");
return -1;
}
BZ2DLLLoaded=1;
BZ2DLLhLib=hLib;
return 0;
}
int BZ2DLLFreeLibrary(void)
{
if(BZ2DLLLoaded==0){return 0;}
FreeLibrary(BZ2DLLhLib);
BZ2DLLLoaded=0;
}
#endif /* WIN32 */
void usage(void)
{
puts("usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename]");
}
int main(int argc,char *argv[])
{
int decompress = 0;
int level = 9;
char *fn_r = NULL;
char *fn_w = NULL;
#ifdef _WIN32
if(BZ2DLLLoadLibrary()<0){
fprintf(stderr,"Loading of %s failed. Giving up.\n", BZ2_LIBNAME);
exit(1);
}
printf("Loading of %s succeeded. Library version is %s.\n",
BZ2_LIBNAME, BZ2_bzlibVersion() );
#endif
while(++argv,--argc){
if(**argv =='-' || **argv=='/'){
char *p;
for(p=*argv+1;*p;p++){
if(*p=='d'){
decompress = 1;
}else if('1'<=*p && *p<='9'){
level = *p - '0';
}else{
usage();
exit(1);
}
}
}else{
break;
}
}
if(argc>=1){
fn_r = *argv;
argc--;argv++;
}else{
fn_r = NULL;
}
if(argc>=1){
fn_w = *argv;
argc--;argv++;
}else{
fn_w = NULL;
}
{
int len;
char buff[0x1000];
char mode[10];
if(decompress){
BZFILE *BZ2fp_r = NULL;
FILE *fp_w = NULL;
if(fn_w){
if((fp_w = fopen(fn_w,"wb"))==NULL){
printf("can't open [%s]\n",fn_w);
perror("reason:");
exit(1);
}
}else{
fp_w = stdout;
}
if((BZ2fp_r == NULL && (BZ2fp_r = BZ2_bzdopen(fileno(stdin),"rb"))==NULL)
|| (BZ2fp_r != NULL && (BZ2fp_r = BZ2_bzopen(fn_r,"rb"))==NULL)){
printf("can't bz2openstream\n");
exit(1);
}
while((len=BZ2_bzread(BZ2fp_r,buff,0x1000))>0){
fwrite(buff,1,len,fp_w);
}
BZ2_bzclose(BZ2fp_r);
if(fp_w != stdout) fclose(fp_w);
}else{
BZFILE *BZ2fp_w = NULL;
FILE *fp_r = NULL;
if(fn_r){
if((fp_r = fopen(fn_r,"rb"))==NULL){
printf("can't open [%s]\n",fn_r);
perror("reason:");
exit(1);
}
}else{
fp_r = stdin;
}
mode[0]='w';
mode[1] = '0' + level;
mode[2] = '\0';
if((fn_w == NULL && (BZ2fp_w = BZ2_bzdopen(fileno(stdout),mode))==NULL)
|| (fn_w !=NULL && (BZ2fp_w = BZ2_bzopen(fn_w,mode))==NULL)){
printf("can't bz2openstream\n");
exit(1);
}
while((len=fread(buff,1,0x1000,fp_r))>0){
BZ2_bzwrite(BZ2fp_w,buff,len);
}
BZ2_bzclose(BZ2fp_w);
if(fp_r!=stdin)fclose(fp_r);
}
}
#ifdef _WIN32
BZ2DLLFreeLibrary();
#endif
return 0;
}

View file

@ -0,0 +1,93 @@
# Microsoft Developer Studio Project File - Name="dlltest" - Package Owner=<4>
# Microsoft Developer Studio Generated Build File, Format Version 5.00
# ** 編集しないでください **
# TARGTYPE "Win32 (x86) Console Application" 0x0103
CFG=dlltest - Win32 Debug
!MESSAGE これは有効なメイクファイルではありません。 このプロジェクトをビルドするためには NMAKE を使用してください。
!MESSAGE [メイクファイルのエクスポート] コマンドを使用して実行してください
!MESSAGE
!MESSAGE NMAKE /f "dlltest.mak".
!MESSAGE
!MESSAGE NMAKE の実行時に構成を指定できます
!MESSAGE コマンド ライン上でマクロの設定を定義します。例:
!MESSAGE
!MESSAGE NMAKE /f "dlltest.mak" CFG="dlltest - Win32 Debug"
!MESSAGE
!MESSAGE 選択可能なビルド モード:
!MESSAGE
!MESSAGE "dlltest - Win32 Release" ("Win32 (x86) Console Application" 用)
!MESSAGE "dlltest - Win32 Debug" ("Win32 (x86) Console Application" 用)
!MESSAGE
# Begin Project
# PROP Scc_ProjName ""
# PROP Scc_LocalPath ""
CPP=cl.exe
RSC=rc.exe
!IF "$(CFG)" == "dlltest - Win32 Release"
# PROP BASE Use_MFC 0
# PROP BASE Use_Debug_Libraries 0
# PROP BASE Output_Dir "Release"
# PROP BASE Intermediate_Dir "Release"
# PROP BASE Target_Dir ""
# PROP Use_MFC 0
# PROP Use_Debug_Libraries 0
# PROP Output_Dir "Release"
# PROP Intermediate_Dir "Release"
# PROP Ignore_Export_Lib 0
# PROP Target_Dir ""
# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
# ADD CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
# ADD BASE RSC /l 0x411 /d "NDEBUG"
# ADD RSC /l 0x411 /d "NDEBUG"
BSC32=bscmake.exe
# ADD BASE BSC32 /nologo
# ADD BSC32 /nologo
LINK32=link.exe
# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386 /out:"minibz2.exe"
!ELSEIF "$(CFG)" == "dlltest - Win32 Debug"
# PROP BASE Use_MFC 0
# PROP BASE Use_Debug_Libraries 1
# PROP BASE Output_Dir "dlltest_"
# PROP BASE Intermediate_Dir "dlltest_"
# PROP BASE Target_Dir ""
# PROP Use_MFC 0
# PROP Use_Debug_Libraries 1
# PROP Output_Dir "dlltest_"
# PROP Intermediate_Dir "dlltest_"
# PROP Ignore_Export_Lib 0
# PROP Target_Dir ""
# ADD BASE CPP /nologo /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
# ADD CPP /nologo /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
# ADD BASE RSC /l 0x411 /d "_DEBUG"
# ADD RSC /l 0x411 /d "_DEBUG"
BSC32=bscmake.exe
# ADD BASE BSC32 /nologo
# ADD BSC32 /nologo
LINK32=link.exe
# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /out:"minibz2.exe" /pdbtype:sept
!ENDIF
# Begin Target
# Name "dlltest - Win32 Release"
# Name "dlltest - Win32 Debug"
# Begin Source File
SOURCE=.\bzlib.h
# End Source File
# Begin Source File
SOURCE=.\dlltest.c
# End Source File
# End Target
# End Project

View file

@ -0,0 +1,228 @@
/*-------------------------------------------------------------*/
/*--- Huffman coding low-level stuff ---*/
/*--- huffman.c ---*/
/*-------------------------------------------------------------*/
/*--
This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Cambridge, UK.
jseward@acm.org
bzip2/libbzip2 version 1.0 of 21 March 2000
This program is based on (at least) the work of:
Mike Burrows
David Wheeler
Peter Fenwick
Alistair Moffat
Radford Neal
Ian H. Witten
Robert Sedgewick
Jon L. Bentley
For more information on these sources, see the manual.
--*/
#include "bzlib_private.h"
/*---------------------------------------------------*/
#define WEIGHTOF(zz0) ((zz0) & 0xffffff00)
#define DEPTHOF(zz1) ((zz1) & 0x000000ff)
#define MYMAX(zz2,zz3) ((zz2) > (zz3) ? (zz2) : (zz3))
#define ADDWEIGHTS(zw1,zw2) \
(WEIGHTOF(zw1)+WEIGHTOF(zw2)) | \
(1 + MYMAX(DEPTHOF(zw1),DEPTHOF(zw2)))
#define UPHEAP(z) \
{ \
Int32 zz, tmp; \
zz = z; tmp = heap[zz]; \
while (weight[tmp] < weight[heap[zz >> 1]]) { \
heap[zz] = heap[zz >> 1]; \
zz >>= 1; \
} \
heap[zz] = tmp; \
}
#define DOWNHEAP(z) \
{ \
Int32 zz, yy, tmp; \
zz = z; tmp = heap[zz]; \
while (True) { \
yy = zz << 1; \
if (yy > nHeap) break; \
if (yy < nHeap && \
weight[heap[yy+1]] < weight[heap[yy]]) \
yy++; \
if (weight[tmp] < weight[heap[yy]]) break; \
heap[zz] = heap[yy]; \
zz = yy; \
} \
heap[zz] = tmp; \
}
/*---------------------------------------------------*/
void BZ2_hbMakeCodeLengths ( UChar *len,
Int32 *freq,
Int32 alphaSize,
Int32 maxLen )
{
/*--
Nodes and heap entries run from 1. Entry 0
for both the heap and nodes is a sentinel.
--*/
Int32 nNodes, nHeap, n1, n2, i, j, k;
Bool tooLong;
Int32 heap [ BZ_MAX_ALPHA_SIZE + 2 ];
Int32 weight [ BZ_MAX_ALPHA_SIZE * 2 ];
Int32 parent [ BZ_MAX_ALPHA_SIZE * 2 ];
for (i = 0; i < alphaSize; i++)
weight[i+1] = (freq[i] == 0 ? 1 : freq[i]) << 8;
while (True) {
nNodes = alphaSize;
nHeap = 0;
heap[0] = 0;
weight[0] = 0;
parent[0] = -2;
for (i = 1; i <= alphaSize; i++) {
parent[i] = -1;
nHeap++;
heap[nHeap] = i;
UPHEAP(nHeap);
}
AssertH( nHeap < (BZ_MAX_ALPHA_SIZE+2), 2001 );
while (nHeap > 1) {
n1 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1);
n2 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1);
nNodes++;
parent[n1] = parent[n2] = nNodes;
weight[nNodes] = ADDWEIGHTS(weight[n1], weight[n2]);
parent[nNodes] = -1;
nHeap++;
heap[nHeap] = nNodes;
UPHEAP(nHeap);
}
AssertH( nNodes < (BZ_MAX_ALPHA_SIZE * 2), 2002 );
tooLong = False;
for (i = 1; i <= alphaSize; i++) {
j = 0;
k = i;
while (parent[k] >= 0) { k = parent[k]; j++; }
len[i-1] = j;
if (j > maxLen) tooLong = True;
}
if (! tooLong) break;
for (i = 1; i < alphaSize; i++) {
j = weight[i] >> 8;
j = 1 + (j / 2);
weight[i] = j << 8;
}
}
}
/*---------------------------------------------------*/
void BZ2_hbAssignCodes ( Int32 *code,
UChar *length,
Int32 minLen,
Int32 maxLen,
Int32 alphaSize )
{
Int32 n, vec, i;
vec = 0;
for (n = minLen; n <= maxLen; n++) {
for (i = 0; i < alphaSize; i++)
if (length[i] == n) { code[i] = vec; vec++; };
vec <<= 1;
}
}
/*---------------------------------------------------*/
void BZ2_hbCreateDecodeTables ( Int32 *limit,
Int32 *base,
Int32 *perm,
UChar *length,
Int32 minLen,
Int32 maxLen,
Int32 alphaSize )
{
Int32 pp, i, j, vec;
pp = 0;
for (i = minLen; i <= maxLen; i++)
for (j = 0; j < alphaSize; j++)
if (length[j] == i) { perm[pp] = j; pp++; };
for (i = 0; i < BZ_MAX_CODE_LEN; i++) base[i] = 0;
for (i = 0; i < alphaSize; i++) base[length[i]+1]++;
for (i = 1; i < BZ_MAX_CODE_LEN; i++) base[i] += base[i-1];
for (i = 0; i < BZ_MAX_CODE_LEN; i++) limit[i] = 0;
vec = 0;
for (i = minLen; i <= maxLen; i++) {
vec += (base[i+1] - base[i]);
limit[i] = vec-1;
vec <<= 1;
}
for (i = minLen + 1; i <= maxLen; i++)
base[i] = ((limit[i-1] + 1) << 1) - base[i];
}
/*-------------------------------------------------------------*/
/*--- end huffman.c ---*/
/*-------------------------------------------------------------*/

View file

@ -0,0 +1,27 @@
LIBRARY LIBBZ2
DESCRIPTION "libbzip2: library for data compression"
EXPORTS
BZ2_bzCompressInit
BZ2_bzCompress
BZ2_bzCompressEnd
BZ2_bzDecompressInit
BZ2_bzDecompress
BZ2_bzDecompressEnd
BZ2_bzReadOpen
BZ2_bzReadClose
BZ2_bzReadGetUnused
BZ2_bzRead
BZ2_bzWriteOpen
BZ2_bzWrite
BZ2_bzWriteClose
BZ2_bzWriteClose64
BZ2_bzBuffToBuffCompress
BZ2_bzBuffToBuffDecompress
BZ2_bzlibVersion
BZ2_bzopen
BZ2_bzdopen
BZ2_bzread
BZ2_bzwrite
BZ2_bzflush
BZ2_bzclose
BZ2_bzerror

View file

@ -0,0 +1,130 @@
# Microsoft Developer Studio Project File - Name="libbz2" - Package Owner=<4>
# Microsoft Developer Studio Generated Build File, Format Version 5.00
# ** 編集しないでください **
# TARGTYPE "Win32 (x86) Dynamic-Link Library" 0x0102
CFG=libbz2 - Win32 Debug
!MESSAGE これは有効なメイクファイルではありません。 このプロジェクトをビルドするためには NMAKE を使用してください。
!MESSAGE [メイクファイルのエクスポート] コマンドを使用して実行してください
!MESSAGE
!MESSAGE NMAKE /f "libbz2.mak".
!MESSAGE
!MESSAGE NMAKE の実行時に構成を指定できます
!MESSAGE コマンド ライン上でマクロの設定を定義します。例:
!MESSAGE
!MESSAGE NMAKE /f "libbz2.mak" CFG="libbz2 - Win32 Debug"
!MESSAGE
!MESSAGE 選択可能なビルド モード:
!MESSAGE
!MESSAGE "libbz2 - Win32 Release" ("Win32 (x86) Dynamic-Link Library" 用)
!MESSAGE "libbz2 - Win32 Debug" ("Win32 (x86) Dynamic-Link Library" 用)
!MESSAGE
# Begin Project
# PROP Scc_ProjName ""
# PROP Scc_LocalPath ""
CPP=cl.exe
MTL=midl.exe
RSC=rc.exe
!IF "$(CFG)" == "libbz2 - Win32 Release"
# PROP BASE Use_MFC 0
# PROP BASE Use_Debug_Libraries 0
# PROP BASE Output_Dir "Release"
# PROP BASE Intermediate_Dir "Release"
# PROP BASE Target_Dir ""
# PROP Use_MFC 0
# PROP Use_Debug_Libraries 0
# PROP Output_Dir "Release"
# PROP Intermediate_Dir "Release"
# PROP Ignore_Export_Lib 0
# PROP Target_Dir ""
# ADD BASE CPP /nologo /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /YX /FD /c
# ADD CPP /nologo /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /YX /FD /c
# ADD BASE MTL /nologo /D "NDEBUG" /mktyplib203 /o NUL /win32
# ADD MTL /nologo /D "NDEBUG" /mktyplib203 /o NUL /win32
# ADD BASE RSC /l 0x411 /d "NDEBUG"
# ADD RSC /l 0x411 /d "NDEBUG"
BSC32=bscmake.exe
# ADD BASE BSC32 /nologo
# ADD BSC32 /nologo
LINK32=link.exe
# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /machine:I386
# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /machine:I386 /out:"libbz2.dll"
!ELSEIF "$(CFG)" == "libbz2 - Win32 Debug"
# PROP BASE Use_MFC 0
# PROP BASE Use_Debug_Libraries 1
# PROP BASE Output_Dir "Debug"
# PROP BASE Intermediate_Dir "Debug"
# PROP BASE Target_Dir ""
# PROP Use_MFC 0
# PROP Use_Debug_Libraries 1
# PROP Output_Dir "Debug"
# PROP Intermediate_Dir "Debug"
# PROP Ignore_Export_Lib 0
# PROP Target_Dir ""
# ADD BASE CPP /nologo /MTd /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /YX /FD /c
# ADD CPP /nologo /MTd /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /YX /FD /c
# ADD BASE MTL /nologo /D "_DEBUG" /mktyplib203 /o NUL /win32
# ADD MTL /nologo /D "_DEBUG" /mktyplib203 /o NUL /win32
# ADD BASE RSC /l 0x411 /d "_DEBUG"
# ADD RSC /l 0x411 /d "_DEBUG"
BSC32=bscmake.exe
# ADD BASE BSC32 /nologo
# ADD BSC32 /nologo
LINK32=link.exe
# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 /pdbtype:sept
# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 /out:"libbz2.dll" /pdbtype:sept
!ENDIF
# Begin Target
# Name "libbz2 - Win32 Release"
# Name "libbz2 - Win32 Debug"
# Begin Source File
SOURCE=.\blocksort.c
# End Source File
# Begin Source File
SOURCE=.\bzlib.c
# End Source File
# Begin Source File
SOURCE=.\bzlib.h
# End Source File
# Begin Source File
SOURCE=.\bzlib_private.h
# End Source File
# Begin Source File
SOURCE=.\compress.c
# End Source File
# Begin Source File
SOURCE=.\crctable.c
# End Source File
# Begin Source File
SOURCE=.\decompress.c
# End Source File
# Begin Source File
SOURCE=.\huffman.c
# End Source File
# Begin Source File
SOURCE=.\libbz2.def
# End Source File
# Begin Source File
SOURCE=.\randtable.c
# End Source File
# End Target
# End Project

View file

@ -0,0 +1,63 @@
# Makefile for Microsoft Visual C++ 6.0
# usage: nmake -f makefile.msc
# K.M. Syring (syring@gsf.de)
# Fixed up by JRS for bzip2-0.9.5d release.
CC=cl
CFLAGS= -DWIN32 -MD -Ox -D_FILE_OFFSET_BITS=64
OBJS= blocksort.obj \
huffman.obj \
crctable.obj \
randtable.obj \
compress.obj \
decompress.obj \
bzlib.obj
all: lib bzip2 test
bzip2: lib
$(CC) $(CFLAGS) -o bzip2 bzip2.c libbz2.lib setargv.obj
$(CC) $(CFLAGS) -o bzip2recover bzip2recover.c
lib: $(OBJS)
lib /out:libbz2.lib $(OBJS)
test: bzip2
type words1
.\\bzip2 -1 < sample1.ref > sample1.rb2
.\\bzip2 -2 < sample2.ref > sample2.rb2
.\\bzip2 -3 < sample3.ref > sample3.rb2
.\\bzip2 -d < sample1.bz2 > sample1.tst
.\\bzip2 -d < sample2.bz2 > sample2.tst
.\\bzip2 -ds < sample3.bz2 > sample3.tst
@echo All six of the fc's should find no differences.
@echo If fc finds an error on sample3.bz2, this could be
@echo because WinZip's 'TAR file smart CR/LF conversion'
@echo is too clever for its own good. Disable this option.
@echo The correct size for sample3.ref is 120,244. If it
@echo is 150,251, WinZip has messed it up.
fc sample1.bz2 sample1.rb2
fc sample2.bz2 sample2.rb2
fc sample3.bz2 sample3.rb2
fc sample1.tst sample1.ref
fc sample2.tst sample2.ref
fc sample3.tst sample3.ref
clean:
del *.obj
del libbz2.lib
del bzip2.exe
del bzip2recover.exe
del sample1.rb2
del sample2.rb2
del sample3.rb2
del sample1.tst
del sample2.tst
del sample3.tst
.c.obj:
$(CC) $(CFLAGS) -c $*.c -o $*.obj

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,47 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.54
from manual.texi on 23 March 2000 -->
<TITLE>bzip2 and libbzip2 - Introduction</TITLE>
<link href="manual_2.html" rel=Next>
<link href="manual_toc.html" rel=ToC>
</HEAD>
<BODY>
<p>Go to the first, previous, <A HREF="manual_2.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
<P><HR><P>
<H1><A NAME="SEC1" HREF="manual_toc.html#TOC1">Introduction</A></H1>
<P>
<CODE>bzip2</CODE> compresses files using the Burrows-Wheeler
block-sorting text compression algorithm, and Huffman coding.
Compression is generally considerably better than that
achieved by more conventional LZ77/LZ78-based compressors,
and approaches the performance of the PPM family of statistical compressors.
</P>
<P>
<CODE>bzip2</CODE> is built on top of <CODE>libbzip2</CODE>, a flexible library
for handling compressed data in the <CODE>bzip2</CODE> format. This manual
describes both how to use the program and
how to work with the library interface. Most of the
manual is devoted to this library, not the program,
which is good news if your interest is only in the program.
</P>
<P>
Chapter 2 describes how to use <CODE>bzip2</CODE>; this is the only part
you need to read if you just want to know how to operate the program.
Chapter 3 describes the programming interfaces in detail, and
Chapter 4 records some miscellaneous notes which I thought
ought to be recorded somewhere.
</P>
<P><HR><P>
<p>Go to the first, previous, <A HREF="manual_2.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
</BODY>
</HTML>

View file

@ -0,0 +1,484 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.54
from manual.texi on 23 March 2000 -->
<TITLE>bzip2 and libbzip2 - How to use bzip2</TITLE>
<link href="manual_3.html" rel=Next>
<link href="manual_1.html" rel=Previous>
<link href="manual_toc.html" rel=ToC>
</HEAD>
<BODY>
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_1.html">previous</A>, <A HREF="manual_3.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
<P><HR><P>
<H1><A NAME="SEC2" HREF="manual_toc.html#TOC2">How to use <CODE>bzip2</CODE></A></H1>
<P>
This chapter contains a copy of the <CODE>bzip2</CODE> man page,
and nothing else.
</P>
<BLOCKQUOTE>
<H4><A NAME="SEC3" HREF="manual_toc.html#TOC3">NAME</A></H4>
<UL>
<LI><CODE>bzip2</CODE>, <CODE>bunzip2</CODE>
- a block-sorting file compressor, v1.0
<LI><CODE>bzcat</CODE>
- decompresses files to stdout
<LI><CODE>bzip2recover</CODE>
- recovers data from damaged bzip2 files
</UL>
<H4><A NAME="SEC4" HREF="manual_toc.html#TOC4">SYNOPSIS</A></H4>
<UL>
<LI><CODE>bzip2</CODE> [ -cdfkqstvzVL123456789 ] [ filenames ... ]
<LI><CODE>bunzip2</CODE> [ -fkvsVL ] [ filenames ... ]
<LI><CODE>bzcat</CODE> [ -s ] [ filenames ... ]
<LI><CODE>bzip2recover</CODE> filename
</UL>
<H4><A NAME="SEC5" HREF="manual_toc.html#TOC5">DESCRIPTION</A></H4>
<P>
<CODE>bzip2</CODE> compresses files using the Burrows-Wheeler block sorting
text compression algorithm, and Huffman coding. Compression is
generally considerably better than that achieved by more conventional
LZ77/LZ78-based compressors, and approaches the performance of the PPM
family of statistical compressors.
</P>
<P>
The command-line options are deliberately very similar to those of GNU
<CODE>gzip</CODE>, but they are not identical.
</P>
<P>
<CODE>bzip2</CODE> expects a list of file names to accompany the command-line
flags. Each file is replaced by a compressed version of itself, with
the name <CODE>original_name.bz2</CODE>. Each compressed file has the same
modification date, permissions, and, when possible, ownership as the
corresponding original, so that these properties can be correctly
restored at decompression time. File name handling is naive in the
sense that there is no mechanism for preserving original file names,
permissions, ownerships or dates in filesystems which lack these
concepts, or have serious file name length restrictions, such as MS-DOS.
</P>
<P>
<CODE>bzip2</CODE> and <CODE>bunzip2</CODE> will by default not overwrite existing
files. If you want this to happen, specify the <CODE>-f</CODE> flag.
</P>
<P>
If no file names are specified, <CODE>bzip2</CODE> compresses from standard
input to standard output. In this case, <CODE>bzip2</CODE> will decline to
write compressed output to a terminal, as this would be entirely
incomprehensible and therefore pointless.
</P>
<P>
<CODE>bunzip2</CODE> (or <CODE>bzip2 -d</CODE>) decompresses all
specified files. Files which were not created by <CODE>bzip2</CODE>
will be detected and ignored, and a warning issued.
<CODE>bzip2</CODE> attempts to guess the filename for the decompressed file
from that of the compressed file as follows:
<UL>
<LI><CODE>filename.bz2 </CODE> becomes <CODE>filename</CODE>
<LI><CODE>filename.bz </CODE> becomes <CODE>filename</CODE>
<LI><CODE>filename.tbz2</CODE> becomes <CODE>filename.tar</CODE>
<LI><CODE>filename.tbz </CODE> becomes <CODE>filename.tar</CODE>
<LI><CODE>anyothername </CODE> becomes <CODE>anyothername.out</CODE>
</UL>
<P>
If the file does not end in one of the recognised endings,
<CODE>.bz2</CODE>, <CODE>.bz</CODE>,
<CODE>.tbz2</CODE> or <CODE>.tbz</CODE>, <CODE>bzip2</CODE> complains that it cannot
guess the name of the original file, and uses the original name
with <CODE>.out</CODE> appended.
</P>
<P>
As with compression, supplying no
filenames causes decompression from standard input to standard output.
</P>
<P>
<CODE>bunzip2</CODE> will correctly decompress a file which is the
concatenation of two or more compressed files. The result is the
concatenation of the corresponding uncompressed files. Integrity
testing (<CODE>-t</CODE>) of concatenated compressed files is also supported.
</P>
<P>
You can also compress or decompress files to the standard output by
giving the <CODE>-c</CODE> flag. Multiple files may be compressed and
decompressed like this. The resulting outputs are fed sequentially to
stdout. Compression of multiple files in this manner generates a stream
containing multiple compressed file representations. Such a stream
can be decompressed correctly only by <CODE>bzip2</CODE> version 0.9.0 or
later. Earlier versions of <CODE>bzip2</CODE> will stop after decompressing
the first file in the stream.
</P>
<P>
<CODE>bzcat</CODE> (or <CODE>bzip2 -dc</CODE>) decompresses all specified files to
the standard output.
</P>
<P>
<CODE>bzip2</CODE> will read arguments from the environment variables
<CODE>BZIP2</CODE> and <CODE>BZIP</CODE>, in that order, and will process them
before any arguments read from the command line. This gives a
convenient way to supply default arguments.
</P>
<P>
Compression is always performed, even if the compressed file is slightly
larger than the original. Files of less than about one hundred bytes
tend to get larger, since the compression mechanism has a constant
overhead in the region of 50 bytes. Random data (including the output
of most file compressors) is coded at about 8.05 bits per byte, giving
an expansion of around 0.5%.
</P>
<P>
As a self-check for your protection, <CODE>bzip2</CODE> uses 32-bit CRCs to
make sure that the decompressed version of a file is identical to the
original. This guards against corruption of the compressed data, and
against undetected bugs in <CODE>bzip2</CODE> (hopefully very unlikely). The
chances of data corruption going undetected is microscopic, about one
chance in four billion for each file processed. Be aware, though, that
the check occurs upon decompression, so it can only tell you that
something is wrong. It can't help you recover the original uncompressed
data. You can use <CODE>bzip2recover</CODE> to try to recover data from
damaged files.
</P>
<P>
Return values: 0 for a normal exit, 1 for environmental problems (file
not found, invalid flags, I/O errors, &#38;c), 2 to indicate a corrupt
compressed file, 3 for an internal consistency error (eg, bug) which
caused <CODE>bzip2</CODE> to panic.
</P>
<H4><A NAME="SEC6" HREF="manual_toc.html#TOC6">OPTIONS</A></H4>
<DL COMPACT>
<DT><CODE>-c --stdout</CODE>
<DD>
Compress or decompress to standard output.
<DT><CODE>-d --decompress</CODE>
<DD>
Force decompression. <CODE>bzip2</CODE>, <CODE>bunzip2</CODE> and <CODE>bzcat</CODE> are
really the same program, and the decision about what actions to take is
done on the basis of which name is used. This flag overrides that
mechanism, and forces bzip2 to decompress.
<DT><CODE>-z --compress</CODE>
<DD>
The complement to <CODE>-d</CODE>: forces compression, regardless of the
invokation name.
<DT><CODE>-t --test</CODE>
<DD>
Check integrity of the specified file(s), but don't decompress them.
This really performs a trial decompression and throws away the result.
<DT><CODE>-f --force</CODE>
<DD>
Force overwrite of output files. Normally, <CODE>bzip2</CODE> will not overwrite
existing output files. Also forces <CODE>bzip2</CODE> to break hard links
to files, which it otherwise wouldn't do.
<DT><CODE>-k --keep</CODE>
<DD>
Keep (don't delete) input files during compression
or decompression.
<DT><CODE>-s --small</CODE>
<DD>
Reduce memory usage, for compression, decompression and testing. Files
are decompressed and tested using a modified algorithm which only
requires 2.5 bytes per block byte. This means any file can be
decompressed in 2300k of memory, albeit at about half the normal speed.
During compression, <CODE>-s</CODE> selects a block size of 200k, which limits
memory use to around the same figure, at the expense of your compression
ratio. In short, if your machine is low on memory (8 megabytes or
less), use -s for everything. See MEMORY MANAGEMENT below.
<DT><CODE>-q --quiet</CODE>
<DD>
Suppress non-essential warning messages. Messages pertaining to
I/O errors and other critical events will not be suppressed.
<DT><CODE>-v --verbose</CODE>
<DD>
Verbose mode -- show the compression ratio for each file processed.
Further <CODE>-v</CODE>'s increase the verbosity level, spewing out lots of
information which is primarily of interest for diagnostic purposes.
<DT><CODE>-L --license -V --version</CODE>
<DD>
Display the software version, license terms and conditions.
<DT><CODE>-1 to -9</CODE>
<DD>
Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
effect when decompressing. See MEMORY MANAGEMENT below.
<DT><CODE>--</CODE>
<DD>
Treats all subsequent arguments as file names, even if they start
with a dash. This is so you can handle files with names beginning
with a dash, for example: <CODE>bzip2 -- -myfilename</CODE>.
<DT><CODE>--repetitive-fast</CODE>
<DD>
<DT><CODE>--repetitive-best</CODE>
<DD>
These flags are redundant in versions 0.9.5 and above. They provided
some coarse control over the behaviour of the sorting algorithm in
earlier versions, which was sometimes useful. 0.9.5 and above have an
improved algorithm which renders these flags irrelevant.
</DL>
<H4><A NAME="SEC7" HREF="manual_toc.html#TOC7">MEMORY MANAGEMENT</A></H4>
<P>
<CODE>bzip2</CODE> compresses large files in blocks. The block size affects
both the compression ratio achieved, and the amount of memory needed for
compression and decompression. The flags <CODE>-1</CODE> through <CODE>-9</CODE>
specify the block size to be 100,000 bytes through 900,000 bytes (the
default) respectively. At decompression time, the block size used for
compression is read from the header of the compressed file, and
<CODE>bunzip2</CODE> then allocates itself just enough memory to decompress
the file. Since block sizes are stored in compressed files, it follows
that the flags <CODE>-1</CODE> to <CODE>-9</CODE> are irrelevant to and so ignored
during decompression.
</P>
<P>
Compression and decompression requirements, in bytes, can be estimated
as:
<PRE>
Compression: 400k + ( 8 x block size )
Decompression: 100k + ( 4 x block size ), or
100k + ( 2.5 x block size )
</PRE>
<P>
Larger block sizes give rapidly diminishing marginal returns. Most of
the compression comes from the first two or three hundred k of block
size, a fact worth bearing in mind when using <CODE>bzip2</CODE> on small machines.
It is also important to appreciate that the decompression memory
requirement is set at compression time by the choice of block size.
</P>
<P>
For files compressed with the default 900k block size, <CODE>bunzip2</CODE>
will require about 3700 kbytes to decompress. To support decompression
of any file on a 4 megabyte machine, <CODE>bunzip2</CODE> has an option to
decompress using approximately half this amount of memory, about 2300
kbytes. Decompression speed is also halved, so you should use this
option only where necessary. The relevant flag is <CODE>-s</CODE>.
</P>
<P>
In general, try and use the largest block size memory constraints allow,
since that maximises the compression achieved. Compression and
decompression speed are virtually unaffected by block size.
</P>
<P>
Another significant point applies to files which fit in a single block
-- that means most files you'd encounter using a large block size. The
amount of real memory touched is proportional to the size of the file,
since the file is smaller than a block. For example, compressing a file
20,000 bytes long with the flag <CODE>-9</CODE> will cause the compressor to
allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
kbytes of it. Similarly, the decompressor will allocate 3700k but only
touch 100k + 20000 * 4 = 180 kbytes.
</P>
<P>
Here is a table which summarises the maximum memory usage for different
block sizes. Also recorded is the total compressed size for 14 files of
the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
column gives some feel for how compression varies with block size.
These figures tend to understate the advantage of larger block sizes for
larger files, since the Corpus is dominated by smaller files.
<PRE>
Compress Decompress Decompress Corpus
Flag usage usage -s usage Size
-1 1200k 500k 350k 914704
-2 2000k 900k 600k 877703
-3 2800k 1300k 850k 860338
-4 3600k 1700k 1100k 846899
-5 4400k 2100k 1350k 845160
-6 5200k 2500k 1600k 838626
-7 6100k 2900k 1850k 834096
-8 6800k 3300k 2100k 828642
-9 7600k 3700k 2350k 828642
</PRE>
<H4><A NAME="SEC8" HREF="manual_toc.html#TOC8">RECOVERING DATA FROM DAMAGED FILES</A></H4>
<P>
<CODE>bzip2</CODE> compresses files in blocks, usually 900kbytes long. Each
block is handled independently. If a media or transmission error causes
a multi-block <CODE>.bz2</CODE> file to become damaged, it may be possible to
recover data from the undamaged blocks in the file.
</P>
<P>
The compressed representation of each block is delimited by a 48-bit
pattern, which makes it possible to find the block boundaries with
reasonable certainty. Each block also carries its own 32-bit CRC, so
damaged blocks can be distinguished from undamaged ones.
</P>
<P>
<CODE>bzip2recover</CODE> is a simple program whose purpose is to search for
blocks in <CODE>.bz2</CODE> files, and write each block out into its own
<CODE>.bz2</CODE> file. You can then use <CODE>bzip2 -t</CODE> to test the
integrity of the resulting files, and decompress those which are
undamaged.
</P>
<P>
<CODE>bzip2recover</CODE>
takes a single argument, the name of the damaged file,
and writes a number of files <CODE>rec0001file.bz2</CODE>,
<CODE>rec0002file.bz2</CODE>, etc, containing the extracted blocks.
The output filenames are designed so that the use of
wildcards in subsequent processing -- for example,
<CODE>bzip2 -dc rec*file.bz2 &#62; recovered_data</CODE> -- lists the files in
the correct order.
</P>
<P>
<CODE>bzip2recover</CODE> should be of most use dealing with large <CODE>.bz2</CODE>
files, as these will contain many blocks. It is clearly
futile to use it on damaged single-block files, since a
damaged block cannot be recovered. If you wish to minimise
any potential data loss through media or transmission errors,
you might consider compressing with a smaller
block size.
</P>
<H4><A NAME="SEC9" HREF="manual_toc.html#TOC9">PERFORMANCE NOTES</A></H4>
<P>
The sorting phase of compression gathers together similar strings in the
file. Because of this, files containing very long runs of repeated
symbols, like "aabaabaabaab ..." (repeated several hundred times) may
compress more slowly than normal. Versions 0.9.5 and above fare much
better than previous versions in this respect. The ratio between
worst-case and average-case compression time is in the region of 10:1.
For previous versions, this figure was more like 100:1. You can use the
<CODE>-vvvv</CODE> option to monitor progress in great detail, if you want.
</P>
<P>
Decompression speed is unaffected by these phenomena.
</P>
<P>
<CODE>bzip2</CODE> usually allocates several megabytes of memory to operate
in, and then charges all over it in a fairly random fashion. This means
that performance, both for compressing and decompressing, is largely
determined by the speed at which your machine can service cache misses.
Because of this, small changes to the code to reduce the miss rate have
been observed to give disproportionately large performance improvements.
I imagine <CODE>bzip2</CODE> will perform best on machines with very large
caches.
</P>
<H4><A NAME="SEC10" HREF="manual_toc.html#TOC10">CAVEATS</A></H4>
<P>
I/O error messages are not as helpful as they could be. <CODE>bzip2</CODE>
tries hard to detect I/O errors and exit cleanly, but the details of
what the problem is sometimes seem rather misleading.
</P>
<P>
This manual page pertains to version 1.0 of <CODE>bzip2</CODE>. Compressed
data created by this version is entirely forwards and backwards
compatible with the previous public releases, versions 0.1pl2, 0.9.0 and
0.9.5, but with the following exception: 0.9.0 and above can correctly
decompress multiple concatenated compressed files. 0.1pl2 cannot do
this; it will stop after decompressing just the first file in the
stream.
</P>
<P>
<CODE>bzip2recover</CODE> uses 32-bit integers to represent bit positions in
compressed files, so it cannot handle compressed files more than 512
megabytes long. This could easily be fixed.
</P>
<H4><A NAME="SEC11" HREF="manual_toc.html#TOC11">AUTHOR</A></H4>
<P>
Julian Seward, <CODE>jseward@acm.org</CODE>.
</P>
<P>
The ideas embodied in <CODE>bzip2</CODE> are due to (at least) the following
people: Michael Burrows and David Wheeler (for the block sorting
transformation), David Wheeler (again, for the Huffman coder), Peter
Fenwick (for the structured coding model in the original <CODE>bzip</CODE>,
and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
(for the arithmetic coder in the original <CODE>bzip</CODE>). I am much
indebted for their help, support and advice. See the manual in the
source distribution for pointers to sources of documentation. Christian
von Roques encouraged me to look for faster sorting algorithms, so as to
speed up compression. Bela Lubkin encouraged me to improve the
worst-case compression performance. Many people sent patches, helped
with portability problems, lent machines, gave advice and were generally
helpful.
</P>
</BLOCKQUOTE>
<P><HR><P>
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_1.html">previous</A>, <A HREF="manual_3.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
</BODY>
</HTML>

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,528 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.54
from manual.texi on 23 March 2000 -->
<TITLE>bzip2 and libbzip2 - Miscellanea</TITLE>
<link href="manual_3.html" rel=Previous>
<link href="manual_toc.html" rel=ToC>
</HEAD>
<BODY>
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_3.html">previous</A>, next, last section, <A HREF="manual_toc.html">table of contents</A>.
<P><HR><P>
<H1><A NAME="SEC43" HREF="manual_toc.html#TOC43">Miscellanea</A></H1>
<P>
These are just some random thoughts of mine. Your mileage may
vary.
</P>
<H2><A NAME="SEC44" HREF="manual_toc.html#TOC44">Limitations of the compressed file format</A></H2>
<P>
<CODE>bzip2-1.0</CODE>, <CODE>0.9.5</CODE> and <CODE>0.9.0</CODE>
use exactly the same file format as the previous
version, <CODE>bzip2-0.1</CODE>. This decision was made in the interests of
stability. Creating yet another incompatible compressed file format
would create further confusion and disruption for users.
</P>
<P>
Nevertheless, this is not a painless decision. Development
work since the release of <CODE>bzip2-0.1</CODE> in August 1997
has shown complexities in the file format which slow down
decompression and, in retrospect, are unnecessary. These are:
<UL>
<LI>The run-length encoder, which is the first of the
compression transformations, is entirely irrelevant.
The original purpose was to protect the sorting algorithm
from the very worst case input: a string of repeated
symbols. But algorithm steps Q6a and Q6b in the original
Burrows-Wheeler technical report (SRC-124) show how
repeats can be handled without difficulty in block
sorting.
<LI>The randomisation mechanism doesn't really need to be
there. Udi Manber and Gene Myers published a suffix
array construction algorithm a few years back, which
can be employed to sort any block, no matter how
repetitive, in O(N log N) time. Subsequent work by
Kunihiko Sadakane has produced a derivative O(N (log N)^2)
algorithm which usually outperforms the Manber-Myers
algorithm.
I could have changed to Sadakane's algorithm, but I find
it to be slower than <CODE>bzip2</CODE>'s existing algorithm for
most inputs, and the randomisation mechanism protects
adequately against bad cases. I didn't think it was
a good tradeoff to make. Partly this is due to the fact
that I was not flooded with email complaints about
<CODE>bzip2-0.1</CODE>'s performance on repetitive data, so
perhaps it isn't a problem for real inputs.
Probably the best long-term solution,
and the one I have incorporated into 0.9.5 and above,
is to use the existing sorting
algorithm initially, and fall back to a O(N (log N)^2)
algorithm if the standard algorithm gets into difficulties.
<LI>The compressed file format was never designed to be
handled by a library, and I have had to jump though
some hoops to produce an efficient implementation of
decompression. It's a bit hairy. Try passing
<CODE>decompress.c</CODE> through the C preprocessor
and you'll see what I mean. Much of this complexity
could have been avoided if the compressed size of
each block of data was recorded in the data stream.
<LI>An Adler-32 checksum, rather than a CRC32 checksum,
would be faster to compute.
</UL>
<P>
It would be fair to say that the <CODE>bzip2</CODE> format was frozen
before I properly and fully understood the performance
consequences of doing so.
</P>
<P>
Improvements which I was able to incorporate into
0.9.0, despite using the same file format, are:
<UL>
<LI>Single array implementation of the inverse BWT. This
significantly speeds up decompression, presumably
because it reduces the number of cache misses.
<LI>Faster inverse MTF transform for large MTF values. The
new implementation is based on the notion of sliding blocks
of values.
<LI><CODE>bzip2-0.9.0</CODE> now reads and writes files with <CODE>fread</CODE>
and <CODE>fwrite</CODE>; version 0.1 used <CODE>putc</CODE> and <CODE>getc</CODE>.
Duh! Well, you live and learn.
</UL>
<P>
Further ahead, it would be nice
to be able to do random access into files. This will
require some careful design of compressed file formats.
</P>
<H2><A NAME="SEC45" HREF="manual_toc.html#TOC45">Portability issues</A></H2>
<P>
After some consideration, I have decided not to use
GNU <CODE>autoconf</CODE> to configure 0.9.5 or 1.0.
</P>
<P>
<CODE>autoconf</CODE>, admirable and wonderful though it is,
mainly assists with portability problems between Unix-like
platforms. But <CODE>bzip2</CODE> doesn't have much in the way
of portability problems on Unix; most of the difficulties appear
when porting to the Mac, or to Microsoft's operating systems.
<CODE>autoconf</CODE> doesn't help in those cases, and brings in a
whole load of new complexity.
</P>
<P>
Most people should be able to compile the library and program
under Unix straight out-of-the-box, so to speak, especially
if you have a version of GNU C available.
</P>
<P>
There are a couple of <CODE>__inline__</CODE> directives in the code. GNU C
(<CODE>gcc</CODE>) should be able to handle them. If you're not using
GNU C, your C compiler shouldn't see them at all.
If your compiler does, for some reason, see them and doesn't
like them, just <CODE>#define</CODE> <CODE>__inline__</CODE> to be <CODE>/* */</CODE>. One
easy way to do this is to compile with the flag <CODE>-D__inline__=</CODE>,
which should be understood by most Unix compilers.
</P>
<P>
If you still have difficulties, try compiling with the macro
<CODE>BZ_STRICT_ANSI</CODE> defined. This should enable you to build the
library in a strictly ANSI compliant environment. Building the program
itself like this is dangerous and not supported, since you remove
<CODE>bzip2</CODE>'s checks against compressing directories, symbolic links,
devices, and other not-really-a-file entities. This could cause
filesystem corruption!
</P>
<P>
One other thing: if you create a <CODE>bzip2</CODE> binary for public
distribution, please try and link it statically (<CODE>gcc -s</CODE>). This
avoids all sorts of library-version issues that others may encounter
later on.
</P>
<P>
If you build <CODE>bzip2</CODE> on Win32, you must set <CODE>BZ_UNIX</CODE> to 0 and
<CODE>BZ_LCCWIN32</CODE> to 1, in the file <CODE>bzip2.c</CODE>, before compiling.
Otherwise the resulting binary won't work correctly.
</P>
<H2><A NAME="SEC46" HREF="manual_toc.html#TOC46">Reporting bugs</A></H2>
<P>
I tried pretty hard to make sure <CODE>bzip2</CODE> is
bug free, both by design and by testing. Hopefully
you'll never need to read this section for real.
</P>
<P>
Nevertheless, if <CODE>bzip2</CODE> dies with a segmentation
fault, a bus error or an internal assertion failure, it
will ask you to email me a bug report. Experience with
version 0.1 shows that almost all these problems can
be traced to either compiler bugs or hardware problems.
<UL>
<LI>
Recompile the program with no optimisation, and see if it
works. And/or try a different compiler.
I heard all sorts of stories about various flavours
of GNU C (and other compilers) generating bad code for
<CODE>bzip2</CODE>, and I've run across two such examples myself.
2.7.X versions of GNU C are known to generate bad code from
time to time, at high optimisation levels.
If you get problems, try using the flags
<CODE>-O2</CODE> <CODE>-fomit-frame-pointer</CODE> <CODE>-fno-strength-reduce</CODE>.
You should specifically <EM>not</EM> use <CODE>-funroll-loops</CODE>.
You may notice that the Makefile runs six tests as part of
the build process. If the program passes all of these, it's
a pretty good (but not 100%) indication that the compiler has
done its job correctly.
<LI>
If <CODE>bzip2</CODE> crashes randomly, and the crashes are not
repeatable, you may have a flaky memory subsystem. <CODE>bzip2</CODE>
really hammers your memory hierarchy, and if it's a bit marginal,
you may get these problems. Ditto if your disk or I/O subsystem
is slowly failing. Yup, this really does happen.
Try using a different machine of the same type, and see if
you can repeat the problem.
<LI>This isn't really a bug, but ... If <CODE>bzip2</CODE> tells
you your file is corrupted on decompression, and you
obtained the file via FTP, there is a possibility that you
forgot to tell FTP to do a binary mode transfer. That absolutely
will cause the file to be non-decompressible. You'll have to transfer
it again.
</UL>
<P>
If you've incorporated <CODE>libbzip2</CODE> into your own program
and are getting problems, please, please, please, check that the
parameters you are passing in calls to the library, are
correct, and in accordance with what the documentation says
is allowable. I have tried to make the library robust against
such problems, but I'm sure I haven't succeeded.
</P>
<P>
Finally, if the above comments don't help, you'll have to send
me a bug report. Now, it's just amazing how many people will
send me a bug report saying something like
<PRE>
bzip2 crashed with segmentation fault on my machine
</PRE>
<P>
and absolutely nothing else. Needless to say, a such a report
is <EM>totally, utterly, completely and comprehensively 100% useless;
a waste of your time, my time, and net bandwidth</EM>.
With no details at all, there's no way I can possibly begin
to figure out what the problem is.
</P>
<P>
The rules of the game are: facts, facts, facts. Don't omit
them because "oh, they won't be relevant". At the bare
minimum:
<PRE>
Machine type. Operating system version.
Exact version of <CODE>bzip2</CODE> (do <CODE>bzip2 -V</CODE>).
Exact version of the compiler used.
Flags passed to the compiler.
</PRE>
<P>
However, the most important single thing that will help me is
the file that you were trying to compress or decompress at the
time the problem happened. Without that, my ability to do anything
more than speculate about the cause, is limited.
</P>
<P>
Please remember that I connect to the Internet with a modem, so
you should contact me before mailing me huge files.
</P>
<H2><A NAME="SEC47" HREF="manual_toc.html#TOC47">Did you get the right package?</A></H2>
<P>
<CODE>bzip2</CODE> is a resource hog. It soaks up large amounts of CPU cycles
and memory. Also, it gives very large latencies. In the worst case, you
can feed many megabytes of uncompressed data into the library before
getting any compressed output, so this probably rules out applications
requiring interactive behaviour.
</P>
<P>
These aren't faults of my implementation, I hope, but more
an intrinsic property of the Burrows-Wheeler transform (unfortunately).
Maybe this isn't what you want.
</P>
<P>
If you want a compressor and/or library which is faster, uses less
memory but gets pretty good compression, and has minimal latency,
consider Jean-loup
Gailly's and Mark Adler's work, <CODE>zlib-1.1.2</CODE> and
<CODE>gzip-1.2.4</CODE>. Look for them at
</P>
<P>
<CODE>http://www.cdrom.com/pub/infozip/zlib</CODE> and
<CODE>http://www.gzip.org</CODE> respectively.
</P>
<P>
For something faster and lighter still, you might try Markus F X J
Oberhumer's <CODE>LZO</CODE> real-time compression/decompression library, at
<BR> <CODE>http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html</CODE>.
</P>
<P>
If you want to use the <CODE>bzip2</CODE> algorithms to compress small blocks
of data, 64k bytes or smaller, for example on an on-the-fly disk
compressor, you'd be well advised not to use this library. Instead,
I've made a special library tuned for that kind of use. It's part of
<CODE>e2compr-0.40</CODE>, an on-the-fly disk compressor for the Linux
<CODE>ext2</CODE> filesystem. Look at
<CODE>http://www.netspace.net.au/~reiter/e2compr</CODE>.
</P>
<H2><A NAME="SEC48" HREF="manual_toc.html#TOC48">Testing</A></H2>
<P>
A record of the tests I've done.
</P>
<P>
First, some data sets:
<UL>
<LI>B: a directory containing 6001 files, one for every length in the
range 0 to 6000 bytes. The files contain random lowercase
letters. 18.7 megabytes.
<LI>H: my home directory tree. Documents, source code, mail files,
compressed data. H contains B, and also a directory of
files designed as boundary cases for the sorting; mostly very
repetitive, nasty files. 565 megabytes.
<LI>A: directory tree holding various applications built from source:
<CODE>egcs</CODE>, <CODE>gcc-2.8.1</CODE>, KDE, GTK, Octave, etc.
2200 megabytes.
</UL>
<P>
The tests conducted are as follows. Each test means compressing
(a copy of) each file in the data set, decompressing it and
comparing it against the original.
</P>
<P>
First, a bunch of tests with block sizes and internal buffer
sizes set very small,
to detect any problems with the
blocking and buffering mechanisms.
This required modifying the source code so as to try to
break it.
<OL>
<LI>Data set H, with
buffer size of 1 byte, and block size of 23 bytes.
<LI>Data set B, buffer sizes 1 byte, block size 1 byte.
<LI>As (2) but small-mode decompression.
<LI>As (2) with block size 2 bytes.
<LI>As (2) with block size 3 bytes.
<LI>As (2) with block size 4 bytes.
<LI>As (2) with block size 5 bytes.
<LI>As (2) with block size 6 bytes and small-mode decompression.
<LI>H with buffer size of 1 byte, but normal block
size (up to 900000 bytes).
</OL>
<P>
Then some tests with unmodified source code.
<OL>
<LI>H, all settings normal.
<LI>As (1), with small-mode decompress.
<LI>H, compress with flag <CODE>-1</CODE>.
<LI>H, compress with flag <CODE>-s</CODE>, decompress with flag <CODE>-s</CODE>.
<LI>Forwards compatibility: H, <CODE>bzip2-0.1pl2</CODE> compressing,
<CODE>bzip2-0.9.5</CODE> decompressing, all settings normal.
<LI>Backwards compatibility: H, <CODE>bzip2-0.9.5</CODE> compressing,
<CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.
<LI>Bigger tests: A, all settings normal.
<LI>As (7), using the fallback (Sadakane-like) sorting algorithm.
<LI>As (8), compress with flag <CODE>-1</CODE>, decompress with flag
<CODE>-s</CODE>.
<LI>H, using the fallback sorting algorithm.
<LI>Forwards compatibility: A, <CODE>bzip2-0.1pl2</CODE> compressing,
<CODE>bzip2-0.9.5</CODE> decompressing, all settings normal.
<LI>Backwards compatibility: A, <CODE>bzip2-0.9.5</CODE> compressing,
<CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.
<LI>Misc test: about 400 megabytes of <CODE>.tar</CODE> files with
<CODE>bzip2</CODE> compiled with Checker (a memory access error
detector, like Purify).
<LI>Misc tests to make sure it builds and runs ok on non-Linux/x86
platforms.
</OL>
<P>
These tests were conducted on a 225 MHz IDT WinChip machine, running
Linux 2.0.36. They represent nearly a week of continuous computation.
All tests completed successfully.
</P>
<H2><A NAME="SEC49" HREF="manual_toc.html#TOC49">Further reading</A></H2>
<P>
<CODE>bzip2</CODE> is not research work, in the sense that it doesn't present
any new ideas. Rather, it's an engineering exercise based on existing
ideas.
</P>
<P>
Four documents describe essentially all the ideas behind <CODE>bzip2</CODE>:
<PRE>
Michael Burrows and D. J. Wheeler:
"A block-sorting lossless data compression algorithm"
10th May 1994.
Digital SRC Research Report 124.
ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
If you have trouble finding it, try searching at the
New Zealand Digital Library, http://www.nzdl.org.
Daniel S. Hirschberg and Debra A. LeLewer
"Efficient Decoding of Prefix Codes"
Communications of the ACM, April 1990, Vol 33, Number 4.
You might be able to get an electronic copy of this
from the ACM Digital Library.
David J. Wheeler
Program bred3.c and accompanying document bred3.ps.
This contains the idea behind the multi-table Huffman
coding scheme.
ftp://ftp.cl.cam.ac.uk/users/djw3/
Jon L. Bentley and Robert Sedgewick
"Fast Algorithms for Sorting and Searching Strings"
Available from Sedgewick's web page,
www.cs.princeton.edu/~rs
</PRE>
<P>
The following paper gives valuable additional insights into the
algorithm, but is not immediately the basis of any code
used in bzip2.
<PRE>
Peter Fenwick:
Block Sorting Text Compression
Proceedings of the 19th Australasian Computer Science Conference,
Melbourne, Australia. Jan 31 - Feb 2, 1996.
ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
</PRE>
<P>
Kunihiko Sadakane's sorting algorithm, mentioned above,
is available from:
<PRE>
http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
</PRE>
<P>
The Manber-Myers suffix array construction
algorithm is described in a paper
available from:
<PRE>
http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
</PRE>
<P>
Finally, the following paper documents some recent investigations
I made into the performance of sorting algorithms:
<PRE>
Julian Seward:
On the Performance of BWT Sorting Algorithms
Proceedings of the IEEE Data Compression Conference 2000
Snowbird, Utah. 28-30 March 2000.
</PRE>
<P><HR><P>
<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_3.html">previous</A>, next, last section, <A HREF="manual_toc.html">table of contents</A>.
</BODY>
</HTML>

View file

@ -0,0 +1,173 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.54
from manual.texi on 23 March 2000 -->
<TITLE>bzip2 and libbzip2 - Table of Contents</TITLE>
</HEAD>
<BODY>
<H1>bzip2 and libbzip2</H1>
<H2>a program and library for data compression</H2>
<H2>copyright (C) 1996-2000 Julian Seward</H2>
<H2>version 1.0 of 21 March 2000</H2>
<ADDRESS>Julian Seward</ADDRESS>
<P>
<P><HR><P>
<P>
This program, <CODE>bzip2</CODE>,
and associated library <CODE>libbzip2</CODE>, are
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
</P>
<P>
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
<UL>
<LI>
Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
<LI>
The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
<LI>
Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
<LI>
The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
</UL>
<P>
THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</P>
<P>
Julian Seward, Cambridge, UK.
</P>
<P>
<CODE>jseward@acm.org</CODE>
</P>
<P>
<CODE>http://sourceware.cygnus.com/bzip2</CODE>
</P>
<P>
<CODE>http://www.cacheprof.org</CODE>
</P>
<P>
<CODE>http://www.muraroa.demon.co.uk</CODE>
</P>
<P>
<CODE>bzip2</CODE>/<CODE>libbzip2</CODE> version 1.0 of 21 March 2000.
</P>
<P>
PATENTS: To the best of my knowledge, <CODE>bzip2</CODE> does not use any patented
algorithms. However, I do not have the resources available to carry out
a full patent search. Therefore I cannot give any guarantee of the
above statement.
</P>
<UL>
<LI><A NAME="TOC1" HREF="manual_1.html#SEC1">Introduction</A>
<LI><A NAME="TOC2" HREF="manual_2.html#SEC2">How to use <CODE>bzip2</CODE></A>
<UL>
<UL>
<UL>
<LI><A NAME="TOC3" HREF="manual_2.html#SEC3">NAME</A>
<LI><A NAME="TOC4" HREF="manual_2.html#SEC4">SYNOPSIS</A>
<LI><A NAME="TOC5" HREF="manual_2.html#SEC5">DESCRIPTION</A>
<LI><A NAME="TOC6" HREF="manual_2.html#SEC6">OPTIONS</A>
<LI><A NAME="TOC7" HREF="manual_2.html#SEC7">MEMORY MANAGEMENT</A>
<LI><A NAME="TOC8" HREF="manual_2.html#SEC8">RECOVERING DATA FROM DAMAGED FILES</A>
<LI><A NAME="TOC9" HREF="manual_2.html#SEC9">PERFORMANCE NOTES</A>
<LI><A NAME="TOC10" HREF="manual_2.html#SEC10">CAVEATS</A>
<LI><A NAME="TOC11" HREF="manual_2.html#SEC11">AUTHOR</A>
</UL>
</UL>
</UL>
<LI><A NAME="TOC12" HREF="manual_3.html#SEC12">Programming with <CODE>libbzip2</CODE></A>
<UL>
<LI><A NAME="TOC13" HREF="manual_3.html#SEC13">Top-level structure</A>
<UL>
<LI><A NAME="TOC14" HREF="manual_3.html#SEC14">Low-level summary</A>
<LI><A NAME="TOC15" HREF="manual_3.html#SEC15">High-level summary</A>
<LI><A NAME="TOC16" HREF="manual_3.html#SEC16">Utility functions summary</A>
</UL>
<LI><A NAME="TOC17" HREF="manual_3.html#SEC17">Error handling</A>
<LI><A NAME="TOC18" HREF="manual_3.html#SEC18">Low-level interface</A>
<UL>
<LI><A NAME="TOC19" HREF="manual_3.html#SEC19"><CODE>BZ2_bzCompressInit</CODE></A>
<LI><A NAME="TOC20" HREF="manual_3.html#SEC20"><CODE>BZ2_bzCompress</CODE></A>
<LI><A NAME="TOC21" HREF="manual_3.html#SEC21"><CODE>BZ2_bzCompressEnd</CODE></A>
<LI><A NAME="TOC22" HREF="manual_3.html#SEC22"><CODE>BZ2_bzDecompressInit</CODE></A>
<LI><A NAME="TOC23" HREF="manual_3.html#SEC23"><CODE>BZ2_bzDecompress</CODE></A>
<LI><A NAME="TOC24" HREF="manual_3.html#SEC24"><CODE>BZ2_bzDecompressEnd</CODE></A>
</UL>
<LI><A NAME="TOC25" HREF="manual_3.html#SEC25">High-level interface</A>
<UL>
<LI><A NAME="TOC26" HREF="manual_3.html#SEC26"><CODE>BZ2_bzReadOpen</CODE></A>
<LI><A NAME="TOC27" HREF="manual_3.html#SEC27"><CODE>BZ2_bzRead</CODE></A>
<LI><A NAME="TOC28" HREF="manual_3.html#SEC28"><CODE>BZ2_bzReadGetUnused</CODE></A>
<LI><A NAME="TOC29" HREF="manual_3.html#SEC29"><CODE>BZ2_bzReadClose</CODE></A>
<LI><A NAME="TOC30" HREF="manual_3.html#SEC30"><CODE>BZ2_bzWriteOpen</CODE></A>
<LI><A NAME="TOC31" HREF="manual_3.html#SEC31"><CODE>BZ2_bzWrite</CODE></A>
<LI><A NAME="TOC32" HREF="manual_3.html#SEC32"><CODE>BZ2_bzWriteClose</CODE></A>
<LI><A NAME="TOC33" HREF="manual_3.html#SEC33">Handling embedded compressed data streams</A>
<LI><A NAME="TOC34" HREF="manual_3.html#SEC34">Standard file-reading/writing code</A>
</UL>
<LI><A NAME="TOC35" HREF="manual_3.html#SEC35">Utility functions</A>
<UL>
<LI><A NAME="TOC36" HREF="manual_3.html#SEC36"><CODE>BZ2_bzBuffToBuffCompress</CODE></A>
<LI><A NAME="TOC37" HREF="manual_3.html#SEC37"><CODE>BZ2_bzBuffToBuffDecompress</CODE></A>
</UL>
<LI><A NAME="TOC38" HREF="manual_3.html#SEC38"><CODE>zlib</CODE> compatibility functions</A>
<LI><A NAME="TOC39" HREF="manual_3.html#SEC39">Using the library in a <CODE>stdio</CODE>-free environment</A>
<UL>
<LI><A NAME="TOC40" HREF="manual_3.html#SEC40">Getting rid of <CODE>stdio</CODE></A>
<LI><A NAME="TOC41" HREF="manual_3.html#SEC41">Critical error handling</A>
</UL>
<LI><A NAME="TOC42" HREF="manual_3.html#SEC42">Making a Windows DLL</A>
</UL>
<LI><A NAME="TOC43" HREF="manual_4.html#SEC43">Miscellanea</A>
<UL>
<LI><A NAME="TOC44" HREF="manual_4.html#SEC44">Limitations of the compressed file format</A>
<LI><A NAME="TOC45" HREF="manual_4.html#SEC45">Portability issues</A>
<LI><A NAME="TOC46" HREF="manual_4.html#SEC46">Reporting bugs</A>
<LI><A NAME="TOC47" HREF="manual_4.html#SEC47">Did you get the right package?</A>
<LI><A NAME="TOC48" HREF="manual_4.html#SEC48">Testing</A>
<LI><A NAME="TOC49" HREF="manual_4.html#SEC49">Further reading</A>
</UL>
</UL>
<P><HR><P>
This document was generated on 23 March 2000 using the
<A HREF="http://wwwcn.cern.ch/dci/texi2html/">texi2html</A>
translator version 1.51a.</P>
</BODY>
</HTML>

View file

@ -0,0 +1,124 @@
/*-------------------------------------------------------------*/
/*--- Table for randomising repetitive blocks ---*/
/*--- randtable.c ---*/
/*-------------------------------------------------------------*/
/*--
This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression.
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Cambridge, UK.
jseward@acm.org
bzip2/libbzip2 version 1.0 of 21 March 2000
This program is based on (at least) the work of:
Mike Burrows
David Wheeler
Peter Fenwick
Alistair Moffat
Radford Neal
Ian H. Witten
Robert Sedgewick
Jon L. Bentley
For more information on these sources, see the manual.
--*/
#include "bzlib_private.h"
/*---------------------------------------------*/
Int32 BZ2_rNums[512] = {
619, 720, 127, 481, 931, 816, 813, 233, 566, 247,
985, 724, 205, 454, 863, 491, 741, 242, 949, 214,
733, 859, 335, 708, 621, 574, 73, 654, 730, 472,
419, 436, 278, 496, 867, 210, 399, 680, 480, 51,
878, 465, 811, 169, 869, 675, 611, 697, 867, 561,
862, 687, 507, 283, 482, 129, 807, 591, 733, 623,
150, 238, 59, 379, 684, 877, 625, 169, 643, 105,
170, 607, 520, 932, 727, 476, 693, 425, 174, 647,
73, 122, 335, 530, 442, 853, 695, 249, 445, 515,
909, 545, 703, 919, 874, 474, 882, 500, 594, 612,
641, 801, 220, 162, 819, 984, 589, 513, 495, 799,
161, 604, 958, 533, 221, 400, 386, 867, 600, 782,
382, 596, 414, 171, 516, 375, 682, 485, 911, 276,
98, 553, 163, 354, 666, 933, 424, 341, 533, 870,
227, 730, 475, 186, 263, 647, 537, 686, 600, 224,
469, 68, 770, 919, 190, 373, 294, 822, 808, 206,
184, 943, 795, 384, 383, 461, 404, 758, 839, 887,
715, 67, 618, 276, 204, 918, 873, 777, 604, 560,
951, 160, 578, 722, 79, 804, 96, 409, 713, 940,
652, 934, 970, 447, 318, 353, 859, 672, 112, 785,
645, 863, 803, 350, 139, 93, 354, 99, 820, 908,
609, 772, 154, 274, 580, 184, 79, 626, 630, 742,
653, 282, 762, 623, 680, 81, 927, 626, 789, 125,
411, 521, 938, 300, 821, 78, 343, 175, 128, 250,
170, 774, 972, 275, 999, 639, 495, 78, 352, 126,
857, 956, 358, 619, 580, 124, 737, 594, 701, 612,
669, 112, 134, 694, 363, 992, 809, 743, 168, 974,
944, 375, 748, 52, 600, 747, 642, 182, 862, 81,
344, 805, 988, 739, 511, 655, 814, 334, 249, 515,
897, 955, 664, 981, 649, 113, 974, 459, 893, 228,
433, 837, 553, 268, 926, 240, 102, 654, 459, 51,
686, 754, 806, 760, 493, 403, 415, 394, 687, 700,
946, 670, 656, 610, 738, 392, 760, 799, 887, 653,
978, 321, 576, 617, 626, 502, 894, 679, 243, 440,
680, 879, 194, 572, 640, 724, 926, 56, 204, 700,
707, 151, 457, 449, 797, 195, 791, 558, 945, 679,
297, 59, 87, 824, 713, 663, 412, 693, 342, 606,
134, 108, 571, 364, 631, 212, 174, 643, 304, 329,
343, 97, 430, 751, 497, 314, 983, 374, 822, 928,
140, 206, 73, 263, 980, 736, 876, 478, 430, 305,
170, 514, 364, 692, 829, 82, 855, 953, 676, 246,
369, 970, 294, 750, 807, 827, 150, 790, 288, 923,
804, 378, 215, 828, 592, 281, 565, 555, 710, 82,
896, 831, 547, 261, 524, 462, 293, 465, 502, 56,
661, 821, 976, 991, 658, 869, 905, 758, 745, 193,
768, 550, 608, 933, 378, 286, 215, 979, 792, 961,
61, 688, 793, 644, 986, 403, 106, 366, 905, 644,
372, 567, 466, 434, 645, 210, 389, 550, 919, 135,
780, 773, 635, 389, 707, 100, 626, 958, 165, 504,
920, 176, 193, 713, 857, 265, 203, 50, 668, 108,
645, 990, 626, 197, 510, 357, 358, 850, 858, 364,
936, 638
};
/*-------------------------------------------------------------*/
/*--- end randtable.c ---*/
/*-------------------------------------------------------------*/

Binary file not shown.

Binary file not shown.

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,39 @@
/* spew out a thoroughly gigantic file designed so that bzip2
can compress it reasonably rapidly. This is to help test
support for large files (> 2GB) in a reasonable amount of time.
I suggest you use the undocumented --exponential option to
bzip2 when compressing the resulting file; this saves a bit of
time. Note: *don't* bother with --exponential when compressing
Real Files; it'll just waste a lot of CPU time :-)
(but is otherwise harmless).
*/
#define _FILE_OFFSET_BITS 64
#include <stdio.h>
#include <stdlib.h>
/* The number of megabytes of junk to spew out (roughly) */
#define MEGABYTES 5000
#define N_BUF 1000000
char buf[N_BUF];
int main ( int argc, char** argv )
{
int ii, kk, p;
srandom(1);
setbuffer ( stdout, buf, N_BUF );
for (kk = 0; kk < MEGABYTES * 515; kk+=3) {
p = 25+random()%50;
for (ii = 0; ii < p; ii++)
printf ( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" );
for (ii = 0; ii < p-1; ii++)
printf ( "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" );
for (ii = 0; ii < p+1; ii++)
printf ( "ccccccccccccccccccccccccccccccccccccc" );
}
fflush(stdout);
return 0;
}

View file

@ -0,0 +1,6 @@
LIBRARY unbzip2.dll
EXPORTS
BZ2_bzBuffToBuffDecompress@24
BZ2_malloc
BZ2_free

View file

@ -0,0 +1,6 @@
LIBRARY unbzip2.dll
EXPORTS
BZ2_bzBuffToBuffDecompress=BZ2_bzBuffToBuffDecompress@24
BZ2_malloc
BZ2_free

View file

@ -0,0 +1,126 @@
/* A test program written to test robustness to decompression of
corrupted data. Usage is
unzcrash filename
and the program will read the specified file, compress it (in memory),
and then repeatedly decompress it, each time with a different bit of
the compressed data inverted, so as to test all possible one-bit errors.
This should not cause any invalid memory accesses. If it does,
I want to know about it!
p.s. As you can see from the above description, the process is
incredibly slow. A file of size eg 5KB will cause it to run for
many hours.
*/
#include <stdio.h>
#include <assert.h>
#include "bzlib.h"
#define M_BLOCK 1000000
typedef unsigned char uchar;
#define M_BLOCK_OUT (M_BLOCK + 1000000)
uchar inbuf[M_BLOCK];
uchar outbuf[M_BLOCK_OUT];
uchar zbuf[M_BLOCK + 600 + (M_BLOCK / 100)];
int nIn, nOut, nZ;
static char *bzerrorstrings[] = {
"OK"
,"SEQUENCE_ERROR"
,"PARAM_ERROR"
,"MEM_ERROR"
,"DATA_ERROR"
,"DATA_ERROR_MAGIC"
,"IO_ERROR"
,"UNEXPECTED_EOF"
,"OUTBUFF_FULL"
,"???" /* for future */
,"???" /* for future */
,"???" /* for future */
,"???" /* for future */
,"???" /* for future */
,"???" /* for future */
};
void flip_bit ( int bit )
{
int byteno = bit / 8;
int bitno = bit % 8;
uchar mask = 1 << bitno;
//fprintf ( stderr, "(byte %d bit %d mask %d)",
// byteno, bitno, (int)mask );
zbuf[byteno] ^= mask;
}
int main ( int argc, char** argv )
{
FILE* f;
int r;
int bit;
int i;
if (argc != 2) {
fprintf ( stderr, "usage: unzcrash filename\n" );
return 1;
}
f = fopen ( argv[1], "r" );
if (!f) {
fprintf ( stderr, "unzcrash: can't open %s\n", argv[1] );
return 1;
}
nIn = fread ( inbuf, 1, M_BLOCK, f );
fprintf ( stderr, "%d bytes read\n", nIn );
nZ = M_BLOCK;
r = BZ2_bzBuffToBuffCompress (
zbuf, &nZ, inbuf, nIn, 9, 0, 30 );
assert (r == BZ_OK);
fprintf ( stderr, "%d after compression\n", nZ );
for (bit = 0; bit < nZ*8; bit++) {
fprintf ( stderr, "bit %d ", bit );
flip_bit ( bit );
nOut = M_BLOCK_OUT;
r = BZ2_bzBuffToBuffDecompress (
outbuf, &nOut, zbuf, nZ, 0, 0 );
fprintf ( stderr, " %d %s ", r, bzerrorstrings[-r] );
if (r != BZ_OK) {
fprintf ( stderr, "\n" );
} else {
if (nOut != nIn) {
fprintf(stderr, "nIn/nOut mismatch %d %d\n", nIn, nOut );
return 1;
} else {
for (i = 0; i < nOut; i++)
if (inbuf[i] != outbuf[i]) {
fprintf(stderr, "mismatch at %d\n", i );
return 1;
}
if (i == nOut) fprintf(stderr, "really ok!\n" );
}
}
flip_bit ( bit );
}
#if 0
assert (nOut == nIn);
for (i = 0; i < nOut; i++) {
if (inbuf[i] != outbuf[i]) {
fprintf ( stderr, "difference at %d !\n", i );
return 1;
}
}
#endif
fprintf ( stderr, "all ok\n" );
return 0;
}

View file

@ -0,0 +1,5 @@
If compilation produces errors, or a large number of warnings,
please read README.COMPILATION.PROBLEMS -- you might be able to
adjust the flags in this Makefile to improve matters.

View file

@ -0,0 +1,4 @@
Doing 6 tests (3 compress, 3 uncompress) ...
If there's a problem, things might stop at this point.

View file

@ -0,0 +1,5 @@
Checking test results. If any of the four "cmp"s which follow
report any differences, something is wrong. If you can't easily
figure out what, please let me know (jseward@acm.org).

View file

@ -0,0 +1,23 @@
If you got this far and the "cmp"s didn't complain, it looks
like you're in business.
To install in /usr/bin, /usr/lib, /usr/man and /usr/include, type
make install
To install somewhere else, eg, /xxx/yyy/{bin,lib,man,include}, type
make install PREFIX=/xxx/yyy
If you are (justifiably) paranoid and want to see what 'make install'
is going to do, you can first do
make -n install or
make -n install PREFIX=/xxx/yyy respectively.
The -n instructs make to show the commands it would execute, but
not actually execute them.
Instructions for use are in the preformatted manual page, in the file
bzip2.txt. For more detailed documentation, read the full manual.
It is available in Postscript form (manual.ps) and HTML form
(manual_toc.html).
You can also do "bzip2 --help" to see some helpful information.
"bzip2 -L" displays the software license.