Bruce Eckel's Thinking in Java Contents | Prev | Next

Compression

Java 1.1 has also added some classes to support reading and writing streams in a compressed format. These are wrapped around existing IO classes to provide compression functionality.

One aspect of these Java 1.1 classes stands out: They are not derived from the new Reader and Writer classes, but instead are part of the InputStream and OutputStream hierarchies. So you might be forced to mix the two types of streams. (Remember that you can use InputStreamReader and OutputStreamWriter to provide easy conversion between one type and another.)

Java 1.1 Compression class

Function

CheckedInputStream

GetCheckSum( ) produces checksum for any InputStream (not just decompression)

CheckedOutputStream

GetCheckSum( ) produces checksum for any OutputStream (not just compression)

DeflaterOutputStream

Base class for compression classes

ZipOutputStream

A DeflaterOutputStream that compresses data into the Zip file format

GZIPOutputStream

A DeflaterOutputStream that compresses data into the GZIP file format

InflaterInputStream

Base class for decompression classes

ZipInputStream

A DeflaterInputStream that Decompresses data that has been stored in the Zip file format

GZIPInputStream

A DeflaterInputStream that decompresses data that has been stored in the GZIP file format

Although there are many compression algorithms, Zip and GZIP are possibly the most commonly used. Thus you can easily manipulate your compressed data with the many tools available for reading and writing these formats.

Simple compression with GZIP

The GZIP interface is simple and thus is probably more appropriate when you have a single stream of data that you want to compress (rather than a collection of dissimilar pieces of data). Here’s an example that compresses a single file:

//: GZIPcompress.java
// Uses Java 1.1 GZIP compression to compress
// a file whose name is passed on the command
// line.
import java.io.*;
import java.util.zip.*;

public class GZIPcompress {
  public static void main(String[] args) {
    try {
      BufferedReader in =
        new BufferedReader(
          new FileReader(args[0]));
      BufferedOutputStream out =
        new BufferedOutputStream(
          new GZIPOutputStream(
            new FileOutputStream("test.gz")));
      System.out.println("Writing file");
      int c;
      while((c = in.read()) != -1)
        out.write(c);
      in.close();
      out.close();
      System.out.println("Reading file");
      BufferedReader in2 =
        new BufferedReader(
          new InputStreamReader(
            new GZIPInputStream(
              new FileInputStream("test.gz"))));
      String s;
      while((s = in2.readLine()) != null)
        System.out.println(s);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~ 

The use of the compression classes is straightforward – you simply wrap your output stream in a GZIPOutputStream or ZipOutputStream and your input stream in a GZIPInputStream or ZipInputStream. All else is ordinary IO reading and writing. This is, however, a good example of when you’re forced to mix the old IO streams with the new: in uses the Reader classes, whereas GZIPOutputStream’s constructor can accept only an OutputStream object, not a Writer object.

Multi-file storage with Zip

The Java 1.1 library that supports the Zip format is much more extensive. With it you can easily store multiple files, and there’s even a separate class to make the process of reading a Zip file easy. The library uses the standard Zip format so that it works seamlessly with all the tools currently downloadable on the Internet. The following example has the same form as the previous example, but it handles as many command-line arguments as you want. In addition, it shows the use of the Checksum classes to calculate and verify the checksum for the file. There are two Checksum types: Adler32 (which is faster) and CRC32 (which is slower but slightly more accurate).

//: ZipCompress.java
// Uses Java 1.1 Zip compression to compress
// any number of files whose names are passed
// on the command line.
import java.io.*;
import java.util.*;
import java.util.zip.*;

public class ZipCompress {
  public static void main(String[] args) {
    try {
      FileOutputStream f =
        new FileOutputStream("test.zip");
      CheckedOutputStream csum =
        new CheckedOutputStream(
          f, new Adler32());
      ZipOutputStream out =
        new ZipOutputStream(
          new BufferedOutputStream(csum));
      out.setComment("A test of Java Zipping");
      // Can't read the above comment, though
      for(int i = 0; i < args.length; i++) {
        System.out.println(
          "Writing file " + args[i]);
        BufferedReader in =
          new BufferedReader(
            new FileReader(args[i]));
        out.putNextEntry(new ZipEntry(args[i]));
        int c;
        while((c = in.read()) != -1)
          out.write(c);
        in.close();
      }
      out.close();
      // Checksum valid only after the file
      // has been closed!
      System.out.println("Checksum: " +
        csum.getChecksum().getValue());
      // Now extract the files:
      System.out.println("Reading file");
      FileInputStream fi =
         new FileInputStream("test.zip");
      CheckedInputStream csumi =
        new CheckedInputStream(
          fi, new Adler32());
      ZipInputStream in2 =
        new ZipInputStream(
          new BufferedInputStream(csumi));
      ZipEntry ze;
      System.out.println("Checksum: " +
        csumi.getChecksum().getValue());
      while((ze = in2.getNextEntry()) != null) {
        System.out.println("Reading file " + ze);
        int x;
        while((x = in2.read()) != -1)
          System.out.write(x);
      }
      in2.close();
      // Alternative way to open and read
      // zip files:
      ZipFile zf = new ZipFile("test.zip");
      Enumeration e = zf.entries();
      while(e.hasMoreElements()) {
        ZipEntry ze2 = (ZipEntry)e.nextElement();
        System.out.println("File: " + ze2);
        // ... and extract the data as before
      }
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~ 

For each file to add to the archive, you must call putNextEntry( ) and pass it a ZipEntry object. The ZipEntry object contains an extensive interface that allows you to get and set all the data available on that particular entry in your Zip file: name, compressed and uncompressed sizes, date, CRC checksum, extra field data, comment, compression method, and whether it’s a directory entry. However, even though the Zip format has a way to set a password, this is not supported in Java’s Zip library. And although CheckedInputStream and CheckedOutputStream support both Adler32 and CRC32 checksums, the ZipEntry class supports only an interface for CRC. This is a restriction of the underlying Zip format, but it might limit you from using the faster Adler32.

To extract files, ZipInputStream has a getNextEntry( ) method that returns the next ZipEntry if there is one. As a more succinct alternative, you can read the file using a ZipFile object, which has a method entries( ) to return an Enumeration to the ZipEntries.

In order to read the checksum you must somehow have access to the associated Checksum object. Here, a handle to the CheckedOutputStream and CheckedInputStream objects is retained, but you could also just hold onto a handle to the Checksum object.

A baffling method in Zip streams is setComment( ). As shown above, you can set a comment when you’re writing a file, but there’s no way to recover the comment in the ZipInputStream. Comments appear to be supported fully on an entry-by-entry basis only via ZipEntry.

Of course, you are not limited to files when using the GZIP or Zip libraries – you can compress anything, including data to be sent through a network connection.

The Java archive (jar) utility

The Zip format is also used in the Java 1.1 JAR (Java ARchive) file format, which is a way to collect a group of files into a single compressed file, just like Zip. However, like everything else in Java, JAR files are cross-platform so you don’t need to worry about platform issues. You can also include audio and image files as well as class files.

JAR files are particularly helpful when you deal with the Internet. Before JAR files, your Web browser would have to make repeated requests of a Web server in order to download all of the files that make up an applet. In addition, each of these files was uncompressed. By combining all of the files for a particular applet into a single JAR file, only one server request is necessary and the transfer is faster because of compression. And each entry in a JAR file can be digitally signed for security (refer to the Java documentation for details).

A JAR file consists of a single file containing a collection of zipped files along with a “manifest” that describes them. (You can create your own manifest file; otherwise the jar program will do it for you.) You can find out more about JAR manifests in the online documentation.

The jar utility that comes with Sun’s JDK automatically compresses the files of your choice. You invoke it on the command line:

jar [options] destination [manifest] inputfile(s)

The options are simply a collection of letters (no hyphen or any other indicator is necessary). These are:

c

Creates a new or empty archive.

t

Lists the table of contents.

x

Extracts all files

x file

Extracts the named file

f

Says: “I’m going to give you the name of the file.” If you don’t use this, jar assumes that its input will come from standard input, or, if it is creating a file, its output will go to standard output.

m

Says that the first argument will be the name of the user-created manifest file

v

Generates verbose output describing what jar is doing

O

Only store the files; doesn’t compress the files (use to create a JAR file that you can put in your classpath)

M

Don’t automatically create a manifest file

If a subdirectory is included in the files to be put into the JAR file, that subdirectory is automatically added, including all of its subdirectories, etc. Path information is also preserved.

Here are some typical ways to invoke jar:

jar cf myJarFile.jar *.class

This creates a JAR file called myJarFile.jar that contains all of the class files in the current directory, along with an automatically-generated manifest file.

jar cmf myJarFile.jar myManifestFile.mf *.class

Like the previous example, but adding a user-created manifest file called myManifestFile.mf.

jar tf myJarFile.jar

Produces a table of contents of the files in myJarFile.jar.

jar tvf myJarFile.jar

Adds the “verbose” flag to give more detailed information about the files in myJarFile.jar.

jar cvf myApp.jar audio classes image

Assuming audio, classes, and image are subdirectories, this combines all of the subdirectories into the file myApp.jar. The “verbose” flag is also included to give extra feedback while the jar program is working.

If you create a JAR file using the O option, that file can be placed in your CLASSPATH:

CLASSPATH="lib1.jar;lib2.jar;"

Then Java can search lib1.jar and lib2.jar for class files.

The jar tool isn’t as useful as a zip utility. For example, you can’t add or update files to an existing JAR file; you can create JAR files only from scratch. Also, you can’t move files into a JAR file, erasing them as they are moved. However, a JAR file created on one platform will be transparently readable by the jar tool on any other platform (a problem that sometimes plagues zip utilities).

As you will see in Chapter 13, JAR files are also used to package Java Beans.

Contents | Prev | Next