Text
processing
If
you come from a C or C++ background, you might be skeptical at first of
Java’s power when it comes to handling text. Indeed, one drawback is that
execution speed is slower and that could hinder some of your efforts. However,
the tools (in particular the String
class) are quite powerful, as the examples in this section show (and
performance improvements have been promised for Java).
As
you’ll see, these examples were created to solve problems that arose in
the creation of this book. However, they are not restricted to that and the
solutions they offer can easily be adapted to other situations. In addition,
they show the power of Java in an area that has not previously been emphasized
in this book.
Extracting
code listings
You’ve
no doubt noticed that each complete code listing (not code fragment) in this
book begins and ends with special comment tag marks ‘
//:’
and ‘
///:~’.
This meta-information is included so that the code can be automatically
extracted from the book into compilable source-code files. In my previous book,
I had a system that allowed me to automatically incorporate tested code files
into the book. In this book, however, I discovered that it was often easier to
paste the code into the book once it was initially tested and, since it’s
hard to get right the first time, to perform edits to the code within the book.
But how to extract it and test the code? This program is the answer, and it
could come in handy when you set out to solve a text processing problem. It
also demonstrates many of the
String
class features.
I
first save the entire book in ASCII text format into a separate file. The
CodePackager
program has two modes (which you can see described in
usageString):
if you use the
-p
flag, it expects to see an input file containing the ASCII text from the book.
It will go through this file and use the comment tag marks to extract the code,
and it uses the file name on the first line to determine the name of the file.
In addition, it looks for the
package
statement in case it needs to put the file into a special directory (chosen via
the path indicated by the
package
statement).
But
that’s not all. It also watches for the change in chapters by keeping
track of the package names. Since all packages for each chapter begin with
c02,
c03,
c04,
etc. to indicate the chapter where they belong
(except
for those beginning with
com,
which are ignored for the purpose of keeping track of chapters), as long as the
first listing in each chapter contains a
package
statement with the chapter number, the
CodePackager
program can keep track of when the chapter changed and put all the subsequent
files in the new chapter subdirectory.
As
each file is extracted, it is placed into a
SourceCodeFile
object that is then placed into a collection. (This process will be more
thoroughly described later.) These
SourceCodeFile
objects could simply be stored in files, but that brings us to the second use
for this project. If you invoke
CodePackager
without
the
-p
flag it expects a “packed” file as input, which it will then
extract into separate files. So the
-p
flag means that the extracted files will be found “packed” into
this single file.
Why
bother with the packed file? Because different computer platforms have
different ways of storing text information in files. A big issue is the
end-of-line character or characters, but other issues can also exist. However,
Java has a special type of IO stream – the DataOutputStream
–
which promises that, regardless of what machine the data is coming from, the
storage of that data will be in a form that can be correctly retrieved by any
other machine by using a DataInputStream.
That is, Java handles all of the platform-specific
details, which is a large part of the promise of Java. So the
-p
flag stores everything into a single file in a universal format. You download
this file and the Java program from the Web, and when you run
CodePackager
on this file
without
the
-p
flag the files will all be extracted to appropriate places on your system. (You
can specify an alternate subdirectory; otherwise the subdirectories will just
be created in the current directory.) To ensure that no system-specific formats
remain,
File
objects are used everywhere a path or a file is described. In addition,
there’s a sanity check: an empty file is placed in each subdirectory; the
name of that file indicates how many files you should find in that subdirectory.
Here
is the code, which will be described in detail at the end of the listing:
//: CodePackager.java
// "Packs" and "unpacks" the code in "Thinking
// in Java" for cross-platform distribution.
/* Commented so CodePackager sees it and starts
a new chapter directory, but so you don't
have to worry about the directory where this
program lives:
package c17;
*/
import java.util.*;
import java.io.*;
class Pr {
static void error(String e) {
System.err.println("ERROR: " + e);
System.exit(1);
}
}
class IO {
static BufferedReader disOpen(File f) {
BufferedReader in = null;
try {
in = new BufferedReader(
new FileReader(f));
} catch(IOException e) {
Pr.error("could not open " + f);
}
return in;
}
static BufferedReader disOpen(String fname) {
return disOpen(new File(fname));
}
static DataOutputStream dosOpen(File f) {
DataOutputStream in = null;
try {
in = new DataOutputStream(
new BufferedOutputStream(
new FileOutputStream(f)));
} catch(IOException e) {
Pr.error("could not open " + f);
}
return in;
}
static DataOutputStream dosOpen(String fname) {
return dosOpen(new File(fname));
}
static PrintWriter psOpen(File f) {
PrintWriter in = null;
try {
in = new PrintWriter(
new BufferedWriter(
new FileWriter(f)));
} catch(IOException e) {
Pr.error("could not open " + f);
}
return in;
}
static PrintWriter psOpen(String fname) {
return psOpen(new File(fname));
}
static void close(Writer os) {
try {
os.close();
} catch(IOException e) {
Pr.error("closing " + os);
}
}
static void close(DataOutputStream os) {
try {
os.close();
} catch(IOException e) {
Pr.error("closing " + os);
}
}
static void close(Reader os) {
try {
os.close();
} catch(IOException e) {
Pr.error("closing " + os);
}
}
}
class SourceCodeFile {
public static final String
startMarker = "//:", // Start of source file
endMarker = "} ///:~", // End of source
endMarker2 = "}; ///:~", // C++ file end
beginContinue = "} ///:Continued",
endContinue = "///:Continuing",
packMarker = "###", // Packed file header tag
eol = // Line separator on current system
System.getProperty("line.separator"),
filesep = // System's file path separator
System.getProperty("file.separator");
public static String copyright = "";
static {
try {
BufferedReader cr =
new BufferedReader(
new FileReader("Copyright.txt"));
String crin;
while((crin = cr.readLine()) != null)
copyright += crin + "\n";
cr.close();
} catch(Exception e) {
copyright = "";
}
}
private String filename, dirname,
contents = new String();
private static String chapter = "c02";
// The file name separator from the old system:
public static String oldsep;
public String toString() {
return dirname + filesep + filename;
}
// Constructor for parsing from document file:
public SourceCodeFile(String firstLine,
BufferedReader in) {
dirname = chapter;
// Skip past marker:
filename = firstLine.substring(
startMarker.length()).trim();
// Find space that terminates file name:
if(filename.indexOf(' ') != -1)
filename = filename.substring(
0, filename.indexOf(' '));
System.out.println("found: " + filename);
contents = firstLine + eol;
if(copyright.length() != 0)
contents += copyright + eol;
String s;
boolean foundEndMarker = false;
try {
while((s = in.readLine()) != null) {
if(s.startsWith(startMarker))
Pr.error("No end of file marker for " +
filename);
// For this program, no spaces before
// the "package" keyword are allowed
// in the input source code:
else if(s.startsWith("package")) {
// Extract package name:
String pdir = s.substring(
s.indexOf(' ')).trim();
pdir = pdir.substring(
0, pdir.indexOf(';')).trim();
// Capture the chapter from the package
// ignoring the 'com' subdirectories:
if(!pdir.startsWith("com")) {
int firstDot = pdir.indexOf('.');
if(firstDot != -1)
chapter =
pdir.substring(0,firstDot);
else
chapter = pdir;
}
// Convert package name to path name:
pdir = pdir.replace(
'.', filesep.charAt(0));
System.out.println("package " + pdir);
dirname = pdir;
}
contents += s + eol;
// Move past continuations:
if(s.startsWith(beginContinue))
while((s = in.readLine()) != null)
if(s.startsWith(endContinue)) {
contents += s + eol;
break;
}
// Watch for end of code listing:
if(s.startsWith(endMarker) ||
s.startsWith(endMarker2)) {
foundEndMarker = true;
break;
}
}
if(!foundEndMarker)
Pr.error(
"End marker not found before EOF");
System.out.println("Chapter: " + chapter);
} catch(IOException e) {
Pr.error("Error reading line");
}
}
// For recovering from a packed file:
public SourceCodeFile(BufferedReader pFile) {
try {
String s = pFile.readLine();
if(s == null) return;
if(!s.startsWith(packMarker))
Pr.error("Can't find " + packMarker
+ " in " + s);
s = s.substring(
packMarker.length()).trim();
dirname = s.substring(0, s.indexOf('#'));
filename = s.substring(s.indexOf('#') + 1);
dirname = dirname.replace(
oldsep.charAt(0), filesep.charAt(0));
filename = filename.replace(
oldsep.charAt(0), filesep.charAt(0));
System.out.println("listing: " + dirname
+ filesep + filename);
while((s = pFile.readLine()) != null) {
// Watch for end of code listing:
if(s.startsWith(endMarker) ||
s.startsWith(endMarker2)) {
contents += s;
break;
}
contents += s + eol;
}
} catch(IOException e) {
System.err.println("Error reading line");
}
}
public boolean hasFile() {
return filename != null;
}
public String directory() { return dirname; }
public String filename() { return filename; }
public String contents() { return contents; }
// To write to a packed file:
public void writePacked(DataOutputStream out) {
try {
out.writeBytes(
packMarker + dirname + "#"
+ filename + eol);
out.writeBytes(contents);
} catch(IOException e) {
Pr.error("writing " + dirname +
filesep + filename);
}
}
// To generate the actual file:
public void writeFile(String rootpath) {
File path = new File(rootpath, dirname);
path.mkdirs();
PrintWriter p =
IO.psOpen(new File(path, filename));
p.print(contents);
IO.close(p);
}
}
class DirMap {
private Hashtable t = new Hashtable();
private String rootpath;
DirMap() {
rootpath = System.getProperty("user.dir");
}
DirMap(String alternateDir) {
rootpath = alternateDir;
}
public void add(SourceCodeFile f){
String path = f.directory();
if(!t.containsKey(path))
t.put(path, new Vector());
((Vector)t.get(path)).addElement(f);
}
public void writePackedFile(String fname) {
DataOutputStream packed = IO.dosOpen(fname);
try {
packed.writeBytes("###Old Separator:" +
SourceCodeFile.filesep + "###\n");
} catch(IOException e) {
Pr.error("Writing separator to " + fname);
}
Enumeration e = t.keys();
while(e.hasMoreElements()) {
String dir = (String)e.nextElement();
System.out.println(
"Writing directory " + dir);
Vector v = (Vector)t.get(dir);
for(int i = 0; i < v.size(); i++) {
SourceCodeFile f =
(SourceCodeFile)v.elementAt(i);
f.writePacked(packed);
}
}
IO.close(packed);
}
// Write all the files in their directories:
public void write() {
Enumeration e = t.keys();
while(e.hasMoreElements()) {
String dir = (String)e.nextElement();
Vector v = (Vector)t.get(dir);
for(int i = 0; i < v.size(); i++) {
SourceCodeFile f =
(SourceCodeFile)v.elementAt(i);
f.writeFile(rootpath);
}
// Add file indicating file quantity
// written to this directory as a check:
IO.close(IO.dosOpen(
new File(new File(rootpath, dir),
Integer.toString(v.size())+".files")));
}
}
}
public class CodePackager {
private static final String usageString =
"usage: java CodePackager packedFileName" +
"\nExtracts source code files from packed \n" +
"version of Tjava.doc sources into " +
"directories off current directory\n" +
"java CodePackager packedFileName newDir\n" +
"Extracts into directories off newDir\n" +
"java CodePackager -p source.txt packedFile" +
"\nCreates packed version of source files" +
"\nfrom text version of Tjava.doc";
private static void usage() {
System.err.println(usageString);
System.exit(1);
}
public static void main(String[] args) {
if(args.length == 0) usage();
if(args[0].equals("-p")) {
if(args.length != 3)
usage();
createPackedFile(args);
}
else {
if(args.length > 2)
usage();
extractPackedFile(args);
}
}
private static String currentLine;
private static BufferedReader in;
private static DirMap dm;
private static void
createPackedFile(String[] args) {
dm = new DirMap();
in = IO.disOpen(args[1]);
try {
while((currentLine = in.readLine())
!= null) {
if(currentLine.startsWith(
SourceCodeFile.startMarker)) {
dm.add(new SourceCodeFile(
currentLine, in));
}
else if(currentLine.startsWith(
SourceCodeFile.endMarker))
Pr.error("file has no start marker");
// Else ignore the input line
}
} catch(IOException e) {
Pr.error("Error reading " + args[1]);
}
IO.close(in);
dm.writePackedFile(args[2]);
}
private static void
extractPackedFile(String[] args) {
if(args.length == 2) // Alternate directory
dm = new DirMap(args[1]);
else // Current directory
dm = new DirMap();
in = IO.disOpen(args[0]);
String s = null;
try {
s = in.readLine();
} catch(IOException e) {
Pr.error("Cannot read from " + in);
}
// Capture the separator used in the system
// that packed the file:
if(s.indexOf("###Old Separator:") != -1 ) {
String oldsep = s.substring(
"###Old Separator:".length());
oldsep = oldsep.substring(
0, oldsep. indexOf('#'));
SourceCodeFile.oldsep = oldsep;
}
SourceCodeFile sf = new SourceCodeFile(in);
while(sf.hasFile()) {
dm.add(sf);
sf = new SourceCodeFile(in);
}
dm.write();
}
} ///:~
You’ll
first notice the
package
statement that is commented out. Since this is the first program in the
chapter, the
package
statement
is necessary to tell
CodePackager
that
the chapter has changed, but putting it in a package would be a problem. When
you create a
package,
you tie the resulting program to a particular directory structure, which is
fine for most of the examples in this book. Here, however, the
CodePackager
program must be compiled and run from an arbitrary directory, so the
package
statement is commented out. It will still
look
like an ordinary
package
statement to
CodePackager,
though, since the program isn’t sophisticated enough to detect multi-line
comments. (It has no need for such sophistication, a fact that comes in handy
here.)
The
first two classes are support/utility classes designed to make the rest of the
program more consistent to write and easier to read. The first,
Pr,
is similar to the ANSI C library
perror,
since it prints an error message (but also exits the program). The second class
encapsulates the creation of files, a process that was shown in Chapter 10 as
one that rapidly becomes verbose and annoying. In Chapter 10, the proposed
solution created new classes, but here
static
method
calls are used. Within those methods the appropriate exceptions are caught and
dealt with. These methods make the rest of the code much cleaner to read.
The
first class that helps solve the problem is
SourceCodeFile,
which represents all the information (including the contents, file name, and
directory) for one source code file in the book. It also contains a set of
String
constants representing the markers that start and end a file, a marker used
inside the packed file, the current system’s end-of-line separator and
file path separator (notice the use of
System.getProperty( )
to get the local version), and a copyright notice, which is extracted from the
following file
Copyright.txt.
//////////////////////////////////////////////////
// Copyright (c) Bruce Eckel, 1998
// Source code file from the book "Thinking in Java"
// All rights reserved EXCEPT as allowed by the
// following statements: You may freely use this file
// for your own work (personal or commercial),
// including modifications and distribution in
// executable form only. Permission is granted to use
// this file in classroom situations, including its
// use in presentation materials, as long as the book
// "Thinking in Java" is cited as the source.
// Except in classroom situations, you may not copy
// and distribute this code; instead, the sole
// distribution point is http://www.BruceEckel.com
// (and official mirror sites) where it is
// freely available. You may not remove this
// copyright and notice. You may not distribute
// modified versions of the source code in this
// package. You may not use this file in printed
// media without the express permission of the
// author. Bruce Eckel makes no representation about
// the suitability of this software for any purpose.
// It is provided "as is" without express or implied
// warranty of any kind, including any implied
// warranty of merchantability, fitness for a
// particular purpose or non-infringement. The entire
// risk as to the quality and performance of the
// software is with you. Bruce Eckel and the
// publisher shall not be liable for any damages
// suffered by you or any third party as a result of
// using or distributing software. In no event will
// Bruce Eckel or the publisher be liable for any
// lost revenue, profit, or data, or for direct,
// indirect, special, consequential, incidental, or
// punitive damages, however caused and regardless of
// the theory of liability, arising out of the use of
// or inability to use software, even if Bruce Eckel
// and the publisher have been advised of the
// possibility of such damages. Should the software
// prove defective, you assume the cost of all
// necessary servicing, repair, or correction. If you
// think you've found an error, please email all
// modified files with clearly commented changes to:
// Bruce@EckelObjects.com. (please use the same
// address for non-code errors found in the book).
////////////////////////////////////////////////// When
extracting files from a packed file, the file separator of the system that
packed the file is also noted, so it can be replaced with the correct one for
the local system.
The
subdirectory name for the current chapter is kept in the field
chapter,
which is initialized to
c02.
(You’ll notice that the listing in Chapter 2 doesn’t contain a
package statement.) The only time that the
chapter
field changes is when a
package
statement is discovered in the current file.
Building
a packed file
The
first constructor is used to extract a file from the ASCII text version of this
book. The calling code (which appears further down in the listing) reads each
line in until it finds one that matches the beginning of a listing. At that
point, it creates a new
SourceCodeFile
object, passing it the first line (which has already been read by the calling
code) and the BufferedReader
object from which to extract the rest of the source code listing.
At
this point, you begin to see heavy use of the
String
methods. To extract the file name, the overloaded version of substring( )
is called that takes the starting offset and goes to the end of the
String.
This starting index is produced by finding the length( )
of the
startMarker.
trim( )
removes white space from both ends of the
String.
The first line can also have words after the name of the file; these are
detected using indexOf( ),
which returns -1 if it cannot find the character you’re looking for and
the value where the first instance of that character is found if it does.
Notice there is also an overloaded version of
indexOf( )
that takes a
String
instead of a character.
Once
the file name is parsed and stored, the first line is placed into the
contents
String
(which is used to hold the entire text of the source code listing). At this
point, the rest of the lines are read and concatenated into the
contents
String.
It’s not quite that simple, since certain situations require special
handling. One case is error checking: if you run into a
startMarker,
it means that no end marker was placed at the end of the listing that’s
currently being collected. This is an error condition that aborts the program.
The
second special case is the
package
keyword. Although Java is a free-form language, this program requires that the
package
keyword be at the beginning of the line. When the
package
keyword is seen, the package name is extracted by looking for the space at the
beginning and the semicolon at the end. (Note that this could also have been
performed in a single operation by using the overloaded
substring( )
that takes both the starting and ending indexes.) Then the dots in the package
name are replaced by the file separator, although an assumption is made here
that the file separator is only one character long. This is probably true on
all systems, but it’s a place to look if there are problems.
The
default behavior is to concatenate each line to
contents,
along with the end-of-line string, until the
endMarker
is discovered, which indicates that the constructor should terminate. If the
end of the file is encountered before the
endMarker
is seen, that’s an error.
Extracting
from a packed file
The
second constructor is used to recover the source code files from a packed file.
Here, the calling method doesn’t have to worry about skipping over the
intermediate text. The file contains all the source-code files, placed
end-to-end. All you need to hand to this constructor is the
BufferedReader
where the information is coming from, and the constructor takes it from there.
There is some meta-information, however, at the beginning of each listing, and
this is denoted by the
packMarker.
If the
packMarker
isn’t there, it means the caller is mistakenly trying to use this
constructor where it isn’t appropriate.
Once
the
packMarker
is found, it is stripped off and the directory name (terminated by a ‘
#’)
and the file name (which goes to the end of the line) are extracted. In both
cases, the old separator character is replaced by the one that is current to
this machine using the
String
replace( )
method. The old separator is placed at the beginning of the packed file, and
you’ll see how that is extracted later in the listing.
The
rest of the constructor is quite simple. It reads and concatenates each line to
the
contents
until the
endMarker
is found.
Accessing
and writing the listings
The
next set of methods are simple accessors:
directory( ),
filename( )
(notice the method can have the same spelling and capitalization as the field)
and
contents( ),
and
hasFile( )
to indicate whether this object contains a file or not. (The need for this will
be seen later.)
The
final three methods are concerned with writing this code listing into a file,
either a packed file via
writePacked( )
or a Java source file via
writeFile( ).
All
writePacked( )
needs is the
DataOutputStream,
which was opened elsewhere, and represents the file that’s being written.
It puts the header information on the first line and then calls
writeBytes( )
to write
contents
in a “universal” format.
When
writing the Java source file, the file must be created. This is done via
IO.psOpen( ),
handing it a File
object that contains not only the file name but also the path. But the question
now is: does this path exist? The user has the option of placing all the source
code directories into a completely different subdirectory, which might not even
exist. So before each file is written, File.mkdirs( )
is called with the path that you want to write the file into. This will make
the entire path all at once.
Containing
the entire collection of listings
It’s
convenient to organize the listings as subdirectories while the whole
collection is being built in memory. One reason is another sanity check: as
each subdirectory of listings is created, an additional file is added whose
name contains the number of files in that directory.
The
DirMap
class produces this effect and demonstrates the concept of a
“multimap.” This is implemented using a Hashtable
whose keys are the subdirectories being created and whose values are Vector
objects containing the
SourceCodeFile
objects in that particular directory. Thus, instead of mapping a key to a
single value, the “multimap” maps a key to a set of values via the
associated
Vector.
Although this sounds complex, it’s remarkably straightforward to
implement. You’ll see that most of the size of the
DirMap
class is due to the portions that write to files, not to the
“multimap” implementation.
There
are two ways you can make a
DirMap:
the default constructor assumes that you want the directories to branch off of
the current one, and the second constructor lets you specify an alternate
absolute path for the starting directory.
The
add( )
method is where quite a bit of dense action occurs. First, the
directory( )
is extracted from the
SourceCodeFile
you want to add, and then the
Hashtable
is examined to see if it contains that key already. If not, a new
Vector
is added to the
Hashtable
and associated with that key. At this point, the
Vector
is there, one way or another, and it is extracted so the
SourceCodeFile
can be added. Because Vectors
can be easily combined with
Hashtables
like this, the power of both is amplified.
Writing
a packed file involves opening the file to write (as a DataOutputStream
so the data is universally recoverable) and writing the header information
about the old separator on the first line. Next, an
Enumeration
of the
Hashtable
keys is produced and stepped through to select each directory and to fetch the
Vector
associated with that directory so each
SourceCodeFile
in that
Vector
can be written to the packed file.
Writing
the Java source files to their directories in
write( )
is
almost identical to
writePackedFile( )
since both methods simply call the appropriate method in
SourceCodeFile.
Here, however, the root path is passed into
SourceCodeFile.writeFile( )
and when all the files have been written the additional file with the name
containing the number of files is also written.
The
main program
The
previously described classes are used within
CodePackager.
First you see the usage string that gets printed whenever the end user invokes
the program incorrectly, along with the
usage( )
method that calls it and exits the program. All
main( )
does is determine whether you want to create a packed file or extract from one,
then it ensures the arguments are correct and calls the appropriate method.
When
a packed file is created, it’s assumed to be made in the current
directory, so the
DirMap
is created using the default constructor. After the file is opened each line is
read and examined for particular conditions:
- If
the line starts with the starting marker for a source code listing, a new
SourceCodeFile
object is created. The constructor reads in the rest of the source listing. The
handle that results is directly added to the
DirMap.
- If
the line starts with the end marker for a source code listing, something has
gone wrong, since end markers should be found only by the
SourceCodeFile
constructor.
When
extracting a packed file, the extraction can be into the current directory or
into an alternate directory, so the
DirMap
object is created accordingly. The file is opened and the first line is read.
The old file path separator information is extracted from this line. Then the
input is used to create the first
SourceCodeFile
object, which is added to the
DirMap.
New
SourceCodeFile
objects are created and added as long as they contain a file. (The last one
created will simply return when it runs out of input and then
hasFile( )
will return false.)
Checking
capitalization style
Although
the previous example can come in handy as a guide for some project of your own
that involves text processing, this project will be directly useful because it
performs a style check to make sure that your capitalization conforms to the
de-facto Java style. It opens each
.java
file in the current directory and extracts all the class names and identifiers,
then shows you if any of them don’t meet the Java style.
For
the program to operate correctly, you must first build a class name repository
to hold all the class names in the standard Java library. You do this by moving
into all the source code subdirectories for the standard Java library and
running
ClassScanner
in each subdirectory. Provide as arguments the name of the repository file
(using the same path and name each time) and the
-a
command-line option to indicate that the class names should be added to the
repository.
To
use the program to check your code, run it and hand it the path and name of the
repository to use. It will check all the classes and identifiers in the current
directory and tell you which ones don’t follow the typical Java
capitalization style.
You
should be aware that the program isn’t perfect; there a few times when it
will point out what it thinks is a problem but on looking at the code
you’ll see that nothing needs to be changed. This is a little annoying,
but it’s still much easier than trying to find all these cases by staring
at your code.
The
explanation immediately follows the listing:
//: ClassScanner.java
// Scans all files in directory for classes
// and identifiers, to check capitalization.
// Assumes properly compiling code listings.
// Doesn't do everything right, but is a very
// useful aid.
import java.io.*;
import java.util.*;
class MultiStringMap extends Hashtable {
public void add(String key, String value) {
if(!containsKey(key))
put(key, new Vector());
((Vector)get(key)).addElement(value);
}
public Vector getVector(String key) {
if(!containsKey(key)) {
System.err.println(
"ERROR: can't find key: " + key);
System.exit(1);
}
return (Vector)get(key);
}
public void printValues(PrintStream p) {
Enumeration k = keys();
while(k.hasMoreElements()) {
String oneKey = (String)k.nextElement();
Vector val = getVector(oneKey);
for(int i = 0; i < val.size(); i++)
p.println((String)val.elementAt(i));
}
}
}
public class ClassScanner {
private File path;
private String[] fileList;
private Properties classes = new Properties();
private MultiStringMap
classMap = new MultiStringMap(),
identMap = new MultiStringMap();
private StreamTokenizer in;
public ClassScanner() {
path = new File(".");
fileList = path.list(new JavaFilter());
for(int i = 0; i < fileList.length; i++) {
System.out.println(fileList[i]);
scanListing(fileList[i]);
}
}
void scanListing(String fname) {
try {
in = new StreamTokenizer(
new BufferedReader(
new FileReader(fname)));
// Doesn't seem to work:
// in.slashStarComments(true);
// in.slashSlashComments(true);
in.ordinaryChar('/');
in.ordinaryChar('.');
in.wordChars('_', '_');
in.eolIsSignificant(true);
while(in.nextToken() !=
StreamTokenizer.TT_EOF) {
if(in.ttype == '/')
eatComments();
else if(in.ttype ==
StreamTokenizer.TT_WORD) {
if(in.sval.equals("class") ||
in.sval.equals("interface")) {
// Get class name:
while(in.nextToken() !=
StreamTokenizer.TT_EOF
&& in.ttype !=
StreamTokenizer.TT_WORD)
;
classes.put(in.sval, in.sval);
classMap.add(fname, in.sval);
}
if(in.sval.equals("import") ||
in.sval.equals("package"))
discardLine();
else // It's an identifier or keyword
identMap.add(fname, in.sval);
}
}
} catch(IOException e) {
e.printStackTrace();
}
}
void discardLine() {
try {
while(in.nextToken() !=
StreamTokenizer.TT_EOF
&& in.ttype !=
StreamTokenizer.TT_EOL)
; // Throw away tokens to end of line
} catch(IOException e) {
e.printStackTrace();
}
}
// StreamTokenizer's comment removal seemed
// to be broken. This extracts them:
void eatComments() {
try {
if(in.nextToken() !=
StreamTokenizer.TT_EOF) {
if(in.ttype == '/')
discardLine();
else if(in.ttype != '*')
in.pushBack();
else
while(true) {
if(in.nextToken() ==
StreamTokenizer.TT_EOF)
break;
if(in.ttype == '*')
if(in.nextToken() !=
StreamTokenizer.TT_EOF
&& in.ttype == '/')
break;
}
}
} catch(IOException e) {
e.printStackTrace();
}
}
public String[] classNames() {
String[] result = new String[classes.size()];
Enumeration e = classes.keys();
int i = 0;
while(e.hasMoreElements())
result[i++] = (String)e.nextElement();
return result;
}
public void checkClassNames() {
Enumeration files = classMap.keys();
while(files.hasMoreElements()) {
String file = (String)files.nextElement();
Vector cls = classMap.getVector(file);
for(int i = 0; i < cls.size(); i++) {
String className =
(String)cls.elementAt(i);
if(Character.isLowerCase(
className.charAt(0)))
System.out.println(
"class capitalization error, file: "
+ file + ", class: "
+ className);
}
}
}
public void checkIdentNames() {
Enumeration files = identMap.keys();
Vector reportSet = new Vector();
while(files.hasMoreElements()) {
String file = (String)files.nextElement();
Vector ids = identMap.getVector(file);
for(int i = 0; i < ids.size(); i++) {
String id =
(String)ids.elementAt(i);
if(!classes.contains(id)) {
// Ignore identifiers of length 3 or
// longer that are all uppercase
// (probably static final values):
if(id.length() >= 3 &&
id.equals(
id.toUpperCase()))
continue;
// Check to see if first char is upper:
if(Character.isUpperCase(id.charAt(0))){
if(reportSet.indexOf(file + id)
== -1){ // Not reported yet
reportSet.addElement(file + id);
System.out.println(
"Ident capitalization error in:"
+ file + ", ident: " + id);
}
}
}
}
}
}
static final String usage =
"Usage: \n" +
"ClassScanner classnames -a\n" +
"\tAdds all the class names in this \n" +
"\tdirectory to the repository file \n" +
"\tcalled 'classnames'\n" +
"ClassScanner classnames\n" +
"\tChecks all the java files in this \n" +
"\tdirectory for capitalization errors, \n" +
"\tusing the repository file 'classnames'";
private static void usage() {
System.err.println(usage);
System.exit(1);
}
public static void main(String[] args) {
if(args.length < 1 || args.length > 2)
usage();
ClassScanner c = new ClassScanner();
File old = new File(args[0]);
if(old.exists()) {
try {
// Try to open an existing
// properties file:
InputStream oldlist =
new BufferedInputStream(
new FileInputStream(old));
c.classes.load(oldlist);
oldlist.close();
} catch(IOException e) {
System.err.println("Could not open "
+ old + " for reading");
System.exit(1);
}
}
if(args.length == 1) {
c.checkClassNames();
c.checkIdentNames();
}
// Write the class names to a repository:
if(args.length == 2) {
if(!args[1].equals("-a"))
usage();
try {
BufferedOutputStream out =
new BufferedOutputStream(
new FileOutputStream(args[0]));
c.classes.save(out,
"Classes found by ClassScanner.java");
out.close();
} catch(IOException e) {
System.err.println(
"Could not write " + args[0]);
System.exit(1);
}
}
}
}
class JavaFilter implements FilenameFilter {
public boolean accept(File dir, String name) {
// Strip path information:
String f = new File(name).getName();
return f.trim().endsWith(".java");
}
} ///:~
The
class
MultiStringMap
is a tool that allows you to map a group of strings onto each key entry. As in
the previous example, it uses a Hashtable
(this time with inheritance) with the key as the single string that’s
mapped onto the
Vector
value. The
add( )
method simply checks to see if there’s a key already in the
Hashtable,
and if not it puts one there. The
getVector( )
method produces a
Vector
for a particular key, and
printValues( ),
which is primarily useful for debugging, prints out all the values
Vector
by
Vector. To
keep life simple, the class names from the standard Java libraries are all put
into a Properties
object (from the standard Java library). Remember that a
Properties
object is a
Hashtable
that holds only
String
objects for both the key and value entries. However, it can be saved to disk
and restored from disk in one method call, so it’s ideal for the
repository of names. Actually, we need only a list of names, and a
Hashtable
can’t accept
null
for either its key or its value entry. So the same object will be used for both
the key and the value.
For
the classes and identifiers that are discovered for the files in a particular
directory, two
MultiStringMaps
are used:
classMap
and
identMap.
Also, when the program starts up it loads the standard class name repository
into the
Properties
object
called
classes,
and when a new class name is found in the local directory that is also added to
classes
as
well as to
classMap.
This way,
classMap
can be used to step through all the classes in the local directory, and
classes
can be used to see if the current token is a class name (which indicates a
definition of an object or method is beginning, so grab the next tokens –
until a semicolon – and put them into
identMap). The
default constructor for
ClassScanner
creates a list of file names (using the
JavaFilter
implementation of FilenameFilter,
as described in Chapter 10). Then it calls
scanListing( )
for each file name.
Inside
scanListing( )
the source code file is opened and turned into a StreamTokenizer.
In the documentation, passing
true
to
slashStarComments( )
and
slashSlashComments( )
is supposed to strip those comments out, but this seems to be a bit flawed (it
doesn’t quite work in Java 1.0).
Instead, those lines are commented out and the comments are extracted by
another method. To do this, the ‘
/’
must be captured as an ordinary character rather than letting the
StreamTokenizer
absorb it as part of a comment, and the
ordinaryChar( )
method tells the
StreamTokenizer
to
do
this. This is also true for dots (‘
.’),
since we want to have the method calls pulled apart into individual
identifiers. However, the underscore, which is ordinarily treated by
StreamTokenizer
as an individual character, should be left as part of identifiers since it
appears in such
static
final
values as
TT_EOF
etc., used in this very program. The
wordChars( )
method
takes a range of characters you want to add to those that are left inside a
token that is being parsed as a word. Finally, when parsing for one-line
comments or discarding a line we need to know when an end-of-line occurs, so by
calling
eolIsSignificant(true)
the eol will show up rather than being absorbed by the
StreamTokenizer. The
rest of
scanListing( )
reads and reacts to tokens until the end of the file, signified when
nextToken( )
returns the
final
static
value
StreamTokenizer.TT_EOF.
If
the token is a
‘/’
it is potentially a comment, so
eatComments( )
is called to deal with it. The only other situation we’re interested in
here is if it’s a word, of which there are some special cases.
If
the word is
class
or
interface
then the next token represents a class or interface name, and it is put into
classes
and
classMap.
If the word is
import
or
package,
then we don’t want the rest of the line. Anything else must be an
identifier (which we’re interested in) or a keyword (which we’re
not, but they’re all lowercase anyway so it won’t spoil things to
put those in). These are added to
identMap. The
discardLine( )
method is a simple tool that looks for the end of a line. Note that any time
you get a new token, you must check for the end of the file.
The
eatComments( )
method is called whenever a forward slash is encountered in the main parsing
loop. However, that doesn’t necessarily mean a comment has been found, so
the next token must be extracted to see if it’s another forward slash (in
which case the line is discarded) or an asterisk. But if it’s neither of
those, it means the token you’ve just pulled out is needed back in the
main parsing loop! Fortunately, the pushBack( )
method allows you to “push back” the current token onto the input
stream so that when the main parsing loop calls nextToken( )
it will get the one you just pushed back.
For
convenience, the
classNames( )
method produces an array of all the names in the
classes
collection. This method is not used in the program but is helpful for debugging.
The
next two methods are the ones in which the actual checking takes place. In
checkClassNames( ),
the class names are extracted from the
classMap
(which, remember, contains only the names in this directory, organized by file
name so the file name can be printed along with the errant class name). This is
accomplished by pulling each associated
Vector
and stepping through that, looking to see if the first character is lower case.
If so, the appropriate error message is printed.
In
checkIdentNames( ),
a similar approach is taken: each identifier name is extracted from
identMap.
If the name is not in the
classes
list, it’s assumed to be an identifier or keyword. A special case is
checked: if the identifier length is 3 or more
and
all the characters are uppercase, this identifier is ignored because it’s
probably a
static
final
value such as
TT_EOF.
Of course, this is not a perfect algorithm, but it assumes that you’ll
eventually notice any all-uppercase identifiers that are out of place.
Instead
of reporting every identifier that starts with an uppercase character, this
method keeps track of which ones have already been reported in a
Vector
called
reportSet( ).
This treats the
Vector
as a “set” that tells you whether an item is already in the set.
The item is produced by concatenating the file name and identifier. If the
element isn’t in the set, it’s added and then the report is made.
The
rest of the listing is comprised of
main( ),
which busies itself by handling the command line arguments and figuring out
whether you’re building a repository of class names from the standard
Java library or checking the validity of code you’ve written. In both
cases it makes a
ClassScanner
object.
Whether
you’re building a repository or using one, you must try to open the
existing repository. By making a File
object and testing for existence, you can decide whether to open the file and
load( )
the
Properties
list
classes
inside
ClassScanner.
(The classes from the repository add to, rather than overwrite, the classes
found by the
ClassScanner
constructor.) If you provide only one command-line argument it means that you
want to perform a check of the class names and identifier names, but if you
provide two arguments (the second being “
-a”)
you’re
building a class name repository. In this case, an output file is opened and
the method
Properties.save( )
is used to write the list into a file, along with a string that provides header
file information.