Deleting all that Subversion metadata

Someone just sent me a Java project, which I need to import into Eclipse and take a look at.  (The changes aren’t ready to commit to Subversion yet.)

The project came to me with a target directory, which is easy to delete:

    rm -r target

but it also came with the Subversion metadata — an .svn folder under each regular folder.  The line I used to get rid of all these:

    find . -name *.svn* | xargs rm -r

Yeah xargs!

Finding the magic non-beans

Someone asked me yesterday what examples we have in the framework services “layer” where we provide functionality through a class that is not exposed as a Spring bean.  I could think of just a couple of examples off the top of my head, but both were kind of odd ones.  I wanted to search the framework services codebase to see if we have other uses of non-beans.

I reasoned that a distinctive characteristic of using a non-bean X is that you tend to see … = new X(... in code using the class.  Here’s what I decided I wanted:

  1. For *.java in the framework services code base, show me the names of the classes that are public (e.g., search for “public class”)
  2. For each class C in this list, search our entire code base for ” = new C(“

1. Finding the Public Classes

1.1. Listing the .java files

From the Windows command prompt I navigated to the directory of my workspace and issued a simple

dir /s /b *.java

This output the filenames in C:… format, which would normally be fine, except that the Cygwin utilities I use for this type of file processing don’t deal well with the backslashes.  They expect a more Unixy output.  I can provide that by using the following instead:

find -name *.java

This output the same filenames in ./…/path/to/the/ format.

1.2. Whittling away the non-public classes

Next, I wanted to search each file in the above list (281java files) for the string “public class”.  (I could probably have skipped this step and just chopped the .java from the filenames to yield the class names except that we have several package private classes that I don’t care about for purposes of this search.)

Xargs to the rescue!

find -name *.java | xargs grep "public class"

This pares our list down to 172 classes.

1.3. Whittling away the test classes

I notice as I examine the output from step 1.2 that several of the public classes are in …/src/test/java/… .  For purposes of this search, I don’t care about those — I only want to see the public classes in production code.  Without bothering to spend time reading the find utility’s manpage,  I modify the search to be like this:

find -name */main/*.java | xargs grep "public class"

At this point, I get one of the most helpful warnings I’ve ever seen (thanks, findutils team!):

find: warning: Unix filenames usually don’t contain slashes (though pathnames do).  That means that ‘-name `*/main/*.java” will probably evaluate to false all the time on this system.  You might find the ‘-wholename’ test more useful, or perhaps ‘-samefile’.
Alternatively, if you are using GNU grep, you could use ‘find … -print0 | grep -FzZ `*/main/*.java”.

Sure enough, no results found.  I fix the search to use the -wholename test, as the warning suggests:

find -wholename */main/*.java | xargs grep "public class"

This works, and now my list is down to 79 public classes, all in src/main/java.

1.4. Just the class names

Actually, what I have is 79 lines like this:

./exceptions/src/main/java/com/ontsys/fw/exception/ class InvalidDataImpl implements InvalidData {

I want just the class names.

A while later…

Here’s our command line now (broken into its three parts for readability; it’s all one line when I run it):

find -wholename */main/*.java
| xargs grep "public class "
| sed -e "s/.*public class ([^ ]*) .*/1/"

To put into words what this is doing:

  • Line 1 lists all production Java files (excludes src/test/java/)
  • Line 2 looks in each file listed and prints the lines that contain the string “public class “.
  • Line 3 looks for public class X (where X is a bunch of non-space characters) and prints just the class name.

1.5. Just the classnames: a minor tweak

The regular expression we’re passing to grep in line 2 above has matched a Javadoc comment line in which the phrase “public class ” was used.  Let’s tweak line 2 to only match lines that have a capital letter after public class:

find -wholename */main/*.java | xargs grep “public class [A-Z]” | sed -e “s/.*public class ([^ ]*) .*/1/”

So: now we have 78 class names printing out.  Now to see who instantiates these.

2. Who Instantiates These?

Now, for each class C in our list (78 of ’em), I want to know all the places in our codebase where ” = new C” appears.

2.1. Who-all instantiates these?

Here’s an approach (O(N^2) at best, but I find it’s often easier for me to make it work fast once it works at all…):


bash-3.2$ for c in`find -wholename */main/*.java | xargs grep “public class [A-Z]” | sed -e “s/.*public class ([^ ]*) .*/1/”`; do find -wholename */main/*.java | xargs grep ” = new $c”; done

Notice that we’re running bash to get the backquote goodness.

For each class in our list-o-78, we search the working directory for production java source files that instantiate that class directly.  This took five-and-a-half minutes on my PC, not searching the whole codebase (which I guess I’d have to check out in its entirety…hmm…) but just framework services.

2.2. Who that we care about instantiates these?

The results generated by step 2.1. include mostly all instantiation of our POs (persistence objects?)  I would like to remove instantiations of POs from our results and see what’s left.

I can tell a PO because its classname always ends with PO.  So here’s our updated command line:

bash-3.2$ for c in `find -wholename */main/*.java | xargs grep “public class [A-Z]” | sed -e “s/.*public class ([^ ]*) .*/1/”`; do find -wholename */main/*.java | xargs egrep ” = new $c” | grep –invert-match ” = new [A-Za-z]*PO”; done

This gets us just the five interesting instantiations.

Next time: How to avoid all this using SVN Searcher!