Using jcFind to find duplicates within an archive
Posted by danielmeyer on July 29, 2008
The other day I used jcFind to search for duplicate classes inside my .war file. Just wanted to jot down how I did that.
- I have Cygwin installed, including the Python package.
- I downloaded jcfind-1.0.5.zip from sourceforge.net and extracted the jcfind python script therefrom.
- I unzipped my xa-example .war file to a temporary directory (jcfind doesn’t currently support searching recursively inside archives).
- Ran the command
The Command
Here’s the command, and then I’ll explain each part:
python d:\path\to\jcfind\jcfind @c:\tempdir\xa-example.war | grep "\.class$" | rev | cut --delimiter=. --fields=2- | rev | cut --delimiter=" " --fields=2 | sort | uniq --repeated
The Explanation
| Element | Explanation |
|---|---|
python d:\path\to\jcfind\jcfind |
jcfind is a Python script, so we need to start up Python to use it |
@c:\tempdir\xa-example.war-dir |
The syntax jcfind expects is searchstring@directory-to-search. With no searchstring, jcfind lists all contents of the archives. |
| grep "\.class$" |
jcfind lists directories too, but we want just classes, so we search for .class at the end of a line |
| rev |
reverse the string — sometimes it’s easier to do things from the other end… |
| cut --delimiter=. --fields=2- |
chop off “.class” |
| rev |
turn the strings back around frontways |
| cut --delimiter=" " --fields=2 |
chop off the path so we can see which adjacent lines are identical |
| sort |
get any duplicates next to each other |
| uniq --repeated |
only show duplicate classnames |
‘Course, for all this, it still doesn’t tell you which libraries the duplicates are in…