finding same files across hardrives

D. Hugh Redelmeier hugh-pmF8o41NoarQT0dZR+AlfA at public.gmane.org
Sat Nov 29 19:38:36 UTC 2008


| From: Jose <jtc-vS8X3Ji+8Wg6e3DpGhMbh2oLBQzVVOGK at public.gmane.org>

| I've been trying to find files with the same name

[Some of your typos make it a bit harder to understand what you are
asking.]

| Is there any linux rpm or souce to compile utility that may help to do this?

This kind of thing is easy to do with a shell script.  For that reason
I've never investigated if there are utilities to make this easier.

Looking for matching names is a bit scary to me.  I'd prefer to look
for duplicate contents.

To find non-obvious matching files, I do a md5sum or sha1sum of each
file and then find files with identical hashes.  Being paranoid, I
actual do a cmp before I'm sure that they match (the chance of
cryptographic hashes matching but the contents differing is VERY
slight).

Note: a lot of files are empty: the fact that all of them have
identical contents really doesn't say that they the "same" file in a
semantic sense.

Are your duplicates systematically placed?

Here is a shell script that I just whipped up WITHOUT TESTING.
It requires that the file contents match, not just the name.
Since I don't actually know what you want, I don't know whether this
script could be useful.

================================================================
# stop if anything goes wrong
set -ue

# good directory:
GD=$HOME/good
# bad directory:
BD=/somewhere/else

cd $GD
find . -type f -print |
	while read p
	do
		if [ -f "$p" ] && cmp -s "$p" "$BD/$p"
		then
			#### after testing this, change this to
			#### actually rm
			echo rm "$BD/$p"
		endif
	done
================================================================
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list