finding same files across hardrives

Jose jtc-vS8X3Ji+8Wg6e3DpGhMbh2oLBQzVVOGK at public.gmane.org
Sat Nov 29 20:03:00 UTC 2008


D. Hugh Redelmeier wrote:
> | From: Jose <jtc-vS8X3Ji+8Wg6e3DpGhMbh2oLBQzVVOGK at public.gmane.org>
> 
> | I've been trying to find files with the same name
> 
> [Some of your typos make it a bit harder to understand what you are
> asking.]
> 
> | Is there any linux rpm or souce to compile utility that may help to do this?
> 
> This kind of thing is easy to do with a shell script.  For that reason
> I've never investigated if there are utilities to make this easier.
> 
> Looking for matching names is a bit scary to me.  I'd prefer to look
> for duplicate contents.
> 
> To find non-obvious matching files, I do a md5sum or sha1sum of each
> file and then find files with identical hashes.  Being paranoid, I
> actual do a cmp before I'm sure that they match (the chance of
> cryptographic hashes matching but the contents differing is VERY
> slight).
> 
> Note: a lot of files are empty: the fact that all of them have
> identical contents really doesn't say that they the "same" file in a
> semantic sense.
> 
> Are your duplicates systematically placed?
> 
> Here is a shell script that I just whipped up WITHOUT TESTING.
> It requires that the file contents match, not just the name.
> Since I don't actually know what you want, I don't know whether this
> script could be useful.
> 
> ================================================================
> # stop if anything goes wrong
> set -ue
> 
> # good directory:
> GD=$HOME/good
> # bad directory:
> BD=/somewhere/else
> 
> cd $GD
> find . -type f -print |
> 	while read p
> 	do
> 		if [ -f "$p" ] && cmp -s "$p" "$BD/$p"
> 		then
> 			#### after testing this, change this to
> 			#### actually rm
> 			echo rm "$BD/$p"
> 		endif
> 	done
> ================================================================
> --
> The Toronto Linux Users Group.      Meetings: http://gtalug.org/
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
> 
Hi Hugh

Thanks for the script, basically not having tape backups I got to copy 
the same files on different hard drives as "backups", but now I have a 
backup solution and I would like to consolidate a single copy of the 
data and properly back it up

Thanks again,

JOse
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list