finding same files across hardrives

Andrei andreilitvin-bJEeYj9oJeDQT0dZR+AlfA at public.gmane.org
Sun Nov 30 17:55:24 UTC 2008


And if you are looking for duplicate names, you could use 

find /dir -type f | xargs -n 1 sh -c 'echo `basename $0` $0' ...

however that probably breaks when you have spaces in your names, so you
can try a more "evil":

find /dir -type f | xargs -n 1 sh -c 'echo `basename $0 | md5sum` $0'
| ...

to join by the MD5 of the name. Will run faster, but like someone else
said, searching for same name for deduplication is fairly dangerous. MD5
search (while getting one or two good-nights sleep :-) ) is probably
better.

Andrei


On Sun, 2008-11-30 at 12:48 -0500, Andrei wrote:
> How about something like:
> 
> find /dir1 -type f | xargs md5sum | sort >data1.txt
> find /dir2 -type f | xargs md5sum | sort >data2.txt
> 
> join ./data1.txt ./data2.txt 
> 
> 
> I think this should give you all the files with the same content (not
> sure how it would handle duplicates though, but I guess it should work)
> 
> Regards,
> Andrei
> 
> 
> On Sat, 2008-11-29 at 12:07 -0500, Jose wrote:
> > Hi everybody
> > 
> > I've been trying to find files with the same name, asically Imade 
> > multiple copies when I had these workstations, I got a machine capable 
> > of holding more disk and data, but I need to get a list so I can safely 
> > delete the date from one drive(s) and keep the other, I tried using a 
> > combination of find and du but the ooutput is not helpful.
> > 
> > Is there any linux rpm or souce to compile utility that may help to do this?
> > 
> > Thanks,
> > 
> > Jose
> > --
> > The Toronto Linux Users Group.      Meetings: http://gtalug.org/
> > TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> > How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
> 
> --
> The Toronto Linux Users Group.      Meetings: http://gtalug.org/
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists

--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list