Duplicate file finding script

Jason Shein jason-xgs8i/e9EeWTtA8H5PvdGCwD8/FfD2ys at public.gmane.org
Tue Sep 20 17:03:59 UTC 2005


For those of you whose hard drives are cluttering up with possibly duplicate 
files, try this little script out. 

It will recursively MD5sum all files in a directory and output to a file 
called rem-duplicates.sh. Open this file in an editor, and un-comment the 
file(s) you would like to be removed. After this run the script and all 
un-commented files will be removed.


Step-by-step:

#nano duplicate_finder.sh

paste in script below

##beginning of script##
OUTF=rem-duplicates.sh;
echo "#! /bin/sh" > $OUTF;
find "$@" -type f -follow -print0 |
  xargs -0 -n1 md5sum |
    sort --key=1,32 | uniq -w 32 -d --all-repeated=separate |
    sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> 
$OUTF;
chmod a+x $OUTF; ls -l $OUTF
##end of script##

go into the directory you would like to search. Then run the script.
#sh <path to script>duplicate_finder.sh

this could take a while, depending on the size of the directory....

now open the file it has created and see what it found.
#nano rem-duplicates.sh

un-comment unwanted files. Then run the resulting script.

**WARNING**  THESE FILES WILL BE DELETED PERMANENTLY!

#sh rem-duplicates.sh

-- 
Jason Shein
Director of Networking, Operations and Systems
Detached Networks
jason-xgs8i/e9EeWTtA8H5PvdGCwD8/FfD2ys at public.gmane.org
( 905 ) - 876 - 4158 Voice
( 905 ) - 876 - 5817 Mobile
http://www.detachednetworks.ca
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list