Duplicate file finding script
Jason Shein
jason-xgs8i/e9EeWTtA8H5PvdGCwD8/FfD2ys at public.gmane.org
Tue Sep 20 17:03:59 UTC 2005
For those of you whose hard drives are cluttering up with possibly duplicate
files, try this little script out.
It will recursively MD5sum all files in a directory and output to a file
called rem-duplicates.sh. Open this file in an editor, and un-comment the
file(s) you would like to be removed. After this run the script and all
un-commented files will be removed.
Step-by-step:
#nano duplicate_finder.sh
paste in script below
##beginning of script##
OUTF=rem-duplicates.sh;
echo "#! /bin/sh" > $OUTF;
find "$@" -type f -follow -print0 |
xargs -0 -n1 md5sum |
sort --key=1,32 | uniq -w 32 -d --all-repeated=separate |
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >>
$OUTF;
chmod a+x $OUTF; ls -l $OUTF
##end of script##
go into the directory you would like to search. Then run the script.
#sh <path to script>duplicate_finder.sh
this could take a while, depending on the size of the directory....
now open the file it has created and see what it found.
#nano rem-duplicates.sh
un-comment unwanted files. Then run the resulting script.
**WARNING** THESE FILES WILL BE DELETED PERMANENTLY!
#sh rem-duplicates.sh
--
Jason Shein
Director of Networking, Operations and Systems
Detached Networks
jason-xgs8i/e9EeWTtA8H5PvdGCwD8/FfD2ys at public.gmane.org
( 905 ) - 876 - 4158 Voice
( 905 ) - 876 - 5817 Mobile
http://www.detachednetworks.ca
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list