war story: parallel(1) command

David Mason dmason-bqArmZWzea/GcjXNFnLQ/w at public.gmane.org
Tue Jul 30 15:04:49 UTC 2013


On 30 July 2013 00:33, Eric B <gyre-Ja3L+HSX0kI at public.gmane.org> wrote:
> It is easy to find collisions on a Linux filesystem with a 32-bit CRC
> checksum.  If you have more than 65,000 (~ 2^(32/2)) files,
> you will probably find at least one.
>
> One would think that MD5 is good enough,
> but because it is cryptographically broken, you could find collisions
> that were legitimately generated and not adversarial.
> For example, you might unpack something related to hashes, and it
> contains examples of two different files with duplicate MD5 hashes.
>
> To be safe, use a stronger hash.

If you're trying to find duplicates, use the fastest scan possible as
a first cut... like size... then, if you have a lot of files of the
same size, compare hashes (any hash will do because) finally do a
byte-wise comparison on the files.

../Dave
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list