war story: parallel(1) command

Jamon Camisso jamon.camisso-H217xnMUJC0sA/PxXw9srA at public.gmane.org
Fri Aug 2 19:42:30 UTC 2013


On 02/08/13 12:24 PM, Eric wrote:
> On Thu, 1 Aug 2013, James Knott wrote:
> 
>> D. Hugh Redelmeier wrote:
>>> MD5 hashes are quite reasonably distributed over the 128-bit space. You'd
>>> need something like 2^64 things for the birthday paradox to fire.
>>
>> Okay.  You have 128 bit hash.  That's 2^128 possible combinations.  Now
>> take a bunch of 1 KB (2^10) files.  Those are 2^13 bits and 2^(2^13)
>> combinations.  That means for any given hash, you could have an
>> extremely large number of different files that size that could have the
>> same hash.  Granted, duplicate hashes are extremely rare, but you can
>> never claim they're impossible.
> 
> Are you arguing about this specific case, or the principle?
> 
> To make it clear, consider an extreme example:
> If an event has a probability of
> occurring once in a googolplex universe lifetimes, will that
> occur in reality?  Of course, not.
> 
> Next, is that worth wasting CPU/IO/real time programming for
> it because someone says "It is not impossible."?

Absolutely. We invented digital computers to externalize computational
work for us as a society because it is faster and more efficient than
doing it by hand or with a slide rule.

As such, we ought to use computers to do work, like checking the results
of algorithms that are good enough for most intents and purposes, but
not all, even accounting for an outlier in a 2^128 space.

For example, a forensic analysis of a filesystem would have to show with
absolute certainty that any hash collision, however remotely improbable
in any infinite number of universes' lifetimes, has been documented,
files compared and accounted for.

Close enough is just not good enough, and as was originally pointed out,
file I/O is the bottleneck, not CPU. So once a file is read, might as
well account for even the remote possibilities with a few extra CPU loops.

Jamon
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list