war story: parallel(1) command

Christopher Browne cbbrowne-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
Thu Aug 1 23:02:19 UTC 2013


On Thu, Aug 1, 2013 at 4:16 AM, Eric <gyre-Ja3L+HSX0kI at public.gmane.org> wrote:
> On Wed, 31 Jul 2013, Lennart Sorensen wrote:
>
>> On Tue, Jul 30, 2013 at 06:33:44PM -0400, Eric B wrote:
>> > Your "likely the same" is context dependent.
>> > I agree with what you say above in the context of random file
>> > corruption or in the case of files containing random bits.
>> >
>> > For Hugh's case, he wants to hash all the files in a real filesystem
>> > to find real differences.
>> >
>> > If one calculates the SHA-N hash for each file, that would
>> > answer the question ("Are these files the same or different?")
>> > with virtual certainty.  There is NO need for an additional
>> > compare if the same hash is found.
>>
>> Of course there is.  If you don't, you simply indicate you have no
>> understanding of what a hash is.
>
> Did you follow the link Hugh's provided?:
> http://marc.info/?l=git&m=115678778717621&w=2
>
>> > When probabilities are too astronomically unlikely,
>> > they never happen in reality.
>>
>> That's not good enough for file comparison.
>>
>> Of course you are unlikely to find two different files with the same
>> hash, so must likely the extra comparison won't happen on files that
>> are not the same, but you still need to do it.
>
> No, you do not need to do it, and doing it is a waste of time.
> If something does not happen in reality, there is no need to test
> for it.
>
> In the cases I am talking about, think of a collision as being so
> astronomically unlikely that it is indistinguishable from
> impossible.
> A distinction without a difference is not a real difference.
> Any other factual event in this universe is more likely
> to occur than such a collision.
> For all intents and purposes, a real collision is impossible.

This seems in conflict with Cromwell's Rule...
http://en.wikipedia.org/wiki/Cromwell%27s_rule

The reference is to Oliver Cromwell. Cromwell wrote to the synod of
the Church of Scotland on August 5 1650, including a phrase that has
become well known and frequently quoted:

  "I beseech you, in the bowels of Christ, think it possible that you
may be mistaken."

As Lindley puts it, assigning a probability should "leave a little
probability for the moon being made of green cheese; it can be as
small as 1 in a million, but have it there since otherwise an army of
astronauts returning with samples of the said cheese will leave you
unmoved."[3] Similarly, in assessing the likelihood that tossing a
coin will result in either a head or a tail facing upwards, there is a
possibility, albeit remote, that the coin will land on its edge and
remain in that position.

While I agree that SHA-1 was designed to try to resist collisions, I
do not think I have quite so much faith as you do as to the permanency
or *true* impossibility of this.

People thought that the Knapsack cipher was strong; it turned out not
to be nearly as strong as imagined.  It would be unfortunate if Git
depended (to the point of falling apart destructively!) on the True
Impossibility of SHA-1 collisions, only for us to discover, 10 years
from now, that someone can induce collisions at some moderate
computational cost, and, thereby, throw patches at Git repositories
that make them fall over.

I'm not making up a purely theoretical problem; note that
cryptographers have found better-than-brute-force attacks already
against SHA-1.
<http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html>
The difficulty was reduced in 2005 from 2^80 calculations to 2^69.
That's the sort of thing that theoretical cryptanalysis tends to
achieve; moving from "toughest brute force" solutions to ones that
reduce the solution space to something easier.  Of course, "easier !=
easy"; 2^69 calculations is still a pretty material amount of work,
more than what the distributed.net project did.

But I think I rather go with Cromwell, and be a little paranoid that
what, at this point, seems "impossible before the heat death of the
universe" might prove more tractable.

The Japanese apparently were mistaken as to how large the largest
tsunami could be; they evidently treated something as impossible that
was evidently rather less improbable.  A few billion dollars of extra
spending might have prevented what's estimated at > $200B in losses.

When using hashing for uniqueness, I'm inclined to not treat
collisions as being Truly Impossible, when we should well know that
that isn't actually so.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list