war story: parallel(1) command

Lennart Sorensen lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Wed Jul 31 18:00:21 UTC 2013


On Wed, Jul 31, 2013 at 10:51:52AM -0400, Christopher Browne wrote:
> I hope that comes with an "expect, but verify."
> 
> If it's a hard dependency, and there's no test, then your repository
> might get destroyed if a (highly improbable) collision did took place.
> 
> It's tempting to say "no need to bother, [heat death of universe]...",
> but depending on how bad it is to have a collision, it may be somewhat
> important to check.
> 
> For Git, a collision would have pretty perverse effects; it would mean
> two changes seem like they're the same, and they'll both be treated as
> the parents of successor patches, which would be Mighty Destructive to
> the repository, as it makes it stop making sense.  (Particularly if
> you've been pruning out disconnected patches, so that it's pretty
> certain that those that remain will be parents of something.)
> 
> I have a "don't care" case, myself; I have a script that I use to
> purge mail out of my MH instance.  It pulls all the messages that
> appear deleted (e.g. - a message that gets a comma prepended to the
> filename) as well as those that have gotten archived by the way I use
> Maildir, and stows them all into an MH "Deleted" folder.
> 
> There are expected to be a great deal of duplicates, as, in order to
> be careful not to lose things as they get refiled to apropos places, I
> tend to keep copies around.
> 
> I have a step that deduplicates the messages, where I compare via MD5
> checksums, and throw away the dupes before taking what's left over in
> ~/Mail/Deleted and stowing that into a compressed tarball on the
> possibility of need for future reference.
> 
> It's *possible* that I could lose a few messages to collisions, but
> it's certainly no disaster, as this was mail I was not really planning
> to ever do anything with again.  So I accept here the possibility of
> there being a few losses, don't care.
> 
> If I were using this to dedupe, say, my photograph collection, I
> wouldn't consider the checksum to be enough, as I don't want to
> Perhaps, Randomly lose a few pictures rather magically.
> 
> Mind you, if false duplicates seem to be nearly impossible, people
> will be liable to have an excessive level of trust.  Until the
> plane/train crashes, or some other such disaster, and they'll swing
> back in the other direction...
> 
> I'm slightly surprised that SCMs aren't using UUIDs instead; they tend
> to have more suitable uniqueness guarantees.

Using a hash means it represents the state of the SCM tree.  A UUID is
just random data.  Very different purpose.

-- 
Len Sorensen
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list