war story: parallel(1) command

Christopher Browne cbbrowne-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
Sun Jul 28 01:21:53 UTC 2013


I had a similarly pleasing experience with parallel a few months ago.

Wanted to spread out database load for an import process.

I:

- Used split to break big files into 1000 tuple chunks

- This gave me literally thousands of data files where there was filtering
in addition to loading into DB.  (I'm renumbering various stuff, nicely
handled via a small C program or two.)

By using parallel, I could set up a series of concurrent processing streams
covering both filtering and ultimately loading into DB.

By having parallel restrict things to ~10 work processes, this could
harness parallelism, as the servers do have multiple physical disks and
CPUs.

The restriction to 10 concurrent jobs kept it from bogging down.  It's
obviously stupid to try to have 100 or 1000 processes fighting over CPUs.

Prll looked like it might be easier to get installed on systems that might
lack C compilers, but seemed a little more fragile otherwise, though that's
a woefully vague impression.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gtalug.org/pipermail/legacy/attachments/20130727/1cf73f73/attachment.html>


More information about the Legacy mailing list