dump a perl array into a psql DB via 'copy'; help?

Taavi Burns jaaaarel-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
Wed Dec 15 22:28:31 UTC 2004


On Wed, 15 Dec 2004 01:38:56 -0500, Madison Kelly <linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org> wrote:
> (currently taking 11 seconds to process 22,000 files on my machine) I
> then call 'psql' to read in the contents. This works but the read alone
> takes another 31 seconds. I know this sounds somewhat trivial but I need
> it to be faster.
<snip>
> out. Next I want to dump that contents of the array into 'psql' one line
> at a time but not commit the changes until the whole array is in (which
> is how I believe the 'copy' works from a text file). This way I would
> avoid a second disk IO hit by avoiding the need to have 'psql' read the
> file.

http://www.postgresql.org/docs/7.4/interactive/sql-copy.html

You might make use of "COPY FROM STDIN" so that you can pipe the
output of your program directly into psql, though this will add the overhead
of transferring data between your process and the client process and then
to the server process, as opposed to your process to disk to the server
process.  I can't say which might be faster.

Also consider writing the file (or streamingn it) as BINARY instead of
text.  The web page above states that it's a bit faster.  If you already have
your data in variables, it might even decrease your write time to pack
them into the psql binary format, and certainly speed up psql's read
processing.  I doubt it'll be earth-shattering, but probably good for a few
percent.  :)

As someone else suggested, you can also write the file to /dev/shm to avoid
having it hit disk, though if the file is larger than half the RAM of
the machine
in question, you'll run out of space (and probably cause some horrible
swapping while your at it, and negate any performance gain you were hoping
for).  Piping the data directly to psql would likely perform much better on
a dual CPU system, as data could be processed as it's generated.  Even on
a uniprocessor, if that one CPU isn't pegged for those first 11 seconds, you
might see an overall increase in throughput by piping.

-- 
taa
/*eof*/
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list