Postgres/Perl performance help needed

Tue May 25 20:20:40 UTC 2004

On Tue, May 25, 2004 at 04:16:23PM -0400, Madison Kelly wrote:
> Hi all,
> 
>   The intrepid "Asker of questions" is back (so sorry!).
> 
>   I am hoping I can pick the brains of those of you here who play with 
> postgresql a lot. Maybe you can help me configure the server to be more 
> efficient or suggest a different way to approach my code.
> 
>   I have run into a problem where I need to write data into a 
> postgresql database and it is unbearably slow to do the task. I have 
> been reading (and will continue to read after sending this) what I can 
> find on the postgresql.org site and elsewhere that google points me but 
> so far my attempts to resolve the problem have in fact made it worse.
> 
>   Specs: I have Fedora Core 1 on a Pentium3 650MHz CPU with 448MB RAM 
> (it's a box make of parts) running in runlevel 3 with no X up.
> 
>   I have tried increasing the amount of swap space by passing:
> 
> echo 128000000 > /proc/sys/kernel/shmmax
> 
>   to the kernel and editing /var /lib/pgsql/data/postgresql.cong to have:
> 
> shared_buffers = 15200
> sort_mem = 32168
> 
>   Before I started messing with things I record a directory with 2,490 
> files and folders (just the names, obviously!) in 23 seconds which was 
> not reasonable. When I tried to record a filesystem with 175,000 records 
> it took 32 minutes... Since I have started tweaking the same number of 
> records takes 35 seconds.
> 
>   What the script does during this time is:
> - Read the contents of an 'ls' call into an array
> - Process each entry in that array by spliting out the data in string 
> variables
> - Look at the permission to see if it is a directory or a file.
> - If it is a file, check within the database to see if the record exists
>   - If it exists, update it
>   - If it does not exist, insert it
> - If it is a direcotry first check it against an array of ignored 
> directories
>   - If the directory isn't to ignored:
>     - Check to see if the directory already exists as a record
>       - If it does, update it
>       - if it does not, insert it
>       - Read the contents of the subdirectory using the same steps here.

Do you have indexes on the important fields in the database?  If not, go
create them.  A database operates much faster when it has indexes than
without.

Have you checked how long it takes the perl script to run without the
database actually being used, just for the processing of the ls data?
Is it possible perhaps to 'know' that some files don't even have to be
processed again somehow to make it more efficient for updates?

Lennart Sorensen
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml