Postgres/Perl performance help needed
Madison Kelly
linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org
Tue May 25 20:16:23 UTC 2004
Hi all,
The intrepid "Asker of questions" is back (so sorry!).
I am hoping I can pick the brains of those of you here who play with
postgresql a lot. Maybe you can help me configure the server to be more
efficient or suggest a different way to approach my code.
I have run into a problem where I need to write data into a
postgresql database and it is unbearably slow to do the task. I have
been reading (and will continue to read after sending this) what I can
find on the postgresql.org site and elsewhere that google points me but
so far my attempts to resolve the problem have in fact made it worse.
Specs: I have Fedora Core 1 on a Pentium3 650MHz CPU with 448MB RAM
(it's a box make of parts) running in runlevel 3 with no X up.
I have tried increasing the amount of swap space by passing:
echo 128000000 > /proc/sys/kernel/shmmax
to the kernel and editing /var /lib/pgsql/data/postgresql.cong to have:
shared_buffers = 15200
sort_mem = 32168
Before I started messing with things I record a directory with 2,490
files and folders (just the names, obviously!) in 23 seconds which was
not reasonable. When I tried to record a filesystem with 175,000 records
it took 32 minutes... Since I have started tweaking the same number of
records takes 35 seconds.
What the script does during this time is:
- Read the contents of an 'ls' call into an array
- Process each entry in that array by spliting out the data in string
variables
- Look at the permission to see if it is a directory or a file.
- If it is a file, check within the database to see if the record exists
- If it exists, update it
- If it does not exist, insert it
- If it is a direcotry first check it against an array of ignored
directories
- If the directory isn't to ignored:
- Check to see if the directory already exists as a record
- If it does, update it
- if it does not, insert it
- Read the contents of the subdirectory using the same steps here.
Thanks everyone!!
Madison
Here is the code snipet:
=- Begin perl code snippet -=
$starttime=time;
$real_dir=$mntdir;
# Set the 'file_exist' flag to 'false' and reset exiting files to 'true'.
$DB->do("UPDATE file_dir SET file_exist='f' WHERE
file_src_uuid='$file_src_uuid'") || die "$DBI::errstr";
# Jump to our file list sub routine
&list_files($real_dir, $exclude_list_num, $relative_dir);
$endtime=time;
$totaltime=($endtime - $starttime);
sub list_files
{
my $real_dir = shift;
my $exclude_list_num = shift;
my $relative_dir = shift;
print " \n";
print " =- Working in directory; Real Dir: '$real_dir', Relative Dir:
'$relative_dir' -=\n";
print "Permission\t Num\t Owner\t Group\t Size\t Mod Date\t Mod
Time\t\t GMT\t File Name\n";
open (LS, "sudo ls -lA --full-time '$real_dir' 2>&1 |");
my @file_list;
while (<LS>)
{
next if /^total \d+/;
s/\n//;
push ( @file_list, $_ );
}
close(LS);
for ( @file_list )
{
my ($file_perm, $num, $file_own_user, $file_own_grp, $file_size,
$file_mod_date, $file_mod_time, $file_mod_time_zone,
$file_name)=split/\s+/, $_, 9;
next if ( $file_perm eq "total" );
if ( $file_perm =~ /^d/ )
{
$baddir=0;
for ($j=0; $j <= $exclude_list_num; $j++)
{
my $dircheck;
if ( $real_dir eq "/" )
{
$dircheck=$real_dir.$file_name;
}
else
{
$dircheck=$real_dir."/".$file_name;
}
if ( $exclude_list[$j] eq $dircheck )
{
$baddir=1;
}
}
if ( $baddir == 0 )
{
if ( $real_dir eq "/" )
{
$passed_real_dir=$real_dir.$file_name;
}
else
{
$passed_real_dir=$real_dir."/".$file_name;
}
$file_parent_dir=$relative_dir;
if ( $relative_dir eq "/" )
{
$passed_relative_dir=$relative_dir.$file_name;
}
else
{
$passed_relative_dir=$relative_dir."/".$file_name;
}
$i++;
print "$file_perm\t $file_own_user\t $file_own_grp\t $file_size\t
$file_mod_date\t $file_mod_time\t $file_mod_time_zone\t $file_name\n";
$DBreq=$DB->prepare("SELECT null FROM file_dir WHERE
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND
file_name='$file_name'") || die "$DBI::errstr";
$num=$DBreq->execute();
if ( $num > 0 )
{
$DB->do("UPDATE file_dir SET file_perm='$file_perm',
file_own_user='$file_own_user', file_own_grp='$file_own_grp',
file_size=$file_size, file_mod_date='$file_mod_date',
file_mod_time='$file_mod_time',
file_mod_time_zone='$file_mod_time_zone', file_exist='t' WHERE
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND
file_name='$file_name'") || die "$DBI::errstr";
}
else
{
$DB->do("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir,
file_parent_dir, file_perm, file_own_user, file_own_grp, file_size,
file_mod_date, file_mod_time, file_mod_time_zone, file_backup,
file_display, file_exist ) VALUES ( '$file_src_uuid', '$file_name', 't',
'$file_parent_dir', '$file_perm', '$file_own_user', '$file_own_grp',
$file_size, '$file_mod_date', '$file_mod_time', '$file_mod_time_zone',
0, 0, 't' )") || die "$DBI::errstr";
}
&list_files($passed_real_dir, $exclude_list_num, $passed_relative_dir);
}
}
elsif ( $file_perm =~ /^l/ )
{
# MADDY: Add the option to let the user decide whether to follow
symlinks or not then catch it here.
$file_dir="f";
$file_parent_dir=$relative_dir;
$i++;
print "$file_perm\t $num\t $file_own_user\t $file_own_grp\t
$file_size\t $file_mod_date\t $file_mod_time\t $file_mod_time_zone\t
$file_name\n";
$DBreq=$DB->prepare("SELECT null FROM file_dir WHERE
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND
file_name='$file_name'") || die "$DBI::errstr";
$num=$DBreq->execute();
if ( $num > 0 )
{
$DB->do("UPDATE file_dir SET file_perm='$file_perm',
file_own_user='$file_own_user', file_own_grp='$file_own_grp',
file_size=$file_size, file_mod_date='$file_mod_date',
file_mod_time='$file_mod_time',
file_mod_time_zone='$file_mod_time_zone', file_exist='t' WHERE
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND
file_name='$file_name'") || die "$DBI::errstr";
}
else
{
$DB->do("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir,
file_parent_dir, file_perm, file_own_user, file_own_grp, file_size,
file_mod_date, file_mod_time, file_mod_time_zone, file_backup,
file_display, file_exist ) VALUES ( '$file_src_uuid', '$file_name', 'f',
'$file_parent_dir', '$file_perm', '$file_own_user', '$file_own_grp',
$file_size, '$file_mod_date', '$file_mod_time', '$file_mod_time_zone',
0, 0, 't' )") || die "$DBI::errstr";
}
}
elsif ( $file_perm =~ /^-/ )
{
$file_dir="f";
$file_parent_dir=$relative_dir;
$i++;
print "$file_perm\t $num\t $file_own_user\t $file_own_grp\t
$file_size\t $file_mod_date\t $file_mod_time\t $file_mod_time_zone\t
$file_name\n";
$DBreq=$DB->prepare("SELECT null FROM file_dir WHERE
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND
file_name='$file_name'") || die "$DBI::errstr";
$num=$DBreq->execute();
if ( $num > 0 )
{
$DB->do("UPDATE file_dir SET file_perm='$file_perm',
file_own_user='$file_own_user', file_own_grp='$file_own_grp',
file_size=$file_size, file_mod_date='$file_mod_date',
file_mod_time='$file_mod_time',
file_mod_time_zone='$file_mod_time_zone', file_exist='t' WHERE
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND
file_name='$file_name'") || die "$DBI::errstr";
}
else
{
$DB->do("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir,
file_parent_dir, file_perm, file_own_user, file_own_grp, file_size,
file_mod_date, file_mod_time, file_mod_time_zone, file_backup,
file_display, file_exist ) VALUES ( '$file_src_uuid', '$file_name', 'f',
'$file_parent_dir', '$file_perm', '$file_own_user', '$file_own_grp',
$file_size, '$file_mod_date', '$file_mod_time', '$file_mod_time_zone',
0, 0, 't' )") || die "$DBI::errstr";
}
}
else
{
system 'echo " | |- \''.$file_name.'\' is not a known file type,
skipping." >> '.$log;
}
}
print " =- Return from; Real Dir: '$real_dir', Relative Dir:
'$relative_dir':\n";
print " \n";
return;
}
=- End perl code snippet -=
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list