Postgres/Perl performance help needed
Madison Kelly
linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org
Wed May 26 00:16:51 UTC 2004
Fraser Campbell wrote:
> On Tuesday 25 May 2004 16:16, Madison Kelly wrote:
>
>
>> Before I started messing with things I record a directory with 2,490
>>files and folders (just the names, obviously!) in 23 seconds which was
>>not reasonable. When I tried to record a filesystem with 175,000 records
>>it took 32 minutes... Since I have started tweaking the same number of
>>records takes 35 seconds.
>
>
> How are you connecting? I see a variable $DB but I'm not sure what type of
> object it is. Is it possible that you are reconnecting to the database on
> every single select/insert/update, that will massively slow down the
> operations. I'm not a DBI (or perl) expert but perhaps if you can show how
> your establishing the database connection, and doing subsequent
> select/insert/deletes, someone else can chip in.
>
> I expect that having indexes on the database could slow you down (at least for
> inserts).
>
> If you can't get postgresql working right you could always dump it for mysql.
> Mysql does have some licensing quirks these days that may (legitimately) give
> you second thoughts but don't let the database guys tell you that it's "just
> an interface to files" (or something like that). Still I'd give postgresql
> an honest shot since you started out with that.
>
> Mysql has never been a bottleneck for me, you can only know for sure about
> your app by trying it.
>
Hi Fraser, and thanks for responding! I am not sure if I am keeping
the connection open or not. I have tried adding 'autocommit = false' to
'postgresql.conf' but then everything broke (maybe for other reasons, I
have been poking at a lot) which -should- hold everything in memory
until the job is done and then commit the lot but maybe I understand how
to use it wrong...
Learning so much at once is really burning my poor brain out... :/ I
really appreciate your (and everyone else's!) help!
Madison
=- Here is how I open the database:
# Open the connection to the database
my $DB = DBI->connect("DBI:Pg:dbname=$db_name","$user")|| die("Connect
error (Is PostgresSQL running?): $DBI::errstr");
=- Here is what I do before jumping into my sub-routine for the first
time:
$starttime=time;
$real_dir=$mntdir;
# Set the 'file_exist' flag to 'false' and reset exiting files to 'true'.
$DB->do("UPDATE file_dir SET file_exist='f' WHERE
file_src_uuid='$file_src_uuid'") || die "$DBI::errstr";
# Jump to our file list sub routine
&list_files($real_dir, $exclude_list_num, $relative_dir);
$endtime=time;
$totaltime=($endtime - $starttime);
system 'echo " |- Displayed \''.$i.'\' files and directories in
\''.$totaltime.'\' seconds." >> '.$log;
system 'echo " \\- Finished reading in the file and directory list for
the partition mounted at \''.$mntdir.'\'!" >> '.$log;
=- Here is the subroutine that does all the work and feeds data into
the database:
sub list_files
{
my $real_dir = shift;
my $exclude_list_num = shift;
my $relative_dir = shift;
open (LS, "sudo ls -lA --full-time '$real_dir' 2>&1 |");
my @file_list;
while (<LS>)
{
next if /^total \d+/;
s/\n//;
push ( @file_list, $_ );
}
close(LS);
for ( @file_list )
{
my ($file_perm, $num, $file_own_user, $file_own_grp, $file_size,
$file_mod_date, $file_mod_time, $file_mod_time_zone,
$file_name)=split/\s+/, $_, 9;
next if ( $file_perm eq "total" );
if ( $file_perm =~ /^d/ )
{
my $ufl;
$baddir=0;
for ($j=0; $j <= $exclude_list_num; $j++)
{
my $dircheck;
if ( $real_dir eq "/" )
{
$dircheck=$real_dir.$file_name;
}
else
{
$dircheck=$real_dir."/".$file_name;
}
if ( $exclude_list[$j] eq $dircheck )
{
$baddir=1;
}
}
if ( $baddir == 0 )
{
if ( $real_dir eq "/" )
{
$passed_real_dir=$real_dir.$file_name;
}
else
{
$passed_real_dir=$real_dir."/".$file_name;
}
$file_parent_dir=$relative_dir;
if ( $relative_dir eq "/" )
{
$passed_relative_dir=$relative_dir.$file_name;
}
else
{
$passed_relative_dir=$relative_dir."/".$file_name;
}
$i++;
print "$file_perm\t $file_own_user\t $file_own_grp\t $file_size\t
$file_mod_date\t $file_mod_time\t $file_mod_time_zone\t
$relative_dir"."$file_name\n";
$DBreq=$DB->prepare("SELECT null FROM file_dir WHERE
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND
file_name='$file_name'") || die "$DBI::errstr";
$num=$DBreq->execute();
system 'echo " |- Num equals \''.$num.'\', Filesize equals
\''.$file_size.'\'" >> '.$log;
if ( $num > 0 )
{
system 'echo " | \\- \"UPDATE file_dir SET
file_perm=\''.$file_perm.'\', file_own_user=\''.$file_own_user.'\',
file_own_grp=\''.$file_own_grp.'\', file_size='.$file_size.',
file_mod_date=\''.$file_mod_date.'\',
file_mod_time=\''.$file_mod_time.'\',
file_mod_time_zone=\''.$file_mod_time_zone.'\', file_exist=\'t\' WHERE
file_src_uuid=\''.$file_src_uuid.'\' AND
file_parent_dir=\''.$relative_dir.'\' AND
file_name=\''.$file_name.'\'\"" >> '.$log;
$DB->do("UPDATE file_dir SET file_perm='$file_perm',
file_own_user='$file_own_user', file_own_grp='$file_own_grp',
file_size=$file_size, file_mod_date='$file_mod_date',
file_mod_time='$file_mod_time',
file_mod_time_zone='$file_mod_time_zone', file_exist='t' WHERE
file_ufl='$file_ufl'") || die "$DBI::errstr";
}
else
{
system 'echo " | \\- \"INSERT INTO file_dir ( file_src_uuid,
file_name, file_dir, file_parent_dir, file_perm, file_own_user,
file_own_grp, file_size, file_mod_date, file_mod_time,
file_mod_time_zone, file_backup, file_display, file_exist ) VALUES (
\''.$file_src_uuid.'\', \''.$file_name.'\', \'t\',
\''.$file_parent_dir.'\', \''.$file_perm.'\', \''.$file_own_user.'\',
\''.$file_own_grp.'\', '.$file_size.', \''.$file_mod_date.'\',
\''.$file_mod_time.'\', \''.$file_mod_time_zone.'\', 0, 0, \'t\' )\"" >>
'.$log;
$DB->do("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir,
file_parent_dir, file_perm, file_own_user, file_own_grp, file_size,
file_mod_date, file_mod_time, file_mod_time_zone, file_backup,
file_display, file_exist ) VALUES ( '$file_src_uuid', '$file_name', 't',
'$file_parent_dir', '$file_perm', '$file_own_user', '$file_own_grp',
$file_size, '$file_mod_date', '$file_mod_time', '$file_mod_time_zone',
0, 0, 't' )") || die "$DBI::errstr";
}
&list_files($passed_real_dir, $exclude_list_num, $passed_relative_dir);
}
}
elsif ( $file_perm =~ /^l/ )
{
# MADDY: Add the option to let the user decide whether to follow
symlinks or not then catch it here.
$file_dir="f";
$file_parent_dir=$relative_dir;
$i++;
print "$file_perm\t $file_own_user\t $file_own_grp\t $file_size\t
$file_mod_date\t $file_mod_time\t $file_mod_time_zone\t
$relative_dir"."$file_name\n";
$DBreq=$DB->prepare("SELECT null FROM file_dir WHERE
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND
file_name='$file_name'") || die "$DBI::errstr";
$num=$DBreq->execute();
system 'echo " |- Num equals \''.$num.'\', Filesize equals
\''.$file_size.'\'" >> '.$log;
if ( $num > 0 )
{
system 'echo " | \\- \"UPDATE file_dir SET
file_perm=\''.$file_perm.'\', file_own_user=\''.$file_own_user.'\',
file_own_grp=\''.$file_own_grp.'\', file_size='.$file_size.',
file_mod_date=\''.$file_mod_date.'\',
file_mod_time=\''.$file_mod_time.'\',
file_mod_time_zone=\''.$file_mod_time_zone.'\', file_exist=\'t\' WHERE
file_src_uuid=\''.$file_src_uuid.'\' AND
file_parent_dir=\''.$relative_dir.'\' AND
file_name=\''.$file_name.'\'\"" >> '.$log;
$DB->do("UPDATE file_dir SET file_perm='$file_perm',
file_own_user='$file_own_user', file_own_grp='$file_own_grp',
file_size=$file_size, file_mod_date='$file_mod_date',
file_mod_time='$file_mod_time',
file_mod_time_zone='$file_mod_time_zone', file_exist='t' WHERE
file_ufl='$file_ufl'") || die "$DBI::errstr";
}
else
{
system 'echo " | \\- \"INSERT INTO file_dir ( file_src_uuid,
file_name, file_dir, file_parent_dir, file_perm, file_own_user,
file_own_grp, file_size, file_mod_date, file_mod_time,
file_mod_time_zone, file_backup, file_display, file_exist ) VALUES (
\''.$file_src_uuid.'\', \''.$file_name.'\', \'t\',
\''.$file_parent_dir.'\', \''.$file_perm.'\', \''.$file_own_user.'\',
\''.$file_own_grp.'\', '.$file_size.', \''.$file_mod_date.'\',
\''.$file_mod_time.'\', \''.$file_mod_time_zone.'\', 0, 0, \'t\' )\"" >>
'.$log;
$DB->do("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir,
file_parent_dir, file_perm, file_own_user, file_own_grp, file_size,
file_mod_date, file_mod_time, file_mod_time_zone, file_backup,
file_display, file_exist ) VALUES ( '$file_src_uuid', '$file_name', 't',
'$file_parent_dir', '$file_perm', '$file_own_user', '$file_own_grp',
$file_size, '$file_mod_date', '$file_mod_time', '$file_mod_time_zone',
0, 0, 't' )") || die "$DBI::errstr";
}
}
elsif ( $file_perm =~ /^-/ )
{
$file_dir="f";
$file_parent_dir=$relative_dir;
$i++;
print "$file_perm\t $file_own_user\t $file_own_grp\t $file_size\t
$file_mod_date\t $file_mod_time\t $file_mod_time_zone\t
$relative_dir"."$file_name\n";
$DBreq=$DB->prepare("SELECT null FROM file_dir WHERE
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND
file_name='$file_name'") || die "$DBI::errstr";
$num=$DBreq->execute();
system 'echo " |- Num equals \''.$num.'\', Filesize equals
\''.$file_size.'\'" >> '.$log;
if ( $num > 0 )
{
system 'echo " | \\- \"UPDATE file_dir SET
file_perm=\''.$file_perm.'\', file_own_user=\''.$file_own_user.'\',
file_own_grp=\''.$file_own_grp.'\', file_size='.$file_size.',
file_mod_date=\''.$file_mod_date.'\',
file_mod_time=\''.$file_mod_time.'\',
file_mod_time_zone=\''.$file_mod_time_zone.'\', file_exist=\'t\' WHERE
file_src_uuid=\''.$file_src_uuid.'\' AND
file_parent_dir=\''.$relative_dir.'\' AND
file_name=\''.$file_name.'\'\"" >> '.$log;
$DB->do("UPDATE file_dir SET file_perm='$file_perm',
file_own_user='$file_own_user', file_own_grp='$file_own_grp',
file_size=$file_size, file_mod_date='$file_mod_date',
file_mod_time='$file_mod_time',
file_mod_time_zone='$file_mod_time_zone', file_exist='t' WHERE
file_ufl='$file_ufl'") || die "$DBI::errstr";
}
else
{
system 'echo " | \\- \"INSERT INTO file_dir ( file_src_uuid,
file_name, file_dir, file_parent_dir, file_perm, file_own_user,
file_own_grp, file_size, file_mod_date, file_mod_time,
file_mod_time_zone, file_backup, file_display, file_exist ) VALUES (
\''.$file_src_uuid.'\', \''.$file_name.'\', \'t\',
\''.$file_parent_dir.'\', \''.$file_perm.'\', \''.$file_own_user.'\',
\''.$file_own_grp.'\', '.$file_size.', \''.$file_mod_date.'\',
\''.$file_mod_time.'\', \''.$file_mod_time_zone.'\', 0, 0, \'t\' )\"" >>
'.$log;
$DB->do("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir,
file_parent_dir, file_perm, file_own_user, file_own_grp, file_size,
file_mod_date, file_mod_time, file_mod_time_zone, file_backup,
file_display, file_exist ) VALUES ( '$file_src_uuid', '$file_name', 't',
'$file_parent_dir', '$file_perm', '$file_own_user', '$file_own_grp',
$file_size, '$file_mod_date', '$file_mod_time', '$file_mod_time_zone',
0, 0, 't' )") || die "$DBI::errstr";
}
}
else
{
system 'echo " | |- \''.$file_name.'\' is not a known file type,
skipping." >> '.$log;
}
}
return;
}
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list