Postgres/Perl performance help needed

Madison Kelly linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org
Wed May 26 00:16:51 UTC 2004


Fraser Campbell wrote:
> On Tuesday 25 May 2004 16:16, Madison Kelly wrote:
> 
> 
>>   Before I started messing with things I record a directory with 2,490
>>files and folders (just the names, obviously!) in 23 seconds which was
>>not reasonable. When I tried to record a filesystem with 175,000 records
>>it took 32 minutes... Since I have started tweaking the same number of
>>records takes 35 seconds.
> 
> 
> How are you connecting?  I see a variable $DB but I'm not sure what type of 
> object it is.  Is it possible that you are reconnecting to the database on 
> every single select/insert/update, that will massively slow down the 
> operations.  I'm not a DBI (or perl) expert but perhaps if you can show how 
> your establishing the database connection, and doing subsequent 
> select/insert/deletes, someone else can chip in.
> 
> I expect that having indexes on the database could slow you down (at least for 
> inserts).
> 
> If you can't get postgresql working right you could always dump it for mysql. 
> Mysql does have some licensing quirks these days that may (legitimately) give 
> you second thoughts but don't let the database guys tell you that it's "just 
> an interface to files" (or something like that).  Still I'd give postgresql 
> an honest shot since you started out with that.
> 
> Mysql has never been a bottleneck for me, you can only know for sure about 
> your app by trying it.
> 

   Hi Fraser, and thanks for responding! I am not sure if I am keeping 
the connection open or not. I have tried adding 'autocommit = false' to 
'postgresql.conf' but then everything broke (maybe for other reasons, I 
have been poking at a lot) which -should- hold everything in memory 
until the job is done and then commit the lot but maybe I understand how 
to use it wrong...

   Learning so much at once is really burning my poor brain out... :/ I 
really appreciate your (and everyone else's!) help!

Madison

  =- Here is how I open the database:

# Open the connection to the database
my $DB = DBI->connect("DBI:Pg:dbname=$db_name","$user")|| die("Connect 
error (Is PostgresSQL running?): $DBI::errstr"); 


  =- Here is what I do before jumping into my sub-routine for the first 
time:

$starttime=time;
$real_dir=$mntdir;
# Set the 'file_exist' flag to 'false' and reset exiting files to 'true'.
$DB->do("UPDATE file_dir SET file_exist='f' WHERE 
file_src_uuid='$file_src_uuid'") || die "$DBI::errstr";
# Jump to our file list sub routine
&list_files($real_dir, $exclude_list_num, $relative_dir);
$endtime=time;
$totaltime=($endtime - $starttime);
system 'echo " |- Displayed \''.$i.'\' files and directories in 
\''.$totaltime.'\' seconds." >> '.$log;
system 'echo " \\- Finished reading in the file and directory list for 
the partition mounted at \''.$mntdir.'\'!" >> '.$log;

  =- Here is the subroutine that does all the work and feeds data into 
the database:

sub list_files
{
	my $real_dir = shift;
	my $exclude_list_num = shift;
	my $relative_dir = shift;
	open (LS, "sudo ls -lA --full-time '$real_dir' 2>&1 |");
	my @file_list;
	while (<LS>)
	{
		next if /^total \d+/;
		s/\n//;
		push ( @file_list, $_ );
	}
	close(LS);
	for ( @file_list )
	{
		my ($file_perm, $num, $file_own_user, $file_own_grp, $file_size, 
$file_mod_date, $file_mod_time, $file_mod_time_zone, 
$file_name)=split/\s+/, $_, 9;
		next if ( $file_perm eq "total" );
		if ( $file_perm =~ /^d/ )
		{
			my $ufl;
			$baddir=0;
			for ($j=0; $j <= $exclude_list_num; $j++)
			{
				my $dircheck;
				if ( $real_dir eq "/" )
				{
					$dircheck=$real_dir.$file_name;
				}
				else
				{
					$dircheck=$real_dir."/".$file_name;
				}
				if ( $exclude_list[$j] eq $dircheck )
				{
					$baddir=1;
				}
			}
			if ( $baddir == 0 )
			{
				if ( $real_dir eq "/" )
				{
					$passed_real_dir=$real_dir.$file_name;
				}
				else
				{
					$passed_real_dir=$real_dir."/".$file_name;
				}
				$file_parent_dir=$relative_dir;
				if ( $relative_dir eq "/" )
				{
					$passed_relative_dir=$relative_dir.$file_name;
				}
				else
				{
					$passed_relative_dir=$relative_dir."/".$file_name;
				}
				$i++;
				print "$file_perm\t $file_own_user\t $file_own_grp\t $file_size\t 
$file_mod_date\t $file_mod_time\t $file_mod_time_zone\t 
$relative_dir"."$file_name\n";
				$DBreq=$DB->prepare("SELECT null FROM file_dir WHERE 
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND 
file_name='$file_name'") || die "$DBI::errstr";
				$num=$DBreq->execute();
				system 'echo " |- Num equals \''.$num.'\', Filesize equals 
\''.$file_size.'\'" >> '.$log;
				if ( $num > 0 )
				{
					system 'echo " | \\- \"UPDATE file_dir SET 
file_perm=\''.$file_perm.'\', file_own_user=\''.$file_own_user.'\', 
file_own_grp=\''.$file_own_grp.'\', file_size='.$file_size.', 
file_mod_date=\''.$file_mod_date.'\', 
file_mod_time=\''.$file_mod_time.'\', 
file_mod_time_zone=\''.$file_mod_time_zone.'\', file_exist=\'t\' WHERE 
file_src_uuid=\''.$file_src_uuid.'\' AND 
file_parent_dir=\''.$relative_dir.'\' AND 
file_name=\''.$file_name.'\'\"" >> '.$log;
					$DB->do("UPDATE file_dir SET file_perm='$file_perm', 
file_own_user='$file_own_user', file_own_grp='$file_own_grp', 
file_size=$file_size, file_mod_date='$file_mod_date', 
file_mod_time='$file_mod_time', 
file_mod_time_zone='$file_mod_time_zone', file_exist='t' WHERE 
file_ufl='$file_ufl'") || die "$DBI::errstr";
				}
				else
				{
					system 'echo " | \\- \"INSERT INTO file_dir ( file_src_uuid, 
file_name, file_dir, file_parent_dir, file_perm, file_own_user, 
file_own_grp, file_size, file_mod_date, file_mod_time, 
file_mod_time_zone, file_backup, file_display, file_exist ) VALUES ( 
\''.$file_src_uuid.'\', \''.$file_name.'\', \'t\', 
\''.$file_parent_dir.'\', \''.$file_perm.'\', \''.$file_own_user.'\', 
\''.$file_own_grp.'\', '.$file_size.', \''.$file_mod_date.'\', 
\''.$file_mod_time.'\', \''.$file_mod_time_zone.'\', 0, 0, \'t\' )\"" >> 
'.$log;
					$DB->do("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir, 
file_parent_dir, file_perm, file_own_user, file_own_grp, file_size, 
file_mod_date, file_mod_time, file_mod_time_zone, file_backup, 
file_display, file_exist ) VALUES ( '$file_src_uuid', '$file_name', 't', 
'$file_parent_dir', '$file_perm', '$file_own_user', '$file_own_grp', 
$file_size, '$file_mod_date', '$file_mod_time', '$file_mod_time_zone', 
0, 0, 't' )") || die "$DBI::errstr";
				}
				&list_files($passed_real_dir, $exclude_list_num, $passed_relative_dir);
			}
		}
		elsif ( $file_perm =~ /^l/ )
		{
			# MADDY: Add the option to let the user decide whether to follow 
symlinks or not then catch it here.
			$file_dir="f";
			$file_parent_dir=$relative_dir;
			$i++;
			print "$file_perm\t $file_own_user\t $file_own_grp\t $file_size\t 
$file_mod_date\t $file_mod_time\t $file_mod_time_zone\t 
$relative_dir"."$file_name\n";
			$DBreq=$DB->prepare("SELECT null FROM file_dir WHERE 
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND 
file_name='$file_name'") || die "$DBI::errstr";
			$num=$DBreq->execute();
			system 'echo " |- Num equals \''.$num.'\', Filesize equals 
\''.$file_size.'\'" >> '.$log;
			if ( $num > 0 )
			{
				system 'echo " | \\- \"UPDATE file_dir SET 
file_perm=\''.$file_perm.'\', file_own_user=\''.$file_own_user.'\', 
file_own_grp=\''.$file_own_grp.'\', file_size='.$file_size.', 
file_mod_date=\''.$file_mod_date.'\', 
file_mod_time=\''.$file_mod_time.'\', 
file_mod_time_zone=\''.$file_mod_time_zone.'\', file_exist=\'t\' WHERE 
file_src_uuid=\''.$file_src_uuid.'\' AND 
file_parent_dir=\''.$relative_dir.'\' AND 
file_name=\''.$file_name.'\'\"" >> '.$log;
				$DB->do("UPDATE file_dir SET file_perm='$file_perm', 
file_own_user='$file_own_user', file_own_grp='$file_own_grp', 
file_size=$file_size, file_mod_date='$file_mod_date', 
file_mod_time='$file_mod_time', 
file_mod_time_zone='$file_mod_time_zone', file_exist='t' WHERE 
file_ufl='$file_ufl'") || die "$DBI::errstr";
			}
			else
			{
				system 'echo " | \\- \"INSERT INTO file_dir ( file_src_uuid, 
file_name, file_dir, file_parent_dir, file_perm, file_own_user, 
file_own_grp, file_size, file_mod_date, file_mod_time, 
file_mod_time_zone, file_backup, file_display, file_exist ) VALUES ( 
\''.$file_src_uuid.'\', \''.$file_name.'\', \'t\', 
\''.$file_parent_dir.'\', \''.$file_perm.'\', \''.$file_own_user.'\', 
\''.$file_own_grp.'\', '.$file_size.', \''.$file_mod_date.'\', 
\''.$file_mod_time.'\', \''.$file_mod_time_zone.'\', 0, 0, \'t\' )\"" >> 
'.$log;
				$DB->do("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir, 
file_parent_dir, file_perm, file_own_user, file_own_grp, file_size, 
file_mod_date, file_mod_time, file_mod_time_zone, file_backup, 
file_display, file_exist ) VALUES ( '$file_src_uuid', '$file_name', 't', 
'$file_parent_dir', '$file_perm', '$file_own_user', '$file_own_grp', 
$file_size, '$file_mod_date', '$file_mod_time', '$file_mod_time_zone', 
0, 0, 't' )") || die "$DBI::errstr";
			}
		}
		elsif ( $file_perm =~ /^-/ )
		{
			$file_dir="f";
			$file_parent_dir=$relative_dir;
			$i++;
			print "$file_perm\t $file_own_user\t $file_own_grp\t $file_size\t 
$file_mod_date\t $file_mod_time\t $file_mod_time_zone\t 
$relative_dir"."$file_name\n";
			$DBreq=$DB->prepare("SELECT null FROM file_dir WHERE 
file_src_uuid='$file_src_uuid' AND file_parent_dir='$relative_dir' AND 
file_name='$file_name'") || die "$DBI::errstr";
			$num=$DBreq->execute();
			system 'echo " |- Num equals \''.$num.'\', Filesize equals 
\''.$file_size.'\'" >> '.$log;
			if ( $num > 0 )
			{
				system 'echo " | \\- \"UPDATE file_dir SET 
file_perm=\''.$file_perm.'\', file_own_user=\''.$file_own_user.'\', 
file_own_grp=\''.$file_own_grp.'\', file_size='.$file_size.', 
file_mod_date=\''.$file_mod_date.'\', 
file_mod_time=\''.$file_mod_time.'\', 
file_mod_time_zone=\''.$file_mod_time_zone.'\', file_exist=\'t\' WHERE 
file_src_uuid=\''.$file_src_uuid.'\' AND 
file_parent_dir=\''.$relative_dir.'\' AND 
file_name=\''.$file_name.'\'\"" >> '.$log;
				$DB->do("UPDATE file_dir SET file_perm='$file_perm', 
file_own_user='$file_own_user', file_own_grp='$file_own_grp', 
file_size=$file_size, file_mod_date='$file_mod_date', 
file_mod_time='$file_mod_time', 
file_mod_time_zone='$file_mod_time_zone', file_exist='t' WHERE 
file_ufl='$file_ufl'") || die "$DBI::errstr";
			}
			else
			{
				system 'echo " | \\- \"INSERT INTO file_dir ( file_src_uuid, 
file_name, file_dir, file_parent_dir, file_perm, file_own_user, 
file_own_grp, file_size, file_mod_date, file_mod_time, 
file_mod_time_zone, file_backup, file_display, file_exist ) VALUES ( 
\''.$file_src_uuid.'\', \''.$file_name.'\', \'t\', 
\''.$file_parent_dir.'\', \''.$file_perm.'\', \''.$file_own_user.'\', 
\''.$file_own_grp.'\', '.$file_size.', \''.$file_mod_date.'\', 
\''.$file_mod_time.'\', \''.$file_mod_time_zone.'\', 0, 0, \'t\' )\"" >> 
'.$log;
				$DB->do("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir, 
file_parent_dir, file_perm, file_own_user, file_own_grp, file_size, 
file_mod_date, file_mod_time, file_mod_time_zone, file_backup, 
file_display, file_exist ) VALUES ( '$file_src_uuid', '$file_name', 't', 
'$file_parent_dir', '$file_perm', '$file_own_user', '$file_own_grp', 
$file_size, '$file_mod_date', '$file_mod_time', '$file_mod_time_zone', 
0, 0, 't' )") || die "$DBI::errstr";
			}
		}
		else
		{
			system 'echo " | |- \''.$file_name.'\' is not a known file type, 
skipping." >> '.$log;
		}
	}
	return;
}

--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list