perl; getting a subset of hash keys based on a given value(s)

Madison Kelly linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org
Tue Mar 29 19:38:38 UTC 2005


Hi all,

   I have a couple of large (200,000+) hashes in perl that I need to 
pull keys from based on given values within it.

   In the first hash I want to pull one or more hashes that match a 
given value and then for each see if a match exists in the second hash. 
With the help of TPM I've got it so that I can brute-force this by 
reading every value in the hash and 'next if' past the ones that don't 
match what I want but this is horribly inefficient.

   My hashes are built like this:

$src{$n++} = {
	dev_uuid	=>	"<device_uuid>",
	parent_dir	=>	"<parent_dir>",
	file_name	=>	"<file_name>",
	type		=>	"<file_type>",
	size		=>	"<file_size>",
	user		=>	"<owning_user>",
	uid		=>	"<owner_uid>"
};

and

$dst{$o++} = {
	from_uuid	=>	"<source_uuid>",
	dev_uuid	=>	"<device_uuid>",
	parent_dir	=>	"<parent_dir>",
	file_name	=>	"<file_name>",
	type		=>	"<file_type>",
	size		=>	"<file_size>",
	user		=>	"<owning_user>",
	uid		=>	"<owner_uid>"
};

   Another way to ask my question would be to show that if this was a 
database I might get all the keys from '%src' that I want like this:

SELECT keys FROM %src WHERE parent_dir='/home' AND dev_uuid='abc123';

   Then for all the returned keys I would get the corresponding 
'file_name' and 'type' and see if a matching key exists in the '%dst' 
hash. Perhaps like so:

SELECT keys FROM %dst WHERE from_device='$src{$n}{device_id}' AND 
parent_dir='$src{$n}{parent_dir}' AND file_name='$src{$n}{file_name}' 
AND file_type='$src{$n}{file_type}';

   If one (or more) '%dst' hashes match then I know a match was found 
and I will use the 'dst' key(s) return to compare against the given 
'src' key I am currently working on to compare more details. Is there a 
way in perl to do this?

   The brute force way I was doing it was:

print "\nStarting the hash test:\n";
foreach my $part ( keys %src )
{
	next if $src{$part}{parent_dir} ne "/";
	print " |- [ Source ] - 
[s$part]-[".$src{$part}{dev_uuid}.":".$src{$part}{parent_dir}.$src{$part}{file_name}.":".$src{$part}{type}."]\n";
	$match=0;
	for my $dpart ( keys %dst )
	{
		print " |- [ Check  ] - 
[d$dpart]-[".$dst{$dpart}{from_uuid}.":".$dst{$dpart}{parent_dir}.$dst{$dpart}{file_name}.":".$dst{$dpart}{type}."] 
on: [".$dst{$dpart}{file_name}."]\n";
		next if $dst{$dpart}{from_uuid} ne $src{$part}{dev_uuid};
		next if $dst{$dpart}{parent_dir} ne $src{$part}{parent_dir};
		next if $dst{$dpart}{file_name} ne $src{$part}{file_name};
		next if $dst{$dpart}{type} ne $src{$part}{type};
		print " |- [ Match  ] - the file has been backed up to: 
[".$dst{$dpart}{dev_uuid}."]\n";
		$match=1;
		last;
	}
	if ( $match == 0 )
	{
		print " |- [ Match  ] - No match!\n";
	}
}
print " \\- Test finished!\n\n";

   I am hoping there is a better way I could do the 'foreach ...' 
section to accomplish what I want. Before this I was creating the hash 
keys using '<dev_id>:<parent_dir>:<file_name>:<file_type>' and then using:

foreach ( grep {/^$src_dirs[$i]\:.*[^d]$/} keys %src )

   But that was taking over 2sec/dir. which meant my program would take 
up to 7h to run this one task. The great guys and gals at TPM helped me 
get my head around 'name => value' style hashes. So this style of hash 
is still somewhat new to me so I may be missing the obvious. :p

Thanks all!

Madison

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list