perl; getting a subset of hash keys based on a given value(s)
Madison Kelly
linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org
Tue Mar 29 19:38:38 UTC 2005
Hi all,
I have a couple of large (200,000+) hashes in perl that I need to
pull keys from based on given values within it.
In the first hash I want to pull one or more hashes that match a
given value and then for each see if a match exists in the second hash.
With the help of TPM I've got it so that I can brute-force this by
reading every value in the hash and 'next if' past the ones that don't
match what I want but this is horribly inefficient.
My hashes are built like this:
$src{$n++} = {
dev_uuid => "<device_uuid>",
parent_dir => "<parent_dir>",
file_name => "<file_name>",
type => "<file_type>",
size => "<file_size>",
user => "<owning_user>",
uid => "<owner_uid>"
};
and
$dst{$o++} = {
from_uuid => "<source_uuid>",
dev_uuid => "<device_uuid>",
parent_dir => "<parent_dir>",
file_name => "<file_name>",
type => "<file_type>",
size => "<file_size>",
user => "<owning_user>",
uid => "<owner_uid>"
};
Another way to ask my question would be to show that if this was a
database I might get all the keys from '%src' that I want like this:
SELECT keys FROM %src WHERE parent_dir='/home' AND dev_uuid='abc123';
Then for all the returned keys I would get the corresponding
'file_name' and 'type' and see if a matching key exists in the '%dst'
hash. Perhaps like so:
SELECT keys FROM %dst WHERE from_device='$src{$n}{device_id}' AND
parent_dir='$src{$n}{parent_dir}' AND file_name='$src{$n}{file_name}'
AND file_type='$src{$n}{file_type}';
If one (or more) '%dst' hashes match then I know a match was found
and I will use the 'dst' key(s) return to compare against the given
'src' key I am currently working on to compare more details. Is there a
way in perl to do this?
The brute force way I was doing it was:
print "\nStarting the hash test:\n";
foreach my $part ( keys %src )
{
next if $src{$part}{parent_dir} ne "/";
print " |- [ Source ] -
[s$part]-[".$src{$part}{dev_uuid}.":".$src{$part}{parent_dir}.$src{$part}{file_name}.":".$src{$part}{type}."]\n";
$match=0;
for my $dpart ( keys %dst )
{
print " |- [ Check ] -
[d$dpart]-[".$dst{$dpart}{from_uuid}.":".$dst{$dpart}{parent_dir}.$dst{$dpart}{file_name}.":".$dst{$dpart}{type}."]
on: [".$dst{$dpart}{file_name}."]\n";
next if $dst{$dpart}{from_uuid} ne $src{$part}{dev_uuid};
next if $dst{$dpart}{parent_dir} ne $src{$part}{parent_dir};
next if $dst{$dpart}{file_name} ne $src{$part}{file_name};
next if $dst{$dpart}{type} ne $src{$part}{type};
print " |- [ Match ] - the file has been backed up to:
[".$dst{$dpart}{dev_uuid}."]\n";
$match=1;
last;
}
if ( $match == 0 )
{
print " |- [ Match ] - No match!\n";
}
}
print " \\- Test finished!\n\n";
I am hoping there is a better way I could do the 'foreach ...'
section to accomplish what I want. Before this I was creating the hash
keys using '<dev_id>:<parent_dir>:<file_name>:<file_type>' and then using:
foreach ( grep {/^$src_dirs[$i]\:.*[^d]$/} keys %src )
But that was taking over 2sec/dir. which meant my program would take
up to 7h to run this one task. The great guys and gals at TPM helped me
get my head around 'name => value' style hashes. So this style of hash
is still somewhat new to me so I may be missing the obvious. :p
Thanks all!
Madison
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list