perl; getting a subset of hash keys based on a given value(s)

Wed Mar 30 06:17:43 UTC 2005

| From: Madison Kelly <linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org>

|   I have a couple of large (200,000+) hashes in perl that I need to pull keys
| from based on given values within it.

Admission: I don't know perl.  One consequence is that I may have
misunderstood what you are saying.

I'm not really able to extract your problem from your description.
You seem to couch it in terms of an (apparently unacceptable)
solution.  It would be useful to step back and describe your actual
problem.

If I understand what you've done, I think that each of your tables is
represented as an array of hashes.  Each array has on the order of
200,000 entries.  Each entry is a hash representing a tuple of 7 (in
src) or 8 (in dst) members.

I don't know all the operations you need to do on these
datastructures.  The only ones you have revealed are:

- adding entries

- probing for and fetching entries with specific parent_dir and
  dev_uuid values

It is dangerous to design a datastructure without having an idea of
all the operations that it needs to support.  But I'll do that anyway,
assuming the only operations are the ones I've listed.

You could replace the array with a hash where the key is from
parent_dir x dev_uuid and the value is the rest of the tuple.
I am guessing that this key would not be unique, so the value would
have to be a list of rest-of-tuples.

This datastructure ought to be fast for the operations you've shown
us.

Without knowing perl's implementation, I'd guess that 400,000 hashes
might be significantly more expensive than 400,000 arrays.  So you
might consider representing each tuple with an array.

You could use a database, and your description certainly biases one to
thinking that way, but you haven't actually asked for anything that
needs the complexity of one.

You might find it useful to learn and think about datastructures in a
more abstract way than just what perl provides.  These could then be
implemented in perl, if you want.  In fact, when you know what you're
looking for, you can probably find an implementation in cpan.

It is useful to distinguish an abstract datastructure (one that
satisfies the problem requirements) from its
implementation/representation.  For example "tuple" suggests an
abstract datastructure that can be implemented in a bunch of ways
including as an array or a hash.

Hope this helps,

Hugh Redelmeier
hugh-pmF8o41NoarQT0dZR+AlfA at public.gmane.org
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml