Building cross reference -- how?

William Park opengeometry-FFYn/CNdgSA at public.gmane.org
Mon Oct 14 08:03:17 UTC 2013


On Mon, Oct 14, 2013 at 03:09:04AM -0400, D. Hugh Redelmeier wrote:
> | On Sun, Oct 13, 2013 at 01:03:46PM -0400, D. Hugh Redelmeier wrote:
> | > What's wrong with the multiple fgrep solution that you came up with?
> | > The answer might help us understand your problem better.
> | 
> | Nothing wrong with grep solution, really.  It's better than awk
> | solution,
> |     awk '$0 ~ re1 && $0 ~ re2 {print}'
> | because awk is doing 2 full passes whereas grep is doing somewhere
> | between 1 and 2 passes.
> 
> I don't know what you mean by passes.  I guess you mean: scans of a
> line in the buffer.  That's generally cheap.  The number of times you
> read the file is more expensive.  Both read it once.

I meant, regex is going through the line twice.  Regex is not fast
either.

> 
> && is a conditional AND, so the right side will only be
> evaluated if the left is true (probably rare).
> 
> | But, I think this road is "dead end".
> 
> That implies you are wanting to go somewhere, but we cannot help if we
> don't know where that is.

I'm trying to avoid scanning the entire file.  If I have 1M files, each
with 1K lines, then that's 1G lines.  I was trying to reduce it to 1M
lines by extracting cross reference words beforehand.  I was looking to
reduce it even further.  And, I think I found it.
-- 
William
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list