Building cross reference -- how?
William Park
opengeometry-FFYn/CNdgSA at public.gmane.org
Mon Oct 14 08:03:17 UTC 2013
On Mon, Oct 14, 2013 at 03:09:04AM -0400, D. Hugh Redelmeier wrote:
> | On Sun, Oct 13, 2013 at 01:03:46PM -0400, D. Hugh Redelmeier wrote:
> | > What's wrong with the multiple fgrep solution that you came up with?
> | > The answer might help us understand your problem better.
> |
> | Nothing wrong with grep solution, really. It's better than awk
> | solution,
> | awk '$0 ~ re1 && $0 ~ re2 {print}'
> | because awk is doing 2 full passes whereas grep is doing somewhere
> | between 1 and 2 passes.
>
> I don't know what you mean by passes. I guess you mean: scans of a
> line in the buffer. That's generally cheap. The number of times you
> read the file is more expensive. Both read it once.
I meant, regex is going through the line twice. Regex is not fast
either.
>
> && is a conditional AND, so the right side will only be
> evaluated if the left is true (probably rare).
>
> | But, I think this road is "dead end".
>
> That implies you are wanting to go somewhere, but we cannot help if we
> don't know where that is.
I'm trying to avoid scanning the entire file. If I have 1M files, each
with 1K lines, then that's 1G lines. I was trying to reduce it to 1M
lines by extracting cross reference words beforehand. I was looking to
reduce it even further. And, I think I found it.
--
William
--
The Toronto Linux Users Group. Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
More information about the Legacy
mailing list