regexp

Peter L. Peres plp-ysDPMY98cNQDDBjDh4tngg at public.gmane.org
Sat Apr 24 20:01:30 UTC 2004


On Sat, 24 Apr 2004, John Macdonald wrote:

> That will reject "abc=deh" which I'd include in the
> specified "def and anything else".  You need:
>
>      /abc=(([a-eg-z][a-z][a-z])|([a-z][a-fh-z][a-z])|([a-z][a-z][a-gi-z]))/
>
> It gets even messier if you also want to allow
> other than exactly 3 lower case letters to be in the
> assigned value.

Argh. I did something like this in Perl:

while (<>) {
	if ( /=<([^>]*)>/ && ($tm = "\Q$1") && ($tm ne "foo\@bar\.baz") )  {
		print 'got it = ($t)';
	}
}

which is ugly beyond words. There has got to be a better way. Later I'll
want to prune matches to $1 by a list so I'll likely use a hash or a
function for the .ne. part. If you haven't guessed yet, this is about
pruning certain email addresses from a list extracted from mail logs
(whitelisting/blacklisting etc). The above works and was tested on a log
with 20000+ lines, which it managed in a couple of tens of seconds on a
slow machine (with cpu load 0.95 over ~3000 matches). Surely there is a
way to specify a negative match-block in regexp ?! Anyway when I tried to
condense the above if() into a single expression using lookahead in Perl
(?! etc) it did not work as I feel it should. This is my first time with
lookahead so it may be I am doing something wrong. Could someone rewrite
the above using lookahead as an example ? I am hoping to be able to
rewrite this using compiled regexps in C and make it more efficient.

tia, and thanks for all who posted so far,
Peter


--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list