regexp
Peter L. Peres
plp-ysDPMY98cNQDDBjDh4tngg at public.gmane.org
Sat Apr 24 20:01:30 UTC 2004
On Sat, 24 Apr 2004, John Macdonald wrote:
> That will reject "abc=deh" which I'd include in the
> specified "def and anything else". You need:
>
> /abc=(([a-eg-z][a-z][a-z])|([a-z][a-fh-z][a-z])|([a-z][a-z][a-gi-z]))/
>
> It gets even messier if you also want to allow
> other than exactly 3 lower case letters to be in the
> assigned value.
Argh. I did something like this in Perl:
while (<>) {
if ( /=<([^>]*)>/ && ($tm = "\Q$1") && ($tm ne "foo\@bar\.baz") ) {
print 'got it = ($t)';
}
}
which is ugly beyond words. There has got to be a better way. Later I'll
want to prune matches to $1 by a list so I'll likely use a hash or a
function for the .ne. part. If you haven't guessed yet, this is about
pruning certain email addresses from a list extracted from mail logs
(whitelisting/blacklisting etc). The above works and was tested on a log
with 20000+ lines, which it managed in a couple of tens of seconds on a
slow machine (with cpu load 0.95 over ~3000 matches). Surely there is a
way to specify a negative match-block in regexp ?! Anyway when I tried to
condense the above if() into a single expression using lookahead in Perl
(?! etc) it did not work as I feel it should. This is my first time with
lookahead so it may be I am doing something wrong. Could someone rewrite
the above using lookahead as an example ? I am hoping to be able to
rewrite this using compiled regexps in C and make it more efficient.
tia, and thanks for all who posted so far,
Peter
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list