regexp matching question

Behdad Esfahbod behdad-26n5VD7DAF2Tm46uYYfjYg at public.gmane.org
Thu Oct 6 03:28:27 UTC 2005


On Wed, 5 Oct 2005, Walter Dnes wrote:

> On Thu, Oct 06, 2005 at 12:58:36AM +0300, Peter wrote
>
> > The problem is that formail fails to properly split certain files
> > and I do not know why. I tried to understand the problem but formail
> > source is un-maintainable imho. Thus I am trying to work around it
> > using my own matcher. That's where the regexp question came in.
>
>   The problem is that there are no "out-of-band" separaters in an mbox
> file.  There are certain header-pattern conventions which imply a
> separator.  You can run into situations where the text of an email
> message contains what looks like headers/separators, even to the most
> sophisticated matching algorithm.  It can be hard to separate email body
> >From headers<g>.  It's not quite as bad as Microsoft's
> "begin loveletter.txt.exe" cockup, but the principle is the same.  If
> you subscribe to a procmail or other anti-spam list, where people post
> sample email headers, you *WILL* see formail screw up the splits.  This
> is one area where maildir format reigns supreme.

I'm not sure what you exactly mean, but AFAIK, a new message is
started when the regexp "^From:" matches, and the header ends
when two consecutive new lines (Dos or Unix conventions) match.
What's wrong with that?

--behdad
http://behdad.org/
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list