regexp matching question

Tony Abou-Assaleh taa-HInyCGIudOg at public.gmane.org
Fri Oct 7 05:52:11 UTC 2005


For the maildir format, it is illegal to have "^From " anywhere in the
body of the message (or even the header). The program that creates the
maildir file, appends a 'From blah blah' line to each message, and ensure
that no such line appears anywhere in the message. If such a line appears,
then it is suffixed by '>'. At least this is how it's done on the system I
checked.

To ensure that my mail client doesn't do all the conversion, I sent an
email using telnet to port 25, included 'From blah' line in the header and
in the body. Both were suffixed by '>' when I opened the maildir file
(using less). Further, the 'From blah' in the header was moved to become
the first line of the body. I think it is because a header must not
contain spaces before ':' (along with other restrictions).

In short, if you're dealing with a (valid, non-corrupted) maildir file,
then it is safe to use '^From ' as a message delimiter.

Cheers,

TAA

-----------------------------------------------------
Tony Abou-Assaleh
Lecturer, Computer Science Department
Brock University, St. Catharines, ON, Canada, L2S 3A1
Office: MC J215
Tel:    +1(905)688-5550 ext. 5243
Fax:    +1(905)688-3255
Email:  taa-HInyCGIudOg at public.gmane.org
WWW:    http://www.cosc.brocku.ca/~taa/
----------------------[THE END]----------------------

On Fri, 7 Oct 2005, Walter Dnes wrote:

> On Wed, Oct 05, 2005 at 11:28:27PM -0400, Behdad Esfahbod wrote
>
> > I'm not sure what you exactly mean, but AFAIK, a new message is
> > started when the regexp "^From:" matches, and the header ends
> > when two consecutive new lines (Dos or Unix conventions) match.
> > What's wrong with that?
>
>   In ordinary emails, probably nothing.  With maildir, I don't see the
> "^From " (*NOT* "^From:" as in your message).  The rule there is that
> the headers begin at line 1, and end with the first set of two
> consecutive "newlines".
>
>   But what happens when you're on a procmail or anti-spam mailing-list
> where people deliberately (and properly, I might add) post sample
> headers that match your criteria for a new message???  For working with
> procmail when being passed one message at a time (e.g. POP or fetchmail
> or analyzing multiple maildir format messages) the way to avoid problems
> is to specify a ridiculously high number with formail's "-m" parameter.
> I.e. you avoid false splits by effectively telling formail not to split
> out messages, regardless of what it sees.
>
>   As I said, that works fine when you *KNOW* that you're working with
> *ONLY ONE MESSAGE*.  The mbox format problem is that *ALL* the messages
> are kept in one humoungous file.  formail *MUST* attempt to split them
> apart using heuristics.  If someone plunks a set of valid sample headers
> into the body of message without quoting the headers, that will cause
> formail to think it sees a new message.
>
>   The message that I was replying to talked about problems splitting out
> messages, and that implies mbox format.
>
> --
> Walter Dnes <waltdnes-SLHPyeZ9y/tg9hUCZPvPmw at public.gmane.org>
> An infinite number of monkeys pounding away on keyboards will
> eventually produce a report showing that Windows is more secure,
> and has a lower TCO, than linux.
> --
> The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
>
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list