Q: Mailbox format

Fraser Campbell fraser-eicrhRFjby5dCsDujFhwbypxlwaOVQ5f at public.gmane.org
Thu Apr 22 22:29:42 UTC 2004


On Wednesday 21 April 2004 22:58, S P Arif Sahari Wibowo wrote:

> Well, in 1999 Mark Crispin wrote an article about mailbox format, and he
> wrote why maildir (file per message) format is not a good idea. Is his
> reasoning not correct or not applicable anymore?
>
> http://www.washington.edu/imap/documentation/formats.txt.html

I think the article was written by someone with an extreme bias and investment 
in mbox format ... since it's posted on UW's site I assume that Mark is 
someone involved with the UW-IMAP project (my ASSumptions could be wrong).

3rd paragraph, he says:

       A flat-file format mailbox is always a file, never a directory.
    This means that it is impossible to have a flat-file format mailbox
    that has inferior mailbox names under it (so-called "dual-usage"
    mailboxes).  For some inexplicable reason, some people want this.

I have a Debian folder into which a bunch of debian mailing lists are filtered 
(announce, security, etc).  These lists are fairly low traffic so I choose 
not to create dedicated folders for them.  At the same time I have 
debian-user which is _very_ high traffic.  That I file in 
Debian->User->Year->Month.

He implies that my chosen email filing method is bizarre contortion.  You 
don't suppose that he might be slightly biased since his IMAP implementation 
is not capable filing email like this. ... or maybe I'm just being overly 
sensitive at being called a non-conformist ;-)

> There's a general reason why file/message formats are a bad idea. Just
> about every filesystem in existance serializes file creation and deletions
> because these manipulate the free space map.  This turns out to be an
> enormous problem when you start creating/deleting more than a few messages
> per second; you spend all your time thrashing in the filesystem.
> 
> It is also extremely slow to do a text search through a file/message
> format mailbox.  All of those open()s and close()s really add up to major
> filesystem thrashing.

While I expect there is some validity to the above claims I don't think it's 
that relevant.  Maybe disk/cpu have speeded up in the 5 years since he wrote 
this or maybe he was just grasping at straws to try to keep people from 
deserting his mail storage format.

I have asked the maildir question myself on other lists and I have seen 
repeated discussions about it.  I don't recall anyone, anywhere, ever 
recommending mbox (flat file) over maildir.  Russell Coker (Debian 
developer/bonnie++/etc) who is a big proponent of maildir mentions that he 
has built multiple servers that host 200,000 users (single server) with mbox 
I doubt it would be possible (maybe with insanely low quotas).

I've seen good hardware grind to a halt because of mbox/uw-imap when only 
handling 25-50 users), those same machines performed like racehorses once 
converted over to Maildir/courier-imap ... these are real conversions that I 
have done, night-and-day difference

Maildir isn't without it's problems ... lots of small files can make some 
processes much slower (like a backup using tar).  You should be careful about 
your choice of filesystem if you are building a very large system or expect 
huge mailboxes.  Reiserfs is well regarded for Maildir storage though with 
recent kernels the older guys (and additional choices) might have caught up.

Another Maildir advantage is that it is very easy to script since every email 
is an individual file.  A couple of lines in plain old shell can very easily 
manipulate your email ... try that with a folder (mbox) where all your 
messages are crammed into a huge file (it's harder).

-- 
Fraser Campbell <fraser-Txk5XLRqZ6CsTnJN9+BGXg at public.gmane.org>                 http://www.wehave.net/
Georgetown, Ontario, Canada                               Debian GNU/Linux
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list