Q: Mailbox format
Fraser Campbell
fraser-eicrhRFjby5dCsDujFhwbypxlwaOVQ5f at public.gmane.org
Thu Apr 22 22:29:42 UTC 2004
On Wednesday 21 April 2004 22:58, S P Arif Sahari Wibowo wrote:
> Well, in 1999 Mark Crispin wrote an article about mailbox format, and he
> wrote why maildir (file per message) format is not a good idea. Is his
> reasoning not correct or not applicable anymore?
>
> http://www.washington.edu/imap/documentation/formats.txt.html
I think the article was written by someone with an extreme bias and investment
in mbox format ... since it's posted on UW's site I assume that Mark is
someone involved with the UW-IMAP project (my ASSumptions could be wrong).
3rd paragraph, he says:
A flat-file format mailbox is always a file, never a directory.
This means that it is impossible to have a flat-file format mailbox
that has inferior mailbox names under it (so-called "dual-usage"
mailboxes). For some inexplicable reason, some people want this.
I have a Debian folder into which a bunch of debian mailing lists are filtered
(announce, security, etc). These lists are fairly low traffic so I choose
not to create dedicated folders for them. At the same time I have
debian-user which is _very_ high traffic. That I file in
Debian->User->Year->Month.
He implies that my chosen email filing method is bizarre contortion. You
don't suppose that he might be slightly biased since his IMAP implementation
is not capable filing email like this. ... or maybe I'm just being overly
sensitive at being called a non-conformist ;-)
> There's a general reason why file/message formats are a bad idea. Just
> about every filesystem in existance serializes file creation and deletions
> because these manipulate the free space map. This turns out to be an
> enormous problem when you start creating/deleting more than a few messages
> per second; you spend all your time thrashing in the filesystem.
>
> It is also extremely slow to do a text search through a file/message
> format mailbox. All of those open()s and close()s really add up to major
> filesystem thrashing.
While I expect there is some validity to the above claims I don't think it's
that relevant. Maybe disk/cpu have speeded up in the 5 years since he wrote
this or maybe he was just grasping at straws to try to keep people from
deserting his mail storage format.
I have asked the maildir question myself on other lists and I have seen
repeated discussions about it. I don't recall anyone, anywhere, ever
recommending mbox (flat file) over maildir. Russell Coker (Debian
developer/bonnie++/etc) who is a big proponent of maildir mentions that he
has built multiple servers that host 200,000 users (single server) with mbox
I doubt it would be possible (maybe with insanely low quotas).
I've seen good hardware grind to a halt because of mbox/uw-imap when only
handling 25-50 users), those same machines performed like racehorses once
converted over to Maildir/courier-imap ... these are real conversions that I
have done, night-and-day difference
Maildir isn't without it's problems ... lots of small files can make some
processes much slower (like a backup using tar). You should be careful about
your choice of filesystem if you are building a very large system or expect
huge mailboxes. Reiserfs is well regarded for Maildir storage though with
recent kernels the older guys (and additional choices) might have caught up.
Another Maildir advantage is that it is very easy to script since every email
is an individual file. A couple of lines in plain old shell can very easily
manipulate your email ... try that with a folder (mbox) where all your
messages are crammed into a huge file (it's harder).
--
Fraser Campbell <fraser-Txk5XLRqZ6CsTnJN9+BGXg at public.gmane.org> http://www.wehave.net/
Georgetown, Ontario, Canada Debian GNU/Linux
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list