[Dovecot] New mailbox format

dean gaudet dean-list-dovecot at arctic.org
Fri Sep 23 21:54:08 EEST 2005


On Fri, 23 Sep 2005, Timo Sirainen wrote:

> The point is to have a mailbox format where the mailbox can consist of one
> or more files. Grouping multiple files in a single file makes it faster to
> read, but it's slower to expunge mails from the beginning of the file. So
> this format would allow sysadmin to specify rules on how large the files
> would be allowed to grow.

for about a decade now i've set up all my inbound mail to deliver to two
mboxes -- one is an "inbox", the other is an "archive".  the inbox is
what i look at with my mta, and i delete things from it as soon as i'm
done reading or dealing with them.  the archive is there so i never have
to think about whether i want to save something (and the disk space is
totally manageable)... similar to how gmail works.

my "archive" is a collection of mbox files.  one named "current" which is
where new deliveries occur, and the others named YYYYMMDD.bz2, which are
compressed/read-only archived mboxes.  the rotation/compression occurs
in a cronjob depending on the size of the current file.

it's a bit of a kludge, because the file boundaries are very obvious if
you need to find a thread that's spread across a few of them.  but it's
all just mbox so it's easy to grep and concat a few files into a temporary
mbox and extract a thread with any MUA.

i've wanted to turn this into a "real" format supported by dovecot for a
while but i never seem to get to it... it sounds like you're headed in a
similar direction.


> This format is mostly designed to work nicely with separate index files
> such as Dovecot has. The flags are stored in the mailbox files mostly as a
> backup in case the index gets lost or corrupted.

here's one point where my thinking has differed -- i'd treat the mailbox
files as read-only (plus one file which is append-only) and include an
append-only modification log for recovery purposes...  read-only mailbox
files permit compression, and don't have the nightly cleanup lockout
problems you mention later.

the log is affordable in terms of disk space.  the mailbox files plus
the log are sufficient to recover the current state of the mailbox.

in my case i'd probably grow the log forever... because i'd also never
be doing expunges... but the process of doing an expunge can also update
any info embedded into the (compressed|uncompressed) mailbox files.

the cost of a log in terms of disk writes for updates is probably better
than updates to the mailbox files themselves -- assuming a log entry
is compact enough that dozens fit in a 4KiB filesystem block you can
amortize several updates into one synchronous disk write rather than
having dozens of synchronous writes to separate blocks.


> When the file is opened, it must be shared locked with flock(). Expunging
> mails requires exclusive flock(), so expunges can't happen while someone is
> still reading the file.

with read-only mailbox files there's no sharing restriction -- an expunge
can create a new mbox file, update the index, and unlink the old one.
(minor easy to solve race if a reader gets a filename from the index
which is renamed before it's opened... just loop... expunges should be
infrequent enough this has no livelock potential.)

but i guess that's not very quota friendly... oops.


> Compatibility
> -------------
> 
> If needed, it would be possible to create new/ and tmp/ directories under
> the mailbox to allow mail deliveries to be in maildir format. The maildir
> files would be then either just moved or maybe be appended to existing mail
> files.

yeah i've definitely wanted delivery to require no changes -- in my case
it just happens to the mbox file named "current".

i think it's best not to deal with indices and other fancy things during
delivery because there's no opportunity to amortize synchronized disk
writes...  and i know in the case of large ISP mail sites there tend to
be a lot of users who never read their mail.  you'll support more users
on less hardware if the synchronous disk writes are at a minimum.

-dean


More information about the dovecot mailing list