[Dovecot] 1.0-test24 and some mbox benchmarking

Timo Sirainen tss at iki.fi
Fri Jul 2 21:40:24 EEST 2004


http://dovecot.org/test/

Again only mbox fixes. I found some more bugs which could have also
caused some of the mbox problems that people reported.

They were found when I today thought I'd again do a bit of testing with
my favourite 1.4GB mbox. Then I thought I might as well see how it
compares against UW-IMAP. First a bit of explanations how they work
internally:


Dovecot 1.0-test24
------------------

Dovecot works by trying to do everything in one "sync" function. It reads
new mails, inserts missing headers, writes header modifications and
expunges messages.

Dovecot leaves 100 bytes of padding in every mail's headers, which it
can use to avoid moving the rest of the file forward when it needs to
insert space. However, if there's not enough space (and with new mails
there isn't), it needs to do the moving.

Moving is done by first reading messages and counting how many bytes
we're short. Once we've seen enough padding we'll rewrite those
messages. If we read until end of file, we'll grow the file and rewrite
the messages. Rewriting is made backwards, so we don't have to do much
buffering, and in case the rewriting gets interrupted (crash, power
loss, etc.) the data loss is very small, a few kilobytes maximum.


UW-IMAP 2004
------------

UW-IMAP first only reads the mbox with SELECT, rewrite is only triggered by
LOGOUT, EXPUNGE and CHECK commands. UW-IMAP also inserts padding, but less
than Dovecot. It tries to get added headers to fit into 50 bytes, and
what's not used is left as padding. Normally this seems to get it around
15-20 bytes of padding. Perhaps I should shrink it from Dovecot too.

Rewriting works by reading the file forward into buffer and writing the
changes as needed. This is quite fast, but it means the buffer can grow
large, and if the rewrite gets interrupted everything in the buffer gets
lost. In my test mbox this would have been 23MB of lost data, but normally
much less.


Benchmarks
----------

I simply rewrote a 1.4GB mbox containing 361052 mails, Linux kernel mailing
list archives from years 96-02. Computer is Athlon XP 2700+ with 1GB of
memory.

reads/writes were counted using Linux's iostat command. Nothing else was
being used in that partition, so the numbers should be accurate. Except
UW-IMAP's read count is a few blocks too much because I got tired of
waiting it and started looking into the mbox to see how far it had gotten.

Read counts could also be somewhat wrong if some of the mbox was already in
buffer cache. I tried to trash it anyway by catting 2x4GB of data into
/dev/null. Kernel used around 900MB of memory for caching.

Total CPU times may also be a bit off, as the computer was being used at
the same time.


Dovecot 1.0-test24
------------------

reads : 4007432 blocks = 1956 MB
writes: 2947381 blocks = 1439 MB

original mbox : 1420611590 B = 2774632 blocks = 1354 MB
rewritten mbox: 1472684487 B = 2876336 blocks = 1404 MB
indexes       :   14452732 B =   28227 blocks =   14 MB

   7221164 dovecot.index
     10436 dovecot.index.cache
   7221132 dovecot.index.log (this will be truncated after a while)

 - 16064 VSZ, 8012 RSS after SELECT completed
     - 14MB is mmaped index files
     - 216kB heap left of which 70kB actually in use
     - heap usage was 25MB VSZ/RSS constantly while syncing, but allocations
       were so large that libc used anonymous mmap()s so they got dropped
       after sync
     - VSZ peaked at 32MB, most likely because index file(s) were temporarily
       being mmap()ed more than once
 - 63.37s user
 - 16.56s system
 - 24% cpu
 - 5:22.75 total


UW-IMAP 2004
------------

Reading:

 - memory: 70044 VSZ, 67724 RSS
 - CPU: 7s user, 53s total

Totally:

reads : 5549640 blocks = 2709 MB
writes: 2875581 blocks = 1404 MB

original mbox : 1420611590 B = 2774632 blocks = 1354 MB
rewritten mbox: 1444411190 B = 2821115 blocks = 1377 MB

 - 93416 VSZ, 91120 RSS
 - 2295.81s user
 - 23.99s system
 - 96% cpu
 - 39:57.90 total


Notes
-----

Not counting Dovecot's indexes UW-IMAP wrote 21MB less. But Dovecot
wrote 27MB more padding, so just by shrinking it Dovecot would have
written 6MB less data. Also because Dovecot writes the file backwards,
it needs to do some extra jumping around and overlapping writes, but I
guess OS nicely merged them.

I'm not exactly sure why UW-IMAP uses so much CPU for rewriting, but it
does and so Dovecot is over 7x faster in total (with 36x less CPU).

Dovecot should also support delaying the rewrite. This is mostly useful for
POP3 clients which deletes all the mail at logout, so they won't need the
rewriting at all. Dovecot also writes all flag changes to disk immediately
while UW-IMAP leaves it later to do more at once. That results in less
total I/O as well.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://dovecot.org/pipermail/dovecot/attachments/20040702/e87d81e3/attachment-0001.bin>


More information about the dovecot mailing list