[Dovecot] Need fast Maildir to mdbox conversion

Robin dovecot at r.paypc.com
Wed Mar 28 06:00:30 EEST 2012


On 3/27/2012 3:40 PM, Jeff Gustafson wrote:
> 	I looked around the 'Net to see if there might be a custom program for
> offline Maildir to mdbox conversion. So far I haven't turned up
> anything. The problem for us is that the dsync program simply takes a
> lot of time to convert mailboxes.

Is it slower than doing an IMAP APPEND over an authenticated dovecot 
connection?

I've used a simple PERL script based on Mail::IMAPClient and Mail::Box 
to import 180,000+ mailboxes into dovecot's mdbox at fairly high speed, 
and all it does is IMAP APPENDs.  (I had to shard the mailboxes because 
these PERL based tools exhaust RAM when run with mailboxes larger than 
about 600MB).

On my development VM test box (32 bit Slack 13.37, 2G/2G split kernel, 
no RAID, Q6600 with only two cores allocated to the VM) and 8GB of DDR2 
RAM does

Emails=180,044
real    237m28.485s  (12.5 emails/second)
user    94m50.425s
sys     10m09.389s
21,984,824  /mail/home

I'm writing a swiss-army (C-based, no bytecode crap languages) mailbox 
"transcoding" tool, since none appear to exist.  To keep it simple, I/O 
to/from "remote" mailbox (connections) are not pipelined.  It won't 
require more than MAXEMAILSIZE's worth of RAM (if one of the directions 
involves a remote connection), and so far when processing MIX, Maildir, 
and Mbox files, it's extremely fast.

Adding support for [sm]dbox wouldn't appear to be problematic.  At the 
moment, it supports everything Panda's c-client supports plus 
Maildir/Maildir++ (including Panda's "MIX").

Write support for Maildir's extremely UNDER-tested so far, as I've 
mainly used it to import Maildir hives.

I've experimented with Maildir as a format, and while the one email to a 
file model seems like a sensible idea, it seems to simply transfer 
stress from one part of the system to another, mainly filesystems, and 
not many of those are really up for handling that many files in one 
directory very efficiently.

None of my users have mailboxes with fewer than 100K emails in them, 
some have more than a million.

=R=



More information about the dovecot mailing list