[Dovecot] Dovecot tuning for GFS2

Stan Hoeppner stan at hardwarefreak.com
Fri Aug 23 04:57:40 EEST 2013


On 8/21/2013 4:07 PM, Jan-Frode Myklebust wrote:

> I would strongly suggest using mdbox instead. AFAIK clusterfs' aren't

I'd recommend mdbox as well, with a healthy rotation size.  The larger
files won't increase IMAP performance substantially but they can make
backup significantly quicker.

> very good at handling many small files. It's a worst case random I/O 
> usage pattern, with high rate of metadata operations on top.

Just for clarification, small files and random IO patterns at the disks
are only a small fraction of the maildir problem.  The majority of it is
metadata--the create, move, rename, etc operations.  To keep the
in-memory filesystem state consistent across all nodes, and to avoid
putting extra IOPS on the storage if on disk data structures were to be
used for synchronization, cluster filesystems exchange all metadata
updates and synchronization data over the cluster interconnect.  This is
inherently slow.

With a local filesystem and multiple processes, this coherence dance
takes place at DRAM latencies--tens of nanoseconds, and scales well as
load increases because DRAM bandwidth is 25-100 GB/s.  With a cluster
filesystem it takes place at interconnect latency, tens to hundreds of
μs, or about 1000x higher latency.  And it doesn't scale well as
bandwidth is limited to ~100 MB/s with GbE, ~1 GB/s with 10GbE or
Myrinet.  Stepping up to Infiniband 4x DDR can get you ~2 GB/s and
slightly lower latency, but that's a lot of extra expense for a mail
cluster, given the performance won't scale with the $$ spent.  The
switch and HBAs will cost more than the COTS servers.

Selecting the right mailbox format is in essence free, and mostly solves
the maildir metadata and IOPS problem.

> We use IBM GPFS for clusterfs, and have finally completed the conversion
> of a 130+ million inode maildir filesystem, into a 18 million inode mdbox
> filesystem. I have no hard performance data showing the difference
> between maildir/mdbox, but at a minimum mdbox is much easier to manage.
> Backup of 130+ million files is painfull.. and also it feels nice to be
> able do schedule batches of mailbox purges to off-hours, instead of doing
> them at peak hours.

130m to 18m is 'only' a 7 fold decrease.  18m inodes is still rather
large for any filesystem, cluster or local.  A check on an 18m inode XFS
filesystem, even on fast storage, would take quite some time.  I'm sure
it would take quite a bit longer to check a GFS2 with 18m inodes.  Any
reason you didn't go a little larger with your mdbox rotation size?

-- 
Stan



More information about the dovecot mailing list