[Dovecot] Single instance storage - testing please

Timo Sirainen tss at iki.fi
Thu Aug 26 22:32:18 EEST 2010


http://hg.dovecot.org/dovecot-2.0-sis contains the code for it.
Otherwise it's the latest (as of writing this) dovecot-2.0 hg tree.
Please test if you're interested in SIS. :)

Once there's at least some testing, I'll probably add this to v2.0.x
since very little of this new code is used when SIS is disabled (which
is the default of course).

SIS works pretty much like explained in
http://dovecot.org/list/dovecot/2010-July/050832.html and
http://dovecot.org/list/dovecot/2010-July/050992.html

Two things I'm not yet entirely sure about:

1. What hash algorithm to use? Currently it's hard coded to SHA1.
Besides more CPU usage, the other potential problem with larger hashes
is that they also generate larger filenames. The filenames are currently
hex-encoded, but to save space they could be changed to some kind of
modified-base64 (base64 uses '/' chars, so it can't be regular base64).
Example filename lengths:

          hex  modified-base64
  SHA1    73   50
  SHA256  97   66
  SHA512  161  109

Yet another possibility would be to use SHA256/SHA512 and just truncate
the hash length to less number of bits.

2. Should I add support for trusting hash uniqueness and to avoid disk
I/O generated by the byte-by-byte comparison? It could still first check
that the file sizes match.

Usage
-----

You can enable SIS for sdbox and mdbox:

mail_attachment_dir = /var/attachments

Just setting the above enables "instant SIS", where byte-by-byte
comparison is done immediately during saving mails. Alternative is to
leave the comparing later by setting:

  mail_attachment_fs = sis-queue /var/attachments/queue:posix

This does no deduplication itself yet. To do that you'll need a nighty
(or whatever) run, which calls:

  doveadm sis deduplicate /var/attachments /var/attachments/queue

There's also a feature to easily find all attachments based on a hash.
For example:

  % sha1sum foo
  351641b73feb7cf7e87e5a8c3ca9a37d7b21e525  foo
  % doveadm sis find /var/attachments 351641b73feb7cf7e87e5a8c3ca9a37d7b21e525
  /var/attachments/35/16/351641b73feb7cf7e87e5a8c3ca9a37d7b21e525-e13a841f28ba764c123b00008c4a11c1
  /var/attachments/35/16/351641b73feb7cf7e87e5a8c3ca9a37d7b21e525-1d3b940628ba764c0b3b00008c4a11c1

If you want to save attachments to a separate files without SIS (e.g.
you want to use your filesystems deduplication), set:

  mail_attachment_fs = posix

By default only attachments larger than 128 kB are written to attachment
storage. You can change it from:

  mail_attachment_min_size = 128k

It's also possible to create a plugin that adds further restrictions to
when the attachment is saved separately. This might be useful to reduce
disk seeks for attachments that are typically shown inline by
clients/webmail. You can do this by overriding
mailbox.save_is_attachment() method.

If you want to distribute attachments to multiple filesystems, just
create /var/attachments/[0-9a-f][0-9a-f] as symlinks pointing to
whatever mount paths you want.



More information about the dovecot mailing list