Dovecot and data migration

Alain BERNARD memoefix at gmail.com
Thu May 7 14:10:54 UTC 2015


Thank you.

In fact, Postfix adds an individual Delivered-To: header line with the
final envelope recipient address in order to stop mail forwarding loops as
early as possible. This is a real problem with a multiple recipient email
and find exact duplicates by comparing the hash values of emails.

To perform a test, I used the -p parameter :

# /usr/libexec/dovecot/deliver -p tempfile -d fredb
# /usr/libexec/dovecot/deliver -p tempfile -d gregk

# ll /usr/libexec/dovecot/deliver
lrwxrwxrwx 1 root mail 11 16 févr. 09:15 /usr/libexec/dovecot/deliver ->
dovecot-lda

# ll /store/vmail/gam/fredb/Maildir/cur/
1430986604.M985408P7547.mail6.domain.org\,S\=1037\,W\=1059\:2\,a
-rw------- 1 vmail vmail 1037  7 mai   10:16
/store/vmail/gam/fredb/Maildir/cur/1430986604.M985408P7547.mail6.domain.org
,S=1037,W=1059:2,a

However, the file isn't hard linked. So, fredb and gregk have the same file
but I see that the number of hard links isn't  2 (files with a different
inode number).

Regards,

2015-05-07 8:12 GMT+02:00 Steffen Kaiser <skdovecot at smail.inf.fh-brs.de>:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wed, 6 May 2015, Alain BERNARD wrote:
>
>  Our legacy data store retains a single copy of a message regardless of the
>> number of mailboxes in which that message resides. It does this by
>> creating
>> hard links to that message in the mailboxes containing that message.
>>
>
>  Thus, when we perform data migration to the server target (Dovecot), the
>> copies of the same message are copied over with the migration process
>> (imapsync). We use the storage format maildir.
>>
>
>  With a small message store, this means that a lot of messages are
>> duplicated unnecessarily. How to reduce message store size due to
>> duplicate
>> storage of identical messages ?
>>
>
> There is no function in Dovecot doing that. For the synchronisation you
> can come up with some filesystem related script doing that easily.
>
>  Does a relinking function exist and can be run in real-time mode ? how can
>> we configure Dovecot to deduplicate for all users using a hash to
>> determine
>> whether the file could be already exist ?
>>
>
> In Dovecot v1 I did this with an external script, that hard linked equal
> files in cur and new directories that resides in more than 10 or so
> mailboxes.
>
> But in the production phase with Dovecot v2 you will face some culprit:
> with LMTP all messages are different now, because of the user-related
> Delivered-To and final Recieved header. With deliver this does not happen,
> but your MTA possibly adds different headers then, because usually LDAs are
> called per recipient. Dovecot deliver has the "-p" option to optionally
> hard link to file message file to the argument of -p. But then you must use
> some scripting to have your MTA call that script for all final recipients.
> You should also check, if Sieve is compatible with -p, because I remember
> some bug reports.
>
> - -- Steffen Kaiser
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQEVAwUBVUsCN3z1H7kL/d9rAQJp7Af/dPVmZcYQN48P4rgThc6RLFoB4PeLTF3B
> X42XqLmyje0d1Hv2YJMJXdSJccYJ4vp14MWJ0h11I3jOor17lnBGBTBqPyxZI7gL
> bYDJI2DUSh1CoQ2Sed9vRe5uKaDDlfuPFIym5JE4EJky8m8uEYSa+RRr/jtxbzpn
> RyKTn0SWls818hC5rISowvYyej5tvgZcq1lQn7yglqbriudJY33PHaa4EA7aaKVC
> ok4kiL9R0hKLTVjmeibxe0ZfI5MALVqkr1m5UOKXVj0M8lMHxx+qOoMlmkU3fXqI
> vwgvgYusvp3OeJJw23CJ5T0haaltzRcHJFil9F/4CLwMrsI44NnhgA==
> =JbnI
> -----END PGP SIGNATURE-----
>


More information about the dovecot mailing list