SIS: dovecot creates duplicated attachments (sometimes)

Alexander Moisseev moiseev at mezonplus.ru
Fri Jul 20 10:25:58 EEST 2018


It's been 2 years since my previous post, but the issue is still present (Dovecot 2.3.2).

Long story short, SIS attachment deduplication was broken in Dovecot v2.2.16. When it was fixed in v2.2.22 I wrote this script to deduplicate attachments on my servers back: https://github.com/moisseev/doveadm-tools/blob/master/bin/dsisck
Sometimes I run the script and it finds new duplicated attachments (no too many). If you check these attachments, you find out that the attachments has the same hash, the same size, match if compare them byte by byte, were created at the same time, but one of them has a different inode number.

For instance, 3 days ago it was:
   Attachments processed: 31535
   Duplicated attachments: 370
   Missed hash files: 1313

And today:
   Attachments processed: 32368
   Duplicated attachments: 381
   Missed hash files: 1317

So, 11 new duplicates and 4 new missed hashes.

Can someone suggest a way to debug this problem somehow?

As a lot of time has passed since my previous post, I repost it below.

On 5/16/2016 9:27 AM, Alexander Moisseev wrote:
> SIS deduplication was broken in 2.2.16 and has been fixed with https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1
> but still sometimes (just several times in a month or so) dovecot creates duplicated attachments.
> 
> As you can see in directory listings below all attachments was created at (about) the same time, but one of them has different inode number.
> 
> 
> It easy to discover using https://github.com/moisseev/doveadm-tools/blob/master/bin/dsisck
> 
> # dsisck -n
> 
> mail_uid=vmail
> mail_attachment_dir=/vmail/attachments
> 
> ==> Checking SIS...
> 
> # ln -f ./30/b3/30b367c584a123eee59478adf3e4f4c9e1226545-c56eae04a67c3157287f01003d96bafd ./30/b3/30b367c584a123eee59478adf3e4f4c9e1226545-7ad0a411d17c31572b7a01003d96bafd
> # mkdir -p -m 700 ./64/8f/hashes && ln ./64/8f/648f5cfa27af6d20c8570fdcaeab997663e15105-55430d2cd4432c571cb600003d96bafd ./64/8f/hashes/648f5cfa27af6d20c8570fdcaeab997663e15105
> # ln -f ./6f/3f/6f3fa3e4d374a9c80d07af54960ce0e7adb2e0fe-b1935817ed753557546700003d96bafd ./6f/3f/6f3fa3e4d374a9c80d07af54960ce0e7adb2e0fe-72da1f32e38e3557947b00003d96bafd
> # ln -f ./dd/d9/ddd97aa6f624d4f54968d2c4956fc3a9d796b31b-6bf7c122fefb3157ad3500003d96bafd ./dd/d9/ddd97aa6f624d4f54968d2c4956fc3a9d796b31b-90fd1415e7e53157112800003d96bafd
> 
> --------------------------------------------------------------
>   Unexpected objects found: 0
> 
>   Attachments processed: 43500
>   Different attachments with similar hashes skipped: 0
> 
>   Attachments deduplicated: 3
>   File system blocks freed up: 1800
>   Attachment deduplication attempts failed: 0
> 
>   Hash files created: 1
>   Hash files creation attempts failed: 0
> 
>   Hash files re-linked: 0
>   Hash files re-link attempts failed: 0
> 
>   Orphaned hash files: 0
>   Invalid hash files (not a regular file): 0
> --------------------------------------------------------------
> 
> # cd /vmail/attachments
> 
> 
> # ls -li ./30/b3/ ./30/b3/hashes/
> ./30/b3/:
> total 1996
> 7464083 -rw-------  1 vmail  vmail  337719 May 10 09:16 30b367c584a123eee59478adf3e4f4c9e1226545-7ad0a411d17c31572b7a01003d96bafd
> 7464082 -rw-------  6 vmail  vmail  337719 May 10 09:16 30b367c584a123eee59478adf3e4f4c9e1226545-bb6eae04a67c3157287f01003d96bafd
> 7464082 -rw-------  6 vmail  vmail  337719 May 10 09:16 30b367c584a123eee59478adf3e4f4c9e1226545-bd6eae04a67c3157287f01003d96bafd
> 7464082 -rw-------  6 vmail  vmail  337719 May 10 09:16 30b367c584a123eee59478adf3e4f4c9e1226545-bf6eae04a67c3157287f01003d96bafd
> 7464082 -rw-------  6 vmail  vmail  337719 May 10 09:16 30b367c584a123eee59478adf3e4f4c9e1226545-c36eae04a67c3157287f01003d96bafd
> 7464082 -rw-------  6 vmail  vmail  337719 May 10 09:16 30b367c584a123eee59478adf3e4f4c9e1226545-c56eae04a67c3157287f01003d96bafd
> 7464084 drwx------  2 vmail  vmail     512 May 10 09:16 hashes
> 
> ./30/b3/hashes/:
> total 332
> 7464082 -rw-------  6 vmail  vmail  337719 May 10 09:16 30b367c584a123eee59478adf3e4f4c9e1226545
> 
> # ls -li ./6f/3f/ ./6f/3f/hashes/
> ./6f/3f/:
> total 644
> 15088414 -rw-------  1 vmail  vmail  326656 May 13 11:23 6f3fa3e4d374a9c80d07af54960ce0e7adb2e0fe-72da1f32e38e3557947b00003d96bafd
> 15088422 -rw-------  2 vmail  vmail  326656 May 13 11:23 6f3fa3e4d374a9c80d07af54960ce0e7adb2e0fe-b1935817ed753557546700003d96bafd
> 15088448 drwx------  2 vmail  vmail     512 May 13 11:23 hashes
> 
> ./6f/3f/hashes/:
> total 320
> 15088422 -rw-------  2 vmail  vmail  326656 May 13 11:23 6f3fa3e4d374a9c80d07af54960ce0e7adb2e0fe
> 
> # ls -li ./dd/d9/ ./dd/d9/hashes/
> ./dd/d9/:
> total 748
> 80548 -rw-------  3 vmail  vmail  250640 May 10 18:19 ddd97aa6f624d4f54968d2c4956fc3a9d796b31b-68f7c122fefb3157ad3500003d96bafd
> 80548 -rw-------  3 vmail  vmail  250640 May 10 18:19 ddd97aa6f624d4f54968d2c4956fc3a9d796b31b-6bf7c122fefb3157ad3500003d96bafd
> 80547 -rw-------  1 vmail  vmail  250640 May 10 18:19 ddd97aa6f624d4f54968d2c4956fc3a9d796b31b-90fd1415e7e53157112800003d96bafd
> 80549 drwx------  2 vmail  vmail     512 May 10 18:19 hashes
> 
> ./dd/d9/hashes/:
> total 248
> 80548 -rw-------  3 vmail  vmail  250640 May 10 18:19 ddd97aa6f624d4f54968d2c4956fc3a9d796b31b
> 
> 
> # doveconf -n
> # 2.2.24 (a82c823): /usr/local/etc/dovecot/dovecot.conf
> # Pigeonhole version 0.4.14 (099a97c)
> # OS: FreeBSD 10.3-RELEASE i386
> auth_default_realm = example.com
> auth_mechanisms = digest-md5 cram-md5 plain apop
> doveadm_password =  # hidden, use -P to show it
> first_valid_gid = 1000
> first_valid_uid = 1000
> lda_mailbox_autosubscribe = yes
> listen = *
> mail_attachment_dir = /vmail/attachments
> mail_gid = vmail
> mail_home = /vmail/%d/%n
> mail_location = mdbox:~/mdbox
> mail_plugins = quota zlib acl
> mail_shared_explicit_inbox = yes
> mail_uid = vmail
> managesieve_notify_capability = mailto
> managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date index ihave duplicate mime foreverypart extracttext
> namespace {
>    list = children
>    location = mdbox:%%h/mdbox:INDEXPVT=~/mdbox/shared/%%u
>    prefix = shared/%%u/
>    separator = /
>    subscriptions = no
>    type = shared
> }
> namespace inbox {
>    inbox = yes
>    location =
>    mailbox Archives {
>      special_use = \Archive
>    }
>    mailbox Drafts {
>      special_use = \Drafts
>    }
>    mailbox Junk {
>      auto = subscribe
>      special_use = \Junk
>    }
>    mailbox Sent {
>      special_use = \Sent
>    }
>    mailbox Trash {
>      special_use = \Trash
>    }
>    prefix =
>    separator = /
> }
> passdb {
>    args = scheme=plain username_format=%n /usr/local/etc/dovecot/dovecot.auth/%d.passwd
>    driver = passwd-file
> }
> plugin {
>    acl = vfile
>    acl_shared_dict = file:/vmail/shared-mailboxes.db
>    antispam_backend = mailtrain
>    antispam_mail_notspam = report_ham
>    antispam_mail_sendmail = /usr/local/libexec/dovecot/dovecot-lda
>    antispam_mail_sendmail_args = -d;spam at example.com;-m
>    antispam_mail_spam = report_spam
>    antispam_spam = Junk
>    antispam_trash = Trash;train_ham;train_prob;train_spam
>    quota = dict:User quota::file:%h/dovecot-quota
>    quota_rule = *:storage=2G
>    quota_rule2 = Trash:storage=+10%%
>    quota_rule3 = Spam:storage=+20%%
>    quota_status_nouser = DUNNO
>    quota_status_success = DUNNO
>    sieve_after = /usr/local/etc/dovecot/sieve/sieve.after
>    sieve_before = /usr/local/etc/dovecot/sieve/sieve.before
>    sieve_vacation_min_period = 0
>    zlib_save = gz
>    zlib_save_level = 3
> }
> postmaster_address = postmaster at example.com
> protocols = imap lmtp sieve pop3
> quota_full_tempfail = yes
> service auth {
>    unix_listener /var/spool/postfix/private/auth {
>      group = postfix
>      mode = 0660
>      user = postfix
>    }
>    unix_listener auth-userdb {
>      user = vmail
>    }
> }
> service config {
>    unix_listener config {
>      mode = 0600
>      user = vmail
>    }
> }
> service imap-login {
>    inet_listener imaps {
>      port = 0
>    }
>    process_limit = 200
> }
> service lmtp {
>    unix_listener /var/spool/postfix/private/dovecot-lmtp {
>      group = postfix
>      mode = 0660
>      user = postfix
>    }
>    user = vmail
> }
> service pop3-login {
>    inet_listener pop3s {
>      port = 0
>    }
> }
> service quota-status {
>    client_limit = 1
>    executable = quota-status -p postfix
>    unix_listener /var/spool/postfix/private/quota-status {
>      user = postfix
>    }
> }
> ssl_cert = </etc/ssl/certs/mx.example.com.crt
> ssl_key = </etc/ssl/private/mx.example.com.key
> userdb {
>    args = username_format=%n /usr/local/etc/dovecot/dovecot.auth/%d.passwd
>    driver = passwd-file
> }
> verbose_proctitle = yes
> protocol imap {
>    mail_plugins = quota zlib acl antispam imap_quota imap_zlib imap_acl
> }
> protocol lmtp {
>    mail_plugins = quota zlib acl sieve
> }



More information about the dovecot mailing list