[Dovecot] Corrupted transaction log file / record size too small

Steven Winikoff smw at alcor.concordia.ca
Fri Oct 10 21:22:41 EEST 2008


>>   2) recompiled dovecot without optimization (ie, gcc -O0 instead of -O2)
>>      (as per http://www.mail-archive.com/dovecot@dovecot.org/msg12936.html)
>
>The problem here turned out to be a bug in Sun CC (and gcc -O2  
>apparently worked after all, so maybe a problem with testing).

Thanks.  I'll revert to the default -O2 the next time we recompile
(which will probably be version 1.1.4 sometime early next week).


>>   3) reverted to using flock for file locking instead of fcntl (we'd been
>>      using flock with dovecot 1.0.10; 1.1.3 used fcntl by default in a
>>      couple of places, so I changed it back to see what would happen)
>
>This shouldn't really make a difference.

That's good, because in hindsight I haven't noticed any difference. :-)


>>   - we still see the occasional assertion failure at line 1091 of
>>     index-mail.c; this is the one that I'd hoped to solve by
>>     recompiling without optimization, but at least it seems to be
>>     happening less frequently than before
>
>I had hoped v1.1.3 would have fixed this. But perhaps this is also  
>related to the disk quota issues. Are there any other errors that are  
>logged before the assert?

Nothing obvious, I'm afraid.  For example, this occurence is typical of
the 24 which we've logged since Sept. 26 (which is as far back as we
keep dovecot log data online; for the record, we upgraded from 1.0.10
to 1.1.3 on Sept. 23):

   Oct  8 13:56:39 alcor dovecot: pop3[15127] USER1, NNN.NNN.NNN.NNN: Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0, xfer=12/27
   Oct  8 13:56:39 alcor dovecot: pop3-login: Login: [15118], USER2, NNN.NNN.NNN.NNN
   Oct  8 13:56:39 alcor dovecot: wrapper[15139]: pop3, USER2, NNN.NNN.NNN.NNN
   Oct  8 13:56:39 alcor dovecot: pop3-login: Login: [15058], USER3, NNN.NNN.NNN.NNN
   Oct  8 13:56:39 alcor dovecot: wrapper[15141]: pop3, USER3, NNN.NNN.NNN.NNN
   Oct  8 13:56:39 alcor dovecot: pop3[15141] USER3, NNN.NNN.NNN.NNN: mkdir(/home/USER3/.imap/INBOX) failed: Disk quota exceeded
   Oct  8 13:56:39 alcor dovecot: imap-login: Login: [15121], USER4, NNN.NNN.NNN.NNN
   Oct  8 13:56:39 alcor dovecot: wrapper[15143]: imap, USER4, NNN.NNN.NNN.NNN
   Oct  8 13:56:39 alcor dovecot: pop3[15139] USER2, NNN.NNN.NNN.NNN: Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0, xfer=12/27
   Oct  8 13:56:39 alcor dovecot: imap-login: Login: [15001], USER5, NNN.NNN.NNN.NNN
   Oct  8 13:56:40 alcor dovecot: wrapper[15159]: imap, USER5, NNN.NNN.NNN.NNN
   Oct  8 13:56:40 alcor dovecot: imap-login: Login: [15158], USER6, NNN.NNN.NNN.NNN
   Oct  8 13:56:40 alcor dovecot: wrapper[15162]: imap, USER6, NNN.NNN.NNN.NNN
   Oct  8 13:56:40 alcor dovecot: imap[15159] USER5, NNN.NNN.NNN.NNN: Disconnected: Logged out bytes=305/13377
   Oct  8 13:56:40 alcor dovecot: imap[15162] USER6, NNN.NNN.NNN.NNN: Disconnected: Logged out bytes=110/356
   Oct  8 13:56:40 alcor dovecot: Panic: imap[15143] USER4, NNN.NNN.NNN.NNN: file index-mail.c: line 1091 (index_mail_close): assertion failed: (!mail->data.destroying_stream)
   Oct  8 13:56:40 alcor dovecot: imap[15143] USER4, NNN.NNN.NNN.NNN: Raw backtrace: dovecot-imap [0x4a3ddd] -> dovecot-imap [0x4a4b9d] -> dovecot-imap(i_fatal+0) [0x4a40d1] -> dovecot-imap(index_mail_close+0xfa) [0x4538d4] -> dovecot-imap [0x4538fe] -> dovecot-imap(index_mail_set_seq+0x6a) [0x453ac6] -> dovecot-imap [0x445e6e] -> dovecot-imap(mail_set_seq+0x28) [0x45f6a5] -> dovecot-imap(index_storage_search_next_nonblock+0xe9) [0x458ff0] -> dovecot-imap(mailbox_search_next_nonblock+0x33) [0x462de3] -> dovecot-imap(mailbox_search_next+0x2e) [0x462da2] -> dovecot-imap [0x415e99] -> dovecot-imap(cmd_copy+0x19c) [0x4160ac] -> dovecot-imap [0x41d0ac] -> dovecot-imap [0x41d2e9] -> dovecot-imap [0x41d41c] -> dovecot-imap [0x41d460] -> dovecot-imap(client_input+0xb4) [0x41d5ed] -> dovecot-imap(io_loop_handler_run+0x17d) [0x4aea11] -> dovecot-imap(io_loop_run+0x3b) [0x4adc6a] -> dovecot-imap(main+0x99) [0x42908b] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b306fee78c4] -> dovec!
 ot-imap [0x414939]
   Oct  8 13:56:40 alcor dovecot: child 15143 (imap) killed with signal 6
   Oct  8 13:56:40 alcor dovecot: pop3[15141] USER3, NNN.NNN.NNN.NNN: Disconnected: Logged out top=0/0, retr=1/12816, del=1/1, size=12798, xfer=40/12944
   Oct  8 13:56:40 alcor dovecot: imap-login: Login: [15003], USER7, NNN.NNN.NNN.NNN

Although the usernames and IP addresses have been elided, each USERn
token always represents the same specific account wherever it appears.


>>   - this doesn't happen often, but once in a while we see assertion
>>     failures at line 163 of src/pop3-login/client-authenticate.c or
>>     line 200 of src/imap-login/client-authenticate.c
>
>I think I've managed to get this also sometimes when killing dovecot- 
>auth or doing something else "wrong" (I don't remember exactly), but  
>not during normal operation. I should have noted down how to reproduce  
>this when I still knew how to reproduce it..

For us this appears to happen during normal operations.  That's not to
say I've never killed a dovecot-auth process :-), but it's not something
I do often (in fact the last time I did was when I recompiled 1.1.3 as
described previously, which by now was ten days ago).

For what it may be worth, this particular symptom appears to happen in
clusters.  The most recent spate that we saw were as follows (in context,
in case any of the other log messages may be relevant):

   Oct 10 04:48:45 alcor dovecot: pop3-login: Disconnected (no auth attempts): [9703], NNN.NNN.NNN.NNN
   Oct 10 04:48:45 alcor dovecot: pop3-login: Disconnected (no auth attempts): [9705], NNN.NNN.NNN.NNN
   Oct 10 04:48:45 alcor dovecot: pop3-login: Disconnected (no auth attempts): [9702], NNN.NNN.NNN.NNN
   Oct 10 04:48:46 alcor dovecot: pop3-login: Disconnected (no auth attempts): [9704], NNN.NNN.NNN.NNN
   Oct 10 04:48:46 alcor dovecot: pop3-login: Disconnected: Inactivity (auth failed, 1 attempts): [9500], USER10, NNN.NNN.NNN.NNN
   Oct 10 04:48:46 alcor dovecot: Panic: pop3-login: file client-authenticate.c: line 163 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED)
   Oct 10 04:48:46 alcor dovecot: imap-login: Disconnected: Inactivity (auth failed, 1 attempts): [9502], USER11, NNN.NNN.NNN.NNN, TLS
   Oct 10 04:48:46 alcor dovecot: Panic: imap-login: file client-authenticate.c: line 200 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED)
   Oct 10 04:48:46 alcor dovecot: imap-login: Disconnected: Inactivity (auth failed, 1 attempts): [9503], USER12, NNN.NNN.NNN.NNN, TLS
   Oct 10 04:48:46 alcor dovecot: Panic: imap-login: file client-authenticate.c: line 200 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED)
   Oct 10 04:48:46 alcor dovecot: imap-login: Disconnected: Inactivity (auth failed, 1 attempts): [8777], USER9, NNN.NNN.NNN.NNN
   Oct 10 04:48:46 alcor dovecot: Panic: imap-login: file client-authenticate.c: line 200 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED)
   Oct 10 04:48:47 alcor dovecot: imap[4258] USER8, 67.223.74.40: Disconnected: Logged out bytes=7407/20930
   Oct 10 04:48:47 alcor dovecot: pop3-login: Disconnected: Inactivity (auth failed, 1 attempts): [9575], USER13, NNN.NNN.NNN.NNN
   Oct 10 04:48:47 alcor dovecot: Panic: pop3-login: file client-authenticate.c: line 163 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED)
   Oct 10 04:48:47 alcor dovecot: pop3-login: Disconnected: Inactivity (auth failed, 1 attempts): [9576], USER14, NNN.NNN.NNN.NNN
   Oct 10 04:48:47 alcor dovecot: Panic: pop3-login: file client-authenticate.c: line 163 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED)
   Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9498
   Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9500
   Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9499
   Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9502
   Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9503
   Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9575
   Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9576
   Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9174
   Oct 10 04:48:47 alcor dovecot: imap[9809] USER9, 132.205.7.20: Connection closed bytes=0/0
   Oct 10 04:48:47 alcor dovecot: imap-login: Disconnected: Inactivity (auth failed, 1 attempts): [9174], USER8, NNN.NNN.NNN.NNN, TLS
   Oct 10 04:48:47 alcor dovecot: Panic: imap-login: file client-authenticate.c: line 200 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED)
   Oct 10 04:48:48 alcor dovecot: imap-login: Disconnected (no auth attempts): [9896], NNN.NNN.NNN.NNN
   Oct 10 04:48:48 alcor dovecot: imap-login: Disconnected (no auth attempts): [9895], NNN.NNN.NNN.NNN
   Oct 10 04:48:48 alcor dovecot: pop3-login: Disconnected (no auth attempts): [9893], NNN.NNN.NNN.NNN


>> I missed this originally, but on reflection I notice in the logs that
>> every time the "record size too small" error appears, inevitably the
>> user involved is over disk quota and out of grace period.  For  
>> example,
>
>Yes, this is most likely what's the problem. I remember testing v1.0  
>quite a lot in "out of disk space" conditions, but I haven't really  
>done any such testing with v1.1. I'll try that and see if I can get  
>these problems reproduced and fixed.

Thank you! :-)  Please let me know if there's anything I can do to help.

     - Steven
________________________________________________________________________
Steven Winikoff                |
Concordia University           | "Hanging is too good for a man who
Montreal, QC, Canada           |  makes puns; he should be drawn and
smw at alcor.concordia.ca         |  quoted."
http://alcor.concordia.ca/~smw |                            - Fred Allen


More information about the dovecot mailing list