[Patch] Fix hang in safe_sendfile on SmartOS
# HG changeset patch # User Sebastian Wiedenroth <sebastian.wiedenroth@skylime.net> # Date 1437050484 -7200 # Thu Jul 16 14:41:24 2015 +0200 # Node ID 7ef3a533b097e8e6590e754dc56ad308ab29233b # Parent e3640ccaa76d77a9658126d1f8f306480dad8af7 Fix hang in safe_sendfile on SmartOS The call to sendfile on SmartOS can fail with EOPNOTSUPP. This is a valid error code and documented in the man page. This error code needs to be handled or else dovecot will retry the sendfile call endlessly and hang. diff -r e3640ccaa76d -r 7ef3a533b097 src/lib/sendfile-util.c --- a/src/lib/sendfile-util.c Sat Jan 10 04:32:42 2015 +0200 +++ b/src/lib/sendfile-util.c Thu Jul 16 14:41:24 2015 +0200 @@ -116,7 +116,7 @@ if (errno == EINVAL) { /* most likely trying to read past EOF */ ret = 0; - } else if (errno == EAFNOSUPPORT) { + } else if (errno == EAFNOSUPPORT || errno == EOPNOTSUPP) { /* not supported, return Linux-like EINVAL so caller sees only consistent errnos. */ errno = EINVAL;
On 07/16/2015 06:03 PM, Sebastian Wiedenroth wrote:
Committed .. However, I think a more important bug is that it hangs. It's definitely not supposed to hang. Which process was it that was hanging How can I reproduce that? I can only get it to disconnect the IMAP client.
Thanks!
It was the managesieve process that was hanging. To trigger it we used sieve-connect [1] like this: sieve-connect -u demo@example.com mailbox.example.com
To find the issue we used a dtrace script [2]. With an unpatched version it would show:
CPU ID FUNCTION:NAME
0 4623 safe_sendfile:return 7 4 fffffd7fffdff8b8 154 -> 77
0 4623 safe_sendfile:return 7 4 fffffd7fffdff8b8 9223372036854775807 -> 77
0 4623 safe_sendfile:return 7 4 fffffd7fffdff8b8 9223372036854775807 -> 77
and then just repeat the last line as it retried the call forever. This is where it hangs, spinning on the cpu.
After looking at the code I confirmed the issue with a dtrace one-liner that tracks the sendfilev syscall:
dtrace -n 'syscall::sendfilev:return {printf("%d %x\n", arg0, errno)}'
This showed that the call returned with EOPNOTSUPP (0x7a): 0 6155 sendfilev:return -1 7a
The man page lists this as a valid error code and handling it the same way as EAFNOSUPPORT fixed the issue for us. There are a few more error codes in the man page that currently are not handled by dovecot. This might be something to look into in the future.
I hope this answer provides the details you’re looking for.
Best regards, Sebastian
[1] https://github.com/philpennock/sieve-connect [2] https://gist.github.com/wiedi/4b4ebe5f92ac5b54951b
On 07 Sep 2015, at 23:19, Sebastian Wiedenroth <sebastian.wiedenroth@skylime.net> wrote:
Thanks. I did find a bug in Pigeonhole with this when issuing a GET command :)
Also I see now why it's looping, more or less. sendfile() is still indicating that it's sending some data (by updating s_offset) even though it's returning a failure. I wonder if reverting the earlier EOPNOTSUPP change and applying this patch causes it to assert-crash instead of going to infinite loop? http://hg.dovecot.org/dovecot-2.2/rev/f6dd24658fb1
participants (2)
-
Sebastian Wiedenroth
-
Timo Sirainen