Understanding why Dovecot unexpectedly died

Steffen Kaiser skdovecot at smail.inf.fh-brs.de
Tue Nov 18 08:40:31 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 15 Nov 2014, Luca Bertoncello wrote:

> I use Dovecot 1.2.17 (I can't upgrade right now, due to many reasons),
> controlled by Pacemaker (I have an HA-Cluster).
> Now I see that Pacemaker restarts often Dovecot. I wrote my own script to

Please define "often".
If it is rather very often, try start dovecot with a script an catch its 
output, e.g.:

#!/bin/bash

logf=/tmp/dovecot.start.log

(
  /../sbin/dovecot -F
  rc=$?
  echo $(date) rc=$rc
  exit $rc
) >>"$logf" 2>&1



> manage Dovecot, since Pacemaker does not have his own.
>
> My script, by the "monitor" section has this:
>
> monitor)
>                if [ ! -e $OCF_RESKEY_pid ]; then
>                        echo "stopped (no pidfile)"
> echo "DOVECOT STOPPED - NO PIDFILE" | /usr/bin/logger -p local0.info -t DOVECOT-MONITOR -i
>                        exit $OCF_NOT_RUNNING
>                else
>                        /bin/ps axuwf | /bin/grep `/bin/cat $OCF_RESKEY_pid` | /bin/grep -v grep > /dev/null 2>&1

this is vague and catches many false positives if the pid is low, don't 
your system accepts:

if ! ps `/bin/cat $OCF_RESKEY_pid` >/dev/null 2>&1; then

to query one particular process id?

>                        if [ $? -ne 0 ]; then
>                                echo "stopped"
> echo "DOVECOT STOPPED - NO PROCESS" | /usr/bin/logger -p local0.info -t DOVECOT-MONITOR -i
>                                exit $OCF_NOT_RUNNING
>                        else

How about to log:

lsof -p `/bin/cat $OCF_RESKEY_pid`
lsof -c dovecot
netstat -tupan

into a temporary file, say /tmp/dovecot.monitor.log

>                                if [ "`/bin/netstat -tupan | /bin/grep dovecot | /bin/grep $OCF_RESKEY_bindaddr | /usr/bin/wc -l`" -ne 0 ]; then
>                                        exit $OCF_SUCCESS
>                                else
> echo "DOVECOT STOPPED - NO LISTEN [`/bin/netstat -tupan | /bin/grep dovecot`]" | /usr/bin/logger -p local0.info -t DOVECOT-MONITOR -i
>                                        exit $OCF_ERR_GENERIC
>                                fi
>                        fi
>                fi
>                exit $OCF_SUCCESS
>                ;;
>
> The "loggers" was added now to try to understand why it dies...
> Well, I can see in my syslog, when Pacemaker restarts Dovecot, these lines:
>
> ov 15 18:59:09 mail01 DOVECOT-MONITOR[530]: DOVECOT STOPPED - NO LISTEN [tcp        0      0 192.168.33.1:37545      192.168.33.3:3306       ESTABLISHED 637/dovecot-auth
> Nov 15 18:59:09 mail01 DOVECOT-MONITOR[530]: tcp        0      0
> 192.168.33.1:37537      192.168.33.3:3306       ESTABLISHED 529/dovecot-auth]
>
> So, there is no "dovecot"-Process listening anymore... Normally I have these:
>
> tcp        0      0 0.0.0.0:110             0.0.0.0:*               LISTEN      634/dovecot
> tcp        0      0 0.0.0.0:143             0.0.0.0:*               LISTEN      634/dovecot
> tcp        0      0 0.0.0.0:993             0.0.0.0:*               LISTEN      634/dovecot
> tcp        0      0 0.0.0.0:995             0.0.0.0:*               LISTEN      634/dovecot
> tcp        0      0 192.168.33.1:40994      192.168.33.3:3306       VERBUNDEN   891/dovecot-auth
> tcp        0      0 192.168.33.1:40984      192.168.33.3:3306       VERBUNDEN   638/dovecot-auth
> tcp6       0      0 :::110                  :::*                    LISTEN      634/dovecot
> tcp6       0      0 :::143                  :::*                    LISTEN      634/dovecot
> tcp6       0      0 :::993                  :::*                    LISTEN      634/dovecot
> tcp6       0      0 :::995                  :::*                    LISTEN      634/dovecot
>
> In the mail.log and mail.err I can't see anything but:
>
> Nov 15 18:59:13 mail01 dovecot: Dovecot v1.2.17 starting up
> Nov 15 18:59:13 mail01 dovecot: auth-worker(default): mysql: Connected to 192.168.33.3 (exim)
>
> And in the syslos there is nothing about Dovecot...
>
> Any idea?
>
> Thanks a lot!
> Luca Bertoncello
> (lucabert at lucabert.de)
>

- -- 
Steffen Kaiser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEVAwUBVGsF/3z1H7kL/d9rAQLpJwf/TkKJ6pLDGH434gTuZ6kyvUfDbuuONNHm
NJpLktdHjsTMj6DU5hmygWnVJfa2aJseT6FGn3GQCyIVHoQQIF5YmBo6UPyYjW9U
JEjDortE20LobEEhUOHegBuIu05pfyHQbjdcRM2OXh99G4o3BtDiHqAnPskFyY2X
VMEwH3j9a00EgTDeh37NECgI4iITCt2WYZAGcOweCTiEj+8ll4Og/bAA0Q3Lk+aP
A0i4DnGzyPPayvKEzLmtfgJ0J6mKXNyD+14VPRcaGj4y+KrMc628JVAXpmyvO7N1
9J9drp5qUdeuyMXWQejI4rkvP0ZsuUKaMPJ94uJ2vCBtviLJJ8uoIA==
=tBd9
-----END PGP SIGNATURE-----


More information about the dovecot mailing list