[Dovecot] mysql auth failover failing

Timo Sirainen tss at iki.fi
Mon Sep 12 15:30:45 EEST 2011


On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:

> According to the sample SQL configuration file "HA / round-robin 
> load-balancing is supported by giving multiple host settings, like: 
> host=sql1.host.org host=sql2.host.org".
> 
> However, as far as I can tell dovecot only connects to the first listed 
> host, and processes all queries through it, there does not appear to be 
> any load-balancing going on.

The current code creates connection to the second server only when the
first connection is already busy with an SQL query, or when it's not
working. Once there are more connections, it starts doing round robin
lookups.

This works okay enough with PostgreSQL because it does asynchronous
lookups, so two simultaneous lookups create a second connection. MySQL
does synchronous lookups though, so the second connection is normally
never created.

I suppose the fix to this would be to always connect to all SQL servers
at startup.

> That's not necessarily a dealbreaker; however, high-availability does 
> not appear to be working either.
> 
> If I shutdown the first mysql server, dovecot starts to log connection 
> failures:
> 
> Sep  9 15:47:34 tweak dovecot: auth: Error: 
> mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): 
> Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - 
> waiting for 1 seconds before retry
> 
> Sep  9 15:47:39 tweak dovecot: auth: Error: 
> mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): 
> Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - 
> waiting for 25 seconds before retry

Those are intentional.

> And postfix starts to fail authentications:
> 
> Sep  9 15:47:35 tweak postfix/smtpd[5119]: warning: 
> bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 
> authentication failed: Connection lost to authentication server

It should have created the second connection here and not fail..

> Now and again the authentication process dies:
> 
> Sep  9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: 
> line 697 (auth_request_handler_flush_failures): assertion failed: 
> (auth_request->state == AUTH_REQUEST_STATE_FINISHED)

And this of course shouldn't happen either.

> Requests start to pile up:
> 
> Sep  9 15:51:46 tweak dovecot: auth: Warning: auth workers: Auth request 
> was queued for 25 seconds, 45 left in queue
> 
> Lookups time out:
> 
> Sep  9 15:57:22 tweak dovecot: auth: Error: auth worker: Aborted 
> request: Lookup timed out

These are the result of the previous failures.

> This occasionally pops up:
> 
> Sep  9 15:58:38 tweak dovecot: auth: Fatal: 
> net_connect_unix(auth-worker) failed: Resource temporarily unavailable

Probably this too.

> And sometimes the auth process gets temporarily disabled:
> 
> Sep  9 15:58:57 tweak dovecot: master: Error: service(auth): command 
> startup failed, throttling

Most likely related to the crash, although I think this still shouldn't
have happened.

> I don't think all authentications fail during the scenario, but I think 
> the majority do. Based on the network traffic, dovecot is almost 
> continuously trying to connect to the first listed server. It sometimes 
> connects to the second listed server, but when it does, the connection 
> does not persist, it goes away almost immediately.

There are multiple auth-worker processes, each one having their own
internal MySQL connections with separate retry counters.

I'll try to debug this soon.




More information about the dovecot mailing list