[Dovecot] 2.0, hourly performance stats

Udo Wolter udo.wolter at charite.de
Mon Nov 8 12:36:56 EET 2010

* Ralf Hildebrandt <Ralf.Hildebrandt at charite.de>:
> I'm getting constantly high numbers of page reclaims & involuntary
> context switches for dovecot/auth.
> page reclaims = minor faults = cpu switching back to system-mode, But
> why is the auth process doing that so excessively? Same for the large
> number of involuntary context switches...

Some additions:

The last time we had 2.0 at the start we came into big trouble which could also
be seen on the VMware ESX side. The CPU load was about 95% constantly and on
the VM side the processes showed up in top at mainly using kernel space (system

Now we didn't have that high load in the morning, of course processes had been
in the kernel space too often. But: until the load isn't getting too high the
ESX doesn't show any problems, even the stats went up and down (what they
didn't do the last time we had the real problems, they just stayed in an even
upper line...).

Of course we could test it during the main noon time but in that case the
mailsystem begins to stumble on high load and users might complain. We also
have no real test scenario because it's not easy to get a "real" pressure on
the machine, so we have to test it in the production line. But I cannot switch
on 2.0 permanently this would cause too many problems.

Anyway, even if it runs without making problems on the ESX side we can see the
processes in the kernel space. They're way too long there and Ralf seems to
find the reason: too many page faults. That's all we can say now.


