[Dovecot] POP3 error

Tue Mar 8 18:46:57 EET 2011

On 03/08/2011 09:26 AM, Chris Wilson wrote:
> Hi Thierry,
>
> On Tue, 8 Mar 2011, Thierry de Montaudry wrote:
>> On 08 Mar 2011, at 13:24, Chris Wilson wrote:
>>>
>>>> top - 11:10:14 up 14 days, 12:04,  2 users,  load average: 55.04, 29.13, 14.55
>>>> Tasks: 474 total,  60 running, 414 sleeping,   0 stopped,   0 zombie
>>>> Cpu(s): 99.6%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
>>>> Mem:  16439812k total, 16353268k used,    86544k free,    33268k buffers
>>>> Swap:  4192956k total,      140k used,  4192816k free,  8228744k cached
>>
>> As you can see the numbers (55.04, 29.13, 14.55) the load was busy
>> getting higher when I took this snapshot and this was not a normal
>> situation. Usually this machine's load is only between 1 and 4, which is
>> quite ok for a quad core. It only happens when dovecot start generating
>> errors, and pop3, imap and http get stuck.  It went up to 200, and I was
>> still able to stop web and mail daemons, then restart them, and
>> everything was back to normal.
>
> I don't have a definite answer, but I remember that there has been a
> long-running bug in the Linux kernel with schedulers behaving badly under
> heavy writes:
>
> "One of the problems commonly talked about in our forums and elsewhere is
> the poor responsiveness of the Linux desktop when dealing with significant
> disk activity on systems where there is insufficient RAM or the disks are
> slow. The GUI basically drops to its knees when there is too much disk
> activity..." [http://www.phoronix.com/scan.php?page=news_item&px=ODQ3Mw]
> (note, it's not just the GUI, all other tasks can starve when a disk I/O
> queue builds up).
>
> "There are a few options to tune the linux IO scheduler that can help a
> bunch... Typically CFQ stalls too long under heavy writes, especially if
> your disk subsystem sucks, so particularly if you have several spindles
> deadline is worth a try." [http://communities.vmware.com/thread/82544]
>
> "I run Ubuntu on a moderately powerful quad-core x86-64 system and the
> desktop response is basically crippled whenever something is reading or
> writing large files as fast as it can (at normal priority)... For example,
> cat /path/to/LARGE_FILE>  /dev/null ... Everything else gets completely
> unusable because of the I/O latency."
> [https://bugs.launchpad.net/ubuntu/+source/linux/+bug/343371]
>
> "I was just running mkfs.ext4 -b 4096 -E stride=128 -E stripe-width=128 -O
> ^has_journal /dev/sdb2 on my SSD18M connected via USB1.1, and the result
> was, well, absolutely, positively _DEVASTATING_. The entire system became
> _FULLY_ unresponsive, not even switching back down to tty1 via Ctrl-Alt-F1
> worked (took 20 seconds for even this key to be respected)."
> [http://lkml.org/lkml/2010/4/4/86]
>
> "This regression has been around since about the 2.6.18 timeframe and has
> eluded a lot of testing to isolate the root cause. The most promising fix
> is in the VM subsystem (mm) where the LRU scan has been changed to favor
> keeping executable pages active longer. Most of these symptoms come down
> to VM thrashing to make room for I/O pages. The key change/commit is
> ab4754d24a0f2e05920170c845bd84472814c6, "vmscan: make mapped executable
> pages the first class citizen"... This change was merged into the 2.6.31r1
> kernel."
> [https://bugs.launchpad.net/ubuntu/+source/linux/+bug/131094/comments/235]
>
> One possible cause is that writing to a slow device can block the write
> queue for other devices, causing the machine to come to a standstill when
> there's plenty of useful work that it could be doing.
>
> This could cause a cascading failure in your server as soon as disk
> I/O write load goes over a certain point, a bit like a swap death. I'm not
> sure if the fact that you're using NFS makes a difference; perhaps only if
> you memory-map files?
>
> You could test this by booting with the NOOP or anticipatory scheduler
> instead of the default CFQ to see if it makes any difference.
>
> Cheers, Chris.

You can change it on the fly with:
`echo noop > /sys/block/${DEVICE}/queue/scheduler`

-- 
-Eric 'shubes'