[Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot

Javier de Miguel Rodrí­guez javierdemiguel at us.es
Tue Dec 14 14:15:09 EET 2010

> Javier, your number and type of spindles aren't the source of your I/O
> bottleneck.  Your array controller is.  Or, more specifically, its lack
> of enough fast network ports is the problem.  The P4300 BK718A model of
> which you speak only has two 1GbE iSCSI ports.  That's only 200 MB/s
> full duplex.  That's less link bandwidth than a single 2 Gbit fiber
> channel interface.  2 Gbit FC is 2 generations and over 8 years old now.
>   It was superseded by 4 Gbit FC and then 8 Gbit FC, which is the current
> standard.  8 Gbit FC provides 800 MB/s full duplex bandwidth per link.

     Hmm... I do not fully agree with your reasoning. An imap server is 
a lot of random I/O, not too much bandwith. Our
LeftHand Networks have two 1 Gbein adaptive load balancing (a kind of 
active-active connection), meaning 4 gbps full duplex.

     I attach you a screenshot of the perfomance of the lefthand: 
Average: 15 MB/seg, 1.700 IOPS. Highest load (today) is ~62 MB/seg, with 
a whooping 9000 IOPS, mucho above the theorical iops of 2 raid5 of 8 
disks each (SAS 15K), the cache is working as expected, and queue depth 
of 226 (a bit overloaded, though)

> There are many other fine SAN arrays on the market.  I mention Nexsan
> merely because they are very affordable, fast as hell for the price,
> easy to manage, and I've used them.  Due to the low price, they lack the
> more advanced features of higher priced units, such as snapshots, remote
> replication, etc.  As I mentioned, I'm not a big fan of relying on SAN
> controllers to perform all my disaster recovery functions, so I do that
> with software, such as VCB, and a dedicated backup server with a SAN
> attached robotic library.  Thus, the lack of this functionality in the
> Nexsan units is not an issue here.
     We have more than 65k users, and we need active-active disaster 
recovery. Half of our ESX cluster & lerfthands are in a location, the 
other half is in other location, and we have tested that shutting down a 
complete site we can recover in minutes (simply, the VMs start launching 
in the other one). Only lefthand (in the cheap market)  gave us realtime 
replication of storage, a kind of "network raid 1" for our storage...

     I still think that my problem is IOPs related, no bandwith related. 
My maximum bandwith today was 60 MB/seg, that fits entirely in 1 Gbps, 
but the queue depth is high because of the lot of iops (9000) that 
"only" 16 disks can not handle. I can buy better storage heads to delay 
all that writes, or avoid a lot of them putting the indexes in a SSD or 
in a ramdisk.

     Thank for all the info, I did not know about Nexsan.



