[Dovecot] Using MySQL to store email?

Geo Carncross geocar-dovecot at internetconnection.net
Thu Jun 8 23:48:59 EEST 2006


On Wed, 2006-06-07 at 09:29 -0700, Marc Perkel wrote:
> 
> Les Mikesell wrote:
> > I think the people who expect an improvement from databases over maildir
> > are used to unix filesystems that degrade badly as the number of files
> > in a directory increase.  These days many, like Reiserfs and XFS, are
> > much better.  My theory is that if your filesystem isn't a good place
> > to store things you should fix that before thinking about databases.
> 
> Suppose you have a total of one million messages stored for 5000 users 
> across 800 domains and you want to delete all message that were sent 
> from a specific host. With MySQL it's a one line command and would take 
> only a few seconds to execute. Using Maildir it would take hours because 
> you would have to search every message. That's the power of MySQL.

And with great power comes great frustration: Instead of taking 50msec
to fetch a message, it takes 80msec, and instead of taking 25msec to
load a message into a mailbox, it now takes 200-500msec!

Note that what you REALLY want to do is create a smart index of
everything- it exists, it's called ZOE. It's not fast, and I think in
practice, it's not all that much fun to use, but it does EXACTLY what
you're talking about- builds a relational model of incoming email.

http://zoe.nu/

It's Slow. REALLY slow. And don't fool yourself into thinking it's slow
because it's written in Java. It's slow because fulltext loading
messages is slow.


> Maildir only is fast for indexing file names. But if you are indexing 
> across users and domains by host, or headers, or senders, or whatever 
> then only a database can support these multiple indexes. There are 
> things you can do with databases that are way beyond what you can 
> imagine. Especially if you are integrating it with a spam system.

No. The cost is GREATER for MySQL than it is for Maildir because MySQL
has to spend the time to index those things whether they're used or not.
Building an index on everything is slow- you spend all your time
maintaining indexes. Most of the time they're not needed- and for the
cases where they're interesting, a specialized solution is always
faster.

> For example, a new message comes in and you find that sender matches 
> email in 100 people's spam folders and none in any other folder? It can 
> be classified as spam. If however the from address matches ham in people 
> folder and no spam then you can probably deliver it without spam scanning.

Sender-checks are useless- most junk mail in my mailboxes come from
random addresses. Language classifiers don't use MySQL as a backend
because MySQL is slow. Sparse fields are much faster, and not at all
difficult to implement.


> This issue isn't performance, it's power.

Really?

Tue, 06 Jun 2006 05:38:41 -0700  (08:38 EDT)
> Seems to me that MySQL could somplify all the backend stuff 
> that everyone struggles with and with replication one could create
> very 
> massive and reliable systems

Tue, 06 Jun 2006 16:40:27 -0700  (19:40 EDT)
> Not true for "always". If you have 100,000 messages in a folder the
> database will win easy.

It seems to me that this is poorly thought out with no real goals in
mind. It seems like a way to "keep your options open" - except you
neglect that doing so incurs a great cost. I recommend you take a look
at ZOE and think a little more if you want all email to work that way.


> If you compare it with what people are doing now then MySQL would probably be a little slower.

No, it will definitely be a lot slower.

> But if you compare it to what is possible that you can't do now then MySQL wins.

I'm not interested in doing "what's possible"- and in fact, the vast
majority of people aren't either. They're interested in solving
problems, or having the problems solved for them.

Please go look at ZOE and DBMail and decide if you still think this is a
good idea after using these-- maybe it will help you decide exactly what
you want to do- and perhaps if you do decide to "make your own LDA"
it'll help you decide what's wrong with them- so you don't repeat their
mistakes.

-- 
Internet Connection High Quality Web Hosting
http://www.internetconnection.net/

-- 
Internet Connection High Quality Web Hosting
http://www.internetconnection.net/



More information about the dovecot mailing list