[Dovecot] SQL indexes [was: How to get rid of locks]

Sun Apr 8 11:29:46 EEST 2007

On 8.4.2007, at 10.29, Daniel L. Miller wrote:

> This is probably a scary thought, but . . . what would it take for  
> the indexing part of Dovecot to be implemented via an API/plug-in  
> model?  I'm curious about the effect of using an external SQL  
> engine (my vote would be Firebird) for processing these, and using  
> a open plug-in method would allow for that without binding Dovecot  
> to a particular implementation.

Well.. It would be possible to make the lib-index API completely  
virtualized, but I don't think there's much point. The lib-index API  
actually doesn't have all that much to do with reading/writing index  
files. It's much more about easily manipulating mailbox metadata in  
memory.

For example the way I was planning on implementing SQL mail storage  
was to create an in-memory index and keep it updated by reading the  
data from SQL. The same metadata is in SQL, but it still needs to be  
stored into Dovecot's internal structures (== the indexes).

Then there is however dovecot.index.cache file. It's a pretty simple  
database, so replacing it with SQL would make more sense. The cache  
file API isn't virtualizable yet either, but I was planning on doing  
that if I ever got around to making the SQL mail storage plugin  
really usable.

SQL cache replacement would need to be a bit tricky however to work.  
Currently lib-storage API works like:

  - mail_alloc() is done first. It tells what fields it most likely  
wants to fetch.
  - mail_set_seq() can be used to switch to whatever message in mailbox
  - mail_get_*() functions can be used to fetch the message data.

The simplest SQL implementation would just do a SQL query for each  
mail_get_*() call, but this would also be the slowest implementation.  
A bit better would be to use one SQL query to fetch all the data  
specified by mail_alloc() in the first mail_get_*() function call.  
Then if something extra is fetched that would generate extra SQL  
queries.

However most of the time mail_set_seq() isn't used randomly. It's  
mostly done only when building a reply for THREAD command. Usually  
searching is used:

  - mailbox_search_init() specifies search arguments. For FETCH  
commands this is simple "sequences 1-10".
  - mailbox_search_next() finds the next match and calls mail_set_seq 
() for that mail

So hooking into these functions you could figure out in the _init()  
that you want to do the SQL query for messages 1-10, and the first  
call to _next() tells you the mail structure where you can get the  
list of wanted fields. So IMAP command:

UID FETCH 1:5,10:20 (ENVELOPE BODY INTERNALDATE)

Could be done with a single SQL query, something like:

select envelope, body, internaldate from message_cache where uid  
between 1 and 5 or uid between 10 and 20;

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://dovecot.org/pipermail/dovecot/attachments/20070408/8035c730/attachment-0001.pgp