[Dovecot] FTS Plugin design

Timo Sirainen tss at iki.fi
Thu Apr 16 01:23:53 EEST 2009


On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote:
> I didn't understood yet what is the plugin's design and how the plugins are
> called from the core system and I was wondering if anyone could help me with
> that.

fts-storage.c hooks into all the functions in mail-storage API that it
needs to. Currently indexing isn't done while messages are being saved,
but instead just before searching. The searching functions are:

 - fts_mailbox_search_init() tries to figure out if FTS can optimize the
search. If it does, it tries to figure out if FTS index is up-to-date
and if not, starts the search.

 - fts_mailbox_search_next_nonblock() continues the indexing (or
searching after indexing) for a while. The idea is that IMAP connection
is able to process other commands while doing a long-running search. So
fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It
would be nice if that value was dynamically calculated and also based on
bytes instead of messages, but that's maybe too much trouble.

 - fts_mailbox_search_next_update_seq() uses the fts search results and
updates mail-storage's search stuff so that it doesn't go through
messages that don't match.

 - fts_build_mail() indexes a single mail. It parses the messages and
returns the data in small blocks. For text/* and message/rfc822 parts
those blocks are currently sent to FTS backend. This is where I think
you should look into hooking your attachment parsing. Change
fts_build_want_index_part() to look for more content-types that you're
interested in and then before feeding the blocks to FTS backend put them
through your own converter function, something like:

int attachment_extract_text(struct attachment_extract_context *ctx,
const struct message_block *input, struct message_block *output);


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://dovecot.org/pipermail/dovecot/attachments/20090415/89c0543a/attachment.bin 


More information about the dovecot mailing list