Sieve: Saving "pristine" messages for backups and spam training

Ben Johnson ben at indietorrent.org
Mon Aug 11 21:52:35 UTC 2014


On 8/11/2014 11:42 AM, Jeff Rice wrote:
> Hello,
> I'm trying to work out a way to have my Sieve filter save a "pristine"
> version of email messages as a backup, primarily to use for training the
> spam filter.  I would like is to have every message saved into a single,
> site-wide directory (in the global sieve) before being processed
> additionally and delivered.  The messages in that directory will be used
> to train the spam filter without having to worry about removing
> Spamassassin headers and so forth.

Provided I understand you correctly, my first thought is that saving a
duplicate copy of every single message that arrives on this system seems
wasteful.

Why not save only the messages that would actually be useful for spam
training purposes?

> 
> I thought fileinto :copy might do what I wanted, but this creates a
> backup directory individually for each user.  That's unmanageable for
> the spam training process I use. redirect *could* work, but that adds a
> header during the process so the email saved would not be "pristine".
> 
> I'm thinking of using the extprograms plugin to pipe to a program that
> will do a simple copy.  That feels very hackish, however, and I'm hoping
> there is a more elegant solution.
> 

There is; the Dovecot Antispam plug-in. It does exactly what you
describe, and it addresses the problem of storing a duplicate copy of
all messages.

In short, when a user drags a message from any folder to "Junk", you'll
receive a "pristine" copy of the message at any local address you
specify, delivered to any folder you specify (e.g., "Train as SPAM")
within that "training user's" mailbox.

Conversely, when a user drags a message from "Junk" to any other folder,
you'll receive a copy of the message in your "Train as HAM" folder.

Then, you can point your anti-spam solution's training executable to
these two "pristine master corpus" folders.

If you ever need to reclassify messages, or expunge them, doing so is
trivial with this master corpus approach.

> Am I missing something obvious here?
> 
> Thanks!
> Jeff

Happy to provide a sample script for the antispam plugin's mailtrain
back-end, as that's the one I use.

Cheers,

-Ben


More information about the dovecot mailing list