[Dovecot] deploying dspam

Tom Allison tallison at tacocat.net
Thu Dec 16 13:13:36 EET 2004


Mark E. Mallett wrote:
> On Thu, Dec 16, 2004 at 09:58:53AM +1100, Curtis Maloney wrote:
> 
>>It never came across to me that you were wanting something specific with 
>>dpsam... more that you wanted an explicit trigger for when a user decided 
>>something was/wasn't SPAM.  And I, personally, love the idea.
> 
> 
> I'd still like to see more general hooks on moving into and out of
> folders, or ways to "redeliver" email, or folders that could act as
> pipes, e.g. as mentioned in this thread:
> 
>    http://www.dovecot.org/list/dovecot/2003-July/001973.html
> 
> mm

Here's how I use training with dovecot.  It's hardly related to dovecot, 
but we've strayed this far, I thought I would attempt something that 
might become related again.

bogofilter does a test on email, without an database updates.  This 
keeps the database smaller and since it doesn't change I believe it's 
cached.

bogofilter goes into three categories: (H)am, (U)nsure, (S)pam.

Ham is copied into a folder, "Ham" and delivered as usual.
Unsure is copied into a folder, "Unsure" and delivered as usual.
Spam is delivered into a folder, "Spam"

The rest is done through crontabs.

crontab: All email in Ham, Spam that is >4 days old is automatically 
moved out of the IMAP system (mbox actually, but it's no longer IMAP 
accessable).

the human: moves Ham/Spam/Unsure into seperate folders, NewHam, NewSpam

crontab: All email in NewHam, NewSpam is checked for learning.  If the 
bogofilter score (H/U/S) doesn't match the folder it's placed in it's 
used for training.  In other words if the score is Unsure or Ham and 
it's in folder NewSpam then $score != $folder and it's used for retraining.

I like this method because the crontabs can be run at night when the 
load is small.

If you trigger training based on a mail copy, what happens when someone 
dumps 400 emails into a folder all at once?  What happens when 30 people 
do this all at the same time?  It might not suit a smaller system at 
peak hours to have this done.

I would prefer to impliment a system where you can queue up the training 
in large numbers, but the actual training is done in a managed approach. 
  Over time, the actual amount of training that occurs on a daily basis 
is on the order of <1 per week so it's not time critical that training 
be done.  At first, I ran it hourly.  Now I run it at midnight only. 
But on a large system, I would never deploy something without an initial 
wordlist to provide some filtering which would also make hourly jobs 
unneccessary.

So where does dovecot fall into all of this?

I don't know.  I really can't make an arguement for doing anything to an 
IMAP server that would help with any of this without also making for 
potential problems.  Dumping mail into pipes would lead to an 
unrecoverable condition if there was a human error (wrong pipe).

Perhaps the only thing would be to ask if moving email through the file 
system will really screw up the dovecot indexes.  Sometimes dovecot 
reports some pretty strange number of messages in these folders.



More information about the dovecot mailing list