[Dovecot] Using MySQL to store email?

Eric S. Johansson esj at harvee.org
Wed Jun 7 23:58:44 EEST 2006


Jan Kundrát wrote:
> Marc Perkel wrote:
>> For example, a new message comes in and you find that sender matches
>> email in 100 people's spam folders and none in any other folder? It can
>> be classified as spam. If however the from address matches ham in people
>> folder and no spam then you can probably deliver it without spam scanning.
> 
> It's called auto-whitelisting and smart spam scanners should do that.
> 

actually, auto white listing is any one of a number of techniques used 
to eliminate false positives from "known parties".  I use one in camram 
where anyone you send e-mail to is automatically white listed.  To 
distinguish that from the often confusing auto white listing 
terminology, I call it "friends list".  It works exceedingly well and 
haven't had any significant problems even when the site has been 
infected with zombies.  With any automatic white listing tool, you need 
the human feedback which says "this is spam".  The human feedback 
enables automatic elimination of the entry from the auto white list, and 
blacklisting the IP address the message came from (you did preserve the 
source IP address as a new header in the message, didn't you?).

The analysis techniques suggested originally is classically naïve.  A 
technique I'm playing with that appears to work much better is to use 
the output of the content filter to predict whether a message is good or 
bad.  all of the bad messages are placed into a dumpster and expired 
after five days.  If a message is left in the dumpster, the IP address 
is listed as a "bad source".

Any messages that passes the content filter, friends filter, or spam 
filter is recorded as "good source".  If the ratio of good source to bad 
source drops below 80%, the site is listed as contaminated and 
automatically dumped in the spam trap for human analysis.  If the ratio 
drops below 40%, it's listed as spam and all messages are brown listed.

the main downside of this technique is that it does increase the 
workload for the user (more content in the spam trap) and it does seem 
to work better if you have multiple sources for feeding the good/bad 
ratio analysis

my two cents worth.


More information about the dovecot mailing list