bogofilter satisfaction report

JoeHill joehill-rieW9WUcm8FFJ04o6PK0Fg at public.gmane.org
Wed Apr 27 23:35:54 UTC 2005


On Thu, 28 Apr 2005 01:48:33 +0300 (IDT)
Peter disseminated the following:

> plp at plp:~$ man bogofilter|wc -l
> 625

Ya, that was what put me off at first, but then when I started seeing 10 - 20
spam/day...

> In the end, it was painless.

Running a couple of commands to train Bogofilter is about as painless as it can 
get compared to trying to fight spam with RegExp, eh?

> > A few spam are still getting past Bogofilter, but they're getting caught by
> > my
> > existing Procmail rules (no HTML, no Outlook, stuff like that).
> 
> It will get better. Also look at the score of spam that 'passes'. That 
> will give you a hint on how low you can dare to push the spam threshold 
> in bogofilter.conf. The lower you push it the more drastic the filter, 
> but it may start reaping legitimate emails.

Hmmmm, not seeing any scoring in the headers of the ones that got through. Back 
to the manpage ;-)

> > Now, I called Bogo from Procmail as in the example from the manpage:
> >
> > :0HB:
> > * ? bogofilter
> > $MAILDIR/bogospam
> >
> > I'm curious, do you use the -u switch? In the FAQ, it seems to indicate that
> > this can be dangerous if one does not keep an eye on things.
> 
> I have the -u switch on. Never gave me a hard time. Since no spam ever 
> gets deleted (I collect it and delete it periodically) there is no risk 
> of losing data.

This is the only part I don't get (probably because I have not done enough
reading yet). The mail Bogofilter sends to ~/mail/bogospam, how the heck do I
pick it apart and see which mail is which, let alone seperate it out for further
training?? I scan my Procmail log periodically to check how things are going,
but if I see a false positive...

Okay, I'll go do more reading.

> With -u bogofilter trains itself all the time. No matter what the spammers
> come up with bogofilter will learn (from you, when  you send it the passed
> spam with -Ns) and keep  reinforcing what it learned with every extra spam in
> the same  Bayesian category. I wonder if it uses a hmm. I will have to pick it
>  apart, it is intruguing.

I have no idea what 'hmm' is, but I'll take a wild guess: the 'h' is for
'heuristic'?

-- 
JoeHill / RLU #282046 / www.freeyourmachine.org
19:25:47 up 65 days, 20:34, 7 users, load average: 0.03, 0.03, 0.00
+++++++++++++++++++++++++++
"Superstition, idolatry, and hypocrisy have ample wages, but truth goes
a-begging." -- Martin Luther 
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list