bogofilter satisfaction report
JoeHill
joehill-rieW9WUcm8FFJ04o6PK0Fg at public.gmane.org
Wed Apr 27 23:35:54 UTC 2005
On Thu, 28 Apr 2005 01:48:33 +0300 (IDT)
Peter disseminated the following:
> plp at plp:~$ man bogofilter|wc -l
> 625
Ya, that was what put me off at first, but then when I started seeing 10 - 20
spam/day...
> In the end, it was painless.
Running a couple of commands to train Bogofilter is about as painless as it can
get compared to trying to fight spam with RegExp, eh?
> > A few spam are still getting past Bogofilter, but they're getting caught by
> > my
> > existing Procmail rules (no HTML, no Outlook, stuff like that).
>
> It will get better. Also look at the score of spam that 'passes'. That
> will give you a hint on how low you can dare to push the spam threshold
> in bogofilter.conf. The lower you push it the more drastic the filter,
> but it may start reaping legitimate emails.
Hmmmm, not seeing any scoring in the headers of the ones that got through. Back
to the manpage ;-)
> > Now, I called Bogo from Procmail as in the example from the manpage:
> >
> > :0HB:
> > * ? bogofilter
> > $MAILDIR/bogospam
> >
> > I'm curious, do you use the -u switch? In the FAQ, it seems to indicate that
> > this can be dangerous if one does not keep an eye on things.
>
> I have the -u switch on. Never gave me a hard time. Since no spam ever
> gets deleted (I collect it and delete it periodically) there is no risk
> of losing data.
This is the only part I don't get (probably because I have not done enough
reading yet). The mail Bogofilter sends to ~/mail/bogospam, how the heck do I
pick it apart and see which mail is which, let alone seperate it out for further
training?? I scan my Procmail log periodically to check how things are going,
but if I see a false positive...
Okay, I'll go do more reading.
> With -u bogofilter trains itself all the time. No matter what the spammers
> come up with bogofilter will learn (from you, when you send it the passed
> spam with -Ns) and keep reinforcing what it learned with every extra spam in
> the same Bayesian category. I wonder if it uses a hmm. I will have to pick it
> apart, it is intruguing.
I have no idea what 'hmm' is, but I'll take a wild guess: the 'h' is for
'heuristic'?
--
JoeHill / RLU #282046 / www.freeyourmachine.org
19:25:47 up 65 days, 20:34, 7 users, load average: 0.03, 0.03, 0.00
+++++++++++++++++++++++++++
"Superstition, idolatry, and hypocrisy have ample wages, but truth goes
a-begging." -- Martin Luther
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list