|Newer page:||version 3||Last edited on Wednesday, July 6, 2005 12:22:02 am||by AristotlePagaltzis|
|Older page:||version 1||Last edited on Sunday, August 10, 2003 3:14:49 pm||by PerryLorier||Revert|
@@ -1,6 +1,6 @@
filtering based on statistics, for every document (email)
that arrives, you look at
each word in
that document and see
the probability that that word appears
in previous SPAM or HAM
] documents (emails). You then use
a Naive Bayesian calculation to figure out the probability that it's SPAM or HAM
. If it's SPAM you put it
into the SPAM folder
+A filtering that each that document and the in [ ] a Naive Bayesian calculation. into .
It's called "
Naive Bayesian " because it
assumes that events (Words)
are independant, when
they are obviously
not . However
, it works
remarkably well, and attempts
to make it "smarter" tend to end up with the error rate getting higher and higher
simple, fast , effective
, wrong, and actually works
Welcome to the glorious world of MachineLearning
+Naive Bayesian assumes that events are independant, they are not, remarkably to make it "smarter" tend to . simple, fast, wrong , and . Welcome to the glorious world of .
: Ham, obviously, is not spam.