Markovian discrimination
Online Advertising
Markovian discrimination
Markovian discrimination in spam filtering is a method used in
CRM114
and other spam filters to model the statistical behaviors of spam and
nonspam more accurately than in simple
Bayesian methods. A simple Bayesian model of written text contains
only the dictionary of legal words and their relative probabilities. A
Markovian model adds the relative transition probabilities that given
one word, predict what the next word will be. It is based on the theory
of Markov chain by Andrei Markov, hence the name. In essence, a Bayesian filter works
on single words alone, while a Markovian filter works on phrases or
entire sentences.
There are two types of Markov models; the visible Markov model, and the
Hidden Markov Model or HMM. The difference is that with a visible Markov
model, the current word is considered to contain the entire state of the
language model, while a hidden Markov model hides the state and presumes only
that the current word is probabalistically related to the actual internal state
of the language.
For example, in a visible Markov model the word "the" should predict with
accuracy the following word, while in a hidden Markov model, the entire prior
text implys the actual state and predicts the following words, but does not
actually guarantee that state or prediction. Since the latter case is what's
encountered in spam filtering, hidden Markov models are almost always used. In
particular, because of storage limitations, the specific type of hidden Markov
model called a
Markov random field is particularly applicable, usually with a clique size
of between four and six tokens.
Home | Up | Bayesian spam filtering | Markovian discrimination | Bogofilter | Complement set email filtering
Online Advertising, made by MultiMedia | Free content and software
This guide is licensed under the GNU
Free Documentation License. It uses material from the Wikipedia.
|