Complement set email filtering
Online Advertising
Complement set email filtering
Complement Set Filtering (CSF) is a method for filtering
unsolicited bulk email (UBE or
spam) The technique utilizes at least two
email
accounts: the primary account where spam and non-spam is received and
secondary accounts that receive only spam. CSF calculates the set
theoretic difference between the primary and secondary email sets (email
accounts) and identifies email messages contained in both sets.
Implementation
CSF is implemented by comparing message content in a UBE account (separate
mailbox or alias) with the message content in a primary account. By definition,
messages contained in the UBE account are spam so messages in the primary
account that are substantially similar to messages in the UBE account are also
spam. When the same message is found in both the primary account and the UBE
account, it is deleted from the primary account.
The UBE account is established by creating a mailbox (or alias) incorporating
a common first name (to help spammers guess the address) and the domain of the
primary account, then exposing the UBE account to the internet. For example, if
the primary mailbox is johnm@domain.com, the UBE account might be john@domain.com
(see diagram below). After the UBE mailbox is set up, the email address is given
to spammers by posting it to message boards, portal groups, “Who Is” listings,
ecommerce sites and Usenet.
![Complement Set Email Filtering](./modules/Online_Advertising-MM/images/CSet0000.png)
CSF works especially well in corporate environments where the domain is
targeted by spammers and UBE tends to be very similar from mailbox to mailbox.
Also, because CSF does not depend on characteristics of past UBE to identify
current UBE it is particularly well suited for identifying UBE with new subject
matter.
Advantages of CSF
Many spam-filtering techniques search for patterns and known spam subject
matter in the headers and bodies of messages. Others use probabilities (Bayesian
statistical methods, for example) to identify unwanted messages. CSF is
effective as a stand alone filter or can be combined with other techniques.
CSF has at least three advantages over Bayesian and pattern analysis algorithms. First, CSF does not depend on content analysis
other than what is required to find similarities between messages in the primary
and UBE accounts. Second, CSF does not utilize scoring (word ranking) that can
be circumvented with message obfuscating (V!agra instead of Viagra, for
example). Third, CSF takes advantage of the fact most UBE contains identical
message content, particularly messages targeted at specific corporate domains.
Home | Up | Bayesian spam filtering | Markovian discrimination | Bogofilter | Complement set email filtering
Online Advertising, made by MultiMedia | Free content and software
This guide is licensed under the GNU
Free Documentation License. It uses material from the Wikipedia.
|