Word salad
Online Advertising
Word salad
Word salad is a mixture of seemingly meaningful words that
together signify nothing; the phrase draws its name from the positive
symptom of psychosis,
Word salad (mental health). When applied to a physical theory, "word salad" it is a derogatory description that labels the theory as
senseless or utterly devoid of meaning.
In the context of
computer science and linguistics, explicitly constructed word salad is a tool
for demonstrating the difference between random utterance and coherent
expression of thought. Software such as the Dissociated Press within emacs demonstrates the construction of interesting-but-meaningless word salad from
large samples of coherent language, by constructing new, random documents that
share some of the same word or letter clustering properties as the language
sample. These word salads appear as natural language to the inattentive eye or
ear, but are clearly meaningless when read or listened to with full attention.
In the 21st century,
e-mail
spammers have
begun using word salad construction as a way to elude e-mail filtering.
In spam e-mail
In response to the growing problem of
spam
e-mail, filtering tools became available starting around
2002 which implemented a widely employed method known as the naive Bayes
classifier. This method uses the probability of various words
appearing in spam
emails to automatically classify them as spam. For a short time, this worked
fairly well to classify emails as probable spam. In response, spammers developed
word salad to fool programs employing this method of classification. By
adding large amounts of random text somewhere in their message, spammers hope to
confuse
Bayesian classifiers into classifying the message as "ham
e-mail" (non-spam e-mail). Typically, this text contains random words from a
dictionary.
Algorithms for detecting word salad are clearly possible and not particularly
difficult to implement. They would be, for the most part, more computationally
intensive than most rules used by spam filters today (2006). A statistical
approach based on Zipf's law of word frequency has potential in detecting simple
word salad, as do grammar checking and the use of natural language processing
algorithms. Statistical Markovian analysis, where short phrases are used to
determine if they are likely to occur in normal English sentences, is another
statistical approach that would be effective against completely random phrasing
but might be fooled by Dissociated Press techniques.
Sentence and paragraph salad
In a related technique, actual text from some large corpus of legitimate
English (the plays of
Shakespeare, other etexts distributed by Project Gutenberg, random world wide
web pages, or the like) is added into the email. This approach
attempts to get around algorithms that could be devised to detect the more
primitive form of word salad.
Paragraph salad will reduce the effectiveness of any of the algorithms
mentioned above and will lead to higher scores with any Bayesian filters. The
only algorithms that might thwart sentence and paragraph salad would be very
high level and expensive natural language processing, some kind of artificial
intelligence algorithm involving a search engine, or exhaustive listing of spam
emails. All of these techniques would be exceptionally expensive, and would
likely not be very successful at filtering spam despite their high cost.
Letter salad
On an even smaller scale than word salad, spammers use misspellings of words
to try to thwart Bayesian filters. Misspelling Viagra as Via6ra, \/|/\Gr/\, or
any one of a number of other ways, or even
using characters from international character sets is an attempt to avoid the
high efficiency with which a Bayesian filter would classify any email containing
certain words as spam. A simple spell checker might significantly reduce the
effectiveness of letter salad approaches, yet most present spam filters do not
use one.
The lengths to which some spammers have gone with letter salad have often
produced illegible, almost laughable messages. Reading such email has become
akin to deciphering complex custom license plates.
Future
As spam filters get better at detecting simple word and letter salad,
spammers will likely migrate towards sentence and paragraph salad techniques. In
the process of obscuring their message from improving spam filters, they will
also obscure their message from potential targets of their advertising, virus
distribution, or
phishing.
At some point, the profitability of spam may be brought down to the point that
its volume is substantially reduced.
Recommendations
End users should take no action upon receiving email with word salad content,
or whose sender or purpose is unclear. Opening questionable email, and
especially clicking on links contained in it, may risk overall
information security.
Home | Up | e-Mail spammers | Spam bait | Word salad | Spamvertising | DNSBL | The Abusive Hosts Blocking List | e-Mail authentication | Sender Policy Framework | Open mail relay | Boulder Pledge
Online Advertising, made by MultiMedia | Free content and software
This guide is licensed under the GNU
Free Documentation License. It uses material from the Wikipedia.
|