LD SoftwareBespoke Software, Web Design, Security Consultants and Host Services.

Menu

Sentinel
You have been warned!
We have caught 5884 shameful hackers.

NukeSentinel(tm)

Paypal Referral
Sign up for PayPal and start accepting credit card payments instantly.

Link Exchange
Join our free link exchange

Click Here
 
Word salad

Online Advertising

Word salad

From Wikipedia the free encyclopedia, by MultiMedia

Back | Home | Up | Next


Word salad is a mixture of seemingly meaningful words that together signify nothing; the phrase draws its name from the positive symptom of psychosis, Word salad (mental health). When applied to a physical theory, "word salad" it is a derogatory description that labels the theory as senseless or utterly devoid of meaning.

In the context of computer science and linguistics, explicitly constructed word salad is a tool for demonstrating the difference between random utterance and coherent expression of thought. Software such as the Dissociated Press within emacs demonstrates the construction of interesting-but-meaningless word salad from large samples of coherent language, by constructing new, random documents that share some of the same word or letter clustering properties as the language sample. These word salads appear as natural language to the inattentive eye or ear, but are clearly meaningless when read or listened to with full attention. In the 21st century, e-mail spammers have begun using word salad construction as a way to elude e-mail filtering.

In spam e-mail

In response to the growing problem of spam e-mail, filtering tools became available starting around 2002 which implemented a widely employed method known as the naive Bayes classifier. This method uses the probability of various words appearing in spam emails to automatically classify them as spam. For a short time, this worked fairly well to classify emails as probable spam. In response, spammers developed word salad to fool programs employing this method of classification. By adding large amounts of random text somewhere in their message, spammers hope to confuse Bayesian classifiers into classifying the message as "ham e-mail" (non-spam e-mail). Typically, this text contains random words from a dictionary.

Algorithms for detecting word salad are clearly possible and not particularly difficult to implement. They would be, for the most part, more computationally intensive than most rules used by spam filters today (2006). A statistical approach based on Zipf's law of word frequency has potential in detecting simple word salad, as do grammar checking and the use of natural language processing algorithms. Statistical Markovian analysis, where short phrases are used to determine if they are likely to occur in normal English sentences, is another statistical approach that would be effective against completely random phrasing but might be fooled by Dissociated Press techniques.

Sentence and paragraph salad

In a related technique, actual text from some large corpus of legitimate English (the plays of Shakespeare, other etexts distributed by Project Gutenberg, random world wide web pages, or the like) is added into the email. This approach attempts to get around algorithms that could be devised to detect the more primitive form of word salad.

Paragraph salad will reduce the effectiveness of any of the algorithms mentioned above and will lead to higher scores with any Bayesian filters. The only algorithms that might thwart sentence and paragraph salad would be very high level and expensive natural language processing, some kind of artificial intelligence algorithm involving a search engine, or exhaustive listing of spam emails. All of these techniques would be exceptionally expensive, and would likely not be very successful at filtering spam despite their high cost.

Letter salad

On an even smaller scale than word salad, spammers use misspellings of words to try to thwart Bayesian filters. Misspelling Viagra as Via6ra, \/|/\Gr/\, or any one of a number of other ways, or even using characters from international character sets is an attempt to avoid the high efficiency with which a Bayesian filter would classify any email containing certain words as spam. A simple spell checker might significantly reduce the effectiveness of letter salad approaches, yet most present spam filters do not use one.

The lengths to which some spammers have gone with letter salad have often produced illegible, almost laughable messages. Reading such email has become akin to deciphering complex custom license plates.

Future

As spam filters get better at detecting simple word and letter salad, spammers will likely migrate towards sentence and paragraph salad techniques. In the process of obscuring their message from improving spam filters, they will also obscure their message from potential targets of their advertising, virus distribution, or phishing. At some point, the profitability of spam may be brought down to the point that its volume is substantially reduced.

Recommendations

End users should take no action upon receiving email with word salad content, or whose sender or purpose is unclear. Opening questionable email, and especially clicking on links contained in it, may risk overall information security.


Home | Up | e-Mail spammers | Spam bait | Word salad | Spamvertising | DNSBL | The Abusive Hosts Blocking List | e-Mail authentication | Sender Policy Framework | Open mail relay | Boulder Pledge

Online Advertising, made by MultiMedia | Free content and software

This guide is licensed under the GNU Free Documentation License. It uses material from the Wikipedia.

 
You can syndicate our News with backend.php And our Forums with rss.php
You can also access our feeds via Feedburner Site News and LD Software Forums
© 2009 ld-software.co.uk All Rights Reserved.
PHP-Nuke Copyright © 2005 by Francisco Burzi. This is free software, and you may redistribute it under the GPL. PHP-Nuke comes with absolutely no warranty, for details, see the license.
Page Generation: 0.53 Seconds