![](themes/icicle/images/spacer.gif) |
Spamdexing
Online Advertising
Spamdexing
Cloaking | Doorway page | Scraper site | Spam blogs | Spam in blogs | Spam mass | Made For AdSense | Bookmark spam | Referer spam | TrustRank
Spamdexing or search engine spamming is the practice of
deliberately creating web pages which will be indexed by search engines
in order to increase the chance of a website or page being placed close
to the beginning of search engine results, or to influence the category
to which the page is assigned. Many designers of web pages try to get a
good ranking in search engines and design their pages accordingly. The
word is a portmanteau of
spamming and indexing.
Spamdexing refers exclusively to practices that are dishonest and mislead
search and indexing programs to give a page a ranking it does not deserve.
"White hat" techniques for making a website indexable by search engines, without
misleading the indexation process, are known as
search engine optimization (SEO). SEO techniques do not involve deceit.
Search engine spammers, on the contrary, are generally aware that the content
that they promote is not very useful or relevant to the ordinary internet
surfer. Search engines use a variety of
algorithms to determine relevancy ranking. Some of these include determining
whether the search term appears in the META keywords tag, others whether the
search term appears in the body text of a web page.
A variety of techniques are used to spamdex (see below). Many search engines
check for instances of spamdexing and will remove suspect pages from their
indexes.
The rise of spamdexing in the mid-1990s made the leading search engines of
the time less useful, and the success of Google at both
producing better search results and combating keyword spamming, through its
reputation-based
PageRank
link analysis system, helped it become the dominant search site late in the
decade, where it remains. While it has not been rendered useless by spamdexing,
Google has not been immune to more sophisticated methods either.
Google
bombing is another form of web vandalism, which involves creating pages that
directly affect the rank of other sites[1].
Common spamdexing techniques can be classified into two broad classes:
content spam and link spam.
Content spam
These techniques involve altering the logical view that a search engine has
over the page's contents. They all aim at variants of the
vector space model for information retrieval on text collections.
- Hidden or invisible text
- Disguising keywords and phrases by making them the same (or almost
the same) color as the background, using a tiny font size or hiding them
within the HTML code such as "no frame" sections, ALT attributes and "no
script" sections. This is useful to make a page appear to be relevant
for a web crawler in a way that makes it more likely to be found.
Example: A promoter of a Ponzi scheme wants to attract web surfers to a site where he
advertises his scam. He places hidden text appropriate for a fan page of
a popular music group on his page, hoping that the page will be listed
as a fan site and receive many visits from music lovers.
-
Keyword stuffing
- This involves the insertion of hidden, random text on a webpage to
raise the keyword density or ratio of keywords to other words on the
page. Older versions of indexing programs simply counted how often a
keyword appeared, and used that to determine relevance levels. Most
modern search engines have the ability to analyze a page for keyword
stuffing and determine whether the frequency is above a "normal" level.
- Meta tag stuffing
- Repeating keywords in the
Meta
tags, and using keywords that are unrelated to the site's content.
- Gateway or
doorway pages
- Creating low-quality web pages that contain very little content but
are instead stuffed with very similar key words and phrases. They are
designed to rank highly within the search results. A doorway page will
generally have "click here to enter" in the middle of it.
-
Scraper sites
- Scraper sites are created using various programs such as Traffic
Equalizer. These programs are designed to 'scrape' search engine results
pages and create 'content' for a website. These types of websites are
generally full of clickable ads.
Link spam
Link spam takes advantage of link-based ranking algorithms, such as
Google's
PageRank
algorithm, which gives a higher ranking to a
website the more other highly-ranked websites link to it. These techniques also
aim at influencing other link-based ranking techniques such as the HITS
algorithm.
- Involves creating tightly-knit communities of pages referencing each
other, also known humorously as mutual admiration societies
[2]
- Hidden links
- Putting
links
where visitors will not see them in order to increase
link popularity.
-
Sybil attack
- This is the forging of multiple identities for malicious intent,
named after a
personality disorder with the same name. A spammer may create multiple
web sites at different domain names that all link to each other, such as fake blogs known
as
spam blogs.
-
Spam in blogs
- This is the placing or solicitation of links randomly on other
sites, placing a desired keyword into the hyperlinked text of the
inbound link. Guest books, forums, blogs and any site that accepts
visitors comments are particular targets and are often victims of drive
by spamming where automated software creates nonsense posts with links
that are usually irrelevant and unwanted.
-
Spam blogs
- A spam blog, on the contrary, is a fake blog created exclusively
with the intent of spamming.
-
Referer log spamming
- When someone accesses a
web page, i.e. the referee, by following a link from another web page,
i.e. the referer, the referee is given the address of the referer by the
person's internet browser. Some websites have a referer log which shows
which pages link to that site. By having a robot randomly access many
sites enough times, with a message or specific address given as the
referer, that message or internet address then appears in the referer
log of those sites that have referer logs. Since some search engines base the importance of sites by the number of
different sites linking to them, referer-log spam may be used to
increase the search engine rankings of the spammer's sites, by getting
the referer logs of many sites to link to them.
- Buying expired domains
- Some link spammers monitor DNS records for domains that will expire
soon, then buy them when they expire and replace the pages with links to
their pages.
Some of these techniques may be applied for creating a
Google
bomb, this is, to cooperate with other users to boost the ranking of a
particular page for a particular query.
Other types of spamdexing
-
Mirror websites
- Hosting of multiple websites all with the same content but using
different URLs.
Some search engines give a higher rank to results where the keyword
searched for appears in the URL.
- Page
redirects
- Taking the user to another page without his or her intervention,
e.g. using
META refresh tags, CGI scripts, Java, JavaScript, Server side redirects or server side techniques.
-
Cloaking refers to any of several means to serve up a different page
to the search-engine spider than will be seen by human users. It can be an
attempt to mislead search engines regarding the content on a particular web
site. It should be noted, however, that cloaking can also be used to
ethically increase accessibility of a site to users with disabilities, or to
provide human users with content that search engines aren't able to process
or parse. It is also used to deliver content based on a user's location;
Google themselves use IP delivery, a form of cloaking, to deliver results.
A form of this is 'code swapping, this is: optimizing a page for top
ranking, then swapping another page in its place once a top ranking is achieved.
The following techniques are also widely acknowledged as being
spam, or "black
hat":
See also
External links
To report Spamdexed pages
Search engine help pages for Webmasters
Other tools and information for Webmasters
Home | Up | Relevance | Link campaign | Anchor text | Site map | Search engine results page | WebRank | Google consultant | SEO contest | Spamdexing
Online Advertising, made by MultiMedia | Free content and software
This guide is licensed under the GNU
Free Documentation License. It uses material from the Wikipedia.
| ![](themes/icicle/images/spacer.gif) |