Google Sitemaps
From Wikipedia the free encyclopedia, by MultiMedia
Google Sitemaps is a service offered by Google to
help its crawlers better index webpages.
![Google Sitemaps page](./modules/Google_Guide-MM/images/Google49.jpg)
About
The Google Sitemap Protocol allows you to inform
search engines about URLs on your website
that are available for crawling. A Sitemap is an XML file that lists the
URLs for a site using the Google Sitemap Protocol. The protocol was written
to be highly scalable so it can accommodate sites of any size. It also
enables webmasters to include additional information about each URL. when it
was last updated; how often it changes; how important it is in relation to
other URLs in the site etc. so that search
engines can crawl the site more intelligently.
Sitemaps are particularly beneficial in situations when it is difficult for
users to access all areas of a website through the browseable interface. For
example, any site where certain pages are only accessible via a search form
would benefit from creating a Sitemap and submitting it to
search engines.
Note that the Sitemap Protocol is only a supplement and does not in any way
replace, the existing crawl-based mechanisms that
search engines already use to discover
URLs. By submitting a Sitemap (or Sitemaps) to a
search engine, you are only helping that
engine's crawlers to do a better job of crawling your site.
Using this protocol does not guarantee that your webpages will be included
in search indexes nor does it influence the way your pages are ranked by a
search engine.
If you submit an XML sitemap via Google account, Google provides current
crawler problem reports after a painless verification procedure to ensure
only the site owner gets access to the stats area. For WAP sites Google uses
a different procedure, and the URLs contained in the XML sitemap must be
renderable on a mobile device.
XML Sitemap Format
The Sitemap Protocol format consists of XML tags. The file itself must be
UTF-8 encoded.
Sample
A sample Sitemap that contains just one URL and uses all optional tags is
shown below.
<urlset
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">
<url>
<loc>http://www.yoursite.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
As with all XML files, any data values (including URLs) must use entity
escape codes for the characters : ampersand(&), single quote ('), double
quote ("), less than (>) and greater than (<).
You can compress your Sitemap files using gzip. Compressing your Sitemap
files will reduce your bandwidth requirement. Please note Google requires
that your uncompressed Sitemap file is not larger than 10MB.
Search engines will not process Sitemaps
larger than 10MB.
You can also provide multiple Sitemap files, but each file that you provide
must have no more than 50,000 URLs and must be no larger than 10MB
(10,485,760 bytes) when uncompressed. These limits help to ensure that your
web server does not get bogged down serving very large files.
If you want to list more than 50,000 URLs, you must create multiple Sitemap
files. If you anticipate your Sitemap growing beyond 50,000 URLs or 10MB,
you can create multiple Sitemap files and list them in a Sitemap index file.
Sitemap index files should not list more than 1,000 Sitemaps and should be
named as sitemap_index.xml.
It is strongly recommended that you place your Sitemap at the root directory
of your Web server (http://yoursite.com/sitemap.gz). After you produce your
Sitemap, you will need to notify search engines of the Sitemap's location.
The search engines that you notify will they retrieve your Sitemap and make
the URLs available to their crawlers.
Official Google Sitemaps Generator
Google is providing a script to generate the XML file based on the
Sitemap Protocol, it will look at your server logs, web directory, or a list
of URLs. The script is written in Python, and hosted on Sourceforge.
(Sitemap Python Script) The script can be scheduled to run, via cron or
Windows Task Scheduler. During the script's execution it will notify Google
that the sitemap has changed and to schedule a download of that sitemap.
XML Configuration File
When the script is run it requires at least the configuration file.
$ python sitemap_gen.py --config=<path/config.xml>
In the zip file or gzip download there will be an example_config.xml file
which has LOTS of documentation included in it.
Validation Tools
There are a number of
tools available
to help you validate the structure of your Sitemap based on this schema. You
can find a list of XML related
tools at each of
the following locations:
http://www.w3.org/XML/Schema#Tools
http://www.xml.com/pub/a/2000/12/13/schematools.html
http://www.smart-it-consulting.com/internet/google/submit-validate-sitemap/
Third Party Generation Tools
There are a number of 3rd party tools available to generate, edit, view,
submit and validate XML sitemaps. You can find a list of Google Sitemaps
related tools at each of the following locations:
http://code.google.com/sm_thirdparty.html
http://www.sitemaptools.com/
http://www.smart-it-consulting.com/article.htm?node=133&page=41
XSLT Stylesheet
Normally the xml files are not human friendly. To improve the readability
by humans you can add a xslt script to transform it to html. More
information can be found at
http://enarion.net/google/sitemaps/stylesheet/
External links
Google Guide made by MultiMedia | Free content and software
This guide is licensed under the GNU
Free Documentation License. It uses material from the Wikipedia.
|