Google sitemaps - how and why to use them

Every site should have a sitemap! This used to mean a simple html page on which there were text links to every other (important) page on the site, with good anchor text, plus a brief description of the page content. The purpose of the html sitemap was to provide nice and easy links for search engine spiders to follow ensuring that every page on a site was included in the search engine index. Html sitemaps were, and still are, a secondary navigation system for human visitors and for other spiders that still like to follow good old html links.

Enter Google sitemaps during the summer of 2005. Google described them as

".. an experiment in web crawling."

to enable Google

"…to expand our coverage of the web and speed up the discovery and addition of pages to our index."

Of course, we all want to help Google to do that where our website is concerned.

 

How a Google sitemap is constructed

A Google sitemap is simply an xml file that lists every page on a site. The only tag that is required is the location of the page, ie its URL.

So a Google sitemap listing just one page and no optional information would look like

<?xml version="1.0" encoding="UTF-8"?>
    < urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
    < url>
      < loc>http://www.mydomain.co.uk/</loc>
    </url>
  </urlset>

There are several optional tags that can also be included:

  • last modified - the date on which the file was last modified
  • change frequency - how frequently the content of a page changes
  • priority - how important you, as the owner of the site, rate each page relative to other pages on the site.

Adding a few more pages and the optional tags we get are:

<?xml version="1.0" encoding="UTF-8"?>
  < urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
    < url>
      < loc>http://www.mydomain.co.uk/</loc>
      < lastmod>2005-11-17</lastmod>
      < changefreq>daily</changefreq>
      < priority>0.8 </priority>
    </url>
  < url>
      < loc>http://www.mydomain.co.uk/contact.html</loc>
      < lastmod>2005-11-01</lastmod>
      < changefreq>monthly</changefreq>
      < priority>0.2</priority>
   </url>
  < url>
      < loc>http://www.mydomain.co.uk/training.html</loc>
      < lastmod>2005-10-12</lastmod>
      < changefreq>weekly</changefreq>
      < priority>0.5</priority>
    </url>
  < url>
      < loc>http://www.mydomain.co.uk/archive.html</loc>
      < lastmod>2005-10-13</lastmod>
      < changefreq>never</changefreq>
      < priority>0.1</priority>
   </url>
</urlset>

Of course, Google only takes the optional data as an indication of how to treat pages. Giving all your pages a priority of 0.9 say will simply tell Google that you value each page equally whereas giving pages different levels of priority will give Google a clue as to how you rate their individual importance. Similarly with the change frequency data, it is differences that Google is looking for. How much notice Google actually takes of this optional data is, of course, another matter!

 

Using a sitemap generator

This is akin to using something like DreamWeaver to build your site, it just makes the job a lot easier.

Google suggests is own sitemap generator but then goes on to talk about Python and the need for Python to be loaded onto your server. My feeling is that at this point you either explain to the IT support department exactly which pages you want included in the sitemap or you turn to the list of third party tools Google suggests.

 

Search Engine Marketing and Web AnalyticsTraining Services