Columnist Janet Driscoll Miller reminds us that in an age of content syndication, a well-maintained XML sitemap is key to establishing your site as the original source of your content.
In the early days of search engines, I wasn’t much of a believer in XML sitemaps. But over time, I began to see firsthand how they can benefit websites.
XML sitemaps serve as a way to communicate directly with the search engines, alerting them to new or changed content very quickly and helping to ensure that the content is indexed faster.
For content publishers, it’s become critical to help Google specifically understand if your site is the original publisher of content. Why? Panda.
Content Syndication, Duplicate Content & Panda
It’s not uncommon for publishers to syndicate their content on other websites. Further, it’s also not uncommon for publishers to have their site’s content “curated” by other websites without a formal syndication agreement.
Unfortunately, the definition of content curation is fuzzy at best. In a quick Google search for a recent Search Engine Land article, I found over 47 copies of the article on other sites. (Editor’s note: these are not authorized copies.)
For every publisher site offering syndicated content or having content curated by others (with or without permission), the stakes could not be higher with Google. The Panda algorithm update focused in part on removing duplicate content from search engine results pages — meaning that if a site is not deemed the content originator, it’s at risk of being excluded from the results altogether.
XML sitemaps are just one tool that can help content creators establish their stake as the content originator.
Just how profound can XML sitemaps be for indicating content origination?
In theory, the content originator would likely have the earliest indexed timestamp for the content. But take this example, from a publisher that is not using XML sitemaps, into consideration. The curating or syndicating site is having the same content indexed nearly 40 minutes earlier than the original content:
- Original Content Publisher
- Curated or Syndicated Site
- How to Get Started
So, how should you get started? First, you’ll need to create an XML sitemap for your site. Some content management systems (CMS) have an integrated capability to auto-generate XML sitemaps. For Word Press users, I recommend using the Yeast SEO Plug-in as Word Press does not have built in sitemap generation capability. (If you are already using Yeast for SEO, make sure you have updated to the most recent version.)
Ideally, you’ll want to use a plug-in for your CMS (or innate CMS functionality) to create a sitemap because these tools normally will automatically update your sitemap as new content is added or content is changed. However if you don’t use a CMS or Word Press, you can also create an XML sitemap using various tools like xml-sitemaps.com; however, you’ll need to update your sitemap manually on a regular basis to ensure that its information is correct and up to date.
If you have a particularly large website, you may also need to employ a sitemap index. Search engines will only index the first 50,000 URLs in a sitemap, so if your site has more than 50,000 URLs, you’ll need to use an index to tie multiple sitemaps together. You can learn how to create indices and more about sitemaps at sitemaps.org.
After you’ve created your sitemaps (and potentially sitemap indices), you’ll need to register them with the various search engines. Both Google and Bing encourage webmasters to register sitemaps and RSS feeds through Google Webmaster Tools and Bing Webmaster Tools.
Taking this step helps the search engines identify where your sitemap is — meaning that as soon as the sitemap is updated, the search engines can react faster to index the new content. Also, content curators or syndicators may be using your RSS feeds to automatically pull your content into their sites.
Source: w3trainingschool