Wednesday, October 15, 2008

Which URLS to include in yours sitemap

I have been checking out for some information regarding which Urls we need to include in our sitemaps. Is it good to include dynamic URLS into your sitemap or only the URLS which you find important on yours website?. While i was surfing the web i could get this information which i gonna share here, might be helpful for you too

1) Sean Suchter from Yahoo! suggets - Put all the important pages in your sitemap, rather than every page on your site as Yahoo! considers sitemaps when figuring out which pages are valuable on a site, and if they believe this to be a trusted signal from a publisher, will use it more (the other engines seemed to agree).

2) Nathan Myhrvold from Microsoft suggests that URL structures in sitemaps are very important. Use the shortest, most authoritative, canonical version of the URL you want in the search engines' index in your sitemap file and they'll use that to help automatically filter duplicates and figure out which version to display.

3) Vanessa Fox ex Google employee suggests to put up a comprehensive list of URLs in the Sitemap. Here is what she suggests regarding her statement

" There are several ways people approach what to put in Sitemaps:

-Put the important pages in the Sitemap. This method is a good one is if it’s problematic to put all pages in the Sitemap. The point of the Sitemap is to let search engine know more about your site, particularly about the pages of your site, and this approach tells the search engines about the pages you care about most. That should give search engines a signal that all other things being equal, you’re telling them that these pages are the ones you care about. (Of course, all signals normally aren’t equal, so instead this will be one signal balanced among many, but the same idea holds.) So, that’s a solid approach.

-Put the non-indexed pages in the Sitemap. The idea behind this method is that search engines already know about the rest of your site, so you’re just making sure they know about these as well. This may seem the opposite of the first approach. After all, if from the first approach search engines should get a signal that the pages in the Sitemap are most important, then wouldn’t the search engines use that same signal for this set of URLs? When really they might be the least important (hence the non-indexing). It may seem that way, but actually that’s not the case. Since search engines use the Sitemap as one of many signals, what you’re really saying with URLs in a Sitemap, is hey, search engine! pay attention to these pages! It generally won’t cause the search engine to then forsake all other signals that caused indexing of the other pages. It will just focus some extra attention on these. A Sitemap comes into play the most in the crawling process. So, if some pages aren’t indexed, it makes sense to make sure the search engines know about them so they can crawl them.

-Put a comprehensive list of URLs in the Sitemap. This is my preferred approach when it’s technically practical. Why not tell search engines what the definitive list of pages on your site is? Why limit it to really important ones? One benefit to this is that there’s at least one place other than crawling that Sitemaps can be helpful, and that’s canonicalization. If a search engine has detected that several URLs display the same page, the version of the URL that’s in the Sitemap is a signal as to which is the canonical version.

In reality, any of these approaches are good ones. Sitemaps enable the site owner to have a voice in the long list of signals that search engines use to crawl and index pages. Since they’re a signal and not a directive, they don’t correlate to just one option. The signal tells the search engines that you care about their crawlers taking a look at these pages, and many times, they then do.

I imagine that each search engine uses the Sitemap signals slightly differently, since after all, each search engine has different crawling and indexing algorithms. However, I do think that it would be useful for the search engines to come together and let us know how exactly they use them and how they differ in using them. In particular, it would be very helpful if, as part of, they got together and made sure they weren’t using Sitemaps for opposing purposes. You don’t want to have a shared standard that is used so differently that if a site owner compiles a Sitemap in a particular way, it helps with one search engine and hurts with another.

When I worked on the collaboration, it was all about figuring out what the standard should be and coming together to support it. Now that all the major engines do, I think the next step is sorting out more details about how they’re used (particularly since the search engines should now have lots of data about how they can best be used) and give site owners best practices.

No comments:

Post a Comment

Page copy protected against web site content infringement by Copyscape
Clicky Web Analytics