I realise this isn’t a big issue at the end of the day, but when something bugs you, you have to follow it up right! This is a follow up article to my initial post yesterday concerning duplicate content. I’m still on the fence about which direction to take as far as indexing categories and tagging pages goes. Why? Well, it seems that everyone has a different opinion on the subject. Do we even know what is considered to be duplicate content? It looks doubtful.

If I have a category page that contains 10 different posts, all on the same topic area, that “duplicate” content is surely diluted by 10 times. Would a search engine still consider the category page to contain duplicate content from the individual article pages? Common sense says it shouldn’t. The same goes for tag pages, as long as they have a number of posts on the page, to dilute the duplicate content enough. I have read many opinions stating that it could still be duplicate content, but others say it really isn’t. Honestl, common sense tells me the latter opinion makes more sense.

Another issue, which is simply the number of pages on the internet. Can a search engine like Google really compare the billions of pages to check for duplicate content, and would it be worth it for them? Ok, so this may not affect content on the limited pages of a single domain, but taking into account the dilution factor mentioned above, would it really be in Google’s interest to flag it up as an issue for the average website? It’s obviously not malicious in nature, so it makes no sense to penalie it.

So it gets me thinking, why not take the meta tags I’m using already to control what is indexed, and add in a new factor. The number of posts under a certain tag or category - if its less than a certain number, make the page a noindex, and if its over a certain number, change the meta tag to allow indexing. This means that I can make sure the dilution factor is large enough to avert a real risk of being penalised for having multiple urls leading to “the same” content. Then I also get the benefit of the enriched keyword density for categories and tags.

A couple more issues still to decide upon. I need to make sure all my meta descriptions for category and tag pages are unique, and decide whether to index paged content, or just the first page of each category / tag page. I think I’ll stick with just the first page for now.

Any opinions on this approach - please leave a comment and let me know!