I am amazed at the number of supplemental pages that are caused by Google’s duplicate content filter. Good sites that do NOT understand that they are in fact having duplicate content problems, nonetheless, how to resolve them.
We have seen good quality sites loose good internal pages because they change currency, have a dynamic naming convention, use a similar contact form for hundreds of products/listings, and have been duplicated by competitors.
These are not sites trying to spam Google with “Page Spam” they are simply one of the thousands and probably millions of troubled sites falling victim to a filter that most webmaster agree is a tad bit “overly aggressive”.
I now tell my clients to think of “Some” Search Engines as a Spam Paranoid Grandma.
This sounds funny, it is, but it is also somewhat true. The SE’s must determine spam from credibility so they implement into their systems a filtering system that tries to deter spam. These filters have an accuracy percentage. Meaning they know innocent sites will be hurt in order to punish the majority of bad apples. What the acceptable percentage is is unknown. We could only guess that its probably in the 75+ percentile. As otherwise it would be useless.
As a webmaster, those numbers simply are not good enough. I have seen too many good quality sites hurt by duplicates, and its now becoming a practice for effective competitor removal.
Filters tend to penalize relentlessly, meaning once hurt, the site or page is dead forever. Its the kiss of Google Death.
I will admit I spend too much time talking about this filter. Sorry Matt, but I believe this ones is hurting too many decent sites and increasing the size of the supplemental index. It simply does not accurately determine the original content well enough.
Although, some of this is programmers faults, some webmasters are simply ignorant to the whole duplicate content filter.
To me this is the biggest headache filter as it really hurts websites rankings and acts as a penalty. You must be very careful if you develop dynamic sites and assure that there is never a way to reproduce the same page more than once or give multiple URL paths to get to the same page.
1) Determine the pages
2) Determine if a robots.txt can be used
3) Use a NOFOLLOW tag when in doubt
4) Never use different URLs for the same page
5) Assure mod rewrites are working properly and there is no way to get there dynamically, if there is disallow access via robots.txt.
6) Robots.txt the metas.
7) If else statements if changing currencies that will add a robots “noindex” to the page.
8) Go crazy and obsess over other possibilities.
9) POST OTHER SOLUTIONS IN THE COMMENTS.
10) Create new named URLs for the old pages after removing the duplicate pages. 301 redirect the old pages.
[tags]duplicate content filter, google, google’s, duplicate content, seo, seoimage, dynamic content, url naming, overly aggressive, filter, fixes, google death[/tags]