Boost your sales with Web Marketing Today Premium Edition

Reusing Web Content without Getting Penalized

by Dr. Ralph F. Wilson, E-Commerce Consultant
Web Marketing Today, Issue 126, July 9, 2003

For the July 2, 2003 Doctor Ebiz I received this question:

"Our organization creates huge amounts of content. This content is created and 'owned' by different internal divisions. Much of this content is re-usable across divisions. However, we have heard that allowing the same content to appear on multiple web properties can cause penalties from search engines. How can reuse content without getting blacklisted?" -- Keith Seabourn, Campus Crusade for Christ, International

Search Engine Marketing: the essential best practice guide, by Mike Grehan I shared this question with Mike Grehan, author of the highly-respected Search Engine Marketing: The essential best practice guide. Based on his experience, I answered the question briefly in Doctor Ebiz, but felt it was important to include Mike's full answer for those who are interested:

I've had a chance to review your question re duplicate material. And as is always the case where search engines are concerned -- there's no real "cut'n'dry" answer, I'm afraid.
Search engines are aware that there are many legitimate reasons for uploading duplicate content to the web. For instance, improving access time by providing regional versions of sites, sharing of research data and also the use of common promotional material for intermediaries selling the same products. However, the one area that they really do have concern about is what they term as "pseudo identities". They see this as a method of spamming search engines with seemingly different websites that do in fact have the same content, that is, two websites may be www.discountdvd_.com and www.xxxpass_.com both pointing to the same "adult" material.
One of the world's leading scientists in the field of information retrieval on the web has conducted the most authoritative research into detecting duplicate material online and his methods have been adopted very successfully by the major search engines. I'd rather not explain in detail exactly how these methods work in an open forum such as your newsletter, as this may also lend itself as a lesson on how to do better spamming. Suffice to say that, search engines can detect duplicate material from the most obvious analysis such as clusters of pages which are byte-wise identical or even just very similar. In the main, they view duplicate content as being a high percentage of paths -- that is the portion of the URL after the hostname or the file name -- which are present on more than one website. And more so if the content under those paths links to documents which have other similar content, such as duplicate page content with exactly the same outbound links.
It's safe to say that, in the widest definition, search engines do not consider hosts that replicate content but rename paths as "mirrors". Therefore, for the purpose of syndicating same/similar content, the best practice is to always rename the page file names under whichever server they are being hosted. Of course, many similar pages, even with different filenames, being hosted under the same IP address is likely to also set the "alarm bells" ringing.
To be completely safe in the knowledge that search engines will not penalise you or drop you from their database for reproducing content across a number of servers, ideally you would simply use the 'robots.txt' protocol to avoid this material being indexed by search engines in the first instance (http://www.robotstxt.org/wc/exclusion.html).


Read additional articles from Web Marketing Today, Issue 126, July 9, 2003

AddThis Social Bookmark Button
Three free e-books Subscribe to our free e-mail newsletter — Web Marketing Today®, published to 108,000+ confirmed opt-in subscribers worldwide. Just to encourage you to take this step, I'm including three free e-books that you can download and read: The Web Marketing Checklist: 32 Ways to Promote Your Website, 12 Website Design Decisions Your Business Will Need to Make, and Making & Marketing E-Books, each worth $12 -- just for subscribing. No catch.RSS feed
First Last
E-mail
Country (2-letter abbreviation)
Preferred Format Plain text
HTML

We respect your privacy and never sell or rent our subscriber lists. Subscribing will not result in more spam! I guarantee it!