How to Find Duplicate Content

Duplicate content is one of the biggest issues in SEO efforts, especially in terms of analyzing data, goals and conversions.  As you know, or should know, if you have duplicate content on your website, you are competing with yourself when it comes to the Search Engines Results Page (SERP) because you’re targeting those same keywords and phrases twice (or more) and this just confuses search engines. In addition, because of the duplicate content, you are also splitting your “link juice” with each duplicate page; this is very bad for Page Rank (Google’s Stamp of Approval), incoming links and will ultimately hurt your SEO rankings.

So what is considered Duplicate Content?

Duplicate content is when you have multiple pages with the same Page Title, Page Description, or multiple pages with the same content.

If you’re running a Content Management System (CMS), like Drupal, eFlex, or WordPress, and your site is content heavy, it is very possible that you may have duplicate content without even knowing it, especially if you’re using categories and/or tags within your CMS – I see this mostly with WordPress.

How to find Duplicate Content

While I’m sure there are many approaches to identifying duplicate content, and I hope you will share some of your methods, my go-to tool is Google Webmaster Central.

Google Webmaster Central has a great set of tools and will help you quickly find duplicate content by reporting pages with the same Page Titles and Page Descriptions.  If you’re using Google’s Webmaster Tools, here are the steps to find duplicate content:

  1. Log into your GWC account
  2. Click on Diagnostics
  3. Click HTML suggestions

Here you will see tables listing duplicate content, if any:

Handle Duplicate Content Example 1

Click on any one of the links and you will see a list of links showing you where the duplicates (Please note that if you are running campaigns, like in this illustration, then don’t worry about these because they’re not indexed).

Handle Duplicate Content Example 2

 

Now that you’ve identified the duplicate content, it’s time to do some more research. First thing is first – DO NOT DELETE any of the duplicate pages! Before you take any action, you need to ask yourself a few questions:

  1. Are any of the duplicate pages ranked higher than the others?
  2. Do any of these pages have incoming links?
  3. Are any of the duplicate pages indexed in Search Engines?

Checking Page Rank

This is a simple, but very important thing to check for. Depending on your browser of choice, there are plenty of browser extensions available to display Page Rank for web pages that you’re browsing. Personally, I use Google Chrome most of the time, but also like Firefox because of all the great SEO add-ons that are available. But, if you do not have any browser extensions installed, then you can simply visit the following page to check the duplicate pages’ Page Rank:

http://www.google-page-rank-check.com/

After checking each duplicate page, make sure to document the values, if any, so that you can refer to these later.

Checking for Incoming Links

Open up your handy Google Webmaster tools and follow these steps:

  1. Click Your site on the web
  2. Click Links to your site

Here you will see several views of how and where links are linking to your website.

Handle Duplicate Content Example 3

 

Same thing – document any incoming links to their respective pages.

Checking Search Engine Indexes

Because I’m not really doing any sort of marketing, I only deal with the giants – Google and Bing. Fortunately, both search engines will show you what has been indexed in their respective catalogues by using the same commands. There are several commands in which you can use:

site: –  this command will show you all the pages indexed in the catalogue for the specified domain name.

Handle Duplicate Content Example 4

inurl: – this command limits search results to those where the query appears in the URL.

Handle Duplicate Content Example 5

Intitle: – Similar to the inurl command, the intitle command limits results with only those where the query is in the title tag.

Handle Duplicate Content Example 6

Just for fun, you could do a vanity search by leaving out the site command – intitle:”Steven Britton

I should mention there are more commands, but you have the ones you’ll need to find the duplicate content that you’re looking for in the search engine indexes. So, go through these steps, find your duplicate content and document it.

If you have any methods that you’ve had success with, please post them here so that others, including myself, can benefit from them.

Coming soon: How to Remove Duplicate Content

  • Setting up 301 redirects
  • Adding canonical links
  • Removing duplicate content from search engines

 

 

 

 

Topics: Featured, SEO