When To Canonicalize, Noindex, Or Do Nothing With Similar Content

Picture your content as you do yourself. Are you carrying some baggage you could get rid of? Carrying something you want to keep but maybe want to repurpose or see differently?

This is no different when it comes to website content. We’ve all likely sat around as a group of minds thinking about the content we would like to slice off our website but realize there is still a need for it, whether it is for a specific prospect, internal team, etc.

While we look for ways to slim our websites as much as possible for content management purposes, we also want to do the same to appease crawling search engine bots.

We want their, hopefully, daily visit to our websites to be fast and succinct.

This hopefully shows them who we are, what we are about, and ultimately – if we have to have content that can’t be removed – how we are labeling it for them.

Luckily, search engine crawlers want to understand our content just as much as we want this of them. Given to us are chances to canonicalize content and noindex content.

However, beware, not doing this correctly could render important website content misunderstood by search engine crawlers or not read at all.

Canonicalize?

Screenshot by author, July 2022

Canonical tags provide a great way of instructing search engines: “Yes, we know this content is not that unique or valuable, but we must have it.”

It can also be a great way to point value to content originating from another domain or vice versa.

Nonetheless, now is your time to show the crawling bots how you perceive website content.

To utilize, you must place this tag within the head section of the source code.

The canonical tag can be a great way to deal with content that you know is duplicate or similar, but it must exist for user needs on the site or a slow site maintenance team.

If you think this tag is an ideal fit for your website, review your website and address site sections that appear to have separate URLs but have similar content (e.g., copy, image, headings, title elements, etc.).

Website auditing tools such as Screaming Frog and the Semrush Site Audit section are a quick way to see content similarities.

If you think there might be some other similar content culprits out there, you can take a deeper look with tools such as Similar Page Checker and Siteliner, which will review your site for similar content.

Now that you have a good feel for cases of similarity, you need to understand if this lack of uniqueness is worthy of canonicalization. Here are a few examples and solutions:

Example 1: Your website exists at both HTTP and HTTPS versions of site pages, or your website exists with both www. and non-www. page versions.

Solution: Place a canonical tag to the page version with the most significant amount of links, internal links, etc., until you can redirect all duplicating pages one-to-one. 

Example 2: You sell products that are highly similar where there is no unique copy on these pages but slight variations in the name, image, price, etc. Should you canonically point the specific product pages to the product parent page?

Solution: Here, my advice is to do nothing. These pages are unique enough to be indexed. They have unique names differentiating them, and this could help you for long-tail keyword instances.

Example 3: You sell t-shirts but have a page for every color and every shirt.

Solution: Canonical tag the color pages to reference the parent shirt page. Each page isn’t a particular product, just a very similar variation.

Use Case: Canonical Tagging Content That’s Unique Enough To Succeed

Similar to the example presented above, I wanted to explain that sometimes, slightly similar content can still be appropriate for indexation.

What if it was shirts with child pages for different shirt types like long sleeves, tank tops, etc.? This now becomes a different product, not just a variation. As also previously mentioned, this can serve successful for long-tail web searches.

Here’s a great example: An automotive sales site that features pages for car makes, associated models, and variations of those models (2Dr, 4Dr, V8, V6, deluxe edition, etc.). The initial thought with this site is that all variations are simply near duplications of the model pages.

You may think, why would we want to annoy search engines with this near duplicative content when we can canonicalize these pages to point to the model page as the representative page?

We moved in this direction but still, the anxiety on whether these pages could succeed made us move to canonically tag each respective model page.

Suppose you canonically tag to the parent model page. Even if you show the content importance/hierarchy to search engines, they may still rank the canonicalized page if the search is relatively specific.

So, what did we see?

We found that organic traffic increased to both child and parent pages. It’s my opinion that when you give credit back to the child pages, the parent page looks to have more authority as it has many child pages which are now given back “credit.”

Monthly traffic to all these pages together grew five times.

Since September of this year, when we revised the canonical tags, there is now 5x monthly organic traffic to this site area, with 754 pages driving organic traffic compared to the 154 recognized earlier in the previous year.

Screenshot by author with Semrush, July 2022

Don’t Make These Canonicalization Mistakes

  • Setting canonical tags that endure a redirect before resolving to the final page can do a great disservice. This will slow search engines as it forces them to try to understand content importance but are now jumping URLs.
  • Similarly, if you point canonical tags towards URL targets that are 404-ing error pages, then you essentially point them into a wall.
  • Canonical tagging to the wrong page version (i.e., www./non-www., HTTP/HTTPS). We discussed finding through website crawling tools that you may have unintentional website duplication. Don’t mistake pointing page importance to a weaker page version.

Noindex?

You can also utilize the meta robots noindex tag to exclude similar or duplicate content entirely.

Placing the noindex tag in the head section of your source code will stop search engines from indexing these pages.

Beware: While the meta robots noindex tag is a quick way to remove duplicate content from ranking consideration, it can be dangerous to your organic traffic if you fail to use it appropriately.

This tag has been used in the past to weed down large sites to present only search-critical site pages so that site crawl spend is as efficient as possible.

However, you want search engines to see all relevant site content to understand site taxonomy and the hierarchy of pages.

However, if this tag doesn’t scare you too much, you can use it to let search engines only crawl and index what you deem fresh, unique content.

Here are a couple of ways noindexing might be discussed as a solution:

Example 1: To aid your customers, you can provide documentation from the manufacturer, even though they already feature this on their website.

Solution: Continue providing documentation to aid your on-site customers but noindex these pages.

They are already owned and indexed with the manufacturer, which likely has much more domain authority than you. In other words, you will not likely be the ranking website for this content.

Example 2: You offer several different but similar products. The only differentiation is color, size, count, etc. We don’t want to waste crawl spend.

Solution: Solve via the use of canonical tags. A long-tail search could drive qualified traffic because a given page would still be indexed and able to rank.

Example 3: You have a lot of old products that you don’t sell much of anymore and are no longer a primary focus.

Solution: This perfect scenario is likely found in a content or sales audit. If the products do little for the company, consider retirement.

Consider either canonically pointing these pages to relevant categorical pages or redirecting them to relevant categorical pages. These pages have age/trust, may have links, and may possess rankings.

Use Case: Don’t Sacrifice Rankings/Traffic For Crawl Spend Considerations

Regarding our website, we know we want to put our best foot forward for search engines.

We don’t want to waste their time when crawling, and we don’t want to create a perception that most of our content lacks uniqueness.

In the example below, to reduce the bloat of somewhat similar product page content from search engine reviews, meta robots noindex tags were placed on child product variation pages during the time of a domain transition/relaunch.

The below graph shows the total keyword amounts which transitioned from one domain to another.

When the meta robots noindex tags were removed, the overall amount of ranking terms grew by 50%.

Screenshot by author with Semrush, July 2022

Don’t Make These Meta Robots Noindex Mistakes

  • Don’t place a meta robots noindex tag on a page with an inbound link value. If so, you should permanently redirect the page in question to another relevant site page. Placing the tag will eliminate the valuable link equity that you have.
  • If you’re noindexing a page that is included in the main, footer, or supporting navigation, make sure that the directive isn’t “noindex, nofollow” but “noindex, follow” so search engines that are crawling the site can still pass through the links on the noindexed page.

Conclusion

Sometimes it is hard to part ways with website content.

The canonical and meta robots noindex tags are a great way to preserve website functionality for all users while also instructing search engines.

In the end, be careful how you tag! It’s easy to lose search presence if you do not fully understand the tagging process.

More Resources:

Featured Image: Jack Frog/Shutterstock