An editor from the popular news site The Verge tweeted that a new article was replaced on page one of Google’s search results by other sites that had copied it. Danny Sullivan answers why that is happening.
Copied Content that Ranks Frustrates Publishers
Copied content that outranks the original is something that publishers have expressed frustration about for many years.
Some of the complaints are due to a misunderstanding.
For example, when a person searches a nonsense phrase like randomly selected words from an article, Google doesn’t know what to do with that’s not a real search query and there is no answer for a nonsense phrase.
So what Google does is to default to a text search, which means that Google is returning search results based on the words in a search query matching the words on a web page.
The real test for whether copied content is outranking the original content is when copied content outranks the original content for competitive keywords that users actually make.
Should a Page Rank Twice if It’s in a Top Stories Result?
But this situation that popped up introduces a different scenario. What happened is that Google will not rank an article headline in the top of the normal search results if that web page is already ranking in the Top Stories featured results, at the top of the web page.
Top Stories is a featured result where Google shows news articles related to a search query.
So if someone searches for a headline Google will usually show the article at the top of the search results in a Top Stories section.
But in this case it doesn’t show the original article in the top of the normal search results because of what Google calls deduplication, an algorithm that stops the same page from ranking twice.
So the question is, should Google rank the same page twice, once in the Top Stories and again at the top of the normal search results?
Entire First Page Consists of Stolen Content
Someone from The Verge tweeted that aside from Google’s featured news section at the top of the search results, a search for a headline from a new article resulted in Google showing an entire top ten that consists of nothing but stolen content.
The person tweeted:
“Hey Google, I just searched for a headline that was published on my website and the ENTIRE FIRST PAGE after the news box was of websites stealing our content. The Verge didn’t show up until page 2.
This problem is getting worse.”
Hey Google, I just searched for a headline that was published on my website and the ENTIRE FIRST PAGE after the news box was of websites stealing our content. The Verge didn’t show up until page 2.
This problem is getting worse. pic.twitter.com/Ox2AMYOt2Q
— Dieter Bohn (@backlon) January 18, 2022
Google’s Danny Sullivan acknowledged that writers searching with a headline expect to see their articles ranking at the top of the search results, not on page two.
But he also noted that searching by headline is not necessarily how regular searchers would search.
Danny’s response is debatable. A reasonable argument could be made that many people search the title of an article when they want to find it to share with a friend or on social media. So there is a real reason why people other than the author of an article may search for the title of an article.
Danny Sullivan from Google tweeted:
“We’ll take a look. I know searching by headline is common for writers and yes, I’d expect this to show first for that. But it doesn’t reflect how most people might seek this content (and for how they might search, I do find it). But again, we’ll look to improve.”
Danny next followed up with an explanation for why an original article ranks on page two for it’s own headline:
Here’s a follow-up on what’s happening & what we’re looking at. You do mention this, but it’s not clear from the screenshot that your article is the first thing on the page (as shown). Because it’s showing in Top Stories, it is getting deduplicated from the rest of the page… pic.twitter.com/YWCtcPAThZ
— Danny Sullivan (@dannysullivan) January 18, 2022
Deduplication can often be useful. Doing this search in the way that user might by using solution-seeking terms rather than unusual terms in the headline, there you are at the top in Top Stories plus deduplicating means there’s more variety from other publications…. pic.twitter.com/638IAZLWIV
— Danny Sullivan (@dannysullivan) January 18, 2022
In searches like that, our systems also are going to generally seek to show the most helpful, reliable info they can. That’s why you don’t see a lot of duplicates of your article showing. Duplicates certainly exist, but it isn’t that helpful to show them….
— Danny Sullivan (@dannysullivan) January 18, 2022
Search Queries That Trigger Alternative Search Results
Danny Sullivan’s next tweet explains how a search query with a lot of terms, like a headline term, causes Google’s algorithm to sort of drop out and begin return search results that are more like old style keyword searches, where the search results are not based on search intent or links but just based on the keywords themselves.
Here is what Danny tweeted:
That leads to headline-oriented searches. As I said before, that’s super common among authors. I used to do it all the time, myself. But headline searches contain typically contain a lot of terms, so our systems shift to return pages that have those terms…
— Danny Sullivan (@dannysullivan) January 18, 2022
As I mentioned above, there is a search intent behind searching for headlines. It may be that Google hasn’t recognized “headline-oriented searches” as a search intent that the algorithm should be aware of.
Danny continued his answer:
This means authors are more likely to find duplicates, even though for typical searches that readers would do, these are unlikely to appear. But our deduplication feature may still kick in even for these, as was happening in this case….
— Danny Sullivan (@dannysullivan) January 18, 2022
As I said, deduplication can be helpful. But we also understand the concern this might be raising. We’ve been doing this with Top Stories since last May, but we’re going to revisit this to see if we should continue or perhaps make other changes.
— Danny Sullivan (@dannysullivan) January 18, 2022
Also, I’m still checking, but I believe this deduplication is especially unique in that it only happens with Top Stories if there’s a single story shown or perhaps only for the very first story shown.
— Danny Sullivan (@dannysullivan) January 18, 2022
Just to cap off with the further clarification I promised, we deduplicate a link from web results if a link appears as the first link in Top Stories and if the Top Stories box appears before web results. If it comes after, we don’t. And again, it’s something we’re reviewing.
— Danny Sullivan (@dannysullivan) January 19, 2022
News Articles and Deduplication
Deduplication is when Google attempts stop one article from ranking twice in the search results. Danny Sullivan stated that the reason an article might not appear in the regular search results is if it is already ranking in the Top Stories and if that Top Stories ranks at the top of the page.
So the question is, is this a situation where a web page should rank twice, because a user might want to see the original article at the top of the search results, even if it’s already in the Top Stories section?
Once the Top Stories section disappears the news article should rank at the top of the search results.
Content is Top Ranked After Top Stories is Gone
And that, as can be seen in the above screenshot, is what is happening right now.
This is an interesting question where Google has to decide what is fair for the publisher and what is useful for the searcher.