There have been an ongoing discussions over the past few weeks across social media that Googlebot has dramatically reduced its crawling. For example, the founder of a web crawl analysis service tweeted a graph showing how Google’s crawl activity has declined since November 11, 2021.
Although the indexing slowdown doesn’t affect all sites, many on Twitter and Reddit agree that something changed at Google with respect to indexing and prove it with screenshots of Googlebot activity.
Evidence of Reduced Crawling
Anecdotal evidence of Google crawling anomalies have been stacking up on social media. The problem with social media is that one can literally make any observation about Google and nearly be guaranteed to receive agreement.
Anecdotal is interesting but not as good as data backed observations, which is what appeared recently on Twitter.
A founder of crawler and log analysis service Seolyzer (@Seolyzer_io) posted a graph of Google crawling behavior that showed a dramatic drop off of crawling activity beginning on November 11th.
He posted:
“Googlebot is on strike! Googlebot has drastically reduced its crawl activity on many large sites since November 11 at 6PM (GMT).”
Googlebot is on strike! Googlebot has drastically reduced its crawl activity on many large sites since November 11 at 6PM (GMT). Are you concerned? It requires a thread! pic.twitter.com/ugLmCZbC1O
— Olivier @Seolyzer.io (@Seolyzer_io) November 15, 2021
304 Server Response Code and Googlebot Crawling
Some have noted a pattern with Googlebot suddenly no longer crawling pages that serve a 304 server response code.
A 304 response code means 304 (Not Modified).
That response code is generated by a server when a browser (or Googlebot) makes a conditional request for a page.
That means that a browser (or Googlebot) tells the server it has a web page saved in cache so don’t bother serving it unless that page has been updated (modified).
Here is a definition of the 304 (Not Modified) server response code from the HTTP Working Group:
“The 304 (Not Modified) status code indicates that a conditional GET or HEAD request has been received and would have resulted in a 200 (OK) response if it were not for the fact that the condition evaluated to false.
In other words, there is no need for the server to transfer a representation of the target resource because the request indicates that the client, which made the request conditional, already has a valid representation; the server is therefore redirecting the client to make use of that stored representation as if it were the payload of a 200 (OK) response.”
304 Response Causes Less Googlebot Crawling?
One person tweeted confirmation (in French) that on several sites with AMP that he monitors experienced a drop on pages that responded with a 304 response.
Je confirme ici aussi dans la search console sur plusieurs sites avec amp baisse flagrante des 304 le 12 novembre
— Erwan Le Tallec (@eletallec) November 15, 2021
The person who posted the original tweet responded with a post of a graph showing how Google nearly stopped crawling pages that responded with a 304 server response code:
Theory 2: 304s pic.twitter.com/KQ2k1pgteS
— Olivier @Seolyzer.io (@Seolyzer_io) November 15, 2021
Others noticed a similar issue where pages serving a 304 response had drastically lower crawl rates:
lol j’ai vu ça ce WE et avant de faire un thread je cherchais des infos / annonces pouvant expliquer mais l’explication c’est clairement amp et 304
— Raphael Doucet (@RaphSEO) November 15, 2021
Another person noticed reduced crawls on travel pages but a crawl increase on ecommerce pages:
Saw this pattern only on tourism and travel portal in Croatia, ecommerce verticals are fine (even saw huge increase in crawls after evening 11 Nov on several)
— Marko Cvijic (@MarkoCvijic) November 15, 2021
Many others are sharing analytics and search console screenshots:
@JohnMu I think there are more routing problems with Google crawling again. Local Nginx server and S3 headers, same problem. Can you tell us something about it? Maybe Cloudflare related problem? https://t.co/c8eV9C4pxg @Seolyzer_io pic.twitter.com/mG1Iqb30UR
— Carlos Redondo (@carlosredondo) November 15, 2021
More data:
I extracted some data, centered around 21-11-11 19:39 (Parisian time).
Some Google verified crawl IPs went completely dark from that point in time. pic.twitter.com/FcqeMXuJPv— Baptiste M. (@bactisme) November 15, 2021
304 Response Code Should Not Alter Crawling
Google’s official developer help page documentation on Googlebot crawling states that a 304 response code should not impact crawling.
Here’s what Google’s official documentation advises:
“Googlebot signals the indexing pipeline that the content is the same as last time it was crawled.
The indexing pipeline may recalculate signals for the URL, but otherwise the status code has no effect on indexing.”
Is it possible that Google has changed (permanently or temporarily) and that developer page is outdated?
Cookie Consent Theory
The 304 Server Response theory is one of many theories and solutions to explain why Googlebot might not index a web page.
One person tweeted that Google increased indexing after removing a cookie consent bar.
Google not crawling and indexing new pages anymore? I had the same problem and removed the cookie consent bar (Cookiepro) to test. Guess what – problem solved. @JohnMu – any ideas why Google might not crawl and index new pages with a cookie-consent popup?
— Dennis Sievers (@resiever) November 16, 2021
Why would a cookie response bar cause Google to not index a web page? Could the cookie consent bar have triggered a 304 response, causing Google to not index the page?
Reduced Googlebot Crawls Discussed at Reddit
The phenomenon of reduced Googlebot crawls were also discussed on Reddit.
A Redditor described how in the past articles from their successful site were indexed within 10 minutes of submitting them via Google Search Console.
They related that recently only half of new articles were being indexed.
But that changed in November according to this Reddit post:
“For whatever reason now less than half of our new articles are indexing, even with me manually submitting them all right after publishing.”
Other redditors shared similar experiences:
“A lot of people are experiencing similar right now… Something seems to be going on with Google.”
“Something is up with Google indexing new posts…”
“My website is 17 years old… suddenly, the latest article took weeks to get indexed.”
Google Says Nothing is Broken
Google’s John Mueller responded to the questions on Reddit:
“I don’t see anything broken in the way Google indexes stuff at the moment. I do see us being critical about what we pick up for indexing though, as any search engine should.”
Is Google Testing New Crawling Patterns?
Bing in October announced an open source indexing protocol called IndexNow whose goal is to reduce how often crawlers crawl web pages in order to reduce how much energy is used at data centers for crawling and at servers for serving web pages. The new protocol benefits publishers because it speeds up the process of notifying search engines when pages are updated or created, resulting in faster indexing of quality web pages.
In November Google announced that it would test the new IndexNow indexing protocol to see if there are benefits to it.
Saving energy and reducing the carbon footprint is one of our most important issues of today. Could it be that Google is improving on ways to reduce crawling without radically changing to a new protocol?
Has Google Reduced Web Page Crawling?
There are some claims that Google has stopped indexing altogether but that is incorrect. However there is significant discussion on social media backed with data to support that Googlebot indexing patterns have changed.