Rendering SEO Manifesto: Why We Need to Go Beyond JavaScript SEO

Want to make sure that your content gets properly accessed by search engines and ranks high?

In his SEJ eSummit session, Bartosz Góralewicz presented how Google is rendering websites on a large scale and shared insights based on Google’s patents and documentation.

Here’s a recap of his presentation.

The Problem with JavaScript

Góralewicz and his team found that 40% of content relying on JavaScript is not indexed after 14 days.

It gets worse.

Ten percent of URLs within an average domain are not indexed by Google and we’re talking unique indexable URLs.

This is something to look at, especially since these trends are changing over time and can get worse.

In 2015, Google claimed that they are good with rendering saying:

“[A]s long as you’re not blocking Googlebot from crawling your JavaScript or CSS files, we are generally able to render and understand your web pages like modern browsers.”

Since 2017, Góralewicz and his team have created a lot of other experiments, including cloaking experiments with JavaScript and others, that revealed crawling and indexing issues encountered by JavaScript-based websites.

Which JavaScript framework is crawlable and indexable? @bart_goralewicz #SMS2017 pic.twitter.com/3beH9dCj14

— Aleyda Solis (@aleyda) May 2, 2017

That same year, Google started talking openly about JavaScript SEO.

Today, while we have Google’s Martin Splitt who has been incredibly helpful to the SEO community, there are still questions left unanswered.

In November 2019 at the Chrome Developer Summit, Splitt announced that the median for rendering at Google has improved from up to a week a year before to only five seconds in 2019.

However, Góralewicz and the Onely team’s additional research found that “the median rendering delay may be virtually non-existent for new websites, the delay in indexing JavaScript content is still very much there.”

Many JavaScript-powered websites don’t get indexed and don’t rank even after two weeks.

They also discovered that:

There are huge brands barely in Google’s index.
Indexing HTML is not as easy as assumed.
Indexing trends fluctuate during Google updates.
You can get kicked out of Google’s index.

One of the challenges with diagnosing indexing drops right now is that the site command is unreliable and can return a lot of false negatives.

Getting Into Google’s Index: A Big SEO Challenge

Getting your content into Google’s index is an absolute foundation of your online presence – and it remains a big SEO challenge today.

And this problem is going to get even bigger with Google’s limited resources because they cannot render and index the whole worldwide web, especially with the cost that now comes with a lot of modern websites.

Just look at some of the biggest brands with significant indexing issues.

If Google won’t index your webpages, all of the other SEO activities won’t matter at all.

Good thing both SEOs and Googlers are starting the conversation about indexing issues and we have better data sources to validate such.

And most indexing problems can actually be solved through technical SEO.

Here’s how.

Batch-Optimized Rendering: How It Works

Google is looking at your website from a batch-optimized rendering and fetch architecture (BOR) perspective.

Looking at these side-by-side views, what Google is seeing is different than what users see in a browser.

So how does BOR work?

Step 1: BOR Skips All Resources Which Are Not Essential to Generate a Preview of Your Page

The first step for batch optimized rendering and fetch architecture is to remove all of the resources that Google doesn’t need in order to generate the preview or layout of your website.

This includes:

Tracking scripts (Google Analytics, Hotjar, etc.)
Ads
Images

Just removing these extra resources can save up to more than 50% of loading, scripting, and rendering time. This saves quite a lot of resources on Google’s end.

Step 2: Set the Value of a Virtual Clock

The second step that Google is going to they’re going to set the value of the virtual clock (which we’ll talk a bit more about below).

Step 3: The Website’s Layout Is Generated

Once the time on that virtual clock “runs out”, the website’s layout is generated.

There are two key concepts to remember:

The Virtual Clock.
The Layout.

What Is a Virtual Clock?

Virtual Clock measures the cost of a website’s rendering.

It is sort of a rendering budget from Google’s side and websites are assigned a little bit of the “budget.”

When the rendering pauses to fetch resources (o.e., scripts, CSS files, image dimensions, etc.), that virtual clock is not advancing. It only advances when we actually render.

This means that if you have a lot of CSS, JavaScript, or other resources within your website, you need more “virtual time” on the virtual clock.

But there’s no guarantee how much of that virtual clock time you’ll be able to get.

While we don’t know what the limit is (and we might never know), we can figure out how resource-hungry our website is.

Using Chrome DevTools, you can slow down your CPU and see how it affects scripting and rendering.

Let’s take H&M’s website as an example.

It increased the time by up to 25 times.

We can see how H&M may struggling with rendering and indexing.

How to Measure the ‘Virtual Clock Load’ of Your Website

Góralewicz recommends two options to measure your “virtual clock load.”

The Layout of Your Page

When virtual clocks time runs out, the layout is generated regardless if it’s halfway through rendering or not.

This leads to a lot of potential challenges.

Most importantly, this is where JavaScript SEO ends and rendering SEO begins.

Rendering has a lot of focus on how layout plays out with this whole idea.

Content Location Matters

We already know that text appearing above the fold is more important than text below the line.

It turns out that it also affects how Google is going to crawl that content.

Google’s 2011 patent, Scheduling resource crawls, tells us how the search engine looks at different sections of the websites, as well as the links within those sections, with a different priority.

This goes to show that Javascript SEO is just the tip of the iceberg. It’s only focused on whether Google can see our content.

Rendering is way beyond that.

It’s a much broader topic because apart from Google just seeing the content, we’re now interested in:

The layout of the page.

The importance of content, based on text size, placement, etc.
Internal and external link extraction.
Entry change rates.
Other factors that have to do with how a website is rendered and how it looks like after that, including images.

Batch Rendering vs. Images

Google’s rendering service is using mock images. Here’s an example of how that plays out.

What About Links?

The value of links depends on their location and attributes.

We’ve known this for quite some time, but this gets more interesting when we look into more patterns from Google.

The position of the link within the page matters.

It affects how Google will crawl that link and what kind of “rating” Google will assign to that link.

Additionally, links placed in important sections of your page may be assigned a higher value compared to links in less important sections.

According to the Ranking documents based on user behavior and/or feature data patent (the Reasonable Surfer model), there are many other features associated with links, including:

Surrounding text: words before and/or after the link;
Type of link (e.g. image/text)
How commercial the anchor text associated with a link might be
Number of links in the source document
Font size

Moreover, Google doesn’t analyze pages on a block-level. A link, even if placed in a popular section of a page, can be considered unimportant – for instance, when it’s a “Terms of Service” link, a banner advertisement, or it’s a link unrelated to the document.

It’s important to note that Google, to fully apply the reasonable surfer model, it’s necessary that the page is fully rendered.

According to the Ranking documents based on user behavior and/or feature data patent (the Reasonable Surfer model), there are many other features associated with links, including:

Surrounding text: words before and/or after the link.
Type of link (e.g., image/text).
How commercial the anchor text associated with a link might be.
Number of links in the source document.
Font size.

Moreover, Google doesn’t analyze pages on a block-level.

A link, even if placed in a popular section of a page, can be considered unimportant – for instance, when it’s a “Terms of Service” link, a banner advertisement, or it’s a link unrelated to the document.

It’s important to note that Google, to fully apply the reasonable surfer model, it’s necessary that the page is fully rendered.

So Which Sections Do & Don’t Get Indexed?

What Góralewicz and his team found out through nine months of research is that Google uses very similar heuristics to pick which parts of a website should be rendered and which of those can be skipped.

To diagnose partial indexing, the Onely team looked at popular websites to see which parts of a given layout is indexed and which are not.

What they discovered is that Google seems to ignore some parts of the websites more eagerly than others.

For example, Google seems to struggle with rendering “related items” and “you may also be interested in” sections.

Google will most likely index your main content.

But… there is a good chance that if your website is heavy on the scripting and rendering side, they will skip a part of your page that is not as crucial as the main content after Google tries to understand that layout.

Google has mentioned that they will interrupt script when they’re heavy, but we didn’t know what that meant until now.

Partial Indexing: Key Findings

You may think that partial indexing is not that significant of a problem.

When they index your main content first, we can assume that this is a smart decision from Google.
This means they will often ignore parts of your layout.
Which may lead to sitewide indexing and crawling issues.
And we are back to the problem that after 14 days, around 40% of JavaScript content is not indexed.

But this leads to an even more significant problem – after 14 days, 10% of the URLs are not indexed.

This is going way beyond JavaScript SEO because rendering happens with and without JavaScript.

JavaScript is not the primary reason for rendering.

Knowing what we know now, should we still call it JavaScript SEO?

Takeaways

To wrap up his presentation, Góralewicz shared the following takeaways:

Rendering SEO and indexing are going to be one of the hottest SEO trends. Soon.
If you’re not indexed, all other SEO activities you’re doing won’t matter.
Indexing is something you can see and measure. It drives revenue. Directly.
For the first time in the history of SEO we have a good understanding of how rendering and indexing work, so let’s make good use of it.

Watch this Presentation

You can now watch Góralewicz’s full presentation from SEJ eSummit on June 2.

More Resources:

Image Credits

Featured Image: Paulo Bobita
All screenshots taken by author, July 2020