Google’s Gary Illyes reveals the search index uses a tiered system where the most popular content is indexed on faster, more expensive storage.
This topic is discussed in the latest episode of Google’s Search Off the Record podcast which deals with language complexities in search index selection.
In explaining how Google builds its search index, Illyes says content is indexed on three types of storage:
- RAM (Random Access Memory): Fastest and most expensive
- SSD (Solid State Drive): Very fast but cost prohibitive
- HDD (Hard Disk Drive): Slowest and least costly
Google reserves the fastest storage for documents that are likely to be served in search results on a frequent basis.
Illyes states:
“And then, when we build our index, and we use all those signals that we have. Let’s pick one, say, page rank, then we try to estimate how much we would serve those documents that we indexed.
So will it be like every second? Will we have a query that triggers those docs? Or will it be once a week or will it be once a year?
And based on that, we might use different kinds of storages to build the index.”
Illyes goes on to give examples of what would be stored on RAM, what would be stored on SSDs, and what would be stored on HDDs.
Content that’s accessed every second will end up being stored on RAM or SSDs. This represents a small amount of Google’s entire index.
The bulk of Google’s index is stored on hard drives because, in Illyes’ words, hard drives are cheap, accessible, and easy to replace.
“So for example, for documents that we know that might be surfaced every second, for example, they will end up on something super fast. And the super fast would be the RAM. Like part of our serving index is on RAM.
Then we’ll have another tier, for example, for solid state drives because they are fast and not as expensive as RAM. But still not– the bulk of the index wouldn’t be on that.
The bulk of the index would be on something that’s cheap, accessible, easily replaceable, and doesn’t break the bank. And that would be hard drives or floppy disks.”
Of course Illyes is kidding about floppy disks, that’s the type of dry humor you get from him on the podcast.
To my knowledge this is the first time Google has let the public in on information about its search index storage tiers. It’s interesting to know the most searched-for content is stored on RAM and SSDs.
The cost of storing even a percentage of Google’s index on RAM and SSDs must be exorbitant. Though it’s likely the cost of faster storage is justified by how important the documents inside are to people.
The demand for the content must be so high that Google doesn’t want to risk a delay in getting it out to searchers.
As it relates to SEO there’s no way to optimize for one type of storage over the other. And there’s no way to tell which of the storage tiers your site is indexed on.
My guess is a decidedly small percentage of web pages are indexed on RAM or SSDs. Bringing it back to SEO, this is a good thing as it means the majority of sites are competing on a level playing field when it comes to index storage speed.