22 Things We Might Have Learned From The Google Search Leak

This week some documents leaked that are supposedly giving us a rare peek behind the curtain of the world’s biggest search engine.

In case you missed it, you should:

  1. Read the SparkToro post (welcome back Rand)
  2. Read the iPullRank post from Mike King
  3. Watch the video from Erfan Azimi (the original leaker)
  4. Download this PDF

And most importantly you should:

REMEMBER: THIS IS AN UNCONFIRMED “LEAK”. YOU SHOULD NOT CHANGE YOUR SEO STRATEGY BASED ON THIS. WE DO NOT KNOW THAT GOOGLE USES ANYTHING FROM THIS LEAK TO RANK WEBSITES. PLEASE READ THIS.

So now that we are on the same page:

What does the “leak” contain?

chromeintotal

It’s really just a reference to all of the different attributes and modules that are potentially used in Google Search (including Chrome browser data!).

It’s important to note that the leaked documents don’t show the specific weight of elements as ranking factors. They also don’t prove which elements are actually used in the ranking systems.

And just because Google collects data doesn’t mean they use it!

But it does show the types of data Google likely collects and how much they collect – this alone was mind-blowing.

Take a look at this page which lists things like this:

productReviewPUhqPage

Or if there is a Panda based demotion:

pandademotion

And quite honestly:

There’s some really weird things in here like the Skin Tone Twiddler-

skintonetwiddler

I’ll talk to you more about Twiddlers later.

This has potential to be an absolutely seismic event in the SEO world.

But you must remember this is an unconfirmed leak and we do not know that Google is using this to rank websites in search!

With that said:

Here are 22 things that stood out to me so far, including some new discoveries that have not been covered anywhere else.

(Download the PDF here)

1. Domain Authority Probably Exists

It’s possible that Google is using a “Domain Authority” type metric in their ranking process.

The leaked documents revealed a metric called “siteAuthority” and “authorityPromotion”.

siteauthority and authoritypromotion

This seems closely associated with domain authority and likely applies sitewide when determining rankings.

Why is it so surprising?

For years, Google has consistently denied the existence of any domain authority metric. They have continually said that ONLY individual pages are evaluated based on their PageRank.

Never a sitewide authority ranking factor.

But that’s not all…

The leak also confirmed a signal called “Homepage PageRank“.

It’s thought that Google uses the PageRank score of a website’s homepage for new pages until those new pages get their own PageRank score.

In fact:

Pagerank is mentioned 16 times in the documents-

pagerank mentions

This is likely used in combination with “siteAuthority” above.

So it’s likely that your website’s authority matters. Building authority with quality backlinks, brand mentions, and other offsite authority signals is probably important.

2. Clicks Do Matter

This is one of the most unsurprising reveals…

Clicks play a significant role in determining search rankings.

From the recent Google anti-trust trial, we already know that systems like NavBoost and Glue exist.

But what we don’t know is the impact these systems have on the SERPs.

navboost search

Let me explain:

NavBoost is a unique system in Google Search that uses information about users’ clicks and interactions with search results to adjust rankings.

good clicks or bad clicks

NavBoost looks at different types of clicks:

  • Good Clicks: Used to identify positive user interactions with search results.
  • BadClicks: Used to identify negative user interactions with search results.
  • Last Longest Clicks: Used to measure the time users spend on a webpage after clicking from Google search results before returning.
  • Impressions: Used to track the number of times a search result is displayed to users.
  • Squashed Clicks: Clicks that are considered less valuable or spammed.
  • Unsquashed Clicks: Clicks that are considered more valuable and genuine.
  • Unicorn Clicks: Specific clicks with unique characteristics, potentially indicating high-quality interactions.

Google has said that click data does not impact search rankings, so why is this data collected?

click data

However, the leak confirms that Google uses clicks as a key metric for evaluating a page’s relevance and quality.

They even categorise click quality!

Crazy, right?

I think that click data plays a significant role in the search rankings. Mostly because of an email from Google VP Alexander Grushetsky:

navboost email

It’s something that all SEOs should be aware of.

The key takeaway is to create great content that answers search intent. Optimise your site and content to increase click-thru rates and dwell time.

Focus on user experience and reduce pogo-sticking as much as possible.

3. The Sandbox Is Probably Real

I’ve been saying for ages that the Google sandbox exists. Most SEOs figured this out by testing their own websites.

Despite that, Google has strongly denied that anything like the sandbox exists. But the leaked documents suggest otherwise.

Google uses a metric called “hostAge” which references the Sandbox:

hostage

This probably means that established and trusted websites gain rankings much faster. New websites must prove themselves trustworthy before being allowed out of the “sandbox.”

4. Google Might “Whitelist” Websites

During major events like the COVID-19 pandemic or democratic elections, Google collected isElectionAuthority and isCovidLocalAuthority data.

electrionauthority

These “whitelists” also extend to other niches like:

  • Travel
  • Health
  • Politics

And even restaurants it seems:

restaraunt whitelist

If true, this shows that Google could give preferential treatment to specific websites if it wanted to.

5. Chrome Data Might Be Used For Ranking

I wasn’t completely shocked by this.

We could already demonstrate that if you upload a PDF to a brand new domain, then accessed the direct URL of that PDF in Chrome – the PDF would magically get indexed.

But Google has been adamant that they don’t not use ANY Google Chrome data as part of organic search. In 2012, Matt Cutts said, “Google does not use Chrome data for search ranking or quality purposes”.

John Mueller has also reinforced this idea throughout the years.

But it looks like Google tracks users interactions through the Chrome browser using uniqueChromeViews:

uniquechromeviews

And more than that…

They are also use the chromeInTotal metric to track site level Chrome views:

chromeintotal

An anonymous source said the Chrome browser was created almost purely to collect more clickstream data. Naughty Google.

Let’s not forget they also admitted they were collecting data from users when using incognito mode for years.

6. There’s Not A “Single” Ranking Algorithm

Google uses multiple algorithms and systems to rank websites.

Most people think of the Google algorithm as one thing. One massive equation of complex calculations that determines which websites rank and in what order.

But Google is more sophisticated than that…

The documentation shows that Google’s ranking system is actually a series of microservices. Each built for different purposes.

Here’s a list of the mentioned systems by function and name:

Crawling

  • Trawler: Manages web crawling, including the crawl queue and understands how often pages are updated/changed.

Indexing

  • Alexandria: Primary indexing system.
  • SegIndexer: Organises web pages into tiers within the Google index.
  • TeraGoogle: Secondary indexing system likely responsible for managing pages that aren’t updated frequently.

Rendering

Processing

Ranking

  • Mustang: Primary system for scoring, ranking and serving content in the SERPs.
  • Ascorer: Main rankings algorithm before re-ranking adjustments.
  • NavBoost: Re-ranks based on user click logs (huge discovery).

Freshness

Serving

  • Google Web Server (GWS): Front end server interacting with users.
  • SuperRoot: Manages post-processing and re-ranking of results.
  • SnippetBrain: Generates snippets for results.
  • Glue: Pulls together universal results using user behaviour.
  • Cookbook: System for generating signals and values at runtime.

The leaked documents outline even more ranking systems, like SAFT. But based on the documentation alone, I couldn’t figure out what they actually do.

Here’s the interesting part:

Google’s algorithm works in layers. The first ranking system does its thing, followed by the second, the third and so on.

Each system adjusts based on the factors it handles, resulting in a refined version of the search results.

Presumably, when Google updates its algorithm, it changes weights in each system. This is why different updates affect different websites.

Note: The weight and influence that each of these systems has in determining rankings is unknown nor do we know they actually use any of this at all.

7. Google “Twiddles” With Search Results

There isn’t a lot of information about “Twiddlers” online, so let’s break down their role here:

Think of Twiddlers as behind-the-scenes filters. They tweak the search results after the main ranking algorithm has done its job.

Here’s how they work:

Just before a search result is shown, a Twiddler steps in and makes some final adjustments. These adjustments can either boost a website or push it down.

freshness twiddler

Twiddlers are a way for Google to fine-tune search rankings based on ranking factors beyond what the core algorithm considers.

Google systems that likely use the Twiddler function are:

qualityboost

Basically, any system ending in “Boost”.

But the weirdest Twiddler reference I found, has to be the Skin Tone Twiddler:

skintonetwiddler

Not really sure what that does, but there seems to be a lot of opportunity for twiddling the search results!

8. Google Authorship Still Exists

It looks like that Google might store information about authors in their index.

What does this mean?

It indicates that authorship could still be a ranking factor.

There has been much debate about E-EAT among SEOs. This is mainly because it’s very hard to score “expertise” and “authority” quantitatively.

But the leaks clearly show that data and information are associated with authors. Google has an “isAuthor” function and a “authorName” within the algorithm:

isauthor

All of this likely plays some role in their algorithm.

9. Anchor Text Is Important

Google uses anchor text within links to measure relevance in quality.

Anchor text is so important that getting it wrong can affect how the link impacts your website (demotion, promotion, ignored).

They are tracking all kinds of anchor text spam signals:

phraseanchor

But they don’t stop there.

There is also this document that collects even more anchor text data:

anchortext

Most people think that Google uses the anchor text to understand the website the link points to. That’s true, but that only examines the link from one side.

Google looks at both sides of the link and uses the anchor text for a signal. The algorithm checks whether the web page is relevant to the anchor text.

If the anchor text doesn’t accurately describe the content of the target site, the web page may actually get demoted in the search results. Or it’s possible Google ignores the link.

That means it’s essential for the anchor text to be relevant and accurate. This video shows how we choose anchor text.

10. Exact Match Domain Challenges

Using an exact match domain name might pose some challenges.

Let me explain:

Domains like “BestCheapLaptops.com” or “NewYorkLocksmith.com” could be viewed as spammy or manipulative. This is especially true if the content on those domains is low-quality or spammy in nature.

This is important because of the exactMatchDomainDemotion:

exactMatchDomainDemotion

While these are likely minor signals, it’s clear that there is a clear distinction with exact match domains.

Not only that, but if you’re paying attention you will see that it suggests EMD’s were previously boosted:

emd boost demotion

This make sense, because anyone who has been doing SEO for a while will know that EMD’s used to give a huge boost in the SERPs.

11. Panda Is Behaviour & Link Based

Google Panda has always been seen as the “content quality” update.

What hasn’t been clear is how Panda works. Until now…

The Panda algorithm focuses on how users interact with a website (user behaviour) and the sites linking root domains (backlinks).

Google then assigns each website a “Site Quality Score” based on its content quality and the number of quality backlinks which acts as a ranking modifier.

If you score high, you get a boost.

Score low and you could potentially be demoted by Panda.

pandademotion

How does Panda measure user behaviour?

Short answer: NavBoost.

The Panda system likely works with the NavBoost system to analyse user behaviour like clicks, interactions and other engagements.

click data

This feeds into the site quality score and impacts your rankings.

This is eye-opening information about Panda because the modifier affects your entire website. It is essential to have a strong backlink profile and ensure that your website provides an excellent user experience.

12. Google Could Demote You

It’s not nice to hear, but it could be true.

During the leak, many “demotions” were released. These are ranking factors that, if applied, can significantly lower your rankings.

What is interesting about this is some “demotions” were previously labelled as “boosts” like the exactMatchDomainDemotion:

emd boost demotion

Here’s a list of some of them:

  • Anchor Mismatch: If a link anchor text doesn’t match the content of the site it points to, Google reduces its value (as mentioned in point 9).
  • SERP Demotion: If a page shows signs that users aren’t happy based on the search results and clicks, it may be ranked lower.
  • Nav Demotion: Pages with poor navigation or bad user experience can be ranked lower.
  • Exact Match Domains Demotion: Exact match domains (e.g. “bestrunningshoes.com”) don’t get as much ranking boost and may even be scrutinised more.
  • Product Review Demotion: Low-quality product reviews are identified and ranked lower.
  • Location Demotions: Pages not specific to a location may be ranked lower as Google tries to match people to pages with relevant location-based content.
  • Porn Demotions: Adult content is obviously ranked lower.
  • Panda Demotion: Seemingly related to the Panda update.
  • Other Link Demotions: Google has a very sophisticated link graph that helps them understand backlinks.

You can find more by looking for the word “demotion” on this page e.g. there is reference to the babyPandaV2Demotion.

But here’s the takeaway:

You should be in the clear as long as you have great content, excellent user experience and a robust backlink profile.

13. Content Length Matters

Google has a token system for measuring content originality.

Why is this important?

Google looks at the number of tokens and compares it to the total number of words on the page.

But here’s the important part…

The system has:

  • numTokens – A maximum number of tokens (words) it can process on a single page. This means that if your content exceeds this limit, any additional tokens (words) beyond the maximum aren’t considered.
  • leadingtext – It seems to store the leading text of a page explicitly

word count

That’s why you must put your most important and unique information as high on the page as possible. This should ensure that Google reads it correctly.

It’s also worth pointing out the “OriginalContentScore” that only seems to apply to short-form content.

originalcontentscore

This means that quality content is not solely based on length.

There is also the spamtokensContentScore which is used in a twiddler:

So if you are going to write long content, make sure the most important stuff is as close to the start as possible.

14. Quality Raters Might Count

If you didn’t know…

Google has a human team of “quality raters” to evaluate websites. They have been public about this and even released the quality rater guidelines they give to their team.

But no one knows for sure how (or even if) the quality rater can impact the search algorithm.

The leaked Google search documents give us some insight.

Bottom line: They probably can.

Some elements show that information gathered by quality raters is stored such as furballUrl and raterCanUnderstandTopic attributes in this relevance ratings document.

And the humanRatings attribute:

humanRatings

Plus there’s other references here:

ewok

I wonder what the “golden set” external to EWOK is.

It’s still unclear whether this information is used just for training data sets or actually impacts the algorithm.

That means there’s a good chance if a quality rater visits your website, their opinion could affect your rankings.

It’s worth checking Google’s Search Quality Rate Guidelines to ensure your website is up to par.

15. Backlinks Are Still Important

For ages, Google has said that links are less important than they once were. I don’t think anyone actually believed them because independent tests proved otherwise.

The leaked documents have zero information that shows links are less important.

In fact:

There is the linkInfo attribute which states how important it is to the model:

linkinfo

What’s more?

Backlinks act as a modifier on numerous ranking factors and twiddlers.

The added importance of PageRank for home pages and the siteAuthority metric also indicates that backlinks are still very important.

Which is probably why PageRank is mentioned 16 times in total:

pagerank mentions

The “link graph” that Google engineers use to understand backlinks is pretty sophisticated. That’s why Google is so good at identifying high-quality and low-quality links.

But it’s clear that Pagerank important in these documents:

pagerankweight

In fact:

Links seem to be so important to Google’s algorithm that…

16. Traffic Impacts Link Value

This one was really interesting and something I, and many other SEO’s have been barking on about for years.

This comes from an anonymous source connected to the leaked Google search documents.

Google classifies links into 3 categories (sourceType):

  • Low quality
  • Medium quality
  • High quality

link weight

They then use click data to determine the link’s category.

If you build a link from a page that does not get traffic, the link is placed in the “low quality” category. It will have zero effect on your website rankings, and Google may even ignore the link completely.

On the other hand…

Getting a link from a page with a lot of traffic has the opposite effect. Google views the link as “high quality”, which helps your site rank better.

How does Google know which pages get clicks?

With the totalClicks attribute of course:

totalClicks

Getting a link from a top tier page should have a more significant impact on your rankings. This is because the link is also regarded as higher quality.

The bottom line is this…

You probably want backlinks from regularly updated pages with real organic traffic. These are likely to have the biggest impact on your site!

17. Backlink Velocity Is Measured

Lots of advanced metrics are collected to monitor backlink velocity.

Backlink velocity simply refers to the rate at which a website gains or loses backlinks over a certain period of time.

One of the signals Google might use to do this is the “PhraseAnchorSpamDays” metric to track the speed and volume of new links.

phraseAnchorSpamDays

This helps distinguish between genuine link growth and deliberate attempts to manipulate rankings.

That’s a pretty big insight into how Google link penalties could be handed out.

18. Your Changes Are Tracked

Google’s index works a bit like the Wayback Machine.

It stores different versions of web pages over time. Google’s documentation even seems to suggest that Google keeps a record of everything it indexes forever.

CrawlerChangerateUrlHistory

This is why you can’t redirect a page to another page and expect the link equity to pass on. Google knows the context and relevance of the previous page.

But the system itself seems to only consider the last 20 versions (urlHistory) of the page:

url history
What does that mean?

In theory, to “start fresh” with Google, you might need to make changes and get the pages re-indexed over 20 times.

19. User Signals Are Critical

The SparkToro post shared an email from Google VP Alexander Grushetsky:

navboost email

They also went on to share a resume which amongst many other interesting things, reads:

“Navboost. This is already one of Google’s strongest ranking signals.”

So user intent is clearly extremely important in Google search algorithm.

If people search, scroll past your website and click on a competitor enough times. Google will recognize that your competitor is satisfying the query better than you are.

The algorithm is designed to promote your website to the top of the organic search results because it’s being clicked on. Even if you don’t have great content or a strong backlink profile…

Google will likely reward your website because users want it.

That means user intent and the NavBoost system are likely the most influential ranking factors in Google’s algorithm beyond backlinks.

You could argue that relevant backlinks will get you to page 1, but positive user signals are required to keep you there.

20. Content Freshness Matters

We know that Google loves fresh content and aged content that is regularly updated.

This document shows that Google has multiple metrics designed to track how fresh the content of a page is.

bylineDate

These are the three big ones:

  • bylineDate – This is the date displayed on the page.
  • syntacticDate – This is the date from the URL or in the title.
  • semanticDate – This is the estimated date pulled from the page content.

It seems Google wants to be extremely confident in the freshness of the content. They have multiple metrics they can use to cross-reference each other.

There are also lastSignificantUpdate and contentage attributes:

lastSignificantUpdate

Ensure the date your content was updated is consistent across structured data, page titles, content, and sitemaps. Update your content to keep it fresh.

21. Domain Registration Information

Google collects and stores the latest domain registration information. That means they probably know who owns each domain and where it is registered.

domain registration information

Domain names are like digital property. Google uses its status as a domain registrar to track a domain ownership history.

It isn’t confirmed whether they use this information to impact the search results, but they definitely collect the data and they did touch on expired domain abuse in the March 2024 update.

22. Small Personal Sites Are Flagged

This might trigger a lot of people.

And I can understand why.

There is a specific attribute that indicates a site is a “small personal site“.

smallPersonalSite

The big question is why?

In the screenshot above it says “Score of small personal site promotion” so perhaps this is used to help promote small personal sites?

There isn’t any additional information about what a small personal site means.

There also isn’t an indication that this is being used in the algorithm for anything.

But who knows?

Wrapping It Up

Remember to download the PDF here for your quick reference.

Don’t forget:

This is an unconfirmed leak. We do not know that Google are using any of the modules or attributes to rank websites.

Even if Google are collecting this data, that doesn’t inherently mean they are using it in their ranking algorithms.

What’s my big takeaway?

The truth is that nothing that has come to light that will dramatically change how I do SEO this year.

If anything, the leak reflects the case study that I published in 2012:

  1. Create great content
  2. Promote it

That case study in 2012 was actually inspired by a Matt Cutts video published 14 years ago in 2010, which basically said:

  • Create a blog and establish yourself as an authority
  • Do something that is original, unique or different
  • Create content that is helpful to the community
  • Answers common questions in your niche
  • Publish original research
  • Create videos
  • Publish how to guides and tutorials
  • Create lists of helpful resources

Weird how that Matt Cutts advice is still relevant and arguably, more relevant than ever before.

If you want to learn more about the leaks you should:

  1. Download this PDF
  2. Read the SparkToro post
  3. Read the iPullRank post from Mike King
  4. Watch the video from Erfan Azimi (the original leaker)

What do you think about all of this? Have you found anything weird in the “leak”? Please leave a comment below and let me know!

Link Building

Link building you will be proud of.

Learn more

SEO Agency

We take full control of your traffic.

Learn more

Learn Portal

Free SEO tutorials to increase your traffic.

Learn more

What Are Your Thoughts?

10 Responses

  1. Incredible insights as always! Your expertise shines through in every post, making it a pleasure to read. Looking forward to more enlightening content!

  2. While I don’t require your services now, I love reading your articles, Matthew. I’d like to add some of my experience to the mix.
    Whenever I create a landing page that “works” and advertise it through Google Ads, the search volume increases until the “works” effect diminishes.
    Example: A flower shop Valentine’s Day promotion. It went live 5 days before Valentine’s Day. It worked so well that the shop sold out 2 days before the day. It took 24 hours of Conversions reported to GA and GAds to trigger a glut of Google Search Traffic. Once the conversions dropped because the shop sold out, the search traffic dropped off the cliff. No SEO was done for that offer.

    Note: I do digital ads and I create dedicated landing pages. I do that to separate Ads from SEO traffic and focus more on the offer. The pages are most of the time not accessible through the website menu.
    My clients often allow us to replace a more “generic” page on the website with these landing pages. The original LP then gets a canonical pointing to the URL on the website.
    The effect on search traffic is obvious.

    That’s why my main focus is conversion optimisation first, no matter what. And I bug the SEO team a lot with reports on non-performing (<5% conversion rate) pages. And I make sure that Google can measure what happens on the website as well as possible (Google Consent Mode V2, e.g.).

  3. This was an amazing read, Matthew. I loved how you provided practical suggestions after explaining each of the 22 points. Heading to share this jewel on Twitter.

  4. This is really the best article about the Google Search Leaks. You gained a new Email follower 😄

    The SEO challenge is one thing I must try!

  5. I’ve seen bits and pieces on other places, but this was the best one that collect most important ones in a nice manner. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

Increase Your Search Traffic
In Just 28 Days…

CLICK HERE TO GET STARTED I’ll show you how step by step

Featured In: