This week some documents leaked that are supposedly giving us a rare peek behind the curtain of the world’s biggest search engine.
In case you missed it, you should:
And most importantly you should:
REMEMBER: THIS IS A CONFIRMED “LEAK”, WE DO NOT KNOW THAT GOOGLE USES ANYTHING FROM THIS LEAK TO RANK WEBSITES. YOU SHOULD NOT CHANGE YOUR SEO STRATEGY BASED ON THIS. PLEASE READ THIS.
So now that we are on the same page:
What does the “leak” contain?
It’s really just a reference to all of the different attributes and modules that are potentially used in Google Search (including Chrome browser data!).
It’s important to note that the leaked documents don’t show the specific weight of elements as ranking factors. They also don’t prove which elements are actually used in the ranking systems.
And just because Google collects data doesn’t mean they use it!
But it does show the types of data Google likely collects and how much they collect – this alone was mind-blowing.
Take a look at this page which lists things like this:
Or if there is a Panda based demotion:
And quite honestly:
There’s some really weird things in here like the Skin Tone Twiddler-
I’ll talk to you more about Twiddlers later.
This has potential to be an absolutely seismic event in the SEO world.
But you must remember this is an unconfirmed leak and we do not know that Google is using this to rank websites in search!
With that said:
Here are 22 things that stood out to me so far, including some new discoveries that have not been covered anywhere else.
That’s not all…
We also made this highly popular guide to the Google Algorithm leak.
It specifically looks at link building:
What Will I Learn?
It’s possible that Google is using a “Domain Authority” type metric in their ranking process.
The leaked documents revealed a metric called “siteAuthority” and “authorityPromotion”.
This seems closely associated with domain authority and likely applies sitewide when determining rankings.
Why is it so surprising?
For years, Google has consistently denied the existence of any domain authority metric. They have continually said that ONLY individual pages are evaluated based on their PageRank.
Never a sitewide authority ranking factor.
But that’s not all…
The leak also confirmed a signal called “Homepage PageRank“.
It’s thought that Google uses the PageRank score of a website’s homepage for new pages until those new pages get their own PageRank score.
In fact:
Pagerank is mentioned 16 times in the documents-
This is likely used in combination with “siteAuthority” above.
So it’s likely that your website’s authority matters. Building authority with quality backlinks, brand mentions, and other offsite authority signals is probably important.
This is one of the most unsurprising reveals…
Clicks play a significant role in determining search rankings.
From the recent Google anti-trust trial, we already know that systems like NavBoost and Glue exist.
But what we don’t know is the impact these systems have on the SERPs.
Let me explain:
NavBoost is a unique system in Google Search that uses information about users’ clicks and interactions with search results to adjust rankings.
NavBoost looks at different types of clicks:
Google has said that click data does not impact search rankings, so why is this data collected?
However, the leak confirms that Google uses clicks as a key metric for evaluating a page’s relevance and quality.
They even categorise click quality!
Crazy, right?
I think that click data plays a significant role in the search rankings. Mostly because of an email from Google VP Alexander Grushetsky:
It’s something that all SEOs should be aware of.
The key takeaway is to create great content that answers search intent. Optimise your site and content to increase click-thru rates and dwell time.
Focus on user experience and reduce pogo-sticking as much as possible.
I’ve been saying for ages that the Google sandbox exists. Most SEOs figured this out by testing their own websites.
Despite that, Google has strongly denied that anything like the sandbox exists. But the leaked documents suggest otherwise.
Google uses a metric called “hostAge” which references the Sandbox:
This probably means that established and trusted websites gain rankings much faster. New websites must prove themselves trustworthy before being allowed out of the “sandbox.”
During major events like the COVID-19 pandemic or democratic elections, Google collected isElectionAuthority and isCovidLocalAuthority data.
These “whitelists” also extend to other niches like:
And even restaurants it seems:
If true, this shows that Google could give preferential treatment to specific websites if it wanted to.
I wasn’t completely shocked by this.
We could already demonstrate that if you upload a PDF to a brand new domain, then accessed the direct URL of that PDF in Chrome – the PDF would magically get indexed.
But Google has been adamant that they don’t not use ANY Google Chrome data as part of organic search. In 2012, Matt Cutts said, “Google does not use Chrome data for search ranking or quality purposes”.
John Mueller has also reinforced this idea throughout the years.
But it looks like Google tracks users interactions through the Chrome browser using uniqueChromeViews:
And more than that…
They are also use the chromeInTotal metric to track site level Chrome views:
An anonymous source said the Chrome browser was created almost purely to collect more clickstream data. Naughty Google.
Let’s not forget they also admitted they were collecting data from users when using incognito mode for years.
Google uses multiple algorithms and systems to rank websites.
Most people think of the Google algorithm as one thing. One massive equation of complex calculations that determines which websites rank and in what order.
But Google is more sophisticated than that…
The documentation shows that Google’s ranking system is actually a series of microservices. Each built for different purposes.
Here’s a list of the mentioned systems by function and name:
The leaked documents outline even more ranking systems, like SAFT. But based on the documentation alone, I couldn’t figure out what they actually do.
Here’s the interesting part:
Google’s algorithm works in layers. The first ranking system does its thing, followed by the second, the third and so on.
Each system adjusts based on the factors it handles, resulting in a refined version of the search results.
Presumably, when Google updates its algorithm, it changes weights in each system. This is why different updates affect different websites.
Note: The weight and influence that each of these systems has in determining rankings is unknown nor do we know they actually use any of this at all.
There isn’t a lot of information about “Twiddlers” online, so let’s break down their role here:
Think of Twiddlers as behind-the-scenes filters. They tweak the search results after the main ranking algorithm has done its job.
Just before a search result is shown, a Twiddler steps in and makes some final adjustments. These adjustments can either boost a website or push it down.
Twiddlers are a way for Google to fine-tune search rankings based on ranking factors beyond what the core algorithm considers.
Google systems that likely use the Twiddler function are:
Basically, any system ending in “Boost”.
But the weirdest Twiddler reference I found, has to be the Skin Tone Twiddler:
Not really sure what that does, but there seems to be a lot of opportunity for twiddling the search results!
It looks like that Google might store information about authors in their index.
What does this mean?
It indicates that authorship could still be a ranking factor.
There has been much debate about E-EAT among SEOs. This is mainly because it’s very hard to score “expertise” and “authority” quantitatively.
But the leaks clearly show that data and information are associated with authors. Google has an “isAuthor” function and a “authorName” within the algorithm:
All of this likely plays some role in their algorithm.
Google uses anchor text within links to measure relevance in quality.
Anchor text is so important that getting it wrong can affect how the link impacts your website (demotion, promotion, ignored).
They are tracking all kinds of anchor text spam signals:
But they don’t stop there.
There is also this document that collects even more anchor text data:
Most people think that Google uses the anchor text to understand the website the link points to. That’s true, but that only examines the link from one side.
Google looks at both sides of the link and uses the anchor text for a signal. The algorithm checks whether the web page is relevant to the anchor text.
If the anchor text doesn’t accurately describe the content of the target site, the web page may actually get demoted in the search results. Or it’s possible Google ignores the link.
That means it’s essential for the anchor text to be relevant and accurate. This video shows how we choose anchor text.
Using an exact match domain name might pose some challenges.
Let me explain:
Domains like “BestCheapLaptops.com” or “NewYorkLocksmith.com” could be viewed as spammy or manipulative. This is especially true if the content on those domains is low-quality or spammy in nature.
This is important because of the exactMatchDomainDemotion:
While these are likely minor signals, it’s clear that there is a clear distinction with exact match domains.
Not only that, but if you’re paying attention you will see that it suggests EMD’s were previously boosted:
This make sense, because anyone who has been doing SEO for a while will know that EMD’s used to give a huge boost in the SERPs.
Google Panda has always been seen as the “content quality” update.
What hasn’t been clear is how Panda works. Until now…
The Panda algorithm focuses on how users interact with a website (user behaviour) and the sites linking root domains (backlinks).
Google then assigns each website a “Site Quality Score” based on its content quality and the number of quality backlinks which acts as a ranking modifier.
If you score high, you get a boost.
Score low and you could potentially be demoted by Panda.
How does Panda measure user behaviour?
Short answer: NavBoost.
The Panda system likely works with the NavBoost system to analyse user behaviour like clicks, interactions and other engagements.
This feeds into the site quality score and impacts your rankings.
This is eye-opening information about Panda because the modifier affects your entire website. It is essential to have a strong backlink profile and ensure that your website provides an excellent user experience.
It’s not nice to hear, but it could be true.
During the leak, many “demotions” were released. These are ranking factors that, if applied, can significantly lower your rankings.
What is interesting about this is some “demotions” were previously labelled as “boosts” like the exactMatchDomainDemotion:
Here’s a list of some of them:
You can find more by looking for the word “demotion” on this page e.g. there is reference to the babyPandaV2Demotion.
But here’s the takeaway:
You should be in the clear as long as you have great content, excellent user experience and a robust backlink profile.
Google has a token system for measuring content originality.
Why is this important?
Google looks at the number of tokens and compares it to the total number of words on the page.
But here’s the important part…
That’s why you must put your most important and unique information as high on the page as possible. This should ensure that Google reads it correctly.
It’s also worth pointing out the “OriginalContentScore” that only seems to apply to short-form content.
This means that quality content is not solely based on length.
There is also the spamtokensContentScore which is used in a twiddler:
So if you are going to write long content, make sure the most important stuff is as close to the start as possible.
If you didn’t know…
Google has a human team of “quality raters” to evaluate websites. They have been public about this and even released the quality rater guidelines they give to their team.
But no one knows for sure how (or even if) the quality rater can impact the search algorithm.
The leaked Google search documents give us some insight.
Bottom line: They probably can.
Some elements show that information gathered by quality raters is stored such as furballUrl and raterCanUnderstandTopic attributes in this relevance ratings document.
And the humanRatings attribute:
Plus there’s other references here:
I wonder what the “golden set” external to EWOK is.
It’s still unclear whether this information is used just for training data sets or actually impacts the algorithm.
That means there’s a good chance if a quality rater visits your website, their opinion could affect your rankings.
It’s worth checking Google’s Search Quality Rate Guidelines to ensure your website is up to par.
For ages, Google has said that links are less important than they once were. I don’t think anyone actually believed them because independent tests proved otherwise.
The leaked documents have zero information that shows links are less important.
In fact:
There is the linkInfo attribute which states how important it is to the model:
What’s more?
Backlinks act as a modifier on numerous ranking factors and twiddlers.
The added importance of PageRank for home pages and the siteAuthority metric also indicates that backlinks are still very important.
Which is probably why PageRank is mentioned 16 times in total:
The “link graph” that Google engineers use to understand backlinks is pretty sophisticated. That’s why Google is so good at identifying high-quality and low-quality links.
But it’s clear that Pagerank important in these documents:
In fact:
Links seem to be so important to Google’s algorithm that…
This one was really interesting and something I, and many other SEO’s have been barking on about for years.
This comes from an anonymous source connected to the leaked Google search documents.
Google classifies links into 3 categories (sourceType):
They then use click data to determine the link’s category.
If you build a link from a page that does not get traffic, the link is placed in the “low quality” category. It will have zero effect on your website rankings, and Google may even ignore the link completely.
On the other hand…
Getting a link from a page with a lot of traffic has the opposite effect. Google views the link as “high quality”, which helps your site rank better.
How does Google know which pages get clicks?
With the totalClicks attribute of course:
Getting a link from a top tier page should have a more significant impact on your rankings. This is because the link is also regarded as higher quality.
The bottom line is this…
You probably want backlinks from regularly updated pages with real organic traffic. These are likely to have the biggest impact on your site!
Lots of advanced metrics are collected to monitor backlink velocity.
Backlink velocity simply refers to the rate at which a website gains or loses backlinks over a certain period of time.
One of the signals Google might use to do this is the “PhraseAnchorSpamDays” metric to track the speed and volume of new links.
This helps distinguish between genuine link growth and deliberate attempts to manipulate rankings.
That’s a pretty big insight into how Google link penalties could be handed out.
Google’s index works a bit like the Wayback Machine.
It stores different versions of web pages over time. Google’s documentation even seems to suggest that Google keeps a record of everything it indexes forever.
This is why you can’t redirect a page to another page and expect the link equity to pass on. Google knows the context and relevance of the previous page.
But the system itself seems to only consider the last 20 versions (urlHistory) of the page:
In theory, to “start fresh” with Google, you might need to make changes and get the pages re-indexed over 20 times.
The SparkToro post shared an email from Google VP Alexander Grushetsky:
They also went on to share a resume which amongst many other interesting things, reads:
“Navboost. This is already one of Google’s strongest ranking signals.”
So user intent is clearly extremely important in Google search algorithm.
If people search, scroll past your website and click on a competitor enough times. Google will recognize that your competitor is satisfying the query better than you are.
The algorithm is designed to promote your website to the top of the organic search results because it’s being clicked on. Even if you don’t have great content or a strong backlink profile…
Google will likely reward your website because users want it.
That means user intent and the NavBoost system are likely the most influential ranking factors in Google’s algorithm beyond backlinks.
You could argue that relevant backlinks will get you to page 1, but positive user signals are required to keep you there.
We know that Google loves fresh content and aged content that is regularly updated.
This document shows that Google has multiple metrics designed to track how fresh the content of a page is.
These are the three big ones:
It seems Google wants to be extremely confident in the freshness of the content. They have multiple metrics they can use to cross-reference each other.
There are also lastSignificantUpdate and contentage attributes:
Ensure the date your content was updated is consistent across structured data, page titles, content, and sitemaps. Update your content to keep it fresh.
Google collects and stores the latest domain registration information. That means they probably know who owns each domain and where it is registered.
Domain names are like digital property. Google uses its status as a domain registrar to track a domain ownership history.
It isn’t confirmed whether they use this information to impact the search results, but they definitely collect the data and they did touch on expired domain abuse in the March 2024 update.
This might trigger a lot of people.
And I can understand why.
There is a specific attribute that indicates a site is a “small personal site“.
The big question is why?
In the screenshot above it says “Score of small personal site promotion” so perhaps this is used to help promote small personal sites?
There isn’t any additional information about what a small personal site means.
There also isn’t an indication that this is being used in the algorithm for anything.
But who knows?
Remember to download the PDF here for your quick reference.
Don’t forget:
This is an unconfirmed leak. We do not know that Google are using any of the modules or attributes to rank websites.
Even if Google are collecting this data, that doesn’t inherently mean they are using it in their ranking algorithms.
The truth is that nothing that has come to light that will dramatically change how I do SEO this year.
If anything, the leak reflects the case study that I published in 2012:
That case study in 2012 was actually inspired by a Matt Cutts video published 14 years ago in 2010, which basically said:
Weird how that Matt Cutts advice is still relevant and arguably, more relevant than ever before.
If you want to learn more about the leaks you should:
What do you think about all of this? Have you found anything weird in the “leak”? Please leave a comment below and let me know!
16 Responses
Leave a Reply Cancel reply
Increase Your Search Traffic
In Just 28 Days…
Fascinating insights from the Google search leak! It’s incredible how much we can learn about user behavior and search trends from these leaks. Understanding these patterns can help businesses and content creators tailor their strategies more effectively. The way data reflects changing interests and concerns is a valuable takeaway. Thanks for breaking down these key points—it’s eye-opening to see what we might have missed otherwise!
Glad you found the blog helpful
This insightful breakdown of the recent Google search engine leak is both enlightening and cautionary. It offers a fascinating glimpse into the types of data Google might collect, though it wisely advises against changing your SEO strategy based on this leak alone. Great job highlighting the importance of careful interpretation!
Fantastic post! Your breakdown of the 22 things we might have learned from the Google search leak is incredibly insightful. It’s fascinating to get a rare peek behind the curtain of the world’s biggest search engine. Thanks for sharing these valuable insights and helping us understand the intricacies of Google’s algorithms!
This is really awesome blog!
Thank you!
Incredible insights as always! Your expertise shines through in every post, making it a pleasure to read. Looking forward to more enlightening content!
If you enjoyed this post, I think you’d like this one too, about the Google leak: https://www.searchlogistics.com/learn/seo/algorithm/leaked-google-docs-bust-seo-myths/
While I don’t require your services now, I love reading your articles, Matthew. I’d like to add some of my experience to the mix.
Whenever I create a landing page that “works” and advertise it through Google Ads, the search volume increases until the “works” effect diminishes.
Example: A flower shop Valentine’s Day promotion. It went live 5 days before Valentine’s Day. It worked so well that the shop sold out 2 days before the day. It took 24 hours of Conversions reported to GA and GAds to trigger a glut of Google Search Traffic. Once the conversions dropped because the shop sold out, the search traffic dropped off the cliff. No SEO was done for that offer.
Note: I do digital ads and I create dedicated landing pages. I do that to separate Ads from SEO traffic and focus more on the offer. The pages are most of the time not accessible through the website menu.
My clients often allow us to replace a more “generic” page on the website with these landing pages. The original LP then gets a canonical pointing to the URL on the website.
The effect on search traffic is obvious.
That’s why my main focus is conversion optimisation first, no matter what. And I bug the SEO team a lot with reports on non-performing (<5% conversion rate) pages. And I make sure that Google can measure what happens on the website as well as possible (Google Consent Mode V2, e.g.).
This was an amazing read, Matthew. I loved how you provided practical suggestions after explaining each of the 22 points. Heading to share this jewel on Twitter.
Thanks, Dhruvir
This is really the best article about the Google Search Leaks. You gained a new Email follower 😄
The SEO challenge is one thing I must try!
Thanks, I really appreciate it!
I’ve seen bits and pieces on other places, but this was the best one that collect most important ones in a nice manner. Thanks!
Glad it was helpful!
Thanks you for such a thorough report on that.