The most interesting thing about the leaked Google docs is being able to compare the leak with Googles public statements.
For example:
Google stated that they do not use a spam score-
But that’s not strictly true…
…because we can see that a spam score does exist within the leaked docs:
Weird, right?
And it doesn’t stop there!
Because I’ve used the leaked Google data to bust 15 other SEO myths below.
What Will I Learn?
Let’s kick it off with a big one…
Google has often denied that click data from Chrome users has been used to inform rankings in the SERPs.
In fact, they’ve been pretty vocal about it:
Not only this…
Gary Illyes went on to publically slam both Rand Fishkin, and the notion of CTR being used to influence search rankings in his Reddit AMA
“Dwell time, CTR, whatever Fishkin’s new theory is, those are generally made up crap. Search is much more simple than people think.”
– Gary Illyes
These kinds of statements have been reinforced multiple times throughout the years by other senior members of Google’s team.
But the leaked API docs seem to tell a different story…
Contrary to Google’s claims, the leak suggests that Google does use Chrome data to evaluate site-wide authority and user engagement metrics.
The API module “NavBoost” (which appears 50 times in the leaked API) is described as a “re-ranking based on click logs of user behaviour.”
This alone is pretty crazy.
It suggests that user interactions by the Chrome browser are indeed being tracked and used to evaluate websites.
Here are some more metrics and systems that could be using Chrome data for rankings:
That’s a lot of click and engagement data being captured, and in case you missed it, they track mobile-specific click data too.
Here’s the takeaway:
Even though Google downplays the significance of click data, the leak suggests they’re using it (a lot) behind the scenes to tweak the search rankings.
Time for a new browser?
Google has continuously denied the existence of a “domain authority” type metric.
They have publicly said that ONLY individual pages are evaluated based on their PageRank.
In the video below, John Mueller said they don’t have a website authority score:
Not only that…
On Twitter (in a now deleted Tweet) he also said:
Sounds pretty definitive, right?
But these statements aren’t exactly true…
The Google algorithm leak confirms the existence of a metric called “siteAuthority“.
The siteAuthority metric seems to imply that Google ranks websites based on their perceived authority.
They may even play a significant role in Google’s ranking algorithm.
We don’t know for sure the impact (if any) siteAuthority has on rankings.
But again…
Google is clearly collecting data and, at the very least, measuring each website’s site authority.
To be clear – ‘siteAuthority’ is likely very different from Moz’s DA ‘domain authority’ and Ahrefs DR ‘Domain Rating’. Both of these authority metrics are heavily linked based.
It’s also worth mentioning that there is a PageRank score as well as a site level one:
Google’s ‘siteAuthority’ measurement is more likely to be a combination of factors – Not based on just one, here are some examples of possible components:
This myth was busted years ago through individual testing, so it came as no surprise. But it’s nice to see that the leak affirms SEOs were on the right track.
For context, the Google sandbox stops new websites from ranking until they have proven themselves trustworthy.
Most SEOs believe it lasts for about 3-6 months.
Google has always denied that any kind of “sandbox” exists.
Here’s what they’ve said in the past:
What’s more?
Check out John Mueller’s response here to a question about Google having a new Google sandbox in the algorithm.
His answer doesn’t exactly fill you with confidence, right?
And now we might know why!
The leak revealed a metric called “hostAge.”
Not to be confused with your website being taken hostage, this metric seems to show that Google measures a domain’s age.
It also strongly suggests a sandbox mechanism has been built into the algorithm.
John Mueller has also answered “No” in the past when he was asked if domain age matters for rankings.
And that might be true.
But why would Google collect data about a domain age if they don’t use it?
Human quality raters are hired by Google to evaluate the quality and relevance of search results.
They follow strict guidelines outlined in the Search Quality Evaluator Guidelines.
Think of these guidelines as a detailed document that tells quality raters how to evaluate websites correctly for search.
Google has always been upfront about the existence of search quality raters and their roles.
BUT…
In this Google Hangout, John Mueller said quality raters “are not specifically flagging sites“ and “that not something that we’d use there directly.“
Is that true, though?
Short answer: It doesn’t look that way.
The term “EWOK” appears multiple times in the API, referencing the quality raters platform.
Not only that, but some elements show that information gathered by quality raters is stored, such as:
But some big attributes uncovered during the leak show the existence of “Golden Documents.”
This likely means that that content can be labeled as high-quality and ultimately get preferential treatment in rankings.
Pretty crazy, right?
The bottom line here is this:
There is a good chance if a quality rater visits your website, their opinion could affect your rankings.
I actually published about the impact a search engine evaluator can have on your site in 2019, but it was getting too much heat at the time so I deleted it.
I’ve always believed that freshness played an important role in content rankings.
Almost every time I update old posts, they get a rankings boost.
Google has emphasized the importance of fresh content, but they’ve been a bit vague about how important it is.
What’s more?
They’ve gone as far as saying that Google doesn’t favour fresh content.
Check out this now deleted tweet from John Mueller:
The truth is that freshness does seem to matter – maybe more than most of us realise. In fact, I have a dedicated module to content freshness in the 28 Day SEO Challenge.
The leaked docs reveal that Google looks at 127 attributes linked to content freshness and date-based signals:
Here are some attributes backing up statements that content should be maintained:
This shows that Google measures when content was last updated.
That probably means that Google prioritises “fresh” content in the organic search results.
There appears to be layers to this algorithm that treat time-sensitive and news-specific content differently:
Nothing really changes strategically for us based on this information. But it’s good to know that Google actively measures freshness.
My advice?
Keep your content fresh and update important pages regularly. Follow the process in the 28 Day SEO challenge to do that.
Google loves to say that content, whether produced by AI or humans, is evaluated and rewarded equally.
According to Google, the key is demonstrating great E-E-A-T:
Google has also said in the past that AI content is perfectly fine as long as it meets quality standards.
Here’s what their Search Guidance document says:
This literally opened the floodgates to thousands of websites spamming the algorithm with AI-generated content.
The irony?
Google made the same mistakes by publishing their own AI overviews telling people to eat rocks:
But what people seem to forget is that Google can do a 180 at any time.
And it looks like they have done that with AI content.
They just didn’t tell anyone!
Look at the fallout from the March 2024 Core update – Websites using AI content were destroyed.
The leaked API documents might give us some insight into why that is…
The term “golden” appears as a flag in the API leak, indicating a “gold-standard” document.
What does it mean?
The attribute notes indicate that the flag gives extra weight to human-written content over AI-written content.
It could also give preferential treatment when it comes to rankings.
To what extent, we don’t know. And how Google measures that difference we also don’t know.
What we do know, is that Google have rolled back their own AI overviews.
Google has insisted for years that it doesn’t manually intervene in rankings.
The algorithm creates an even playing field. The best content will rank.
That’s why people trust Google’s search results so much.
Here’s what John Mueller has said in the past:
But the leaks tell a different story…
It reveals that Google maintains whitelists for specific content types.
For example, certain trusted sites seem to have gotten special treatment during elections or for COVID-related information.
In the leaked API documents, there are two specific terms:
These seem to indicate that whitelists for elections and COVID-related news existed. It’s possible these sites received more visibility and high rankings when covering these topics.
There are also attributes that suggest ranking multipliers could be applied to sites that have been vetted.
Even though Google claims to be impartial, the leak shows that it has the power to whitelist any site it wants.
It’s likely they already have.
Scary stuff.
Backlinks have long been considered a top ranking factor.
But more recently, Google has consistently downplayed their importance.
Here are a couple of statements to chew on:
Then two years on…
John Mueller stated the following in a live Q&A at the brightonSEO conference:
“But my guess is over time, [links] won’t be such a big factor as sometimes it is today. I think already, that’s something that’s been changing quite a bit.” – John Mueller
Links are important, and they’re here to stay.
This is particularly true for high-quality and diverse backlinks from authoritative sites. This also lines up with the relevance of key ranking elements like PageRank.
The leaked docs mention over 136 attributes focused on inbound links, some key attributes include:
Here are some more snippets that likely highlight the importance of inbound links and their quality:
My guess is that backlinks will continue to play an important role in search rankings. How else will Google decipher quality?
Now we are getting technical.
But technical is important!
John Mueller claimed in this deleted Tweet that anchor text ratios, word count and link count are not quality indicators for search rankings.
But that isn’t exactly true.
The leak seems to show that diversity in backlinks is absolutely critical.
It’s not just the sheer number of backlinks that matters most, but the variety and quality.
Something quality link building services have been saying for a while.
Diverse backlinks from multiple domains signal to Google that the content is widely trusted and referenced.
That’s what a quality backlink profile looks like.
If numbers don’t indicate quality, why are there 44 attributes specifically measuring metrics around anchor counts like:
The leak contains a whole module featuring anchor text spam demotions – and it’s extensive to say the least.
It also indicates that anchor text plays a significant role in backlink quality, with over 76 anchor-based attributes.
My advice?
You should follow this tutorial to choose the right anchor text for your links
I always get nervous when Google says, “Don’t worry, we’ll take care of it.”
They don’t have a great track record of following through. I prefer to take matters into my own hands.
And when it comes to disavowing bad links, maybe you should too…
Here’s what Google has said in the past:
That’s not all…
John Mueller also backed up his past statements in a ‘recently deleted’ comment on Reddit:
Those are pretty strong statements.
Contrary to Google’s public stance, the leak shows that bad links do matter.
There are multiple demotions for spammy links, along with scoring systems that could penalise sites with spammy outbound link signals and anchor text.
For example, the attributes “phraseAnchorSpamPenalty” and “spamrank” appear in the API docs.
These seem to indicate penalties for sites associated with spammy links and anchor text.
Not a place you want to be, others include:
While Google tells you not to worry about spammy links, the leaks suggest that ignoring them could harm your site’s rankings.
I for one don’t want to rely on the “hope” strategy of Google catching it for me.
Here’s a myth that might come as a surprise…
Google has often emphasised the importance of link quality and relevance.
They’ve also mentioned that links don’t expire – at worst they may be devalued over time.
In a Webmaster Central Office Hours, John Mueller was asked “Do links expire after a while?”
John Mueller immediately responded: “No, they don’t expire…”
That seems pretty straightforward. But it might not be the whole story.
The leaks suggest that links from newer pages pack more punch than those from older, authoritative pages.
The leaked API documents show two key terms here:
The key takeaway here is that as a page becomes less important to Google, the links can seemingly expire or lose their strength over time.
New links seem to have a more significant impact.
Google has always highlighted the importance of citations and references.
And that’s good!
But they’ve never once indicated an internal system for tracking the accuracy and confidence levels of references you add to your content.
I mean, why would they?
They’ve only ever downplayed the idea that linking out to sources boosts SEO:
Not only that…
John Mueller doubled down with this tweet:
The leaked documents suggest there is indeed an internal scoring system for citations and references, particularly in YMYL (Your Money, Your Life) niches.
They mention metrics like “outlinkScore” and “outlinkDomainRelationship” that appear to evaluate the quality and relevance of outbound links.
Strap in for this one…
Here’s what the modules and attributes suggest:
What does this really mean?
They seem to show that linking to high-quality, relevant external sites CAN boost your page’s authority and trustworthiness.
This is directly opposite to what Google has said in the past.
I know, I know.
While this might not come as a big shock, it should be a wake-up call.
Google is tracking links in and out of your site. Ensure your website only links to reputable websites you would confidently recommend to a friend.
Google has always said they don’t use a specific spam score for links.
When an SEO asked about Semrush’s “toxic link score” for their company website.
John Mueller answered:
Most SEOs know that content and quality links work hand in hand.
The leak supports this.
Google seems to have multiple scoring systems for evaluating links.
This indicates a more aggressive approach to link evaluation than they’ve communicated publicly.
In fact, Google’s entire link graph is very sophisticated. Especially when it comes to toxic link scoring.
The leaked API documents show a whole bundle of metrics for link spam:
With a variety of different labels for link penalties:
Interesting, right?
I was pretty impressed with the level that Google goes to measure link quality. To what extent it affects backlinks, we don’t know.
But this clearly contradicts what they have said in the past about having no spam score.
Here’s another myth that’s been circulating for a while…
Google has claimed their ranking algorithms don’t use a specific “authorship score” to rank content.
Back in 2013, John Mueller said on this Webmaster Central:
“We don’t use Authorship for ranking.”
But this stance changed with the introduction of Schema…
In this 2021 Hangout, John Mueller advised that recognising the author is important for Schema.
Which one is true?
The leaked documents show that Google tracks authorship information. Google stores authors as entities, and authors are an explicit feature in its systems.
This means authorship is a clearly defined and intentionally integrated component within Google’s systems.
It also implies that Google’s algorithms specifically recognise and utilise information about authors as a factor in their ranking process.
This would explain how Google tracks the “expertise” in EEAT.
If that’s true, the author of your content may impact how content is evaluated for quality and relevance.
Bottom line:
Authorship is back.
Google has long maintained that subdomains and domains are treated equally in their rankings.
In 2017, John Mueller said the following in response to a question about whether subdomains or subfolders are better for SEO:
“Google web search is fine with using either subdomains or subdirectories.[…] We do have to learn how to crawl them separately, but for the most part, that’s just a formality for the first few days. So, in short, use what works best for your setup.” – John Mueller
These statements imply that Google views subdomains similarly to their main domains regarding SEO and crawling.
According to Google, there’s no significant difference in how they are treated for ranking purposes.
This is another classic Google “don’t worry, we’ll handle it for you” statement.
It’s like the toxic links in Myth #10.
But the leak seems to show thats not actually the case.
There are two key things here:
The second point is potentially a big one…
It could mean you would need to build the authority of your sub-domain AND main domain separately.
That directly contradicts what Google has said in the past and is something to keep an eye in the future.
The Google search leak has given us a rare glimpse inside Google’s systems.
The leak reveals a complex system with over 14,000 ranking attributes which is incredible to dig through.
Especially when you contrast that with Googles public statements!
From seemingly false statements to complete 180 degree turns, Google has gone back and forth like a ping pong ball over the years.
However, one thing is clear:
You should take what Google says with a huge pinch of salt!
In reality, it’s always best to test on your own or at least follow the tried and tested processes from other SEOs around the world
Want to learn more about the leak?
Download our complete Google search leak analysis, which unpacks 22 things that stood out to me most.
Increase Your Search Traffic
In Just 28 Days…