{"id":8936,"date":"2024-05-27T19:20:52","date_gmt":"2024-05-28T02:20:52","guid":{"rendered":"https:\/\/sparktoro.com\/blog\/?p=8936"},"modified":"2024-07-03T12:28:32","modified_gmt":"2024-07-03T19:28:32","slug":"an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them","status":"publish","type":"post","link":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/","title":{"rendered":"An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them"},"content":{"rendered":"\n<p><em><strong>Update:<\/strong> We are diving deep into the Google API leak and what it means for marketers in our next episode of SparkToro Office Hours on June 27. Join Michael King, Founder &amp; CEO, iPullRank, and Rand Fishkin, Co-founder &amp; CEO, SparkToro, to learn more. <a href=\"https:\/\/sparktoro.registration.goldcast.io\/events\/052be756-fba8-4bfc-aa9d-c6eafe63262e\" target=\"_blank\" rel=\"noreferrer noopener\">RSVP for the free webinar<\/a>.<\/em><\/p>\n\n\n\n<p>On Sunday, May 5th, I received an email from a person claiming to have access to a massive leak of API documentation from inside Google\u2019s Search division. The email further claimed that these leaked documents were confirmed as authentic by ex-Google employees, and that those ex-employees and others had shared additional, private information about Google\u2019s search operations.<\/p>\n\n\n\n<p>Many of their claims directly contradict <a href=\"https:\/\/www.seroundtable.com\/google-ctr-search-rankings-27157.html\">public statements<\/a> made by Googlers over the years, in particular the company\u2019s <a href=\"https:\/\/www.seroundtable.com\/google-ctr-dwell-time-signals-myths-27083.html\">repeated denial<\/a> that <a href=\"https:\/\/www.blindfiveyearold.com\/is-click-through-rate-a-ranking-signal\">click-centric user signals<\/a> are employed, <a href=\"https:\/\/iloveseo.com\/seo\/google-says-subdomains-vs-subfolders-doesnt-matter\/\">denial<\/a> that subdomains are considered separately in rankings, <a href=\"https:\/\/www.seroundtable.com\/google-sandbox-nope-28082.html\">denials<\/a> of a sandbox for newer websites, <a href=\"https:\/\/www.seroundtable.com\/google-domain-age-23697.html\">denials<\/a> that a domain\u2019s age is collected or considered, and more.&nbsp;<\/p>\n\n\n\n<p>Naturally, I was skeptical. The claims made by this source (who asked to remain anonymous) seemed extraordinary\u2013claims like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>In their early years, Google\u2019s search team recognized a need for full clickstream data (every URL visited by a browser) for a large percent of web users to improve their search engine\u2019s result quality.<\/li>\n\n\n\n<li>A system called \u201cNavBoost\u201d (cited by VP of Search, Pandu Nayak, in his <a href=\"https:\/\/thecapitolforum.com\/wp-content\/uploads\/2023\/10\/101823-USA-v-Google-PM.pdf\">DOJ case testimony<\/a>) initially gathered data from Google\u2019s <a href=\"https:\/\/moz.com\/blog\/what-is-googles-pagerank-good-for-whiteboard-friday\">Toolbar PageRank<\/a>, and desire for more clickstream data served as the key motivation for creation of the Chrome browser (<a href=\"https:\/\/www.npr.org\/2008\/09\/05\/94299337\/google-launches-chrome-web-browser\">launched<\/a> in 2008).<\/li>\n\n\n\n<li>NavBoost uses the number of searches for a given keyword to identify trending search demand, the number of clicks on a search result (I ran <a href=\"https:\/\/sparktoro.com\/blog\/queries-clicks-influence-googles-results\/\">several<\/a> <a href=\"https:\/\/x.com\/randfish\/status\/1704722903550644430\">experiments<\/a> on this from 2013-2015), and long clicks versus short clicks (which I <a href=\"https:\/\/moz.com\/blog\/impact-of-queries-and-clicks-on-googles-rankings-whiteboard-friday\">presented theories about in this 2015 video<\/a>).<\/li>\n\n\n\n<li>Google utilizes cookie history, logged-in Chrome data, and pattern detection (referred to in the leak as \u201cunsquashed\u201d clicks versus \u201csquashed\u201d clicks) as effective means for fighting manual &amp; automated click spam.<\/li>\n\n\n\n<li>NavBoost also scores queries for user intent. For example, certain thresholds of attention and clicks on videos or images will trigger video or image features for that query and related, NavBoost-associated queries.<\/li>\n\n\n\n<li>Google examines clicks and engagement on searches both during and after the main query (referred to as a &#8220;NavBoost query&#8221;). For instance, if many users search for \u201cRand Fishkin,\u201d don\u2019t find SparkToro, and immediately change their query to &#8220;SparkToro&#8221; and click SparkToro.com in the search result, SparkToro.com (and websites mentioning &#8220;SparkToro&#8221;) will receive a boost in the search results for the &#8220;Rand Fishkin&#8221; keyword.<\/li>\n\n\n\n<li>NavBoost&#8217;s data is used at the host level for evaluating a site&#8217;s overall quality (my anonymous source speculated that this could be what Google and SEOs called \u201c<a href=\"https:\/\/www.youtube.com\/watch?v=k06ezwACeNM\">Panda<\/a>\u201d). This evaluation can result in a boost or a demotion.<\/li>\n\n\n\n<li>Other minor factors such as penalties for domain names that exactly match unbranded search queries (e.g. mens-luxury-watches.com or milwaukee-homes-for-sale.net), a newer \u201cBabyPanda\u201d score, and spam signals are also considered during the quality evaluation process.<\/li>\n\n\n\n<li>NavBoost geo-fences click data, taking into account country and state\/province levels, as well as mobile versus desktop usage. However, if Google lacks data for certain regions or user-agents, they may apply the process universally to the query results.<\/li>\n\n\n\n<li>During the Covid-19 pandemic, Google employed whitelists for websites that could appear high in the results for Covid-related searches<\/li>\n\n\n\n<li>Similarly, during democratic elections, Google employed whitelists for sites that should be shown (or demoted) for election-related information<\/li>\n<\/ul>\n\n\n\n<p>And these are only the tip of the iceberg.<\/p>\n\n\n\n<p>Extraordinary claims require extraordinary evidence. And while some of these overlap with information revealed during the Google\/DOJ case (some of which <a href=\"https:\/\/x.com\/randfish\/status\/1288980563219513345\">you can read about on this thread from 2020<\/a>), many are novel and suggest insider knowledge.<\/p>\n\n\n\n<p>So, this past Friday, May 24th (following several emails), I had a video call with the anonymous source.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d541c7348b9&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d541c7348b9\" class=\"aligncenter size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"604\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized-1024x604.jpg\" alt=\"\" class=\"wp-image-8937\" style=\"object-fit:cover\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized-1024x604.jpg 1024w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized-300x177.jpg 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized-768x453.jpg 768w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized.jpg 1126w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\">An anonymized screen capture from Rand\u2019s call with the source<\/figcaption><\/figure>\n<\/div>\n\n\n<p>Update (5\/28 at 10:00am Pacific): The anonymous source has decided to come forward. This video announces their identity, <a href=\"https:\/\/www.linkedin.com\/in\/erfanazimi\/\">Erfan Azimi<\/a>, an SEO practitioner and the founder of EA Eagle Digital.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-4-3 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Erfan Azimi: Leaked Google Ranking Factors (Public Statement) - Rand Fishkin, Mike King\" width=\"640\" height=\"480\" src=\"https:\/\/www.youtube.com\/embed\/AEb8_rbfFVw?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>Prior to the email and call, I had neither met nor heard of Erfan. He asked that his identity remain veiled, and that I merely include the quote below:<\/p>\n\n\n\n<p class=\"has-text-align-center has-medium-font-size\" style=\"font-style:italic;font-weight:300\"><em>An eagle uses the storm to reach unimaginable heights.<\/em><br>&#8211; Matshona Dhliwayo<\/p>\n\n\n\n<p>After the call I was able to confirm details of Erfan&#8217;s work history, mutual people we both know from the marketing world, and several of their claims about being at particular events with industry insiders (including Googlers), though I cannot confirm details of the meetings nor the content of discussions they claim to have had.<\/p>\n\n\n\n<p>During our call, Erfan showed me the leak itself: more than 2,500 pages of API documentation containing 14,014 attributes (API features) that appear to come from Google\u2019s internal \u201cContent API Warehouse.\u201d Based on the document\u2019s commit history, this code was uploaded to GitHub on Mar 27, 2024 and not removed until May 7, 2024. (Note: because this piece was, post-publishing, edited to reflect Erfan&#8217;s identity, he&#8217;s referred to below as &#8220;the anonymous source&#8221;).<\/p>\n\n\n\n<p>This documentation doesn\u2019t show things like the weight of particular elements in the search ranking algorithm, nor does it prove which elements are used in the ranking systems. But, it does show incredible details about data Google collects. Here&#8217;s an example of the document format:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d541c7351be&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d541c7351be\" class=\"aligncenter size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"574\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-quality-navboost-1024x574.gif\" alt=\"\" class=\"wp-image-8941\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-quality-navboost-1024x574.gif 1024w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-quality-navboost-300x168.gif 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-quality-navboost-768x430.gif 768w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-quality-navboost-1536x860.gif 1536w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\">Screen capture of leaked data about \u201cgood\u201d and \u201cbad\u201d clicks, including length of clicks (i.e. how long a visitor spends on a web page they\u2019ve clicked from Google\u2019s search results before going back to the search results)<\/figcaption><\/figure>\n<\/div>\n\n\n<p>After walking me through a handful of these API modules, the source explained their motivations (around transparency, holding Google to account, etc.) and their hope: that I would publish an article sharing this leak, revealing some of the many interesting pieces of data it contained, and refuting some \u201clies\u201d Googlers &#8220;had been spreading for years.\u201d<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d541c7355cc&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d541c7355cc\" class=\"aligncenter size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"758\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-statements-on-using-clicks-in-rankings-1024x758.jpg\" alt=\"\" class=\"wp-image-8946\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-statements-on-using-clicks-in-rankings-1024x758.jpg 1024w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-statements-on-using-clicks-in-rankings-300x222.jpg 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-statements-on-using-clicks-in-rankings-768x569.jpg 768w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-statements-on-using-clicks-in-rankings-1536x1137.jpg 1536w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-statements-on-using-clicks-in-rankings.jpg 1580w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\">A sample of statements from Google representatives (Matt Cutts, Gary Ilyes, and John Mueller) denying the use of click-based user signals in rankings over the years<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\">Is this API Leak Authentic? Can We Trust It?<\/h2>\n\n\n\n<p>A critical next step in the process was verifying the authenticity of the API Content Warehouse documents.&nbsp; So, I reached out to some ex-Googler friends, shared the leaked docs, and asked for their thoughts. Three ex-Googlers wrote back: one said they didn\u2019t feel comfortable looking at or commenting on it. The other two shared the following (off the record and anonymously):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u201cI didn\u2019t have access to this code when I worked there. But this certainly looks legit. \u201c<\/li>\n\n\n\n<li>\u201cIt has all the hallmarks of an internal Google API.\u201d<\/li>\n\n\n\n<li>\u201cIt\u2019s a Java-based API. And someone spent a lot of time adhering to Google\u2019s own internal standards for documentation and naming.\u201d<\/li>\n\n\n\n<li>\u201cI\u2019d need more time to be sure, but this matches internal documentation I\u2019m familiar with.\u201d<\/li>\n\n\n\n<li>\u201cNothing I saw in a brief review suggests this is anything but legit.\u201d<\/li>\n<\/ul>\n\n\n\n<p>Next, I needed help analyzing and deciphering the naming conventions and more technical aspects of the documentation. I\u2019ve worked with APIs a bit, but it\u2019s been 20 years since I wrote code and 6 years since I practiced SEO professionally. So, I reached out to one of the world\u2019s foremost technical SEOs: <a href=\"https:\/\/www.linkedin.com\/in\/michaelkingphilly\/\">Mike King<\/a>, founder of <a href=\"https:\/\/ipullrank.com\">iPullRank<\/a>.<\/p>\n\n\n\n<p>During a 40-minute phone call on Friday afternoon, Mike reviewed the leak and confirmed my suspicions: <strong>this appears to be a legitimate set of documents from inside Google\u2019s Search division<\/strong>, and contains an extraordinary amount of previously-unconfirmed information about Google\u2019s inner workings.<\/p>\n\n\n\n<p>2,500 technical documents is an unreasonable amount of material to ask one man (a dad, husband, and entrepreneur, no less) to review in a single weekend. But, that didn\u2019t stop Mike from doing his best.<br>He\u2019s put together an <strong><a href=\"https:\/\/ipullrank.com\/google-algo-leak\">exceptionally detailed initial review of the Google API leak here<\/a><\/strong>, which I\u2019ll reference more in the findings below. And he\u2019s also agreed to join us at <a href=\"https:\/\/sparktoro.com\/sparktogether\">SparkTogether 2024<\/a> in Seattle, WA on Oct. 8, where he\u2019ll present the fully transparent story of this leak in far greater detail, and with the benefit of the next few months of analysis.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/sparktoro.com\/sparktogether\"><img loading=\"lazy\" decoding=\"async\" width=\"900\" height=\"267\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/mike-king-sparktogether-2024.jpg\" alt=\"\" class=\"wp-image-8948\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/mike-king-sparktogether-2024.jpg 900w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/mike-king-sparktogether-2024-300x89.jpg 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/mike-king-sparktogether-2024-768x228.jpg 768w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><\/figure>\n<\/div>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Qualifications and Motivations for this Post<\/h2>\n\n\n\n<p>Before we go further, a few disclaimers: I no longer work in the SEO field. My knowledge of and experience with SEO is 6+ years out of date. I don\u2019t have the technical expertise or knowledge of Google\u2019s internal operations to analyze an API documentation leak and confirm with certainty whether it\u2019s authentic (hence getting Mike\u2019s help and the input of ex-Googlers).<\/p>\n\n\n\n<p>So why publish on this topic?<\/p>\n\n\n\n<p>Because when I spoke to the party that sent me this information, I found them credible, thoughtful, and deeply knowledgeable. Despite going into the conversation deeply skeptical, I could identify no red flags, nor any malicious motivation. This person\u2019s sole aim appeared quite aligned with my own: to hold Google accountable for public statements that conflict with private conversations and leaked documentation, and to bring greater transparency to the field of search marketing. And they believed that, despite my years removed from SEO, I was the best person to share this publicly.<\/p>\n\n\n\n<p>These are goals I cared about deeply for almost two decades. And while my professional life has moved on (I now run two companies: <a href=\"https:\/\/sparktoro.com\">SparkToro<\/a>, which makes audience research software and <a href=\"https:\/\/snackbarstudio.com\">Snackbar Studio<\/a>, an indie video game developer), my interest in and connections to the world of Search Engine Optimization remain strong. I feel a deep obligation to share information about how the world\u2019s dominant search engine works, especially information Google would prefer to keep quiet. And sadly, I\u2019m not sure where else to send something this potentially groundbreaking.<\/p>\n\n\n\n<p>Years ago, before he left journalism to become Google\u2019s Search Liaison, <a href=\"https:\/\/dannysullivan.com\/\">Danny Sullivan<\/a>, would have been my go-to source for a leak of this magnitude. He had the gravitas, resume, knowledge, and experience to examine a claim like this and present it fairly in the court of public opinion. There have been so many times in the last few years I\u2019ve wished for Danny\u2019s calm, even-handed, tough-but-fair-on-Google approach to newsworthy pieces like this\u2013pieces that could reach as far as the company\u2019s statements on the witness stand (e.g. <a href=\"https:\/\/martech.org\/dark-google-search-terms-not-provided-one-year-later\/\">his eloquent writing on Google\u2019s indefensible privacy claims about organic keyword data<\/a>).<\/p>\n\n\n\n<p>Whatever Google&#8217;s paying him, it isn&#8217;t nearly enough.<\/p>\n\n\n\n<p>Apologies that instead of Danny, dear reader, you\u2019re stuck with me. But since you are, I\u2019m going to assume you may not be familiar with my background or credentials, and briefly share those.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>I started doing SEO for small businesses in the Seattle area in 2001, and co-founded the SEO consultancy that would become <a href=\"https:\/\/moz.com\">Moz<\/a> (originally called SEOmoz) in 2003.<\/li>\n\n\n\n<li>For the next 15 years, I worked in the search marketing industry and was often recognized as an influential leader in that field. I authored\/co-authored <a href=\"https:\/\/www.amazon.com\/Lost-Founder-Painfully-Honest-Startup\/dp\/0593853962\/\">Lost and Founder: A Painfully Honest Field Guide to the Startup World<\/a>, <a href=\"https:\/\/www.artofseo.com\/\">The Art of SEO<\/a>, and <a href=\"https:\/\/www.amazon.com\/Inbound-Marketing-SEO-Insights-Blog\/dp\/1118551559\">Inbound Marketing and SEO<\/a>.<\/li>\n\n\n\n<li>Publications including <a href=\"https:\/\/www.wsj.com\/articles\/rand-fishkins-tales-from-the-startup-world-1528824786\">the WSJ<\/a>, <a href=\"https:\/\/www.inc.com\/larry-kim\/15-remarkable-facts-about-entrepreneurial-wizard-of-moz-rand-fishkin.html\">Inc<\/a>, <a href=\"https:\/\/www.forbes.com\/sites\/robertadams\/2016\/09\/06\/how-rand-fishkin-created-an-seo-sempire-the-story-of-moz-com\/?sh=4a571f812b90\">Forbes<\/a>, and <a href=\"https:\/\/www.google.com\/search?q=rand+fishkin+seo+news\">hundreds more<\/a> have written about and quoted me on the world of SEO and Google search, many of them citing a popular weekly video series I hosted for a decade: <a href=\"https:\/\/www.youtube.com\/playlist?list=PL8A5C517175C28573\">Whiteboard Friday<\/a>.<\/li>\n\n\n\n<li>Moz grew to 35,000+ paying customers of its SEO software, revenues of $50M+, and a team of ~200 before being <a href=\"https:\/\/sparktoro.com\/blog\/the-final-chapter-of-my-first-startup\/\">sold<\/a> to a private equity buyer in 2021. I <a href=\"https:\/\/sparktoro.com\/blog\/last-day-moz-first-day-sparktoro\/\">left<\/a> in 2018 and started SparkToro, and in 2023, Snackbar Studio.<\/li>\n\n\n\n<li>I dropped out of college at the University of Washington in 2001 and do not hold a degree, yet my work on Google and SEO has been <a href=\"https:\/\/www.seroundtable.com\/rand-fishkin-congressional-hearing-on-google-27904.html\">cited by the United States Congress<\/a>, the US <a href=\"https:\/\/www.ftc.gov\/system\/files\/ftc_gov\/pdf\/FTC%20and%20Justice%20Department%20Listening%20Forum%20on%20Firsthand%20Effects%20of%20Mergers%20and%20Acquisitions-%20Technology%20-%20May%2012%2C%202022_0.pdf\">Federal Trade Commission<\/a>, the <a href=\"https:\/\/www.wsj.com\/articles\/lyrics-site-genius-com-accuses-google-of-lifting-its-content-11560677400\">Wall Street Journal<\/a>, <a href=\"https:\/\/www.ftc.gov\/system\/files\/ftc_gov\/pdf\/FTC%20and%20Justice%20Department%20Listening%20Forum%20on%20Firsthand%20Effects%20of%20Mergers%20and%20Acquisitions-%20Technology%20-%20May%2012%2C%202022_0.pdf\">New York Times<\/a>, and John Oliver\u2019s <a href=\"https:\/\/sparktoro.com\/blog\/google-apple-and-amazon-stifle-innovation-when-they-favor-their-own-products\/\">Last Week Tonight<\/a>, among dozens of others.<\/li>\n\n\n\n<li>I hold several <a href=\"https:\/\/patents.justia.com\/inventor\/s-rand-mitchell-fishkin\">patents<\/a> around the design of a web scale link index, and am the creator of numerous link-index metrics, including <a href=\"https:\/\/moz.com\/learn\/seo\/domain-authority\">Domain Authority<\/a>, a machine-learning based score commonly used in the digital marketing world to assess a website&#8217;s capability to rank in Google&#8217;s search engine.<\/li>\n<\/ul>\n\n\n\n<p>OK. Back to the Google leak.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">What is the Google API Content Warehouse?<\/h2>\n\n\n\n<p>When looking through the massive trove of API documentation, the first reasonable set of questions might be: \u201cWhat is this? What is it used for? Why does it exist in the first place?\u201d<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d541c73691a&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d541c73691a\" class=\"aligncenter size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"689\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-reference-anchors-1024x689.png\" alt=\"\" class=\"wp-image-8950\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-reference-anchors-1024x689.png 1024w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-reference-anchors-300x202.png 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-reference-anchors-768x517.png 768w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-reference-anchors.png 1180w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p>The leak appears to come from <a href=\"https:\/\/github.com\/googleapis\/\">GitHub<\/a>, and the most credible explanation for its exposure matches what my anonymous source told me on our call: these documents were inadvertently and briefly made public (many links in the documentation point to <a href=\"https:\/\/github.com\/googleapis\/google-api-nodejs-client\/issues\/1884\">private GitHub repositories<\/a> and <a href=\"https:\/\/g3doc.corp.google.com\/repository\/webref\/preprocessing\/names\/tnc_classifier\/README.md\">internal pages on Google\u2019s corporate site<\/a> that require specific, Google-credentialed logins). During this probably-accidental, public period between March and May of 2024, the API documentation was spread to Hexdocs (which indexes public GitHub repos) and found\/circulated by other sources (I\u2019m certain that others have a copy, though it\u2019s odd that I could find no public discourse until now).<\/p>\n\n\n\n<p>According to my ex-Googler sources, documentation like this exists on almost every Google team, explaining various API attributes and modules to help familiarize those working on a project with the data elements available. This leak matches others in public GitHub repositories and on <a href=\"https:\/\/cloud.google.com\/apis\/docs\/overview\">Google\u2019s Cloud API documentation<\/a>, using the same notation style, formatting, and even process\/module\/feature names and references.<\/p>\n\n\n\n<p>If that all sounds like a technical mouthful, think of this as instructions for members of Google\u2019s search engine team. It\u2019s like an inventory of books in a library, a card catalogue of sorts, telling those employees who need to know what\u2019s available and how they can get it.<\/p>\n\n\n\n<p>But, whereas libraries are public, Google search is one of the most secretive, closely-guarded black boxes in the world. <strong>In the last quarter century, no leak of this magnitude or detail has ever been reported from Google\u2019s search division<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How certain can we be that Google\u2019s search engine uses everything detailed in these API docs?<\/h2>\n\n\n\n<p>That\u2019s open to interpretation. Google could have retired some of these, used others exclusively for testing or internal projects, or may even have made API features available that were never employed.<\/p>\n\n\n\n<p>However, there are references in the documentation to deprecated features and specific notes on others indicating they should no longer be used. That strongly suggests those not marked with such details were still in active use as of the March, 2024 leak.<\/p>\n\n\n\n<p>We also can\u2019t say for certain whether the March leak is of the most recent version of this documentation. The most recent date I can find referenced in the API docs is August of 2023:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d541c73714d&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d541c73714d\" class=\"aligncenter size-full wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"922\" height=\"431\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/august-2023-reference-in-google-api-leak.png\" alt=\"\" class=\"wp-image-8952\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/august-2023-reference-in-google-api-leak.png 922w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/august-2023-reference-in-google-api-leak-300x140.png 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/august-2023-reference-in-google-api-leak-768x359.png 768w\" sizes=\"auto, (max-width: 922px) 100vw, 922px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p>The relevant text reads:<\/p>\n\n\n\n<p><em>\u201cThe domain-level display name of the website, such as &#8220;Google&#8221; for google.com. See go\/site-display-name for more details. <strong>As of Aug 2023, this field is being deprecated<\/strong> in favor of info.[AlternativeTitlesResponse].site_display_name_response field, which also contains host-level site display names with additional information.\u201d<\/em><\/p>\n\n\n\n<p>A reasonable reader would conclude that the documentation was up-to-date as of last summer (references to other changes in 2023 and earlier years, all the way back to 2005, are also present), and possibly even up-to-date as of the March 2024 date of disclosure.<\/p>\n\n\n\n<p>Google search obviously changes massively from year to year, and recent introductions like their <a href=\"https:\/\/www.androidauthority.com\/shut-down-google-ai-overview-3446038\/\">much-maligned AI Overviews<\/a>, do not make an appearance in this leak. Which of the items mentioned are actively used today in Google\u2019s ranking systems? That&#8217;s open to speculation. This trove contains fascinating references, many that will be entirely new to non-Google-search-engineers.<\/p>\n\n\n\n<p>But,<strong> I would urge readers not to point to a particular API feature in this leak and say: \u201cSEE! That\u2019s proof Google uses XYZ in their rankings.\u201d<\/strong> It\u2019s not quite proof. It\u2019s a strong indication, stronger than patent applications or public statements from Googlers, but still no guarantee.<\/p>\n\n\n\n<p>That said, it\u2019s as close to a smoking gun as anything since <a href=\"https:\/\/www.justice.gov\/d9\/2023-11\/418021.pdf\">Google\u2019s execs testified in the DOJ trial<\/a> last year. And, speaking of that testimony, much of it is corroborated and expanded on in the document leak, as Mike <a href=\"https:\/\/ipullrank.com\/google-algo-leak\">details in his post<\/a>. \ud83d\udc40<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What can we learn from the Data Warehouse Leak?<\/strong><\/h2>\n\n\n\n<p>I expect that interesting and marketing-applicable insights will be mined from this massive file set for years to come. It\u2019s simply too big and too dense to think that a weekend of browsing could unearth a comprehensive set of takeaways, or even come close.<\/p>\n\n\n\n<p>However, I will share five of the most interesting, early discoveries in my perusal, some that shed new light on things Google has long been assumed to be doing, and others that suggest the company\u2019s public statements (especially those on what they \u201ccollect\u201d) have been erroneous. Because doing so would be tedious and could be perceived as personal grievances (given Google\u2019s historic attacks on my work), I won\u2019t bother showing side-by-sides of what Googlers said vs. what this document insinuates. Besides, Mike did a great job of that in his post.<\/p>\n\n\n\n<p>Instead, I\u2019ll focus on interesting and\/or useful takeaways, and my conclusions from the whole of the modules I\u2019ve been able to review, <a href=\"https:\/\/ipullrank.com\/google-algo-leak\">Mike\u2019s piece<\/a> on the leak, and how this combines with other things we know to be true of Google.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">#1: Navboost and the use of clicks, CTR, long vs. short clicks, and user data<\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d541c737a05&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d541c737a05\" class=\"aligncenter size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"857\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-quality-navboost-goodclicks-badclicks-1024x857.jpg\" alt=\"\" class=\"wp-image-8955\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-quality-navboost-goodclicks-badclicks-1024x857.jpg 1024w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-quality-navboost-goodclicks-badclicks-300x251.jpg 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-quality-navboost-goodclicks-badclicks-768x643.jpg 768w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-quality-navboost-goodclicks-badclicks.jpg 1068w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p>A handful of modules in the documentation make reference to features like \u201cgoodClicks,\u201d \u201cbadClicks,\u201d \u201clastLongestClicks,\u201d impressions, squashed, unsquashed, and unicorn clicks. These are tied to Navboost and Glue, two words that may be familiar to folks who reviewed <a href=\"https:\/\/thecapitolforum.com\/wp-content\/uploads\/2023\/10\/101823-USA-v-Google-PM.pdf\">Google\u2019s DOJ testimony<\/a>. Here\u2019s a relevant excerpt from DOJ attorney Kenneth Dintzer\u2019s cross-examination of Pandu Nayak, VP of Search on the Search Quality team:<\/p>\n\n\n\n<p style=\"font-style:italic;font-weight:400\">Q. So remind me, is navboost all the way back to 2005?<br>A. It&#8217;s somewhere in that range. It might even be before that.<\/p>\n\n\n\n<p style=\"font-style:italic;font-weight:400\">Q. And it&#8217;s been updated. It&#8217;s not the same old navboost that it was back then?<br>A. No.<\/p>\n\n\n\n<p style=\"font-style:italic;font-weight:400\">Q. And another one is glue, right?<br>A. Glue is just another name for navboost that includes all of the other features on the page.<\/p>\n\n\n\n<p style=\"font-style:italic;font-weight:400\">Q. Right. I was going to get there later, but we can do that now. Navboost does web results, just like we discussed, right?<br>A. Yes.<\/p>\n\n\n\n<p style=\"font-style:italic;font-weight:400\">Q. And glue does everything else that&#8217;s on the page that&#8217;s not web results, right?<br>A. That is correct.<\/p>\n\n\n\n<p style=\"font-style:italic;font-weight:400\">Q. Together they help find the stuff and rank the stuff that ultimately shows up on our SERP?<br>A. That is true. They&#8217;re both signals into that, yes.<\/p>\n\n\n\n<p>A savvy reader of these API documents would find they support Mr. Nayak\u2019s testimony (and align with Google\u2019s <a href=\"https:\/\/thecapitolforum.com\/wp-content\/uploads\/2023\/10\/101823-USA-v-Google-PM.pdf\">patent on site quality<\/a>):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.QualityNavboostCrapsCrapsData.html\">Quality Navboost Data module<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.QualityNavboostCrapsFeatureCrapsData.html#module-attributes\">Geo-segmentation of Navboost Data<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.QualityNavboostCrapsCrapsClickSignals.html\">Clicks Signals in Navboost<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.QualityNavboostCrapsAgingDataAgingAgeBucket.html#module-attributes\">Data Aging Impressions and clicks<\/a><\/li>\n<\/ul>\n\n\n\n<p>Google appears to have ways to filter out clicks they don\u2019t want to count in their ranking systems, and include ones they do. They also seem to measure length of clicks (i.e. <a href=\"https:\/\/moz.com\/blog\/seo-satisfaction\">pogo-sticking<\/a> &#8211; when a searcher clicks a result and then quickly clicks the back button, unsatisfied by the answer they found) and impressions.<\/p>\n\n\n\n<p>Plenty has already been written about Google\u2019s use of click data, so I won\u2019t belabor the point. What matters is that Google has named and described features for that measurement, adding even more evidence to the pile.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">#2: Use of Chrome browser clickstreams to power Google Search<\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d541c738447&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d541c738447\" class=\"aligncenter size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"512\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/chrome-total-views-google-api-content-warehouse-1024x512.gif\" alt=\"\" class=\"wp-image-8959\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/chrome-total-views-google-api-content-warehouse-1024x512.gif 1024w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/chrome-total-views-google-api-content-warehouse-300x150.gif 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/chrome-total-views-google-api-content-warehouse-768x384.gif 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p>My anonymous source claimed that way back in 2005, Google wanted the full clickstream of billions of Internet users, and with Chrome, they\u2019ve now got it. The API documents suggest Google calculates several types of metrics that can be called using Chrome views related to both individual pages and entire domains.<\/p>\n\n\n\n<p><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.4.0\/GoogleApi.ContentWarehouse.V1.Model.QualitySitemapTargetGroup.html#module-attributes\">This document<\/a>, describing the features around how Google creates Sitelinks, is particularly interesting. It showcases a call named topUrl, which is \u201cA list of top urls with highest two_level_score, i.e., chrome_trans_clicks.\u201d My read is that Google likely uses the number of clicks on pages in Chrome browsers and uses that to determine the most popular\/important URLs on a site, which go into the calculation of which to include in the sitelinks feature.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"608\" height=\"505\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/image-13.png\" alt=\"\" class=\"wp-image-8961\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/image-13.png 608w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/image-13-300x249.png 300w\" sizes=\"auto, (max-width: 608px) 100vw, 608px\" \/><\/figure>\n<\/div>\n\n\n<p>E.G. In the above screenshot from Google\u2019s results, pages like \u201cPricing,\u201d the \u201cBlog,\u201d and the \u201cLogin\u201d pages are our most-visited, and Google knows this through their tracking of billions of Chrome users\u2019 clickstreams.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.QualityNsrNsrData.html#module-attributes\">Quality NSR Data module<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.VideoContentSearchVideoInfo.html#module-attributes\">Video Content Search module<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.4.0\/GoogleApi.ContentWarehouse.V1.Model.QualitySitemapTargetGroup.html#module-attributes\">Quality Sitemap module<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">#3: Whitelists in Travel, Covid, and Politics<\/h2>\n\n\n\n<p>A module on \u201cGood Quality Travel Sites\u201d would lead reasonable readers to conclude that a whitelist exists for Google in the travel sector (unclear if this is exclusively for Google\u2019s \u201cTravel\u201d search tab, or web search more broadly). References in several places to flags for \u201cisCovidLocalAuthority\u201d and \u201cisElectionAuthority\u201d further suggests that Google is whitelisting particular domains that are appropriate to show for highly controversial of potentially problematic queries.&nbsp;<\/p>\n\n\n\n<p>For example, following the 2020 US Presidential election, one candidate claimed (without evidence) that the election had been stolen, and encouraged their followers to storm the Capital and take potentially violent action against lawmakers, i.e. commit an insurrection.<\/p>\n\n\n\n<p>Google would almost certainly be one of the first places people turned to for information about this event, and if their search engine returned propaganda websites that inaccurately portrayed the election evidence, that could directly lead to more contention, violence, or even the end of US democracy. Those of us who want free and fair elections to continue should be very grateful Google\u2019s engineers are employing whitelists in this case.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.QualityNsrNsrData.html#module-attributes\">Quality NSR Data Attributes<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.AssistantApiSettingsMusicFilter.html#module-attributes\">Assistant API Settings for Music Filters<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.VideoContentSearchVideoGeneratedQueryFeatures.html#module-attributes\">Video Content Search Query Features<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.QualityTravelGoodSitesData.html\">Quality Travel Sites Data module<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">#4: Employing Quality Rater Feedback<\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d541c7395ed&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d541c7395ed\" class=\"aligncenter size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"703\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-quality-rater-ewok-warehouse-api-leak-1024x703.gif\" alt=\"\" class=\"wp-image-8966\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-quality-rater-ewok-warehouse-api-leak-1024x703.gif 1024w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-quality-rater-ewok-warehouse-api-leak-300x206.gif 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-quality-rater-ewok-warehouse-api-leak-768x527.gif 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p>Google has long had a quality rating platform called EWOK (Cyrus Shepard, a notable leader in the SEO space, spent several years contributing to this and <a href=\"https:\/\/zyppy.com\/seo\/google-search-quality-rater\/\">wrote about it here<\/a>). We now have evidence that some elements from the quality raters are used in the search systems.<\/p>\n\n\n\n<p>How influential these rater-based signals are, and what precisely they\u2019re used for is unclear to me in an initial read, but I suspect some thoughtful SEO detectives will dig into the leak, learn, and publish more about it. What I find fascinating is that scores and data generated by EWOK\u2019s quality raters may be <strong>directly involved<\/strong> in Google\u2019s search system, rather than simply a training set for experiments. Of course, it\u2019s possible these are \u201cjust for testing,\u201d but as you browse through the leaked documents, you\u2019ll find that when that\u2019s true, it\u2019s specifically called out in the notes and module details.<\/p>\n\n\n\n<p><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefDocLevelRelevanceRatings.html\">This one<\/a> calls out a \u201cper document relevance rating\u201d sourced from evaluations done via EWOK. There\u2019s no detailed notation, but it\u2019s not much of a logic-leap to imagine how important those human evaluations of websites really are.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"450\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/humanratings-entities-google-api-warehouse-leak-1024x450.gif\" alt=\"\" class=\"wp-image-8981\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/humanratings-entities-google-api-warehouse-leak-1024x450.gif 1024w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/humanratings-entities-google-api-warehouse-leak-300x132.gif 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/humanratings-entities-google-api-warehouse-leak-768x338.gif 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p>This one calls out &#8220;Human Ratings (e.g. ratings from EWOK)&#8221; and notes that they&#8217;re &#8220;typically only populated in the evaluation pipelines,&#8221; which suggests they may be primarily training data in this module (I&#8217;d argue that&#8217;s still a hugely important role, and marketers shouldn&#8217;t dismiss how important it is that quality raters perceive and rate their websites well).<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.4.0\/GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefMentionRatingsSingleMentionRating.html#module-attributes\">Webref Mention Ratings module<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.4.0\/GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefTaskData.html\">Webref Task Data module<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefDocLevelRelevanceRatings.html\">Document Level Relevance module<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.4.0\/GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefPerDocRelevanceRating.html\">Webref per Doc Relevance Rating module<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.4.0\/GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefEntityJoin.html\">Webref Entity Join<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">#5: Google Uses Click Data to Determine How to Weight Links in Rankings<\/h2>\n\n\n\n<p>This one&#8217;s fascinating, and comes directly from the anonymous source who first shared the leak. In their words: &#8220;Google has three buckets\/tiers for classifying their link indexes (low, medium, high quality). Click data is used to determine which link graph index tier a document belongs to. See <a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.4.0\/GoogleApi.ContentWarehouse.V1.Model.AnchorsAnchor.html\">SourceType here<\/a>, and <a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.4.0\/GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefDocumentMetadata.html\">TotalClicks here<\/a>.&#8221; In summary:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If Forbes.com\/Cats\/ has no clicks it goes into the low-quality index and the link is ignored<\/li>\n\n\n\n<li>If Forbes.com\/Dogs\/ has a high volume of clicks from verifiable devices (all the Chrome-related data discussed previously), it goes into the high-quality index and the link passes ranking signals<\/li>\n<\/ul>\n\n\n\n<p>Once the link becomes &#8220;trusted&#8221; because it belongs to a higher tier index, it can flow PageRank and anchors, or be filtered\/demoted by link spam systems. Links from the low-quality link index won&#8217;t hurt a site&#8217;s ranking; they are merely <a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.4.0\/GoogleApi.ContentWarehouse.V1.Model.IndexingDocjoinerAnchorSpamInfo.html\">ignored<\/a>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Big Picture Takeaways for Marketers who Care About Organic Search Traffic<\/h2>\n\n\n\n<p>If you care strategically about the value of organic search traffic, but don\u2019t have much use for the technical details of how Google works, this section\u2019s for you. It\u2019s my attempt to sum up much of Google\u2019s evolution from the period this leak covers: 2005 &#8211; 2023, and I won\u2019t limit myself exclusively to confirmed elements of the leak.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Brand matters more than anything else<br><\/strong>Google has numerous ways to identify entities, sort, rank, filter, and employ them. Entities include brands (brand names, their official websites, associated social accounts, etc.), and as we\u2019ve <a href=\"https:\/\/sparktoro.com\/blog\/who-sends-traffic-on-the-web-and-how-much-new-research-from-datos-sparktoro\/\">seen in our clickstream research with Datos<\/a>, they\u2019ve been on an inexorable path toward exclusively ranking and sending traffic to big, powerful brands that dominate the web &gt; small, independent sites and businesses.<br><br>If there was one universal piece of advice I had for marketers seeking to broadly improve their organic search rankings and traffic, it would be: &#8220;Build a notable, popular, well-recognized brand in your space, outside of Google search.&#8221;<br><\/li>\n\n\n\n<li><strong>Experience, expertise, authoritativeness, and trustworthiness <\/strong>(\u201cE-E-A-T\u201d)<strong> might not matter as directly as some SEOs think.<\/strong><br>The only mention of topical expertise in the leak we\u2019ve found so far is a brief <a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.AppsPeopleOzExternalMergedpeopleapiMapsExtendedData.html#module-attributes\">notation<\/a> about Google Maps review contributions. The other aspects of E-E-A-T are either buried, indirect, labeled in hard-to-identify ways, or, more likely (in my opinion) correlated with things Google uses and cares about, but not specific elements of the ranking systems.<br><br>As Mike <a href=\"https:\/\/ipullrank.com\/google-algo-leak\">noted<\/a> in his article, there is documentation in the leak suggesting Google can identify authors and treats them as entities in the system. Building up one\u2019s influence as an author online may indeed lead to ranking benefits in Google. But what exactly in the ranking systems makes up \u201cE-E-A-T\u201d and how powerful those elements are is an open question. I\u2019m a bit worried that E-E-A-T is 80% propaganda, 20% substance. There are plenty of powerful brands that rank remarkably well in Google and have very little experience, expertise, authoritativeness, or trustworthiness, as <a href=\"https:\/\/housefresh.com\/david-vs-digital-goliaths\/\">HouseFresh&#8217;s recent, viral article<\/a> details in depth.<br><\/li>\n\n\n\n<li><strong>Content and links are secondary when user intention around navigation (and the patterns that intent creates) are present.<\/strong><br>Let&#8217;s say, for example, that many people in the Seattle area search for &#8220;Lehman Brothers&#8221; and scroll to page 2, 3, or 4 of the search results until they find the theatre listing for the Lehman Brother stage production, then click that result. Fairly quickly, Google will learn that&#8217;s what searchers for those words in that area want.<br><br>Even if the Wikipedia article about Lehman Brothers&#8217; role in the financial crisis of 2008 were to invest heavily in link building and content optimization, it&#8217;s unlikely they could outrank the user-intent signals (calculated from queries and clicks) of Seattle&#8217;s theatre-goers.<br><br>Extending this example to the broader web and search as a whole, if you can create demand for your website among enough likely searchers in the regions you&#8217;re targeting, you may be able to end-around the need for classic on-and-off-page SEO signals like links, anchor text, optimized content, and the like. The power of Navboost and the intent of users is likely the most powerful ranking factor in Google&#8217;s systems. As Google VP Alexander Grushetsky put it in <a href=\"http:\/\/chrome-extension:\/\/efaidnbmnnnibpcajpcglclefindmkaj\/https:\/\/www.justice.gov\/d9\/2023-11\/417828.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">a 2019 email to other Google execs<\/a> (including Danny Sullivan and Pandu Nayak): <br><br>&#8220;<em>We already know, one signal could be more powerful than the whole big system on a given metric. For example, I&#8217;m pretty sure that NavBoost alone was \/ is more positive on clicks (and likely even on precision \/ utility metrics) by itself than the rest of ranking (BTW, engineers outside of Navboost team used to be also not happy about the power of Navboost, and the fact it was &#8220;stealing wins&#8221;)<\/em>&#8220;<br><br>Those seeking even more confirmation could review Google engineer <a href=\"https:\/\/avalonlibrary.net\/Project_Veritas_Google_document_dump\/Fake%20News\/Paul%20Haahr_%20Google%20Resume.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Paul Haahr&#8217;s detailed resume<\/a>, which states:<br><br><em>&#8220;I&#8217;m the manager for logs-based ranking projects. The team&#8217;s efforts are currently split among four areas: 1) Navboost. This is already one of Google&#8217;s strongest ranking signals. Current work is on automation in building new navboost data;&#8221;<\/em><br><\/li>\n\n\n\n<li><strong>Classic ranking factors: PageRank, anchors (topical PageRank based on the anchor text of the link), and text-matching have been waning in importance for years. But Page Titles are still quite important.<\/strong><br>This is a finding from Mike&#8217;s excellent analysis that I&#8217;d be foolish not to call out here. PageRank still appears to have a place in search indexing and rankings, but it&#8217;s almost certainly evolved from the original 1998 paper. The document leak insinuates multiple versions of PageRank (<a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.CompositeDocForwardingDup.html#module-attributes\">rawPagerank<\/a>, a deprecated <a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.GeostoreUrlProto.html#module-attributes\">PageRank referencing &#8220;nearest seeds,&#8221;<\/a> <a href=\"https:\/\/hexdocs.pm\/google_api_content_warehouse\/0.3.0\/GoogleApi.ContentWarehouse.V1.Model.IndexingSignalAggregatorAgeWeightedCoverageData.html#module-attributes\">firstCoveragePageRank<\/a> from when the document was first served, etc.) have been created and discarded over the years. And anchor text links, while present in the leak, don&#8217;t seem to be as crucial or omnipresent as I&#8217;d have expected from my earlier years in SEO.<br><\/li>\n\n\n\n<li><strong>For most small and medium businesses and newer creators\/publishers, SEO is likely to show poor returns until you\u2019ve established credibility, navigational demand, and a strong reputation among a sizable audience.<\/strong><br>SEO is a big brand, popular domain&#8217;s game. As an entrepreneur, I&#8217;m not ignoring SEO, but I strongly expect that for the years ahead, until\/unless SparkToro becomes a much larger, more popular, more searched-for and clicked-on brand in its industry, this website will continue to be outranked, even for its original content, by aggregators and publishers who&#8217;ve existed for 10+ years.<br><br>This is almost certainly true for other creators, publishers, and SMBs. The content you create is unlikely to perform well in Google if competition from big, popular websites with well-known brands exists. Google no longer rewards scrappy, clever, SEO-savvy operators who know all the right tricks. They reward established brands, search-measurable forms of popularity, and established domains that searchers already know and click. From 1998 &#8211; 2018 (or so), one could reasonable start a powerful marketing flywheel with SEO for Google. In 2024, I don&#8217;t think that&#8217;s realistic, at least, not on the English-language web in competitive sectors.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Next Steps for the Search Industry<\/h2>\n\n\n\n<p>I\u2019m excited to see how practitioners with more recent experience and deeper technical knowledge go about analyzing this leak. I encourage anyone curious to dig into the documentation, attempt to connect it to other public documents, statements, testimony, and ranking experiments, then publish their findings. For example, Emina Demiri-Watson <a href=\"https:\/\/www.womenintechseo.com\/knowledge\/digging-into-the-video-content-modules-of-googles-api-leak\/\">published a superb analysis of the video content modules<\/a> in the leak, exposing how Google Videos and YouTube rank results.<\/p>\n\n\n\n<p>Historically, some of the search industry\u2019s loudest voices and most prolific publishers have been happy to uncritically repeat Google\u2019s public statements. They write headlines like \u201cGoogle says XYZ is true,\u201d rather than \u201cGoogle Claims XYZ; Evidence Suggests Otherwise.\u201d<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1018\" height=\"309\" src=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/image-14.png\" alt=\"\" class=\"wp-image-8984\" style=\"width:600px\" srcset=\"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/image-14.png 1018w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/image-14-300x91.png 300w, https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/image-14-768x233.png 768w\" sizes=\"auto, (max-width: 1018px) 100vw, 1018px\" \/><figcaption class=\"wp-element-caption\">The SEO industry doesn&#8217;t benefit from these kinds of headlines<\/figcaption><\/figure>\n<\/div>\n\n\n<p>Please, do better. If this leak and the DOJ trial can create just one change, I hope this is it.<\/p>\n\n\n\n<p>When those new to the field read Search Engine Roundtable, Search Engine Land, SE Journal, and the many agency blogs and websites that cover the SEO field\u2019s news, they don\u2019t necessarily know how seriously to take Google\u2019s statements. Journalists and authors should not presume that readers are savvy enough to know that dozens or hundreds of past public comments by Google\u2019s official representatives were later proven wrong.<\/p>\n\n\n\n<p>This obligation isn\u2019t just about helping the search industry\u2014it\u2019s about helping the whole world. Google is one of the most powerful, influential forces for the spread of information and commerce on this planet. Only recently have they been held to some account by governments and reporters. The work of journalists and writers in the search marketing field carries weight in the courts of public opinion, in the halls of elected officials, and in the hearts of Google employees, all of whom have the power to change things for the better or ignore them at our collective peril.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Thank you to Mike King for his invaluable help on this document leak story, to <a href=\"https:\/\/amandanat.com\/\">Amanda Natividad<\/a> for editing help, and to the anonymous source who shared this leak with me. I expect that updates to this piece may arrive over the next few days and weeks as it reaches more eyeballs. If you have findings that support or contradict statements I\u2019ve made here, please feel free to share them in the comments below.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Update: We are diving deep into the Google API leak and what it means for marketers in our next episode of SparkToro Office Hours on June 27. Join Michael King, Founder &amp; CEO, iPullRank, and Rand Fishkin, Co-founder &amp; CEO, SparkToro, to learn more. RSVP for the free webinar. On Sunday, May 5th, I received<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8,24,56],"tags":[],"class_list":["post-8936","post","type-post","status-publish","format-standard","hentry","category-data","category-industry","category-seo"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them - SparkToro<\/title>\n<meta name=\"description\" content=\"Update: We are diving deep into the Google API leak and what it means for marketers in our next episode of SparkToro Office Hours on June 27. Join Michael\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them - SparkToro\" \/>\n<meta property=\"og:description\" content=\"Update: We are diving deep into the Google API leak and what it means for marketers in our next episode of SparkToro Office Hours on June 27. Join Michael\" \/>\n<meta property=\"og:url\" content=\"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/\" \/>\n<meta property=\"og:site_name\" content=\"SparkToro\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/sparktoro\" \/>\n<meta property=\"article:published_time\" content=\"2024-05-28T02:20:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-07-03T19:28:32+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized-1024x604.jpg\" \/>\n<meta name=\"author\" content=\"Rand Fishkin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@sparktoro\" \/>\n<meta name=\"twitter:site\" content=\"@sparktoro\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rand Fishkin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"26 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/\"},\"author\":{\"name\":\"Rand Fishkin\",\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/#\\\/schema\\\/person\\\/2acd0781db4d1905cecfec9bea406570\"},\"headline\":\"An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them\",\"datePublished\":\"2024-05-28T02:20:52+00:00\",\"dateModified\":\"2024-07-03T19:28:32+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/\"},\"wordCount\":5539,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/google-api-content-warehouse-leak-conversation-anonymized-1024x604.jpg\",\"articleSection\":[\"Data\",\"Industry\",\"SEO\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/\",\"url\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/\",\"name\":\"An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them - SparkToro\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/google-api-content-warehouse-leak-conversation-anonymized-1024x604.jpg\",\"datePublished\":\"2024-05-28T02:20:52+00:00\",\"dateModified\":\"2024-07-03T19:28:32+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/#\\\/schema\\\/person\\\/2acd0781db4d1905cecfec9bea406570\"},\"description\":\"Update: We are diving deep into the Google API leak and what it means for marketers in our next episode of SparkToro Office Hours on June 27. Join Michael\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\\\/#primaryimage\",\"url\":\"https:\\\/\\\/images.sparktoro.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/google-api-content-warehouse-leak-conversation-anonymized.jpg\",\"contentUrl\":\"https:\\\/\\\/images.sparktoro.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/google-api-content-warehouse-leak-conversation-anonymized.jpg\",\"width\":1126,\"height\":664},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/\",\"name\":\"SparkToro\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/#\\\/schema\\\/person\\\/2acd0781db4d1905cecfec9bea406570\",\"name\":\"Rand Fishkin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/67da22d493ce011f93cef084e8e9132d009216229c75909ed1ae4d1482d164b6?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/67da22d493ce011f93cef084e8e9132d009216229c75909ed1ae4d1482d164b6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/67da22d493ce011f93cef084e8e9132d009216229c75909ed1ae4d1482d164b6?s=96&d=mm&r=g\",\"caption\":\"Rand Fishkin\"},\"url\":\"https:\\\/\\\/sparktoro.com\\\/blog\\\/author\\\/rand\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them - SparkToro","description":"Update: We are diving deep into the Google API leak and what it means for marketers in our next episode of SparkToro Office Hours on June 27. Join Michael","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/","og_locale":"en_US","og_type":"article","og_title":"An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them - SparkToro","og_description":"Update: We are diving deep into the Google API leak and what it means for marketers in our next episode of SparkToro Office Hours on June 27. Join Michael","og_url":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/","og_site_name":"SparkToro","article_publisher":"https:\/\/facebook.com\/sparktoro","article_published_time":"2024-05-28T02:20:52+00:00","article_modified_time":"2024-07-03T19:28:32+00:00","og_image":[{"url":"https:\/\/sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized-1024x604.jpg","type":"","width":"","height":""}],"author":"Rand Fishkin","twitter_card":"summary_large_image","twitter_creator":"@sparktoro","twitter_site":"@sparktoro","twitter_misc":{"Written by":"Rand Fishkin","Est. reading time":"26 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/#article","isPartOf":{"@id":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/"},"author":{"name":"Rand Fishkin","@id":"https:\/\/sparktoro.com\/blog\/#\/schema\/person\/2acd0781db4d1905cecfec9bea406570"},"headline":"An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them","datePublished":"2024-05-28T02:20:52+00:00","dateModified":"2024-07-03T19:28:32+00:00","mainEntityOfPage":{"@id":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/"},"wordCount":5539,"commentCount":0,"image":{"@id":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/#primaryimage"},"thumbnailUrl":"https:\/\/sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized-1024x604.jpg","articleSection":["Data","Industry","SEO"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/","url":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/","name":"An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them - SparkToro","isPartOf":{"@id":"https:\/\/sparktoro.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/#primaryimage"},"image":{"@id":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/#primaryimage"},"thumbnailUrl":"https:\/\/sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized-1024x604.jpg","datePublished":"2024-05-28T02:20:52+00:00","dateModified":"2024-07-03T19:28:32+00:00","author":{"@id":"https:\/\/sparktoro.com\/blog\/#\/schema\/person\/2acd0781db4d1905cecfec9bea406570"},"description":"Update: We are diving deep into the Google API leak and what it means for marketers in our next episode of SparkToro Office Hours on June 27. Join Michael","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/sparktoro.com\/blog\/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them\/#primaryimage","url":"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized.jpg","contentUrl":"https:\/\/images.sparktoro.com\/blog\/wp-content\/uploads\/2024\/05\/google-api-content-warehouse-leak-conversation-anonymized.jpg","width":1126,"height":664},{"@type":"WebSite","@id":"https:\/\/sparktoro.com\/blog\/#website","url":"https:\/\/sparktoro.com\/blog\/","name":"SparkToro","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/sparktoro.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/sparktoro.com\/blog\/#\/schema\/person\/2acd0781db4d1905cecfec9bea406570","name":"Rand Fishkin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/67da22d493ce011f93cef084e8e9132d009216229c75909ed1ae4d1482d164b6?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/67da22d493ce011f93cef084e8e9132d009216229c75909ed1ae4d1482d164b6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/67da22d493ce011f93cef084e8e9132d009216229c75909ed1ae4d1482d164b6?s=96&d=mm&r=g","caption":"Rand Fishkin"},"url":"https:\/\/sparktoro.com\/blog\/author\/rand\/"}]}},"_links":{"self":[{"href":"https:\/\/sparktoro.com\/blog\/wp-json\/wp\/v2\/posts\/8936","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sparktoro.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sparktoro.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sparktoro.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sparktoro.com\/blog\/wp-json\/wp\/v2\/comments?post=8936"}],"version-history":[{"count":63,"href":"https:\/\/sparktoro.com\/blog\/wp-json\/wp\/v2\/posts\/8936\/revisions"}],"predecessor-version":[{"id":9139,"href":"https:\/\/sparktoro.com\/blog\/wp-json\/wp\/v2\/posts\/8936\/revisions\/9139"}],"wp:attachment":[{"href":"https:\/\/sparktoro.com\/blog\/wp-json\/wp\/v2\/media?parent=8936"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sparktoro.com\/blog\/wp-json\/wp\/v2\/categories?post=8936"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sparktoro.com\/blog\/wp-json\/wp\/v2\/tags?post=8936"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}