New Research: We analyzed 332 million queries over 21 months to uncover never-before-published data on how people use Google

Google is a massive presence on the modern web. They’ve held 90%+ of the global search market share for 15 years. Even the rise of AI tools has been unable to budge that number (so far). They get <9% of all web visits, but send more than 60% of all referring traffic to other sites.

Google is certainly one of the most talked-about, profitable, powerful, and regulatory-resistant companies in the world. But, somehow, in all their years of operation, no one’s attempted to credibly answer key questions about what people use Google to do. Questions like:

  • What percent of all Google searches are for brands vs. generic terms?
  • What percent of Google searches are navigational (people just using the search engine to get somewhere else on the web)?
  • What percent are commercial (i.e. have a financial, business services, or commerce intent) or transactional (i.e. are directly looking to buy something)?
  • What percent are informational (looking up weather, traffic, news, history, painting techniques, or billions of other purely reference-focused answers)?
  • What’s the breakdown of topics people search for in Google? Is adult content a huge portion? Do people search for jobs & career topics as much as they do home & garden or arts & entertainment?
  • And, the question on everyone’s mind, what percent of all Google searches include the words “Paul Rudd?”

Thanks to our partners at Datos (a Semrush company), we’ve got the first-ever high quality answers to all of these, and more.

Datos furnished us with Google search data from a subset of their multi-million person panel: ~130K US devices, mobile and desktop, who were actively using Google for 21 consecutive months: January 2023 – September 2024. For a variety of reasons, Datos limited the queries provided to only those with 100+ Google searches over the 21 month period (though we also have an analysis of all search keywords for a single month below).

The distribution of query terms looks like this:

The data displayed in this report has been provided by Datos, A Semrush Company. The analysis is based on Datos’s US panel, representing a diverse and statistically significant sample of users, and covers the months of (01/23-09/24). For further information please visit Datos’s website and its Privacy Policy.

Even this summary graphic reveals an incredible insight into how people use Google, namely the incredible concentration of volume at the top of the demand curve (those 100 query terms with 155K+ searches).

Following are a dozen charts and graphs breaking down this study step by step, but for those who prefer, we’re also offering a SparkToro Office Hours webinar on Monday, December 9th to uncover even more from this remarkable study.

I encourage you to read the full report, but if you’d like to skip around or reference a particular section, here’s the table of contents:

Breakdown of Methodology

There have been attempts at similar research from the search marketing community before, but the methodologies were always problematic. That’s because if you start with a set of websites’ traffic data (prior to 2013, Google told websites which search terms sent traffic), you’ll get a sample biased by not only the sites themselves, but also by queries that Google answers themselves (zero-click answers are ~60% of all searches). For similar reasons, we can’t use a giant group of keywords from a commercial dataset either (because of selection bias, automated searches skewing volume, and the blind spot of new and long-tail search terms).

What’s needed is a panel of thousands of devices, mobile and desktop, that are confirmed to be owned and used by real people (no bots) and aggregation of every Google search performed on those devices across many months. That’s the magic Datos provides. They’re a source of clickstream information, and thus are able to “see” the queries people make in Google and the URLs they visit before/after, even if the browser never leaves Google’s website.

There are some limitations, of course:

  • Searches in the Google App are not counted in this methodology (those in browsers, whether mobile or desktop, are included)
  • For this study, we did not include searches that started anywhere except Google web search (i.e. not Google Shopping, News, Hotels, Maps, Images, Videos, etc.). We hope to look at these in future studies together.
  • So-called “negative-click searches” where Google displays answers instantly in the search bar before a query is completed aren’t counted, either.
  • Searches performed in Google’s Gemini AI tool aren’t included—only Google web search

In order to analyze and classify more than 320,000 query terms, I enlisted the help of LLMs. Specifically, ChatGPT’s 4o-mini API, which proved to be accurate, fast, and relatively affordable. My process was:

  • Hand-classify 1,000 searches on the various vectors: branded vs. generic, navigational/informational/commercial/transactional, and topic classification (into 24 categories of human interest)
  • Using the GPTforSheets plugin and a budget of ~$100, run each of the major services (Claude, ChatGPT, Gemini, and Mistral) and the various models they offer across the 1,000 hand-classified keywords.
  • After much banging of head against desk, attempting to find a high-quality model and prompt combination, I recruited the help of an AI expert, DataSci101’s Britney Muller (BTW – she’s offering an Actionable AI for Marketers Course that I can’t recommend enough).
  • Britney helped me unlock the key here: more descriptive classification categories > more descriptive prompts. For example, by changing the search intent classification prompt from “Navigational” to “Navigating to a particular website,” I was able to get far greater accuracy. I’ll provide all the prompts used in a future “how to classify search query terms” post.
  • Compare the accuracy of the LLM classification to my hand-classification.
  • The Results: after a great deal of trial & error, GPT 4o-mini consistently achieved 96% accuracy! Total budget ~$500 (including the experimentation process)

From there, I ran all 332K+ keyword rows through each of the 4 prompting classifications, hand-checked a few hundred to be sure accuracy remained high, and produced the graphs, charts, and data you’ll find below.

Distribution of Google’s Search Demand

Simply looking at the quantity of searches for each query term (what search marketers call “keywords”) is fascinating. Across the dataset, our ~130K device panel made 331,697,810 searches for 320,775 unique query terms. That’s ~121 Google searches/searcher/month (but don’t forget, that’s not inclusive of queries with <100 total searches over the 21 months). If we were to include that long tail, it would be closer to ~200 Google searches per searcher per month, which aligns perfectly with our previous research on AI tools vs. Google search.

Datos also did us a big favor by excluding multiple searches from the same device in the same 24-hour period from the dataset. So, if a searcher in Datos’ panel Googled “mole verde oaxaqueño” ten times in a day (which I may have done this past weekend while shopping at three different grocery stores for the hard-to-acquire ingredients), it would only count as a single search.

There are a lot of ways to visualize distributions like this, but I found this show off just how big the “fat head” of Google’s demand curve is.

I wish I could compare this to 10 or 20 years ago, because I’d put a lot of money on that head being much bigger than it ever has been in the past. Sadly, more of the web’s traffic is going to fewer and fewer sites over time, and Google is responsible for a massive portion of all navigational traffic.

Here’s what happens if we put all 332 million searches into buckets of 10,000 query terms:

The top 10K query terms are 46% of all search demand, and a shocking 148 of those makes up almost 15% of total volume (for queries w/ at least 100 searches) over the 21-month study period. Query terms like:

  • YouTube
  • Gmail
  • Amazon
  • Facebook
  • ChatGPT
  • Google Translate
  • WhatsApp Web
  • Google Maps
  • Pornhub (I know, I know, you’re shocked and appalled – more on adult content searches below)
  • Google Docs
  • Instagram
  • Weather
  • Netflix
  • Speed Test
  • Calculator

These kinds of primarily-navigational queries are what many of us search for dozens of times a month.

“What about the long tail?” you ask… Good news. For the single month of September 2024, Datos provided me with a full list of every search performed by the panel (I conducted my analysis and promptly deleted it, so don’t ask!). That dataset is equally fascinating:

In total, those 612,981 single-search keywords were almost double the volume of all searches with 2-10 queries, yet made up a paltry 2.2% of total search volume.

The Long Tail is almost unimaginably long. Every searcher in the panel performed, on average, 5 searches that no other person performed even once (alternative theory: there are a handful of *really weird* searchers who skew the query distribution). Here’s a different look at September’s data in query-volume-segmented buckets that highlights it:

Out of 1.05M query terms employed by our ~130K device panel, more than half (59%) had only a single search.

Google has a statistic they’ve long promoted: “15% of searches we see every day are new.”

I believe them, but I don’t think that’s purely the result of the long tail phenomenon. Seeing this data, I’m now convinced that it’s because artists put out new musical albums that get millions of searches the day they’re announced, new video games spur the same query patterns, new natural disasters strike, new previously-unimaginable political appointments are made, and 500,000 other events occur every day that no one would have searched for before because no one had a reason to perform that search.

My opinion changed the moment I rendered the graph above. If 130K searchers’ most unusual, searched-only-once-in-the-month query terms make up just 2.2% of total volume, that 15% of unique queries Google talks about must be coming from a different part of the demand curve.

Pretty cool how data can change your understanding of a phenomenon, right?

Branded vs. Generic Searches

The first big question I needed an LLM’s help to solve was around brands. Is Google mostly a destination where people seek out websites, brands, companies, people, and products they already know? Or is the majority of search unbranded, generic queries from brand-agnostic information seekers?

Before I show you the data, I’ll tell you what my best guess was, and you can make your own. I estimated that approximately 2/3rds of Google searches would be for brands, and 1/3rd for generic query terms.

I was dead wrong.

Google’s searchers query far fewer branded terms than I expected (though probably much more than many search marketers assume). Just over 44% of Google searches are for branded terms.

Interestingly, the volume of branded keywords is considerably less:

But, because branded terms tend to have more volume per query than unbranded searches, the balance gets closer to 50/50.

In a future post, I hope to break this down in a few industries and product sectors, but roughly speaking I can say that in many commercial, especially B2C sectors, brand outweighs generic by even more than the 2/3rds ratio I estimated. It’s just that a lot of Google searches are informational rather than navigational, commercial, or transactional.

And speaking of search intent types…

Distribution of Search Intent

Seven years ago, I left the world of SEO to start SparkToro in the audience research sector. But before that, I spent 17 years, the entirety of my adult, professional life, living and breathing search. And yet, somehow, I never answered possibly the most important, broad question any outsider would ask about how Google is used:

What are people looking for when they search the web?

Cue head-smacking moment. That this research evaded me (and the entire sector I worked in) for so long is an embarrassment. It would be like working in real estate data and never looking into the distribution of single-family vs. multi-family vs. commercial vs. industrial demand… Bonkers.

But, at last, we’ve got an answer:

I find this absolutely fascinating and powerfully revealing:

  • Just over half of all Google searches are informational in nature—they’re people seeking information about a topic of interest or need.
  • A third of Google searches are navigational—these folks already know where they want to go and are simply using Google to get them there more accurately, efficiently, or safely than typing in a web address manually.
  • A further 14.5% of searches are of a commercial nature, though they may not necessarily be looking to immediately transact. Most product comparison or product information searches (B2B and B2C) fall into this category.
  • And finally, transactional keywords that indicate likely intent to purchase something, sign up for something, or engage an entity for services comprise just 0.69% of all searches (though these are likely among the most desirable and valuable query terms among search marketers)

Once again, the volume of keywords and queries differs dramatically:

Navigational searches are (unsurprisingly) far higher in query volume than in total number of unique query terms.

Topical Classification of Google Search Demand

No lies, I LOVED this portion of the analysis. It blew many of my previous assumptions about the makeup of Google’s search demand to smithereens, and I love when data does that.

Previously, I’d have thought searches in sector like News & Media, Food & Drink, and Adult Content (c’mon, it’s the Internet) would be at the top of a chart like this. And instead we find…

Actors, movies, television shows, musical artists, and video games are so much bigger than I ever would have imagined. No longer will I wonder why Google puts so much energy into creating portal-like experiences for these queries. Fully 25% of Google is just these few, entertainment-focused topics!

If you’re a website that monetizes by selling CPM ads (i.e., you get paid based on traffic, and care little about what kind), this breakdown likely doesn’t matter much. But, if you’re wondering how much of Google’s search volume is monetizable beyond low-paying display and retargeting ads, the answer is likely between 60-70%.

The rest of Google’s search ecosystem is in fields like Arts & Entertainment, Games, Science & Education, Adult Content, Reference Materials which are not un-monetizable by any means (Games alone is a multi-hundred-billion dollar industry), but are comprised of query terms that are very difficult for all but a handful of publishers, websites, or businesses to get value from. It’s hard to sell Stardew Valley t-shirts to people searching for Stardew Valley, tough to convince even a die-hard Jack Black fan to see every TV cartoon he’s done a voice for, and next-to-impossible to make money from a search for “Taylor Swift Instagram,” unless you’re Taylor Swift.

Also, I don’t know about you, but I’m pleasantly surprised to see Adult Content searches (which include a healthy number of navigational queries to websites that I did not know contained such material. Note to Geraldine: your husband’s browser history may look a bit weird the last two months) all the way down at 3.6%. I’d heard stats years ago that upwards of 20% of Internet activity was in this category, so either y’all are moving on or you’re taking more precautions about how you navigate to these sites 🤐.

Somewhat less interesting is the breakdown of number of queries vs. overall search volume. Nonetheless, I’ve visualized it for those who might want to show it at a board meeting or in an industry presentation:

The most visible change is the delta with social media, which is 4.22% of search volume, but only 1.27% of unique keywords. That’s because so many people search for the networks navigationally, e.g. “Facebook,” “Tik Tok,” “Reddit,” etc.

Key Takeaways

This study was one of the most time-consuming, intensive, and educational experiences I’ve had with big data in my career, and while I expect many will draw their own conclusions, my big ones are:

#1: Google has become every brand’s home page

I don’t mean folks are setting Google as their browser homepage (though many are). I mean that fully one-third of Google’s usage is purely navigational, and your brand’s search results in Google are likely to be the first and biggest-impact impression people have about who you are and what you do. Reputation manage those SERPs carefully, friends.

#2: Google’s zero-click answers, including AI overviews have already taken over many of the largest categories of search demand

Arts & Entertainment, Games, Sports, Finance, and Reference queries, 5 of the 9 biggest categories of query demand, are dominated by Google’s own answers, often with media-portal-like customization of results (see #6 below). I don’t see Google stopping there (unless the DOJ case is successful). If you’re reliant on traffic from unbranded queries in a category Google can enter, I’d work hard and fast to develop other ways people can find your brand.

#3: A few thousand query terms make up a quarter of all Google searches, and that looks to be rising over time.

Yes, there’s still a massive number of unique query terms, but the “Long Tail” (<11 searches/month) is only 3.6% of demand. Monopolies are winning. Big brands and topics are winning. People are gravitating to a smaller number of less diverse destinations and ideas. Little wonder big companies and rich individuals are getting a larger and larger share of the economic and attention pie around the world.

#4: It is far better to be the brand your target customers search for than it is to do search marketing for the millions of query permutations in a space (especially since your biggest competitor is often Google themselves)

From 1997-2017, search was an excellent place to start marketing. Today, search is largely a reward for doing marketing right everywhere else.

That’s not to say any person or company should ignore Google’s potential as a channel, but I’d certainly urge marketers to put their eggs in many baskets, distribute their content and brand everywhere their audiences pay attention, and not rely on SEO or PPC to carry your marketing efforts.

#5: Google has become a place people go *after* they discover a need rather than a demand-creation or even demand-nudging platform

Scrolling through hundreds of thousands of searches will give you the distinct impression that search is what people do after they realize a need rather than a place they discover brands and services to investigate.

The classification of queries into topics and intent reinforces my theory (above) that buyer journeys have shifted to discovery happening in places like social media feeds, YouTube, podcasts, industry news sites, Google Discover and Apple News, email newsletters, and conferences/events.

#6: Paul Rudd is a popular dude

There’s also my wife, Geraldine’s, favorite bit of information from this study, which she insisted I share with you all: Paul Rudd appears in 0.00184% of all Google searches by our panel, and since this is a statistically significant sample size, we can likely say with confidence that for every 100,000 Google searches you perform, you’ll look up Paul Rudd (probably because you can’t believe he’s actually that young) at least twice.

Technically, though, Taylor Swift was the most searched-for person in the dataset 😉

P.S. Don’t forget to register here for even more fascinating data and discussions to be revealed in our SparkToro Office Hours webinar on Monday, December 9.