Even the most casual video aficionado knows YouTube (acquired by Google in 2006). As a Google search user, you may even feel like you encounter more YouTube videos than videos from other sources, but does the data back this up?
A Wall Street Journal article in June 2020 measured a strong advantage of YouTube in Google search results, but that article focused on 98 hand-selected videos to compare YouTube to other platforms.
Using a set of over two million Google.com (US) desktop searches captured in early October 2020, we were able to extract more than 250,000 results with video carousels on page one. Most organic video results in 2020 appear in a carousel, like this one:
This carousel appeared on a search for “How to be an investor” (Step 1: Find a bag of money). Notice the arrow on the far-right — currently, searchers can scroll through up to ten videos. While our research tracked all ten positions, most of this report will focus on the three visible positions.
Anecdotally, we see YouTube pop up a lot in Google results, but how dominant are they in the visible three video carousel results across our data set? Here’s a breakdown:
YouTube’s presence across the first three video slots was remarkably consistent, at (1) 94.1%, (2) 94.2% and (3) 94.2%. Khan Academy and Facebook took the #2 and #3 rankings for each carousel slot, with Facebook gaining share in later slots.
Obviously, this is a massive drop from the first to second largest share, and YouTube’s presence only varied from 94.1% to 95.1% across all ten slots. Across all visible videos in the carousel, here are the top ten sites in our data set:
Note that, due to technical limitations with how search spiders work, many Facebook and Twitter videos require a login and are unavailable to Google. That said, the #2 to #10 biggest players in the video carousel — including some massive brands with deep pockets for video content — add up to only 3.7% of visible videos.
Pardon my grammar, but “How to…?” questions have become a hot spot for video results, and naturally lend themselves to niche players like HGTV. Here’s a video carousel from a search for “how to organize a pantry”:
It looks promising on the surface, but does this niche show more diversity of websites at scale? Our data set included just over 45,000 “How to …” searches with video carousels. Here’s the breakdown of the top three sites for each slot:
In our data set, YouTube is even more dominant in the how-to niche, taking up from 97-98% of each of the three visible slots. Khan Academy came in second, and Microsoft (specifically, the Microsoft support site) rounded out the third position (but at <1% in all three slots).
Most of this analysis was based on a snapshot of data in early October. Given that Google frequently makes changes and runs thousands of tests per year, could we have just picked a particularly unusual day? To answer that, we pulled YouTube’s prevalence across all videos in the carousel on the first day of each month of 2020:
YouTube’s dominance was fairly steady across 2020, ranging from 92.0% to 95.3% in our data set (and actually increasing a bit since January). Clearly, this is not a temporary nor particularly recent condition.
Another challenge in studying Google results, even with large data sets, is the possibility of sampling bias. There is no truly “random” sample of search results (more on that in Appendix A), but we’re lucky enough to have a second data set with a long history. While this data set is only 10,000 keywords, it was specifically designed to evenly represent the industry categories in Google Ads. On October 9, we were able to capture 2,390 video carousels from this data set. Here’s how they measured up:
The top three sites in each of the carousel slots were identical to the 2M-keyword data set, and YouTube’s dominance was even higher (up from 94% to 96%). We have every confidence that the prevalence of YouTube results measured in this study is not a fluke of a single day or a single data set.
Does YouTube have an unfair advantage? “Fair” is a difficult concept to quantify, so let’s explore Google’s perspective.
Google’s first argument would probably be that YouTube has the lion’s share of video results because they host the lion’s share of videos. Unfortunately, it’s hard to get reliable numbers across the entire world of video hosting, and especially for social platforms. YouTube is undoubtedly a massive player and likely hosts the majority of non-social, public videos in the United States, but 94% seems like a big share even for the lion.
The larger problem is that this dominance becomes self-perpetuating. Over the past few years, more major companies have hosted videos on YouTube and created YouTube channels because it’s easier to get results in Google search than hosting on smaller platforms or their own site.
Google’s more technical argument is that the video search algorithm has no inherent preference for YouTube. As a search marketer, I’ve learned to view this argument narrowly. There’s probably not a line of code in the algorithm that says something like:
IF site = ‘YouTube’ THEN ranking = 1
Defined narrowly, I believe that Google is telling the truth. However, there’s no escaping the fact that Google and YouTube share a common backbone and many of the same internal organs, which provides advantages that may be insurmountable.
For example, Google’s video algorithm might reward speed. This makes sense — a slow-loading video is a bad customer experience and makes Google look bad. Naturally, Google’s direct ownership over YouTube means that their access to YouTube data is lightning fast. Realistically, how can a competitor, even with billions in investment, produce an experience that’s faster than a direct pipeline to Google? Likewise, YouTube’s data structure is naturally going to be optimized for Google to easily process and digest, relying on inside knowledge that might not be equally available to all players.
For now, from a marketing perspective, we’re left with little choice but to cover our bases and take the advantage YouTube seems to offer. There’s no reason we should expect YouTube’s numbers to decrease, and every reason to expect YouTube’s dominance to grow, at least without a paradigm-shifting disruption to the industry.
Many thanks to Eric H. and Michael G. on our Vancouver team for sharing their knowledge about the data set and how to interpret it, and to Eric and Rob L. for trusting me with Athena access to a treasure trove of data.
The bulk of the data for this study was collected in early October 2020 from a set of just over two million Google.com, US-based, desktop search results. After minor de-duplication and clean-up, this data set yielded 258K searches with video carousels on page one. These carousels accounted for 2.1 million total video results/URLs and 767K visible results (Google displays up to three per carousel, without scrolling).
The how-to analysis was based on a smaller data set of 45K keywords that explicitly began with the words “how to”. Neither data set is a randomly selected sample and may be biased toward certain industries or verticals.
The follow-up 10K data set was constructed specifically as a research data set and is evenly distributed across 20 major industry categories in Google Ads. This data set was specifically designed to represent a wide range of competitive terms.
Why don’t we use true random sampling? Outside of the textbook, a truly random sample is rarely achieved, but theoretically possible. Selecting a random sample of adults in The United States, for example, is incredibly difficult (as soon as you pick up the phone or send out an email, you’ve introduced bias), but at least we know that, at any particular moment, the population of adults in the United States is a finite set of individual people.
The same isn’t true of Google searches. Searches are not a finite set, but a cloud of words being conjured out of the void by searchers every millisecond. According to Google themselves: “There are trillions of searches on Google every year. In fact, 15 percent of searches we see every day are new.” The population of searches is not only in the trillions, but changing every minute.
Ultimately, we rely on large data sets, where possible, try to understand the flaws in any given data set, and replicate our work across multiple data sets. This study was replicated against two very different data sets, as well as a third set created by a thematic slice of the first set, and validated against multiple dates in 2020.