The Traffic Prediction Accuracy of 12 Metrics from Compete, Alexa, SimilarWeb, & More

Services that claim to predict the traffic websites receive have been around for 15+ years, but I’ve long been skeptical of their accuracy (having seen how poorly some have predicted traffic on sites whose analytics I accessed). In 2012, I ran a project to test the veracity of the numbers reported by some of these tools, and came away very unimpressed. Mark Collier (of the Open Algorithm Project) expanded on these tests in 2013 and discovered similarly disappointing results. In recent years, it would seem, services like Alexa, Compete, Quantcast, and others simply don’t do a consistently good job of estimating a website’s traffic.

What about now? Some new services, like Similar Web and SEMRush, have been lauded by practitioners in the marketing community for having much better data. Compete, supposedly, has improved its own service too. I figure it’s time to re-run the data and see how traffic prediction scores perform in 2015.

imec-traffic-chart

A few months back, I asked for volunteers through the IMEC Labs project to share their web traffic data in order to see how accurate the predictive traffic scores from various vendors stacked up. I received 4 months of traffic data (unique visits in Dec 2014-March 2015) from 143 websites, ranging from a few hundred visits per month all the way up to 25 million+ monthly visits, then added columns with data from. A snippet of the data is in the spreadsheet above (click it for a large version).

Unfortunately, I can’t share the sites themselves, as we promised privacy to the survey participants. However, I’ve got some fascinating results from the aggregate data. Let’s look at the metrics we measured, then we’ll get into my thoughts and opinions on each:

12 Metrics for Predicting Web Traffic (in order of data quality/accuracy*)

  1. SimilarWebmonthly visits
    • Avg. of Metric/Actual Traffic: 406.37%
    • % of Data within 70-130% of Actual Traffic: 22.00%
    • Spearman’s Correlation w/ Actual Traffic: 0.827
    • Standard Error: 0.0504
    • Data coverage: 87.41%
  2. Compete.comtotal monthly visits
    • Avg. of Metric/Actual Traffic: 115.52%
    • % of Data within 70-130% of Actual Traffic: 8.52%
    • Spearman’s Correlation w/ Actual Traffic: 0.843
    • Standard Error: 0.0837
    • Data coverage: 31.47%
  3. SEMRushmonthly search visits
    • Avg of this Metric divided by Actual Traffic: 28.84%
    • % of Data within 70-130% of Actual Traffic: 1.21%
    • Spearman’s Correlation w/ Actual Traffic: 0.696
    • Standard Error: 0.0668
    • Data coverage: 88.11%
  4. Quantcastmonthly visits
    • Avg. of Metric/Actual Traffic: 53.41%
    • % of Data within 70-130% of Actual Traffic: 3.75%
    • Spearman’s Correlation w/ Actual Traffic: 0.906
    • Standard Error: 0.097
    • Data coverage: 13.99%
  5. SimilarWebglobal rank
    • Spearman’s Correlation w/ Actual Traffic: 0.839
    • Standard Error: 0.0435
    • Data coverage: 97.2%
  6. AlexaAlexa Rank
    • Spearman’s Correlation w/ Actual Traffic: 0.702
    • Standard Error: 0.0607
    • Data coverage: 100%
  7. MozDomain Authority
    • Spearman’s Correlation w/ Actual Traffic: 0.702
    • Standard Error: 0.0601
    • Data coverage: 100%
  8. Facebook – Likes
    • Spearman’s Correlation w/ Actual Traffic: 0.677
    • Standard Error: 0.0667
    • Data coverage: 83.22%
  9. Google AdWords – # of monthly searches for brand name
    • Spearman’s Correlation w/ Actual Traffic: 0.673
    • Standard Error: 0.068
    • Data coverage: 84.62%
  10. MozLinking Root Domains
    • Spearman’s Correlation w/ Actual Traffic: 0.594
    • Standard Error: 0.0687
    • Data coverage: 100%
  11. Twitter – Followers
    • Spearman’s Correlation w/ Actual Traffic: 0.529
    • Standard Error: 0.0777
    • Data coverage: 85.31%
  12. SEMRushAds
    • Spearman’s Correlation w/ Actual Traffic: 0.355
    • Standard Error: 0.1561
    • Data Coverage: 88.81%

* Since the sample sizes for the data are small, and we’re dealing with several unique metrics and ways of measuring accuracy, this ordering is Rand’s personal creation based on beliefs about the relative value of the accuracy, coverage, and distribution of the metrics collected.


The metrics noted here are important to understand. Compete, Quantcast, SimilarWeb’s Monthly Visits, and SEMRush’s Monthly Search Visits can all be directly compared against the actual traffic, which is why they’ve got the additional numerical measures. The rest can only be compared to the traffic numbers using correlations (e.g. for Moz’s Domain Authority or quantity of Twitter followers, there’s no way to calculate the % of data within 75-125% accuracy because the metrics aren’t trying to show similar kinds of data).

So, what does correlation coefficient (the one metric we can universally compare) mean in this context? It’s not as simple as the percentage of time Quantcast or Compete or Facebook Likes will be entirely accurate about a website’s traffic. Instead, the correlation number is telling us the relative degree to which, as traffic rises/falls, a corresponding rise/fall in traffic will be seen in the numbers from that given metric. For example, Compete might say that your site’s traffic is 2X as high as what your own analytics shows, but if you lose half your traffic, Compete’s traffic metric is much more likely than Twitter followers is to indicate a corresponding drop. Likewise, if you’re comparing site A vs. site B, Compete is more likely (on average – obviously individual results can and do vary) to be accurate in showing you whether A or B’s traffic is higher than Twitter followers is.

Here’s a breakdown of the ways in which we looked at relative accuracy:

  • Avg. of Metric/Actual Traffic – this number tells us, on average, where estimated traffic sits in comparison to actual traffic. By this number, SimilarWeb looks very optimistic, reporting 400%+ of the actual traffic of the sites in our sample. Compete, on the other hand, looks much better, at ~115%. But averages aren’t everything, and, in fact, while Compete was closer on average, it’s more because their highs and lows balanced out than because they were frequently close to reality.
  • % of Data within 70-130% of Actual Traffic – this tell us what percent of the estimated traffic numbers reported by a service were within 70-130% of the real numbers. This metric is where SimilarWeb performed best, getting a near-accurate take on web traffic for nearly 1/4 of the sites in our sample. The other services couldn’t come close. Compete at 8.5%, Quantcast at 3.75%, and SEMRush at 1.21% simply aren’t usable, IMO. While SimilarWeb was often way over on traffic estimates, their strength in getting into this range so much better than their competitors makes them my top pick for now.
  • Spearman’s Correlation with Actual Traffic – Spearman’s helps us know the degree to which, as actual traffic rises/falls how well does the corresponding metric get higher/lower. Quantcast, Compete, and SimilarWeb are all pretty good at this, with correlations above 0.8.
  • Standard Error – This is standard error on the correlation numbers, which helps to give a sense of variance.
  • Data Coverage – Coverage is critical because it shows how frequently the services we looked at actually had real data for the sites in the sample. Moz, Alexa, SimilarWeb, and SEMRush all had data on 85%+ of the sites, while Compete and Quantcast struggled at <35%.

Based on this data (which is far from perfect – a sample of sites 5-10X would be far preferable, though the statistical variances suggest many of these numbers would likely hold), I wouldn’t feel confident using any of these numbers to predict actual traffic, but I would recommend using a combination of SimilarWeb, Compete, and (when available) Quantcast to get a rough sense for relative traffic between sites. For example, if you’re trying to determine how much traffic CompetitorX.com receives, SimilarWeb will give you a bit worse than a 1 in 4 shot of being within 70-130% of that number. But, if you want to know whether CompetitorX.com gets more or less traffic than CompetitorY.com and CompetitorZ.com, looking at Compete and SimilarWeb (and Quantcast if available) can probably give you a good sense of the relative web audience sizes they reach.

Overall, I’m not hugely impressed with any of these services. They’ve all got a long ways to go, and sadly haven’t improved as dramatically as I’d hoped from 3 years ago when we first ran this comparison. In particular, I’m baffled by the number of otherwise savvy marketers, investors, and business people who continue to rely on Alexa data. Moz’s Domain Authority, which isn’t even trying to measure web traffic, has a similar correlation, and Alexa clearly hasn’t been keeping up with the competition.

One final note – including SEMRush isn’t entirely fair in this dataset because they specifically track search visits, not all traffic, and thus we’re not showing their data against the reality they’re trying to illustrate. However, since we were curious about SEMRush’s potential performance, and wanted to see how it performed against metrics like Moz’s Domain Authority and Facebook/Twitter numbers, it’s here for those interested.