This is an update to our previous article on the SEO Impact of Losing Search Query Data back in late October.
Now that more time has passed and Google's rolled out secure sessions (with suppressed query data) widely for its logged in users, we can assess the damage with more accuracy. Here's where we are today, with some fresh solutions and ideas for countering the lost data.
First of all, don't panic! Let me reassure those of you who are legitimately concerned about this: it's not a severe problem, and we can work through it. Yes, it's not at all ideal, and it does make our work harder. But it is not impossible to work around.
There have been frequent reports that (not provided) data is higher than previously reported, and while we're seeing the same trend, there aren't any big shocks. Google's claim that only about 10% of total organic traffic would be impacted is largely holding true, with some outliers.
Here's the data from our most recent analysis.
- Avg. SEO traffic percentage: 7.68% (previously reported at 0.76%)
- Avg. SEO revenue position: 1.85 (previously reported at position 8.2)
What Are We Doing About It?
More importantly than the trend, which I don't think is that surprising, is the answer to this question: what are we going to do about it?
There have been some interesting discussions of late trying to answer that question. Avinash threw out a few ideas for capturing (not provided) data in an excellent piece. One of the key takeaways from his approach, is to make assumptions based on known branded and non-branded query sets, and apply this distribution to the (not provided) segment. A similar (but slightly different) approach was proposed that involves clustering keywords into groups and making assumptions about (not provided) using those clusters.
3 Solutions for Recapturing Lost SEO Queries
We've been mulling over some ideas here, too, and have come up with the following possible solutions. We're still researching the most effective "solution" (it's not really a solution at all, rather it's a "hint") to get at this lost data. So far, here's what we've got:
- Landing pages for the (not provided) segment, and for queries that are retained, tend to line up. In other words, we don't see (not provided) driving predominantly to unique, or outlier, landing pages. No real surprise there. However, since that is the case, then making assumptions about the distribution of queries for said landing pages becomes easier. We can, for example, project the query distribution of our known landing page URLs onto the (not provided) landing page URLs to re-create the (admittedly pseudo) data.The point raised, is that this won't account for outlier terms - notably long tail varieties - that are infrequent, hard to predict, and often only have one occurrence. That's the cost.
- Paid search can be a fruitful area to explore. For campaigns that are driving both SEO and PPC traffic to the same set of URLs, managers can reports to pull these URLs and line them up together. Then, using a search query report, the actual query a user entered which fired a PPC ad can be found for a URL. These can be "back filled" to re-create data for the lost (not provided) segment on the same URL. Surely it's not apples to apples, but it's something.
- Probably the most elegant solution for (not provided) is to simply report on the segment with landing page URLs. Look to quality metrics such as time on site, average pageviews, bounce rate, and conversion or revenue numbers, and compare that to known segments. Then, when called upon to do a deeper analysis, run a few of the techniques here to derive pseudo (but approximate) query data for the lost terms.
Some smart folks from our team pointed out problems with the first approach. If you’re making assumptions about the lost (not provided) data by distributing the data you do have, aren’t you then just recreating the same data? For example, if the keyword traffic breaks down as A = 2,000, B = 1,000, C = 750, and D = 250, and (not provided)= 1000, then the distribution of numbers would simply end up as A = 2,500, B = 1,250, C = 938, and D = 312. We end up with the same set of numbers, now just bigger.
AJ Kohn has partially accounted for that problem. In his previously cited piece, he writes:
"If you want to take things a step further you can apply the distribution of the clustered keywords against the pool of (not provided) traffic. First you reduce the denominator by subtracting the (not provided) traffic from the total. In this instance that’s 208 – 88 which is 120.
"Even without any clustering you can take the first keyword (bounce rate vs. exit rate) and determine that it comprises 20% of the remaining traffic (24/120). You can then apply that 20% to the (not provided) traffic (88) and conclude that approximately 18 visits to (not provided) are comprised of that specific keyword."
What About Google Webmaster Console?
Finally, a word about Google Webmaster Tools. Others have discussed (and Google themselves have stated) that Search Query reports run here can be a useful replacement for (not provided). And it should be noted that this data is indeed an actual replacement for the lost data: Google passes query information for its logged in users to Adwords and Webmaster Tools. That's the good news.
The unfortunate bad news is that Search Query reports from GWT are famously unreliable. They are limited to a rolling 30 day window, so unless you're manually exporting the reports each month (no API is offered), you're only seeing a snapshot of data. Secondly, the data itself is limited to the top 1,000 terms. Finally, and probably most importantly, it's not a replacement for analytics. As Matt Cutts has correctly stated,
"Please don't make the argument that the data in our webmaster console is equivalent to the data that websites can currently find in their server logs, because that's not the case."
It's nice to have some data available in GWT, but it's not a solution.
What about you? What are you doing about this problem?
UPDATE: Ben Goodsell of RKG has a recent piece on SearchEngineWatch covering this topic: 5 Stages of Coping With Lost Search Query Data. Entertaining and spot on!