THE RKGBLOG

3 Proposed Solutions for Regaining Lost SEO Query Data

This is an update to our previous article on the SEO Impact of Losing Search Query Data back in late October.

Now that more time has passed and Google’s rolled out secure sessions (with suppressed query data) widely for its logged in users, we can assess the damage with more accuracy. Here’s where we are today, with some fresh solutions and ideas for countering the lost data.

Patience Grasshopper

First of all, don’t panic! Let me reassure those of you who are legitimately concerned about this: it’s not a severe problem, and we can work through it. Yes, it’s not at all ideal, and it does make our work harder. But it is not impossible to work around.

There have been frequent reports that (not provided) data is higher than previously reported, and while we’re seeing the same trend, there aren’t any big shocks. Google’s claim that only about 10% of total organic traffic would be impacted is largely holding true, with some outliers.

The (not provided) segment risesSadly, a familiar sight; (not provided) rises as a segment.

Here’s the data from our most recent analysis.

  • Avg. SEO traffic percentage: 7.68% (previously reported at 0.76%)
  • Avg. SEO revenue position: 1.85 (previously reported at position 8.2)

What Are We Doing About It?

More importantly than the trend, which I don’t think is that surprising, is the answer to this question: what are we going to do about it?

There have been some interesting discussions of late trying to answer that question. Avinash threw out a few ideas for capturing (not provided) data in an excellent piece. One of the key takeaways from his approach, is to make assumptions based on known branded and non-branded query sets, and apply this distribution to the (not provided) segment. A similar (but slightly different) approach was proposed that involves clustering keywords into groups and making assumptions about (not provided) using those clusters.

3 Solutions for Recapturing Lost SEO Queries

We’ve been mulling over some ideas here, too, and have come up with the following possible solutions. We’re still researching the most effective “solution” (it’s not really a solution at all, rather it’s a “hint”) to get at this lost data. So far, here’s what we’ve got:

  1. Landing pages for the (not provided) segment, and for queries that are retained, tend to line up. In other words, we don’t see (not provided) driving predominantly to unique, or outlier, landing pages. No real surprise there. However, since that is the case, then making assumptions about the distribution of queries for said landing pages becomes easier. We can, for example, project the query distribution of our known landing page URLs onto the (not provided) landing page URLs to re-create the (admittedly pseudo) data.The point raised, is that this won’t account for outlier terms – notably long tail varieties – that are infrequent, hard to predict, and often only have one occurrence. That’s the cost.
  2. Paid search can be a fruitful area to explore. For campaigns that are driving both SEO and PPC traffic to the same set of URLs, managers can reports to pull these URLs and line them up together. Then, using a search query report, the actual query a user entered which fired a PPC ad can be found for a URL. These can be “back filled” to re-create data for the lost (not provided) segment on the same URL. Surely it’s not apples to apples, but it’s something.
  3. Probably the most elegant solution for (not provided) is to simply report on the segment with landing page URLs. Look to quality metrics such as time on site, average pageviews, bounce rate, and conversion or revenue numbers, and compare that to known segments. Then, when called upon to do a deeper analysis, run a few of the techniques here to derive pseudo (but approximate) query data for the lost terms.

Some smart folks from our team pointed out problems with the first approach. If you’re making assumptions about the lost (not provided) data by distributing the data you do have, aren’t you then just recreating the same data? For example, if the keyword traffic breaks down as A = 2,000, B = 1,000, C = 750, and D = 250, and (not provided)= 1000, then the distribution of numbers would simply end up as A = 2,500, B = 1,250, C = 938, and D = 312. We end up with the same set of numbers, now just bigger.

AJ Kohn has partially accounted for that problem. In his previously cited piece, he writes:

“If you want to take things a step further you can apply the distribution of the clustered keywords against the pool of (not provided) traffic. First you reduce the denominator by subtracting the (not provided) traffic from the total. In this instance that’s 208 – 88 which is 120.

“Even without any clustering you can take the first keyword (bounce rate vs. exit rate) and determine that it comprises 20% of the remaining traffic (24/120). You can then apply that 20% to the (not provided) traffic (88) and conclude that approximately 18 visits to (not provided) are comprised of that specific keyword.”

What About Google Webmaster Console?

Finally, a word about Google Webmaster Tools. Others have discussed (and Google themselves have stated) that Search Query reports run here can be a useful replacement for (not provided). And it should be noted that this data is indeed an actual replacement for the lost data: Google passes query information for its logged in users to Adwords and Webmaster Tools. That’s the good news.

Google webmaster tools data: only a snapshotJust a sample: GWT data is not a replacement for analytics.

The unfortunate bad news is that Search Query reports from GWT are famously unreliable. They are limited to a rolling 30 day window, so unless you’re manually exporting the reports each month (no API is offered), you’re only seeing a snapshot of data. Secondly, the data itself is limited to the top 1,000 terms. Finally, and probably most importantly, it’s not a replacement for analytics. As Matt Cutts has correctly stated,

“Please don’t make the argument that the data in our webmaster console is equivalent to the data that websites can currently find in their server logs, because that’s not the case.”

It’s nice to have some data available in GWT, but it’s not a solution.

What about you? What are you doing about this problem?

Special thanks go out to Jamey Barlow, Jody O’Donnell, Cara Pettersen, and the rest of the RKG SEO team for help writing this post!

UPDATE: Ben Goodsell of RKG has a recent piece on SearchEngineWatch covering this topic: 5 Stages of Coping With Lost Search Query Data. Entertaining and spot on!

 

Technorati Tags: ,

  • Adam Audette
    Adam Audette is the Chief Knowledge Officer of RKG.
  • Comments
    14 Responses to “3 Proposed Solutions for Regaining Lost SEO Query Data”
    1. Hugo Guzman says:

      Good stuff here, Adam! Thanks for the tips. I particularly like the second suggestion. Haven’t tried that one yet.

    2. The quote from Matt is more than a year old and a bit outdated. While the Webmaster Tools data is not complete it’s better than it used to be, as is reported in this post (Cf. http://googlewebmastercentral.blogspot.com/2011/10/accessing-search-query-data-for-your.html ) on the Google Webmaster Central blog from October 2011.

    3. Adam Audette Adam Audette says:

      Thanks Hugo! Glad you liked #2. Let me know what you find.

    4. Anna says:

      I’ve set up a filter in GA that reports the page where the (not provided) traffic lands. Now instead of just (not provided) I can see something like “not provided – example.com/page-about-bmw”. This way you can at least see what kind of keywords drive traffic (given that the pages are optimized for certain keywords). Not ideal for homepage though, since both brand and non-brand searches land there. But still better than nothing!

      I am glad smart people, like yourself, start coming up with solutions for (not provided) problem! :-) Thanks for sharing!

    5. Adam Audette Adam Audette says:

      Michael – I tend to like to beat up Webmaster Tools. :) I shouldn’t – it’s actually a great tool – and I agree getting better. Thanks for the heads up on that.

      Anna – I like it. Simplicity in action!

    6. Hi Adam

      Nice tips! – I do most like the second tip which I will try right a way. :)

    7. Ani Lopez says:

      More information can be exported from the integrated of WMT in Google Analytics than the one you get at WMT
      2 months instead of just 1, then you can match that it with Google Analytics keywords data in excel to go further
      http://dynamical.biz/blog/web-analytics/matching-webmastertools-analytics-keywords-data-40.html

    8. Jeff Bronson says:

      Thank you for this tip. I will need to apply to a test Analytics account first and assess behavior. Just as I expected, this only apply’s to organic data, not paid.

    Trackbacks
    Check out what others are saying...
    1. [...] 3 Proposed Solutions for Regaining Lost SEO Query Data, Rimm Kaufman [...]

    2. [...] 3 Proposed Solutions for Regaining Lost SEO Query Data, Rimm Kaufman [...]

    3. [...] has some half-hearted suggestions for getting back recently-encrypted Google referrer [...]

    4. [...] 3 proposed solutions for regaining lost seo query data. RKG [...]

    5. [...] about the changes to Google referral data for Google Account users. But don’t fret, The Rimm-Kaufman Group and Econsultancy have some ideas how you can “steal” some of that data back.While [...]

    6. [...] been a year since Google announced secure search, and while we’ve covered a few creative methods to somewhat account for (not provided) data, a sound solution still has not [...]