Oct 182012

Google's (not provided) One Year Later

It's been a year since Google announced secure search, and while we've covered a few creative methods to somewhat account for (not provided) data, a sound solution still has not surfaced in the search marketing community. For more on the background and details please read Danny Sullivan's Google Put A Price On Privacy, which is a strong article on this topic.

RKG (not provided) share

Now that Google secure search is established, we've seen hidden queries (represented by (not provided) and other labels in various analytics platforms) continue to rise. Currently the segment makes up about 25% of all organic search traffic for RKG clients. There's no sign of the increase slowing, either. Major factors propelling the rise in (not provided) include the following:

  • The increased adoption of Google Plus, now above 400 million registered users. Growth of the social channel represents one of Google's foremost goals internally. Since new Google Plus accounts mean more signed-in users performing searches, this is probably single-handedly the largest factor in the rise of the (not provided) segment.
  • Firefox defaulting to secure search in the latest version of the browser.

It's a shame Google doesn't pass referral data between secure SSL servers, per guidelines for the classic SSL to SSL standard. That change would enable sites to optionally deploy a secure version of their sites by default, keeping data private while retaining valuable query data. Since Google isn't supporting this, we're left with very few solutions to source the lost referral data.

Methods to Recapture Lost (not provided) Data

GA landing pages

Landing pages for (not provided) segment

The first and best option is to make use of Google Web Tools search query data. The good news here is that none of the referral data in GWT shows up as (not provided). The bad news, however, is that this is only a sample of your entire query landscape. It is not definitive nor comprehensive. A second downside is that this data rotates every month, and the API does not make it available, meaning you'll have to either manually export it each month, or get more creative if you desire trending.

Because of these limitations, resourceful marketers need to look for other techniques.

One of the most obvious methods to replace lost (not provided) queries is by leveraging scraped data. Please note that RKG does not condone the practice of scraping Google's SERPs. The practice is in violation of Google's Terms of Service. This information is provided only as a guide to those who use third-party scraped data, or are willing to violate Google's TOS by scraping.

The technique for reclaiming a portion of the lost data is described below. For the purposes of this description we're using the test site ShoeDigest.com, and therefore don't have a problem sharing data transparently:

SEMRush data for (not provided)

Using SEMRush data for (not provided)

  1. In your analytics platform, segment organic search traffic and add the URL or landing page dimension. See all that (not provided) in there? At least you have the URLs.
  2. Go to your competitive insight tool of choice, for example SEMRush, and generate an organic search report for your domain. Export that into Excel (or CSV). Next, sort the data by landing page.
  3. Now, pair the SEMRush landing page URLs with the (not provided) URLs from your analytics. What you'll end up with is some of the recovered data, sorted by landing page.

Note: you won't be able to get clicks, only the query data. To take this process a step further and estimate clicks, follow Ben Goodsell's advice outlined in his SearchEngineWatch column.

Limitations Using Third-party Competitive Tools

There are several limitations with this approach.

reclaiming not provided data

The recovered (not provided) queries

  • First, any scraped data is necessarily not comprehensive. It can really only be used directionally, no matter how "complete" the data is claimed to be. That's because a scraper is reliant upon the source it's attempting to scrape, therefore it can never be definitive.
  • Secondly, the data will only be as fresh as the last time it was captured. Using any third-party data means being dependent upon their data collection schedule and timelines.
  • The final limitation, and probably the most glaring, is that anyone has access to the same data you do. If the only requirement to access scraped data is to sign up for an account, there is nothing stopping your competitors from leveraging the exact same data set you have access to.

Even with those limitations, if you're serious about reclaiming some lost query data, this is currently a viable option.

Secure Search and the Future of SEO

There is no doubt that loss of referral data makes SEO harder. However, there's a counter to that. While Google is harder to game using pre-Panda tactics like heavy internal and external anchor text, and thin, overlapping keyword-themed pages, the level of entry and competition is now higher. As I've written before, the combination of Panda, Penguin, and secure search have created a flight to quality in SEO that has never been seen before. The best online marketers don't see limitations in this new era: they see opportunities.

The bar has risen and it's up to the smartest and most diligent of search marketers to understand this and execute on strategies and tactics that not only work, but are sustainable and (as much as possible) 'future proof' to algorithms we haven't yet seen. The increase of (not provided) creates an economy where the best services and technologies will only increase in demand.

Google's secure search also creates an opportunity for Bing, and even Blekko or DuckDuckGo to differentiate by offering all the data SEOs want. And for SEOs, never has it been more important to leverage referral data from Bing.

I look forward to your thoughts and comments! What are you doing to deal with this problem?

Update: I failed to mention iOS 6 changing to Google secure search. This is big and adds to the challenges ahead!


7 Responses to "Google's (not provided) One Year Later"
Hi Adam, I also read the article on searchengineland.com and also I think your solution get's much closer to what we all want. Of course it is still some custom work, but it isn't that much and you get a lot more insights. I will keep on searching for a even closer solution. This will help a lot to get in the right direction for everyone.
Thanks for sharing additional ways to identify no provided data. We recently conducted a study and found that Google “not provided” now accounts for almost 40 percent of referring traffic data from organic search, an increase of 171 percent since originally introduced a year ago. Here is the link if you are interested in reading the full! http://www.optify.net/press-releases/optify-study-googles-not-provided-rises-to-almost-40-of-organic-traffic-for-b2b-sites Jennifer
What about doing the following calculation: You have the Overall Google organic search visits number let it be OGSV. Then you have the visits number per keyword let it be VPK Usually now you will find the top keyword (or one of the top) to be “not provided” Let it be NPKV If you subtract the not provided visits from all the Google organic visits (OGSV – NPKV) you will get the number of provided visits, let it be PKV. Now assuming the ratio of visits for a given keyword (one that has more than a few visits) will be the same within the not provided data to that of the provided data, why would it not be? Then you can calculate the ratio of visits per keyword from the dividing its provided visits number by the overall provided visits number (PKV). The result is a fracture representing the % of visits a keyword had. Now to receive a close estimate to the real number of visits a keyword had you multiply the above fracture by the total number of not provided visits (NPKV) and you get the number of visits the keyword had among the “not provided” keywords (it usually won’t be a whole number as your using ratios here) . This number then is added to the provided recorded visits for the keyword to give a very close estimate of the real number visits a keyword has. This calculation is to be done per dataset for a given chosen period of time as number are relative to each other only not to the site itself. So for example a site has 3000 over all Google organic visits per a given month out of them 1000 are “not provided” meaning that 2000 are provided. The keyword “test” shows 50 visits in the analytics data from Google organic searches. This means that 50/2000= 0.025 of the provided visits were from this keyword. Now in order to calculate how many of the not provided visits were from this keyword we multiply 0.025 by the number of not provided visits (1000) which will be 0.025*1000= 25. We add this number to the number of provided visits for the keyword (50) and we get the estimated number of visits after controlling for the not provided figure in that data set, in this case 50 +25 = 75. Does this method make sense to anybody? It does to me.


Check out what others are saying...
[...] Google’s (not provided) One Year Later, Rimm Kaufman [...]
[...] users.  While this has made the job of an SEO more difficult, there are some workarounds to recapture the lost query data, and the (not provided) visits are still properly attributed to the organic [...]
[...] Adam Audette/RKG: Google’s (not provided) One Year Later [...]

Leave A Comment