THE RKGBLOG

Is the Vary: User-Agent HTTP Header Broken?

Mobile SEO practices got a little easier last summer when Google’s Pierre Farr announced definitive guidelines for mobile SEO at SMX West in Seattle. The outline was easy to follow, allowed for multiple scenarios and explained the search engine’s blue-sky proposal with Responsive Design.

Like most SEO’s, we took this announcement, incorporated the content into our best practices, and proceeded to implement with clients.

One of the recommendations includes using the “Vary: User-Agent” HTTP header to signal search engines to the presence of a mobile site. This falls under a subsection of the recommendation for dynamically serving different HTML on the same URL:

According to those recommendations, the Vary HTTP header has two important and useful implications:

  1. It signals to caching servers used in ISPs and elsewhere that they should consider the user-agent when deciding whether to serve the page from cache or not. Without the Vary HTTP header, a cache may mistakenly serve mobile users the cache of the desktop HTML page, or vice versa.
  2. It helps Googlebot discover your mobile-optimized content faster, as a valid Vary HTTP header is one of the signals we [Google] may use to crawl URLs that serve mobile-optimized content.

Tech teams often ask us why they should implement the header, which is fairly easy to do, and the explanations above are very well received. That was until one client implemented the header and saw a very large hit to its web server’s traffic and resources. To our knowledge, this was not supposed to happen, so it was evident we needed to investigate the issue further.

It’s worth noting that the client uses Akamai as their CDN. This is not uncommon –  we have many clients who leverage this geographically dispersed platform to off-load resources and decrease load times.

When our client talked with Akamai, they learned that the massive traffic increase to their website was due to the implemented Vary header.

When the upstream providers (in this case, Akamai) aren’t able to cache, they have to keep asking the web server for documents and assets. As a result, the CDN sent more traffic directly to the client’s web servers.

This is explained further in Akamai’s documentation:

The HTTP Vary header is used by servers to indicate that the object being served will vary (the content will be different) based on some attribute of the incoming request, such as the requesting client’s specified user-agent or language. The Akamai servers cannot cache different versions of the content based on the values of the Vary header. As a result, objects received with a Vary header that contains any value(s) other than “Accept-Encoding” will not be cached. To do so might result in some users receiving the incorrect version of the content (wrong language, etc.)

What exactly is the real issue? Why is this signal a problem?

We contacted webpagetest.org and received one possible theory from Patrick Meenan, a Staff Software Engineer at Google:

“Vary: User-Agent is broken for the Internet in general.  …the basic problem is that the user-agents vary so wildly that they are almost unique for every individual (not quite that bad but IE made it a mess by including the version numbers of .Net that are installed on users machines as part of the string). If you Vary on User-Agent then intermediate caches will pretty much end up never caching resources (like Akamai).”

Meenan’s explanation complicates this issue further for clients that use CDN’s with the Google Mobile recommendation. Ultimately, the header status will stop being cached entirely because of all the user-agent string varieties. Additionally, IE exacerbates the problem by including the .Net version in the user-agent string installed on the requesting computer.

We posted our questions to the “page-speed-discuss” forum in Google Groups, and the responses we read measured up with this narrative:

Bryan McQuade – Page Speed Developer at Google (according to his Twitter profile):

“Many HTTP caches decide that Vary: User-Agent is effectively Vary: * since the number of user-agents in the wild is so large. By asking to Vary on User-Agent you are asking your CDN to store many copies of your resource which is not very efficient for them, hence their turning off caching in this case.”

For now, we’re hunting for a work-around solution to this problem, with help from Google, and we will update this blog post when more possible solutions become available.

Comments
21 Responses to “Is the Vary: User-Agent HTTP Header Broken?”
  1. Adam Audette Adam Audette says:

    http://webaim.org/blog/user-agent-string-history/

    “And thus Chrome used WebKit, and pretended to be Safari, and WebKit pretended to be KHTML, and KHTML pretended to be Gecko, and all browsers pretended to be Mozilla, and Chrome called itself Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 Safari/525.13, and the user agent string was a complete mess, and near useless, and everyone pretended to be everyone else, and confusion abounded.”

  2. Jody,
    Thank you very much for this post. We had a project on the log to add “Vary” headers, but I guess it will not happen as we’re using Akamai as well.
    I hope Google can come up with a better solution than this. Wondering if a preference within WMT would work?

    Thanks,
    @svolinsky

  3. TraiaN says:

    I am bit intrigued by the article. If Google’s recommended configuration for smartphone-optimized sites is to use responsive web design, why would you even recommend “Dynamically serving different HTML on the same URL” as best practice?

  4. TraiaN says:

    Just to clarify, with the previous comment what I actually meant to say is, why don’t you recommend responsive web design as a best practice, rather than even going into the troubles of serving different content on the same URL.

  5. Jody O'Donnell Jody O'Donnell says:

    @TraiaN – There are times when the client isn’t ready or willing to do responsive design. E-Commerce, in particular, can be very difficult with responsive design. We try to push for the blue-sky recommendations, but often have to deal with the pragmatic limitations of time, effort and/or money.

    @svolinsky – This would be a great solution for google, but I would rather see the search engines work with a public solution so we SEO’s don’t have to have multiple solutions to make something work.

  6. Mike says:

    So for now, if we’ve implemented all other aspects of Google’s recommended set up (two-way “bidirectional” annotation), would you agree that it’s best to just leave the Vary: User-Agent HTTP header out of the mix?

  7. Jody O'Donnell Jody O'Donnell says:

    @mike – I would do that if you have someone using a CDN. So far, this is only an issue with those types of setups. Specifically, we haven’t tested outside of Akamai. If you are talking about a normal website setup without the CDN, we haven’t seen this same issue.

  8. Mike says:

    Thanks for the response Jody. Good to know.

  9. Jim Robinson says:

    Thanks for sharing this info, Jody. Very helpful. Any recent update on a solution?

  10. Jim says:

    Thanks for that link Adam. I’ll report back if I learn anything new.

  11. Ha!*!*y says:

    What about updating the HTTP Protocol (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44) to have all agents send a device type. If one is not sent than default back to no caching.

    Vary: User-Device

    Device = mobile, desktop or tablet

  12. I wonder how much websites are suffering from this..

    I sent a question and Matt Cutts answered they still recommend to use the HTTP Vary header in theses cases, even if Akamai or other CDNs don’t cache the URLs without it (as he states there are other ways to cache..)

    http://www.youtube.com/watch?v=va6qtaiZRHg

    So we have a real problem here!

  13. Jody O'Donnell Jody O'Donnell says:

    @Christian Oliver – I agree, it is rather a problem. We did have a client who fully implemented it with Akami (prior to us understanding the problem) and their webserver became overloaded from the number of requests coming back from Akami. While I understand what Matt Cutts is trying to say, implementing it with Akami will incur site latency and web server traffic you were hoping to avoid.

  14. @Jody O’Donnell Totally! The same happened in the company I work for

    Anyway, it seems there is a way to solve this with Akamai, basically configuring Akamai to ignore the Vary: User-Agent header (so it continues to cache your webpages) BUT keep it to send it to the clients. Guy Podjarny, from Akamai, explains it here:

    https://groups.google.com/d/msg/page-speed-discuss/kk-_1xslHrE/ravPkRjQYl4J

    I’ll try to get this implemented in my company and see if it works

  15. Andrew says:

    What are people’s thoughts on just creating this Vary: User-Agent header attribute for googlebot only? (IF Googlebot, then add header attribute). I’ve added this header attribute, but only really to help Google. I’m not using Akamai, but am concerned my own server will be overloaded with ISP requests (if I’m understanding the issue correctly). If the header is for Googlebot only, I help Google’s understanding of my site without the extra traffic from other sources.

  16. Jody O'Donnell Jody O'Donnell says:

    The thread Christian is talking about has some very granular detail from Akami that not only gives an explanation about the issue and a possible fix. If anyone gives the solution a whirl, please let us know how it turns out!

    @Andrew – I am not sure the header is only for Google. Google says it uses it as a signal to better understand your mobile configuration, but their are caching servers that also use this information. I do like the idea of using it as a signal to google if you are running into caching problems because of the implementation.

  17. I’m a nit late o commenting this post, but the original problem were on every resource or only on web pages? The Vary header should be set ONLY on text/html mime type, not on every resources. Of course it strictly depends on how the CND is working.

    But if the CND is used only to cache the static resources (js, css, images, …) and not the html adding the Vary header only on that mime type should solve the problem.

Trackbacks
Check out what others are saying...
  1. [...] Is the Vary: User-Agent HTTP Header Broken?, Rimm Kaufman [...]

  2. [...] in order to successfully understand and deploy any of these approaches. And beyond that, it takes technical SEO to understand when a Google best practice could potentially create a sub-optimal user experience for your [...]

  3. [...] in order to successfully understand and deploy any of these approaches. And beyond that, it takes technical SEO to understand when a Google best practice could potentially create a sub-optimal user experience for your [...]