Is the Vary: User-Agent HTTP Header Broken?
Mobile SEO practices got a little easier last summer when Google’s Pierre Farr announced definitive guidelines for mobile SEO at SMX West in Seattle. The outline was easy to follow, allowed for multiple scenarios and explained the search engine’s blue-sky proposal with Responsive Design.
Like most SEO’s, we took this announcement, incorporated the content into our best practices, and proceeded to implement with clients.
One of the recommendations includes using the “Vary: User-Agent” HTTP header to signal search engines to the presence of a mobile site. This falls under a subsection of the recommendation for dynamically serving different HTML on the same URL:
According to those recommendations, the Vary HTTP header has two important and useful implications:
- It signals to caching servers used in ISPs and elsewhere that they should consider the user-agent when deciding whether to serve the page from cache or not. Without the Vary HTTP header, a cache may mistakenly serve mobile users the cache of the desktop HTML page, or vice versa.
- It helps Googlebot discover your mobile-optimized content faster, as a valid Vary HTTP header is one of the signals we [Google] may use to crawl URLs that serve mobile-optimized content.
Tech teams often ask us why they should implement the header, which is fairly easy to do, and the explanations above are very well received. That was until one client implemented the header and saw a very large hit to its web server’s traffic and resources. To our knowledge, this was not supposed to happen, so it was evident we needed to investigate the issue further.
It’s worth noting that the client uses Akamai as their CDN. This is not uncommon – we have many clients who leverage this geographically dispersed platform to off-load resources and decrease load times.
When our client talked with Akamai, they learned that the massive traffic increase to their website was due to the implemented Vary header.
When the upstream providers (in this case, Akamai) aren’t able to cache, they have to keep asking the web server for documents and assets. As a result, the CDN sent more traffic directly to the client’s web servers.
This is explained further in Akamai’s documentation:
The HTTP Vary header is used by servers to indicate that the object being served will vary (the content will be different) based on some attribute of the incoming request, such as the requesting client’s specified user-agent or language. The Akamai servers cannot cache different versions of the content based on the values of the Vary header. As a result, objects received with a Vary header that contains any value(s) other than “Accept-Encoding” will not be cached. To do so might result in some users receiving the incorrect version of the content (wrong language, etc.)
What exactly is the real issue? Why is this signal a problem?
“Vary: User-Agent is broken for the Internet in general. …the basic problem is that the user-agents vary so wildly that they are almost unique for every individual (not quite that bad but IE made it a mess by including the version numbers of .Net that are installed on users machines as part of the string). If you Vary on User-Agent then intermediate caches will pretty much end up never caching resources (like Akamai).”
Meenan’s explanation complicates this issue further for clients that use CDN’s with the Google Mobile recommendation. Ultimately, the header status will stop being cached entirely because of all the user-agent string varieties. Additionally, IE exacerbates the problem by including the .Net version in the user-agent string installed on the requesting computer.
We posted our questions to the “page-speed-discuss” forum in Google Groups, and the responses we read measured up with this narrative:
Bryan McQuade – Page Speed Developer at Google (according to his Twitter profile):
“Many HTTP caches decide that Vary: User-Agent is effectively Vary: * since the number of user-agents in the wild is so large. By asking to Vary on User-Agent you are asking your CDN to store many copies of your resource which is not very efficient for them, hence their turning off caching in this case.”
For now, we’re hunting for a work-around solution to this problem, with help from Google, and we will update this blog post when more possible solutions become available.