Oct 122012

Dealing with the Flood of Site Search Pages in Travel

man in flood with kegHow should travel sites such as hotels and airlines deal with the flood of site search pages generated when users either search for dates and locations, or do a general site search?  While drowning your troubles like this guy might be tempting, it’s not so helpful.  If you work in SEO for the travel industry, this is a question you’ve probably dealt with, and according to my research across a number of leading brands; there are a variety of ways these pages are being managed.

With Google’s recent tendency to return site search pages in the SERPs, it’s a topic worth serious consideration.  For the travel industry, the issue I see with Google returning site search pages generated from date/location searches in the SERPs, is that rates and availability change so quickly that an indexed search page that's even 1 hour old could be useless because the flight or room category is now sold out, and/or the price has changed due to revenue management rules.

For my research I simply used the advanced search operators ‘site:’, ‘intitle:’ and ‘inurl:’, so the numbers are pretty rough, but close enough to make my point.

Hotel examples:
•    Marriott has around 3,890,000 URLs indexed containing ‘search’, which includes a variety of types of search pages.  Consider that the total number of pages indexed for this domain is about 5 million.  So nearly 80% of indexed pages for this company are site search pages.

•    IHG disallows URLs containing ‘/*/searchresult*’ or ‘/*/hotelsearchresult*’ in their robots.txt file, so they have no direct date/location search pages indexed (not including suppressed listings).  However, they do have around 500 pages indexed containing ‘search’ that appear to be generated when multiple hotels match a search.

•    Ritz Carlton has around 2,000 pages indexed with the URL containing ‘Reservations/Default.htm?ci=’, which is generated after searching for availability at a certain location.

Airline examples:
•    United.com: approximately 15,000 URLs indexed containing ‘search’, out of nearly 2.1 million total URLs indexed.  Obviously a very low percentage.  These are a mix of general site search, car search, cruise search, etc.

•    Delta.com: uses a single URL while it searches for results, then dynamically loads those search results, appending the URL with ‘#top’ to return results, entirely avoiding the creation of new URLs.  However, when looking for indexed pages containing ‘search’ in the title, Google returns about 6,500 results, many of which come from the Delta blog, SkyMiles Marketplace and various other international subdomains.

•    American Airlines: about 18,300 URLs containing ‘search’ indexed, out of a total 138,000 pages, or roughly 13%.  These are generated from general site search as opposed to flight/date searches.  It could be they’re using parameter handling in Google Webmaster Tools, or a form of JavaScript, or even both, to avoid indexation of flight/date search pages.

As you can see, approaches and results vary widely.

If our goals are to streamline the crawl experience for Googlebot and other search engine spiders, and to present well optimized, highly converting pages to users via the SERPs, we can do better than to leave all search result pages open to indexation.  The question is, how?

Primary Strategies for Handling Search Pages

Here I’ll identify the primary methods of dealing with these pages:

•    robots.txt as sledgehammerDisallow in robots.txt: Also known as the ‘sledgehammer’ approach, it might be a little more brute force than necessary.  This method uses disallow statements to block all pages containing a certain phrase from being crawled and indexed.  While effective in certain situations, the main issue in this case is you could be missing out on valuable traffic and revenue.

•    Allow all to be indexed: This method results in a much higher number of total pages indexed than necessary, meaning the index is bloated with undesired, poorly optimized pages that don’t convert well if at all.

•    Exclude via parameter handling in Google Webmaster Tools: This can be an effective way to handle dynamically generated pages, but it also has certain limitations.  Accounting for all of the various permutations of URLs and parameters can be difficult and unwieldy, creating a nightmare to effectively manage.  Even if a site has a clean URL structure, you’re potentially missing out on valuable traffic, which could be optimized for.

•    Research and Optimize: We’ve learned from ecommerce that some search result pages can generate a good amount of traffic and conversions, so this option calls for us to research high performing search results pages, then create static, well optimized pages that draw traffic and convert well.  Unwanted search result pages should be blocked from the index using a smart combination of meta robots ‘noindex, follow’ tags, and depending on the situation, carefully crafted disallow statements in the robots.txt file.

The last approach should result in well-optimized pages, which draw traffic in the SERPs, convert well and ultimately drive more revenue by providing users with the information they want.

optimized search pages venn diagram

Hopefully this has been helpful in thinking about how to handle site search pages.  For more insight into SEO and PPC for travel, keep an eye out for upcoming blog posts from Adam Reitelbach outlining essential SEO and PPC tips for travel.

And if you’re attending the PhoCusWright conference in Scottsdale in November, be sure to come by and say hello.


4 Responses to "Dealing with the Flood of Site Search Pages in Travel "
Paul, I don't think robots.txt is an appropriate solution to deal with this particular problem these days. Websites end up in a situation where they have thousands of URL references indexed, which show no snippet in the search results because Google can't crawl the page. Having tens of thousands or millions of otherwise useless, duplicate URLs indexed might not hurt a global brand like Marriot however it seems like this would be a low quality content signal that the Google Panda algorithm would seek out that a smaller brand/site would fall victim to. I think a better solution is a combination of meta robots noindex, X-Robots-Tag noindex, rel="canonical" and HTTP 301 redirects. In each of those scenarios a website preserves their link equity and at the same time, reduces their total indexed pages - which has the added benefit of improving the odds the right page will be returned for the appropriate query instead of a now invalid 'search' results URL. Cheers, Al.
Hi Al, Thanks for the comment. Good points. I think we need to distinguish between search results pages already indexed, and keeping future pages from being indexed. To remove previously indexed pages, a 301 redirect would probably be the best tool. As you pointed out, it preserves link equity and reduces unwanted pages in the index. Disallow statements in robots.txt wouldn't be a good fit for this. Keeping future pages from becoming indexed would most likely be a job for the meta robots 'noindex, follow' tag. Since there won't be any backlink equity to deal with, this keeps the pages from being indexed while allowing the search engine robots to continue following links to deeper pages. Of course each situation is different and requires a unique approach, and not every company can execute on blue sky recommendations. That's why it's so important to evaluate the individual situation and have a few options available, depending on resources and capabilities. Really appreciate the feedback!
kwan says:
Thank for the article


Check out what others are saying...
[...] Dealing with the Flood of Site Search Pages in Travel, Rimm Kaufman [...]

Leave A Comment