RKG Logo 434-978-4300

I’ve noticed smaller and larger e-tailers tend to run custom in-house e-commerce software, while often mid-sized firms depend on third-party e-comm platforms.

For folks who’ve built their own infrastructure, database scaling is a key strategic concern. Slow databases lead to a slow site, and a slow site suffers from reduced conversion. Indeed, speed is perhaps the most important — but oft-overlooked — component of usability. (See, for example, Google’s Marissa Mayer’s Nov ‘06 Web 2.0 comments on the importance of site speed.)

Often, your database is your bottleneck. Speed up database reads and writes and your site will zoom. Better hardware helps, but fast growing sites soon reach the point where it makes sense to scale out rather than up.

If your in-house IT folks are responsible for your database strategy, I highly recommend this series of fascinating snippets on database approaches put up by O’Reilly last spring. Tim O’Rielly talked to prominent web sites about the nuts and bolts of their database strategies here:

* Second Life
* Bloglines and Memeorandum
* Flickr
* NASA World Wind
* Craigslist
* O’Reilly Research
* Google File System and BigTable
* Findory and Amazon
* Brian Aker of MySQL Responds

Of the nine, the O’Reilly Research post is likely most relevant to mid-sized retailers — Roger Magoulas discusses how savvy DBA skills reduced the run time for an important query from “query never finished” to a zippy “query runs under two minutes.” Tips: clean up your data (Magoulas discussed the performance hit from having to deal with orphaned rows), partition your data sensibly, and use automation to keep your partitioning appropriate.

But even the war stories from the larger sites are instructive.

Last spring, Craig’s List’s active data (90 days) was 114G and 56 million rows, with another 238Gigs and 96 million rows in their less-active (older than 90 days) archive. (One suspects all these numbers are significantly bigger now, a year later.) And compared to most shops, Craig’s List has practically no IT staff — the entire firm is 24 people. Last spring, at this scale, Craigslist was still running a single master database, but was in the process of moving to a cluster.

And here’s Second Life’s Ian Wilkes on SL’s preference for scaling-through-architecture vs. scaling-through-hardware:

I think the biggest lesson we learned is that databases need to be treated as a commodity. Standardized, interchangeable parts are far better in the long run than highly-optimized, special-purpose gear. Web 2.0 applications will require more horsepower with less money than “One Database” or his big brother “One Cluster All Hail The Central Cluster” will offer.

Great series.

If you like this post, consider subscribing to our RSS feed. You can also have new posts sent to you via email.


Related Posts

Your Comment

Trackback

http://www.rimmkaufman.com/rkgblog/2007/03/12/database-scaling-war-stories/trackback/

Blogs Citing This Post

  1. Pingback: Creating Scalable applications. 2 Havent this been done before? « Computing Life on October 25, 2007

Email Updates

Categories

Recent Comments

  • Kathleen Raines: In my letter I did put the wrong month. I said April it should have been May.
  • Kathleen Raines: CEO Mike Ullman III, On January 26, 2010 I order a sofa, love seat and chair. I was told it would take 6-8 weeks since it was a...
  • Lance: George - Thanks so much for the interview and the kind words. Jake - We have seen the gains from our tests hold up. But I am sensitive to...
  • Andrew@BloggingGuide: I liked what he said: our approach is to never be satisfied, and always seek incremental improvement. This is absolutely true...
  • Jake Minturn: Great interview! One thing I am curious about, and I’d love to get Lance’s take on this, is if these boosts in conversion...
  • Bob: Would your call center stop answering sales calls because they’ve reached their budgeted labor for the month? This is considered...
  • David: Great post George, nice to see technology story telling alive. Kept me gripped and v interesting.
  • Rex Dixon: @George - That is too bad to hear. I don’t believe we have any PPC test results on our site currently.
  • George Michie: Ken, You’re absolutely right if the CR difference between A and B is small (2 or 3%) the odds of A running the table...
  • Ken Truman: Shay - I definitely think the same logic applies to day of week analysis. George - That’s an extremely interesting way of...
  • George Michie: Hi Laurence, We think folks spend far too much time worrying about mythical penalties. The account QS is dominated by the QS on your...
  • Laurence: Hi George, Thank you for the enlightening post. You’ve sold me on how important the long tail is so over the past few weeks...
  • Billy Wolt: take-away: Make sure you are bidding on your brand, broad topic, and specific model keywords :)
  • George Michie: Thanks for the kind words Lance and Bryan. Andy, I feel your pain. I meant to include a section on why site exclusions didn’t...
  • Algernon: Yay for yahoo! Just in time for them to shut it all down and hand the keys to Microsoft. Sorry, as an advertiser who got hammered for...

Blog Stats

  • Posts: 947
  • Words: 450,092
  • Comments: 2,846

Administration