RKG Logo

I’ve noticed smaller and larger e-tailers tend to run custom in-house e-commerce software, while often mid-sized firms depend on third-party e-comm platforms.

For folks who’ve built their own infrastructure, database scaling is a key strategic concern. Slow databases lead to a slow site, and a slow site suffers from reduced conversion. Indeed, speed is perhaps the most important — but oft-overlooked — component of usability. (See, for example, Google’s Marissa Mayer’s Nov ‘06 Web 2.0 comments on the importance of site speed.)

Often, your database is your bottleneck. Speed up database reads and writes and your site will zoom. Better hardware helps, but fast growing sites soon reach the point where it makes sense to scale out rather than up.

If your in-house IT folks are responsible for your database strategy, I highly recommend this series of fascinating snippets on database approaches put up by O’Reilly last spring. Tim O’Rielly talked to prominent web sites about the nuts and bolts of their database strategies here:

* Second Life
* Bloglines and Memeorandum
* Flickr
* NASA World Wind
* Craigslist
* O’Reilly Research
* Google File System and BigTable
* Findory and Amazon
* Brian Aker of MySQL Responds

Of the nine, the O’Reilly Research post is likely most relevant to mid-sized retailers — Roger Magoulas discusses how savvy DBA skills reduced the run time for an important query from “query never finished” to a zippy “query runs under two minutes.” Tips: clean up your data (Magoulas discussed the performance hit from having to deal with orphaned rows), partition your data sensibly, and use automation to keep your partitioning appropriate.

But even the war stories from the larger sites are instructive.

Last spring, Craig’s List’s active data (90 days) was 114G and 56 million rows, with another 238Gigs and 96 million rows in their less-active (older than 90 days) archive. (One suspects all these numbers are significantly bigger now, a year later.) And compared to most shops, Craig’s List has practically no IT staff — the entire firm is 24 people. Last spring, at this scale, Craigslist was still running a single master database, but was in the process of moving to a cluster.

And here’s Second Life’s Ian Wilkes on SL’s preference for scaling-through-architecture vs. scaling-through-hardware:

I think the biggest lesson we learned is that databases need to be treated as a commodity. Standardized, interchangeable parts are far better in the long run than highly-optimized, special-purpose gear. Web 2.0 applications will require more horsepower with less money than “One Database” or his big brother “One Cluster All Hail The Central Cluster” will offer.

Great series.

If you like this post, consider subscribing to our RSS feed. You can also have new posts sent to you via email.

Share this post (via email, Digg, Delicious, etc)

Possibly Similar Posts

Trackback

http://www.rimmkaufman.com/rkgblog/2007/03/12/database-scaling-war-stories/trackback/

Blogs Citing This Post

  1. Pingback: Creating Scalable applications. 2 Havent this been done before? « Computing Life on October 25, 2007

Your Comment

Email Updates

Categories

Recent Comments

  • George Michie: Chris, I wouldn't be surprised if that's a real number. Inc says they have 550 employees, so their income would have to be $50...
  • George Michie: Hi Christian, I suppose they take the same percentage hit off their commission that the retailer does. To my thinking it's the...
  • Chris Zaharias: I read the magazine on a flight Sunday and recall seeing iCrossing on there at ~~$100M in revenues, and thought the same thing of...
  • Alan Rimm-Kaufman: Christian -- I didn't mean to imply all retailers will face Q4 losses. But it is not improbable that many retailers will be...
  • Christian Little: Despite the economic crisis, how could most retailers be facing a Q4 loss? For most retail this is the best time of the year, you...
  • Christian Little: That's pretty remarkable...makes me want to build a coupon site lol. Don't coupon sites take a huge hit in commissions though...
  • Stephen Schramke: Sage advice. Thanks for sharing!
  • George Michie: Could be Neil. I have my doubts. My suspicion is that there just isn't much work being done, other than taking commission checks to...
  • Neilzb: Those numbers are pretty remarkable, but if I had to guess I would say that it’s possible that they are just 8 people 'outsourcing' full...
  • Jeff Cornejo: I disagree that a revenue/employee ratio shows ANY kind of profitability. If anything, a mostly-passthrough model, with high...
  • George Michie: Hi Dan, The IP address of the advertiser isn't a factor, anyone can run geo-targeted ads regardless of where their website resides....
  • dan shipe: Hey, me again. What about possible exploits to this system? Adwords must evaluate the geographic region based on the IP address of the...
  • Mark Matsusaki: I think I'm in agreement with the previous posting in that ROI is the metric used by many decision makers to measure the value of...
  • George Michie: Thanks for your comments Ophir, you raise excellent points. Particularly as Geo-targeting competition in different areas moves...
  • SEO Services: Nice Post. Thanks for sharing this information with us.

Blog Stats

  • Posts: 758
  • Words: 336,078
  • Comments: 1,340

Administration

Close
  • Social Web
  • E-mail
Powered by ShareThis