RKG Logo 434-978-4300

More and more online marketers are doing more and more testing. There’s blogosphere buzz around testing offers, testing web page design, testing Adwords copy, etc. And all this testing is a Very Good Thing, for well-designed tests can literally transform your business.

Question: When you get a “statistically significant” uptick from a test, is it always a winner?

Answer: Usually, but not always.

There are three situations when your stats software will bless a set of results as “statistically significant”, when really they’re not.

Huge sample, Small Effect

The larger your test sample (impressions, clicks, catalogs mailed, whatever), the smaller effect you can detect. It is a little known fact that if a test is really huge, you’ll nearly always find a statistically significant difference between the control and test cells. The problem is that the difference may be too small to have any practical business significance. For example, with two cells of 10,000,000 apiece, a 1.01% response rate is statistically different than a 1% response rate (t=2.24, p=0.03). However, a single basis point difference has no business impact for the typical direct marketer.

Takeaway advice: Make sure statistically significant effects are large enough to have business significance.

Appropriate Sample, Huge Outlier

Most statistical tests rest on an assumption that noise in your test is normally distributed. This is a usually a great assumption, but sometimes isn’t true. Under a normal assumption, about 95% of the data should fall within 2 standard deviations of the mean, 99.7% should fall within 3 standard deviations of the mean, and you should never see data 5 or 6 standard deviations out. When a stats package sees a 5 or 10 sigma event, the software quivers with excitement and starts ringing happy bells. But if the assumptions about the error model were wrong, you could be led to make a bad decision (hopefully not as significant as Bear Stearns recent loss of $1.6 billion).

Takeaway advice: Check your data for outliers. For direct marketers, an outlier is often a single gigantic order, making whichever test cell was lucky enough to receive it look like a grand slam. If you find atypical events are driving your significance, toss ‘em out.

Appropriate Sample, Small Time Period

Most statistical tests rest on an assumption that noise in your test is stationary, which is a fancy term for “not changing over time.” A retailer with a high traffic site running a MVT test could see a statistically significant winner in a day or two. However, if all the data came from three weekdays in first quarter, you don’t know if those results will hold on weekends, or in Q4.

Takewaya advice: Make sure your tests run long enough to be representative. During the holiday peak, roll out early winners quickly (so as to not miss opportunity), but keep a small holdout back-test to confirm your early results.

• • •

Direct marketing testing is both art and science. The science is designing good tests and running the stats. The art is knowing what to test, how to interpret results, and how to use the findings to significantly improve your business.

If you like this post, consider subscribing to our RSS feed. You can also have new posts sent to you via email.


Related Posts

    No related posts.

Comments

  1. Matthew Griffin, November 24, 2007:

    This is a great article, especially for those of us who are forced into a situation where we have to attempt to decipher web stats every day but have little actual statistical training.

  2. AndyEd, November 24, 2007:

    Well said!

    I often use daily means across a many week sample to hypothesis test split test factors on small and medium size websites. Particulary in this case another potential statistic pitfall, even with a seemingly normal distribution, is heteroscedasticity (http://en.wikipedia.org/wiki/Heteroskedasticity).

    Many standard statistical tests rely on the assumption that variance is equal across repeated samples. Keeping an eye on this assumption is also a useful safeguard in split test analyses.

  3. Stephen Schramke, November 27, 2007:

    Alan - Thank you for including the link to the Blow-up article. It was fascinating reading that I would have never come across had I not been reading your blog. I also really enjoyed the book Super Crunchers that your recommended.

Your Comment

Tags

RKG Tags: ,

Technorati Tags: ,

Trackback

http://www.rimmkaufman.com/rkgblog/2007/11/24/statistical-significance/trackback/

Blogs Citing This Post

  1. Pingback: On Writing Effective Blog Post Titles on August 13, 2008

Email Updates

Categories

Recent Comments

  • Curtis: Great study George! Along the same lines, I’m trying to find a study about which search engines have the highest conversion ratios....
  • registry cleaner: Thank you. I found your division of total time spent on priorities very useful!thanks again
  • George Michie: Ophir, thank you for your marvelous comment. I agree with you. Brand building is an important element of marketing, and a very...
  • Ophir: Hi George, Interesting post, very intereting. I find myself struggling with this issue day in day out and I mostly agreee with your...
  • Kevin Hillstrom: Oh, you are on to something! I can promise you that.
  • George Michie: I am eager to see what you’re thinking on the topic, Kevin. Some of our early early data scratchings suggest that we may be...
  • Kevin Hillstrom: This will teach me to not schedule posts … I have a half-dozen similar posts coming in the next week!!
  • TAMMY LANGWORTHY: I WISH TO CANCEL MY FUN FAMILY REWARDS AS I DON’T USE IT VERY MUCH. THANK YOU TAMMY LANGWORTHY
  • George Michie: Thanks Dave, it is a hot topic for good reason. I’ve had some interesting conversations with Kevin Hillstrom about his...
  • Mark Ballard: I certainly don’t mean to discourage advertising with Yahoo at all as there’s plenty of value to be had there. Healthy...
  • Nathan L.: I have thought about advertising on Yahoo! for some time, but news like this makes me want to just stick with Google. Good useful...
  • Dave 2.0: George, thanks for the callout on the survey. I’m VERY interested in the topic.
  • Nancy Maiewski: Another charge on my J.C.Penney statement for $9.95 for Family Fun Rewards! This isn’t the first time I have opened my bill...
  • George Michie: David, I’m sure Shop.org will make the results available to participants. We’re talking about presenting them at the...
  • David: It’s not clear from the survey whether participants get a free copy of the results. Do you know?

Blog Stats

  • Posts: 938
  • Words: 441,342
  • Comments: 2,755

Administration