There are three kinds of lies: lies, damned lies, and statistics. -- Benjamin Disraeli
I nearly choked on my Sunday morning bagel reading an article in yesterday's Washington Post titled "State of the Household".
Staff writer Neil Irwin wanted to show how macroeconomic trends are affecting the average American household's financial statement.
The key word here is "average."
"Average" often isn't the same as "typical", even though the article's subhead -- "Translating Big Economic Trends Into Something a Real Family Might Face", emphasis mine -- suggests they are one and the same.
According to the Post, here are some facts about the average U.S. household, along with my comments in italics:
- Income: Household income of $55k.
High but plausible. Median US household income in 2006 was $48k, so stating $55k for "typical" income is high, but in the ball park.
- Income: Interest income of $10k.
Huh? Assuming 5% money market yield, that would mean having $200K in cash around (!!!) No, that isn't typical.
- Asset: Stocks and mutual funds totaling $96k
Nope, not typical.
- Asset: Business equity of $70k
- Expense: Heating oil costs of $205
When I lived in New England, our heating bills were at least five-fold that. So not typical for heating oil users. Now I live in Virginia, and heat with a heat pump. So now my oil costs are exactly zero. So not typical for non-heating-oil-users, either.
I am sure these averages are correct, but they're also utterly meaningless.
Averages do a lousy job of characterizing skewed distributions, as averages are very sensitive to outliers. Bill Gates and Warren Buffet et al play a large role in moving those "typical" interest income numbers reported above.
And averages might not be the best characterization of a population. As Arnie Barnett, one of my doctoral advisors, used to say, "The average (the centroid) of a donut is right in the middle, exactly right where there is no donut."
Averages are popular because they have great theoretical properties, and thus form the foundation of basic and advanced statistics.
Medians often perform better at characterizing a distribution. (The median of a distribution or sample is the value at which at most half the population are above and at most half the population are below).
OK, so averages can deceive. What's the relevance to paid search?
In an earlier post on calculating optional PPC bids (Computing Optimal Pay-Per-Click Bids In 19 Easy Steps), I gave an example along these lines:
"Assuming an average click to your site generates an average sales of $7.88, and assuming your margin and financial goals dictate an advertising-to-sales ratio of 28.6%, then you should bid on average 7.88 x .286 = $2.25 per click." (post)
In the footnotes, I point out that
In real life, conversion and SPC are the most important thing NOT to suppose. Averages and assumptions will kill you here. You need good data and careful statistics. Also, in real life, AOV and conversion vary widely by term and engine.
Correctly estimating conversion, AOV, and SPC on medium-traffic “body” terms and on very low volume “tail” terms is very, very important. At our firm that’s one of the most important ingredients to our secret sauce. Correctly estimating low-probability events has been an interest of mine since my doctoral research on the topic at MIT. (post)
In PPC bidding as in newspaper columns, use caution when using averages to characterize highly skewed or highly dispersed distributions.