Jun 212010

# "Advanced Statistics" and other Meaningless Drivel

In the recent Forrester Wave on Attribution solution providers, one criterion for excellence was the use of "Advanced Statistical Models." Indeed, that phrase pops up in the context of bid management on almost every paid search agencies' website, including ours. While high-brow statistics is certainly necessary, it isn't sufficient to guarantee results.

The problem is: "Advanced" doesn't necessarily mean "Accurate." Those who fail to recognize that distinction, treating stats modeling as a check-box rather than a matter of ongoing research, tend to get left behind where the rubber meets the road in performance.

Take the case of attribution modeling. We've been working with our stats consulting team on how best to determine which ads should get what fraction of credit for an order. It's a very difficult problem. Truthfully, it's beyond difficult.

Most folks ended their study of math at a happy point where applying math properly generated the "right" answer. Many folks assume that this must always be the case. Some of us who trudged on into the nether-world of math and quantum mechanics found to their dismay that there often isn't agreement on any one "answer" and that mathematical approaches can be valid and still not be particularly useful.

Let's take a look at some real world examples of how equally "advanced" statistical models produce dramatically different results.

We handed a few million rows of attribution data to some of the sharpest stats PhDs in the land and asked them to build a model for spreading credit judiciously between advertisements. These folks decided that the best approach would be to use a dynamic Bayesian Network to build a predictive model. This highly advanced statistical model produced lousy results.

It turns out that Bayesian models don't handle sequential ordering well, so a sale following the pattern of activity below was mishandled badly.

Example #1:

This first draft of a model credited the sale almost entirely to the the affiliate ad, with less than 10% of the credit given to the competitive natural search touch that preceded it. As marketers, we're pretty sure that's wrong. The "right" answer isn't intuitively obvious, but the notion that that first touch deserves a good bit more credit than that is pretty clear.

Using our experience as marketers, our internal stats folks and I helped our stats consultants think through other signals in the data that the modeling needed to sniff out. They abandoned the Bayesian approach and came back with 8 different models which produced much more reasonable, intuitively appealing results, that still remain very different from one another.

Example #2:

In this case, the user first came to the site twice through a single CSE ad. An hour and a half later, they came back to the site through a competitive (non-brand) paid search ad, and then they went digging for coupons through a few affiliate ads. They ended up going back to the CSE before placing the order.

Here are the ways the different models parsed credit between the ads:

Do any of those models strike you as "dead-on"? "Way off base?"

Let's see how those same models handled a different scenario:

Example 3:

In this case, Affiliate ads actually initiated the contact in late April. The user then re-engaged through a competitive paid search ad, before going back to their favorite coupon site to grab a discount. How would you parse the credit?

Here's what the models did:

Of course, the reality is that we can't know what actually motivated each individual. The best we can hope to do is develop modeling techniques that allow us to get close to the truth in aggregate. These are all highly advanced statistical approaches to the problem and they don't reach the same "answers." That they produce different answers doesn't surprise us, but may be startling to folks who don't live in the world of statistics.

This is the broader point. Whether we're talking about bidding algorithms, circulation modeling software, or attribution management the fact that the statistics under the hood are "advanced" doesn't make them "right." All advanced models are not equally predictive, and absent guidance from marketing experience and testing between modeling approaches "advanced models" can stink. Indeed, we threw out many "advanced" models along the path to our current bidding algorithm.

Marketers absent mathematical know-how are stuck without the ability to manage large scale, difficult problems. However, mathematicians lacking marketing acumen don't fair much better.

Finding the best solution requires both the math and the marketing intuition to make it work.

Buyer's Conundrum: How do you choose between service providers given that they all claim "Advanced Statistics" and none of them will let you look under the hood?

My observation is this: the folks who speak of their solution as a completed puzzle probably never even opened the box. Anyone who isn't still tinkering with their models probably never put much thought into them in the first place.

Talk to their clients about results, and hire the folks who produce great results but are still curious.

Billy Wolt says:
Good stuff George. Regardless of how you could go about crediting the sale, in 2 of the 3 examples, I am 100% sure the affiliate will want 100% crediet on the sale, as it was the last click that lead to the conversion. To make it more complex, there are other variables you could include, such as the keywords that lead to the clicks on ppc and cse....were the keywords for different products? Was the product sold related to any of the keywords searched? Back to the main point of the article, speaking to clients of the agency would be best. Although I have noticed many agencies claiming the same clients. Perhaps they were a client at one point, but having the name on the site makes them look better.
Thanks Billy, You raise excellent points. Particularly for commissioned sales (aka revenue sharing) pricing models, advertisers have to be able to pay based on the "properly" parsed credit for this to have terrific value. Learning that you're overspending or underspending on a particular channel is helpful, but being able to fix it is sometimes a challenge. I think baking this understanding into the contracts with vendors for any rev share deal will become essential. We've often scratched our heads over the fact that other PPC agencies list some of our clients as their clients. Possible that they worked with them in the past, or on something else, but still strange. We're happy to have folks call our former clients as well as our current clients, but listing them on your marketing literature forever seems odd. George
says:
As usual, excellent posting! The fact that the model gave different answers makes me think of the street light issue in research (i.e., we research what we can see even it doesn’t address the real issue). The data you are using is “ad views/clicks” and then correlating “credit for sales”. Given that the models give such different recommends would indicate to me that “ad views/clicks” does not correlate directly with “credit for sales”. Some component is missing.
Thanks Jim, How much weight to give each factor in the modeling and which factors to include/exclude are decisions that impact the results for each case. By necessity, we have to go by feel in terms of which models yield results that more closely jibe with our marketing intuition. What's interesting is that the aggregated results for each of these 8 greatly improved models (as opposed to the first one that stunk) were fairly close together. That fact gave us confidence that we were looking at this the right way and that all of these approaches were preferable to first touch, or last touch, or some less well designed stats models.
says:
Hi George, Nice article and I agree with a lot of what you said. I hear "advanced" and "algorithms" thrown out a lot. Anything can be an algorithm; long division is an algorithm. People also tend to forget that some times simpler solutions tend to work just fine and in the long run are better. To use a car analogy: Its like comparing a Honda Civic with a Beamer. The beamer might have better performance stats but after 10 years the civic gives far fewer problems :) I am curious to know if you have factored in time in your models i.e. if you accounted for a touch point that occurred a month ago vs a day ago ? Sid
Hi Sid, As I'm sure you know, in the language of statistics, time is a "pain in the butt" technically speaking. Some of the models incorporate time factors and some did not. There were a couple of different ways to get at that notion, creating bins of time lags between touches, between a give touch and the order, looking for predictive differences based on those bins. We did find a couple of binning mechanisms that seemed to be of value. We also have the ability to overlay heuristic rules deprecating values based on time lags of various types. The question will be does baking time factors into the model produce "better" results than leaving the time factors out and overlaying heuristic rules. We haven't answered that question yet. George
says:
George, I go back and forth on the value of this type of analysis. I'm not sure to what degree a marketer can influence the agregate channels visitors come to the site on, except to funnel more in from a 1st touch. The models you show are all directionally similar, are the marginal differences between them great enough to drive significantly different investment strategies? I see the value in this type of analysis as a means to cluster visitors based on their shopping intent or customer type: Brand Loyalist / Bargain Shopper / Occasional Buyer / Event Buyer / etc. Have you tried this type of analysis to help explain what type of buyers drive site traffic?
Hi Dave, The differences between the second set of models is fairly small in aggregate, but the differences between any of these models and first or last touch are quite material. The difference are larger still if the first or last touch model gives credit to touches coming from brand search (paid or natural), which we think is an out-and-out mistake unless there are no marketing touches involved. We treat competitive, non-brand natural search to be a marketing channel. I agree that to have real value the system has to reveal actionable data. We think this does in the sense that it can reveal marketing channels that are likely under-invested vs those that are likely over-invested. Acting on that data can lead to true top line and bottom line improvement. To this point, in every case affiliates have grossly over-credited by last touch models. Identifying those affiliate who are more likely to be cannibals than others and tossing them out of the program throws money to the bottom line. Re-investing that money into sales-driving channels (paid and organic search, display, CSE) helps the top line.
[...] This post was mentioned on Twitter by Rimm-Kaufman Group, Metatags, SEM Software, Long Tail, Negative Keywords and others. Negative Keywords said: “Advanced Statistics” and other Meaningless Drivel: In the recent Forrester Wave on Attribution solution providers... http://bit.ly/9D35Ps [...]
[...] Attribution Systems – and I wasn’t even aware of it – ha! – see a post on  “Advanced Statistics” and other Meaningless Drivel at RKG blog that refers to [...]