Sep 222009

# Recognizing Signal & Noise in Paid Search

In the paid search industry, metrics like Conversion Rate (CR) and Sales-per-Click (SPC) often vary. There is sometimes a reason for this variation and its part of our job to find that reason, but other times, no reason exists. This unexplained variation is often referred to as “noise.” Noise occurs on top of, and can sometimes cloud our view of, the true underlying pattern, or, the “signal.”

Take a simple example, where the signal – in this case, the CR – is set to 1% and we create mock click & order data. Since CR is a percentage, that is, the likelihood that a person who clicks on an ad will order something, we can use a random number generator to dictate which of the pretend clicks will turn into pretend orders.

From one batch of random numbers, we see that 10 iterations of 1000 clicks yields observed CRs ranging from 0.5 to 1.4%, although we know the underlying pattern to be 1%. These sample CRs display up to 50% variance – that is, with noise, the CR can be anywhere from 50% to 150% of its true value.

Repeating the simulation with a higher signal, say, a CR of 5%, illustrates a similar picture, but with one key difference. After 10 iterations of 1000 clicks, observed CRs land anywhere between 3.6 and 6.4% – a wide range, just like in the first simulation. This time, however, variance has shrunk. These sample CRs are between 70 and 130% of the true CR. Thus, the higher the underlying value of the phenomenon, whether it’s CR, CTR (click-through-rate), or SPC, the less relative noise you should expect to see.

Another twist on the simulation teaches us about the expected impact of traffic on variation. When we use random numbers to generate 10 iterations of only 100 clicks (mimicking keywords that receive lower traffic), we see variance balloon quickly. This is reflected by the famous Central Limit Theorem, which tells us that larger sample sizes yield smaller standard errors. For us, that means that traffic and noise (remember, unexplained variation) should be inversely related.

This concept of ‘signal vs. noise’ is helpful to understand and keep in mind. When analyzing, have an idea of what you think the signal is, and, dependent on that and the amount of traffic you’re seeing, gauge the expected amount of noise. You’ll then have a better idea what kind of performance warrants investigation and what kind of performance is just due to chance.

For Some Hands-On Fun:

We've included an Excel spreadsheet, the Statistical Noise Maker, where you can explore the data and witness the results I've mentioned above. The first tab includes the 1000 iterations, and the second tab the 100 iterations. To refresh the data, (that is, to generate new random numbers), press F9 and watch as the graphs adjust themselves. To change the signal CR, enter a new value in cell A2. Compare the look of the graphs and the amount of variation between different CRs, and between 1000 and 100 iterations.

Have fun!!

Jon says:
Awesome post. Thank you for sharing, and providing the spreadsheet for us to use!
says:
Well done, Jen, well done!
Jeff says:
A pseudo random number generator may mimic a real process (conversion rate, etc...) but it doesn't mean that conversion is truly a stochastic process, readily modeled. We just have limited knowledge of all initial and existing conditions which vary continually based upon the state of the visitor generating the values. Good post though, Jeff
[...] Statistical Noise in Paid Search, Rimm Kaufman [...]
[...] interfere with the real pattern of the metric we want to see, which we call “signal.” Understanding statistical noise versus signal can help you determine if a variation in performance is a fluke or worth looking into.   [...]
[...] few weeks back one of RKG’s sharp analysts provided insight into the vexing problems associated with thin data. Given the conversion rates in retail PPC, it becomes very difficult to separate signal from noise [...]
[...] of our friend, statistical noise, we don’t know in advance which low traffic terms will generate a sale on very few clicks and [...]
[...] you support this claim?  Can you discredit the possibility that the apparent improvement is just noise? How can you apply that authoritative label of [...]
[...] data will be less meaningful with a smaller sample size. Experiment with the free spreadsheet found here, which demonstrates graphically how your signal to noise ratio is impacted by traffic. To fully  [...]