Detecting Significant Changes In Your Data
For statisticians, significance is an essential but often routine concept. For those who don’t remember the details of college statistics courses, significance is a nebulous concept that lends magical credence to whatever data it describes. Sometimes you make a change in your paid search program, watch the data come in, and want to claim that numbers are improving because of your initiative.
How can you support this claim? Can you discredit the possibility that the apparent improvement is just noise? How can you apply that authoritative label of “significant”?
Here I’d like to walk you through a basic test of significance that you can use to de-mystify changes in your paid search data.
If you’d like to skip the math, click here.
Let’s start with a situational example… say you’ve added Google Site Links to your brand ads and you want to show that brand click-through rate (CTR) has improved as a result.
- First, you need to know what value brand CTR is potentially improving from. Let’s call this value mu (pronounced myoo), and you can choose it in a variety of ways: the average or median CTR over the past month, the average or median CTR from this time of year last year, etc. It should really be whatever value you believe CTR to truly center around.
- Next, you need data points.That is, you need several days of CTR data since the Site Links have been running. How many days is up to you. Generally, more is better, but I’ll touch on that later. The number of days you have is n. Take the average of the CTRs from those days; this is called xbar. Lastly, take the standard deviation (excel function stdev) of these CTRs and call it s.
- Now we can compute a t-score, and with it, the probability that the change in CTR you’re seeing is or isn’t attributable to chance. Set t = |xbar – mu| / (s/squareroot(n)). Then use the function tdist in excel, and for the arguments, plug in t, n-1, and 1. The number that this function returns is the probability that the change in CTR is simply due to chance, aka noise. If this probability is very small, then we say CTR has changed significantly.
I’ve prepared an excel spreadsheet that handles the arithmetic. In this model, change the gray shaded cells to reflect your data. Enter the data that you think has fundamentally changed in column C. Only include data points since the change began. Then, in cell G2, enter the value from which you believe the data to have changed. That is, the average value of the data before the change.
The value p, produced in cell G7, is the probability that the change you’re seeing is only due to chance, and thus meaningless. Typically, a p-level must be below 5% to be considered significant. (If you want to be super, super sure, you can use 1% or 0.1% instead.) In other words, if your p-value is 5% or less, you can confidently say that the change in your data is real, definite, and due to something other than statistical noise. It’s a pretty safe bet that whatever initiative you took – whether it was switching landing pages, altering ad copy, or refining your bidding – was the catalyst for the improvement instead.
Allow me to fill in the spreadsheet with an example. For an imaginary online retailer, brand CTR hovers around 4.4%, so I fill in cell G2 with the value 4.4. The retailer enables Google Site Links, and CTRs for the 3 days afterward are 4.3, 5.2, and 5. So I enter those three data points into column C. And voila… the p-level comes back as 12.66%. This says that there is a 12.66% chance that the rise in CTR was due only to noise.
Not significant. Sorry, click-through-rates haven’t really increased, or at least, we can’t be very confident that the observed change is anything more than random noise.
But… three days is not much data. As smart analysts, we are cautious when examining trends over only a few days, and this significance test incorporates such wisdom. As the number of data points (n) you use increases, p-levels fall. For example, if all the numbers in the above example were the same except that you used 7 days instead of 3 (so n=7), the corresponding probability drops to 2.6%. In this instance, it’s very unlikely (2.6% unlikely) that the increase in CTR was due to noise, so here you can rather confidently say, “Yes, CTR has increased, and it wasn’t due to chance. It was probably due to the site links.”