Swings between A/B in experiments

Question

Hi RevCat team and community

Would love some guidance around LTV calculations in the experiments feature. We’ve been doing price testing of two price points for 2+ months now, and every day we check the experiments feature it is different, wild swings between variant A and B. I would have thought now we have been through 2 monthly renewal cycles we would start to see a trend or more signal, but it is different recommendation each day.

Is there a recommended time period to run an experiment? Or someone on the RevCat team that can help us analysis the results or setup our experiment differently?

Thanks - love the idea behind the feature - just looking for best way to utilize it.

Cheers

sharif · Accepted Answer

Please bear with us while Experiments is in beta! While this could be an issue in the feature, there may be other causes for constantly shifting probabilities.

There are two characteristics to look out for in your experiment’s results: uncertainty and tossups. High uncertainty and close tossups can cause the probabilities to shift drastically over time as more data is gathered and when changing LTV periods.

When the model is uncertain of the LTV, it means that the model is unable to use the data to make a good prediction. One great chart to look at for this is the box and whisker plot. If the whiskers are very long, that’s an indication of high uncertainty, which can cause the needle on the LTV gauge to swing between the two extremes as small amounts of data tip the scales. Look for the time period with the shortest whiskers - that’s going to be the time period that the model is most certain about, which you can then focus on your LTV gauge.

On the other hand, a tossup can occur even when you have good quality data. Essentially, the model has good data to make a prediction on, but the differences in LTV predictions are small enough that the model can’t choose a clear winner. In other words, the two offerings result in basically the same LTV, so the model can’t choose a winner. Small amounts of data can also tip the scales here and cause seemingly large swings in the LTV prediction. Going back to the box and whisker plot, if two boxes overlap significantly, then the LTV for that time period is a tossup. Look for the time period with the least overlap and focus your LTV gauge on that time period.

It’s difficult to explain exactly why your LTV prediction is changing drastically without actually looking through your data. If you feel like this is a bug in the model, please feel free to open a support ticket and we’ll look into it. However, here are some possible causes of uncertainty and tossups and how to fix them:

The two variants are too similar. This makes it more difficult for the model to choose a clear winner, resulting in a tossup. Making them more different (either in price, duration, intro period, etc) will result in more significant trends that the model can pick up on.
Try focusing on just one time period. Use the box and whisker plot and the line chart to find the time periods with the least uncertainty and follow those over time.
Depending on what kind of products you’re using in your experiments, the underlying data could have changed drastically over the two months of the experiment. For example, it’s more difficult to predict the LTV of an annual subscription since they renew infrequently and may not be purchased as often as a monthly subscription, which would have renewed once during the experiment’s lifetime. The uncertainty of the LTV of annual subscriptions can throw the model off. A similar case is when trial periods are long, for example 3 month or 12 month trials. The box and whisker plot is great for this case as it’ll tell you if the model is much more uncertain about the LTV of one variant compared to the other. There’s not much to fix here other than running your experiment even longer to gather more data.

Sign up

‏‏‎ ‎

Log in to the Community

‏‏‎ ‎

Scanning file for viruses.

This file cannot be downloaded