One of the best tools in the data scientist’s toolkit is the ability to build predictive models that allow companies to act *before* something occurs. For example, you might want to predict when a customer is going to churn so that you can intervene (e.g., give them a discount or customer service call) before they leave you forever. Or you might want to know when a component in your factory is likely to fail so that you can replace it before it fails and halts production.

But understanding the profitability of a predictive model underlying a strategy like those above is difficult. A model-based intervention strategy’s Return on Investment (ROI) is highly contextualized, requiring that you understand your model performance, the number and nature of the events you are trying to predict, and the profits/costs associated with your intervention’s hits/misses. It also requires that you set the sensitivity or *threshold* of your predictive model: how confident should it be in a predicted outcome before you intervene?

Moreover, model ROI's can often be counterintuitive. For example, in cases where the base rate of a predicted event is relatively high (e.g., 70%), it's often not worth caring about predictive model accuracy — you would need a near-perfect model in order to meaningfully outperform a strategy that simply intervenes every time. Conversely, if the base rate of a predicted event is low, then even small gains in model-performance can matter to the bottom line.

We built the calculator below to estimate the ROI for a predictive model-based intervention strategy based on just a few simple, configurable parameters. It also demonstrates how your decision about predictive model threshold is critical to this estimation. (To learn more about how this model works, read below.)

## How does this calculator work?

This calculator estimates the profits and costs associated with implementing a predictive model based on each of the parameters you indicated above. To do so, it must make some assumptions. One major assumption is about the predictions generated by the model; namely, that they are normally distributed in two classes (positive predictions, negative predictions) and that the Response Operator Characteristic (ROC) curve of the model is symmetric. Although reasonable assumptions, this means that models with different properties may have different ROIs, despite being matched in terms of model performance, when performance is defined as the area under the ROC curve (the AUC).

## What if I don’t know how much an intervention earns or costs my business?

This is very common if you are trying to predict something noisy like customer retention and lifetime value (LTV). To estimate the earnings and costs associated with an intervention, you’ll need to run an experiment.

You could, for example, assign a sample of your customers to receive an intervention (e.g., a coupon, a call from their account rep) and the other half not to receive an intervention. Next, analyze the effects of the intervention by comparing these groups. If you are interested in LTV and retention, you might want to use survival modeling to see whether the churn rate of your intervention group is lower than your control group. If you know how much your customers spend, this churn rate difference can be converted into an increase in LTV (i.e., profit!).

We recommend to our clients that they implement even a preliminary predictive model before the experiment. This allows them to assess how much revenue an intervention can generate *depending on the predicted probabilities*. This can help optimize your decision about *who* to intervene on by maximizing the profitability.

## Does this only apply to customers?

No. We use customer-focused language in this calculator so that it is concrete. However, the same basic principles would apply if you were attempting to calculate the ROI of a model predicting widgets breaking in a factory, breaches of a secure network, or anything else. You just need to know (1) how many events are possible, (2) the base rate of the event of interest, and (3) the costs/profits associated with intervening.

## Why did you choose AUC as a model performance metric?

AUC is a standard metric used to evaluate binary models predicting either positive or negative outcomes, 1’s or 0’s. We use it here because it’s a “threshold-free” evaluation method; that is, one can calculate AUC without ever committing to a probability threshold over which someone is considered a positive prediction. Instead, AUC concerns itself only with the raw probabilities being predicted, as it is equal to the probability that a random positive outcome sample from your data will be attributed a higher probability of being positive than a random negative outcome sample. This gives us a sense for the model’s performance or accuracy without having to turn those probabilities into discrete, binary outcomes.

One of the biggest issues with AUC is that it’s difficult for non-statisticians to interpret. And that is why we built this calculator!